Module 4b: Selecting rows and columns in pandas

Module 4b: Selecting rows and columns in pandas#

In this section you will learn how:

to select specific rows and columns in a dataframe,
to fill a dataframe with values.

Selecting rows and columns in a pandas dataframe follows different rules from accessing elements in NumPy arrays or Python lists. You can do that either by specifying their labels or indices.

import pandas as pd
import numpy as np

# load the dataframe from the previous section, method is further explained in module 4d
transposed_dict_df = pd.read_csv("transposed_dict_df-4a.csv")

Selection by labels - columns#

This is one of the main ways to select a single column in a Pandas dataframe by its name: referring to the label as a string character in brackets:

# select a single column by name 
transposed_dict_df["height"]

  175.00
    1.79
    1.75
    1.63
    1.68
    1.59
    1.90
    1.71
    1.66
    1.78
Name: height, dtype: float64

Now if you want to select multiple columns, you need to put their names in a Python list, that then goes into the dataframe square brackets:

# select multiple columns by name
transposed_dict_df[["height", "weight"]]

	height	weight
0	175.00	70.9
1	1.79	103.4
2	1.75	85.1
3	1.63	55.9
4	1.68	50.0
5	1.59	64.0
6	1.90	92.9
7	1.71	75.4
8	1.66	90.7
9	1.78	82.3

Mini-exercise 1

Give names to the rows in transposed_dict_df as they used to be - i.e. row 0 gets label “participant1” etc. Don’t type participant labels manually, automate it.

# your code goes here

💡 Solution

One of the possible ways to achieve this result:

participant_labels = []
n_rows = transposed_dict_df.shape[0]
for i in range(0, n_rows):
    label = "participant"+str(i+1)
    participant_labels.append(label)
transposed_dict_df.index = participant_labels

Selection by labels - rows#

Let us reuse the dataframe that you updated in the exercise above. To select a row by its label, you need to use .loc:

# selecting a single row
print("Single row selection")
print(transposed_dict_df.loc["participant1"])
print("------")
print("Multiple row selection")
# selecting multiple rows - you need to put the labels in a list 
# like in multiple column selection previously
transposed_dict_df.loc[["participant1", "participant8"]]

Single row selection

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[6], line 3
      1 # selecting a single row
      2 print("Single row selection")
----> 3 print(transposed_dict_df.loc["participant1"])
      4 print("------")
      5 print("Multiple row selection")

File ~\AppData\Local\Programs\Python\Python313\Lib\site-packages\pandas\core\indexing.py:1191, in _LocationIndexer.__getitem__(self, key)
   1189 maybe_callable = com.apply_if_callable(key, self.obj)
   1190 maybe_callable = self._check_deprecated_callable_usage(key, maybe_callable)
-> 1191 return self._getitem_axis(maybe_callable, axis=axis)

File ~\AppData\Local\Programs\Python\Python313\Lib\site-packages\pandas\core\indexing.py:1431, in _LocIndexer._getitem_axis(self, key, axis)
   1429 # fall thru to straight lookup
   1430 self._validate_key(key, axis)
-> 1431 return self._get_label(key, axis=axis)

File ~\AppData\Local\Programs\Python\Python313\Lib\site-packages\pandas\core\indexing.py:1381, in _LocIndexer._get_label(self, label, axis)
   1379 def _get_label(self, label, axis: AxisInt):
   1380     # GH#5567 this will fail if the label is not present in the axis.
-> 1381     return self.obj.xs(label, axis=axis)

File ~\AppData\Local\Programs\Python\Python313\Lib\site-packages\pandas\core\generic.py:4301, in NDFrame.xs(self, key, axis, level, drop_level)
   4299             new_index = index[loc]
   4300 else:
-> 4301     loc = index.get_loc(key)
   4303     if isinstance(loc, np.ndarray):
   4304         if loc.dtype == np.bool_:

File ~\AppData\Local\Programs\Python\Python313\Lib\site-packages\pandas\core\indexes\range.py:417, in RangeIndex.get_loc(self, key)
    415         raise KeyError(key) from err
    416 if isinstance(key, Hashable):
--> 417     raise KeyError(key)
    418 self._check_indexing_error(key)
    419 raise KeyError(key)

KeyError: 'participant1'

Selecting both rows and columns by label#

If you want to select rows and columns by label together, you need to specify rows with .loc first and then in the next brackets list desired columns.

# single row and single column
print(transposed_dict_df.loc["participant8"]["weight"])

75.4

Mini-exercise 2

Select smoking and diseases data for participants 5 and 9 in one line using labels.

# your code goes here

💡 Solution

One of the possible ways to achieve this result:

print(transposed_dict_df.loc[["participant5", "participant9"]][["smoking", "diseases"]])

Selecting values by their integer position#

If you don’t want to use labels, row and column selection can be done with indices - you need to use .iloc[.., ..] for this. For example, to access diseases (column 5) present in participant 3 (row 2), you need to type transposed_dict_df.iloc[2, 5]. Pandas dataframes also use 0-based indexing, slicing rules are very similar to NumPy arrays.

# selecting a single value
print("Single value")
print(transposed_dict_df.iloc[2, 5])
print("----")
print("Range")
# select a range (the end position 7, 5 are not included in the selected range)
print(transposed_dict_df.iloc[2:7, 3:5])
print("----")
print("Specific values")
# selecting multiple specific values - provide indexes in a list
print(transposed_dict_df.iloc[[0, 5, 7], [0, 1, 2]])
print("----")
print("Entire row")
# select the entire row
print(transposed_dict_df.iloc[2, :])
print("----")
print("Entire column")
# select the entire column
print(transposed_dict_df.iloc[:, 0])

Single value
('Lung cancer',)
----
Range
              weight  smoking
participant3    85.1     True
participant4    55.9    False
participant5    50.0    False
participant6    64.0     True
participant7    92.9    False
----
Specific values
              gender  age  height
participant1  female   25  175.00
participant6  female   32    1.59
participant8    male   46    1.71
----
Entire row
gender                  male
age                       18
height                  1.75
weight                  85.1
smoking                 True
diseases    ('Lung cancer',)
Name: participant3, dtype: object
----
Entire column
participant1     female
participant2       male
participant3       male
participant4     female
participant5     female
participant6     female
participant7       male
participant8       male
participant9     female
participant10      male
Name: gender, dtype: object

Filling out values in a datafame#

Sometimes you might want to change values in a dataframe. Here are some examples for how to do it for a single value or larger data structures:

transposed_dict_df.iloc[[0, 5, 7], [0, 1, 2]] = np.nan # insert a null value

transposed_dict_df.iloc[[0, 5, 7], [0, 1, 2]]

	gender	age	height
participant1	NaN	NaN	NaN
participant6	NaN	NaN	NaN
participant8	NaN	NaN	NaN

# can fill out the range of values with a numpy array, Python lists or a Pandas dataframe

revised_weight_height = np.array([[1.88, 1.65, 1.72], [50, 73, 87]])

transposed_dict_df.iloc[[1, 2, 3], [2, 3]] = revised_weight_height.T

transposed_dict_df.iloc[[1, 2, 3], [2, 3]]

	height	weight
participant2	1.88	50.0
participant3	1.65	73.0
participant4	1.72	87.0

Mini-exercise 3

Participant 1 has their height recorded in centimeters in transposed_dict_df. Change it to meters and update the dataframe.

# your code goes here

💡 Solution

One of the possible ways to achieve this result:

transposed_dict_df.loc["participant1"][["height"]] = 1.75

Save the dataframe to work on it in the next section. The method .to_csv will be explained in more detail in module 4d.

transposed_dict_df.to_csv("transposed_dict_df-4b.csv")

Module 4b: Selecting rows and columns in pandas

Contents

Module 4b: Selecting rows and columns in pandas#

Selection by labels - columns#

Selection by labels - rows#

Selecting both rows and columns by label#

Selecting values by their integer position#

Filling out values in a datafame#