Module 6f: Extra (OPTIONAL)#
- Learn some useful functions we use often.
These are some bonus concepts that aren’t essential for day-to-day data analysis but are useful to know — and often come up in more advanced tasks and code reading.
Recursion#
A recursive function is a function that calls itself to solve a problem step by step. Recursion is not commonly used for DataFrame operations, but it helps you understand how some algorithms work.
Let’s say we want to calculate a factorial:
def factorial(n):
if n == 0 or n == 1:
return 1
return n * factorial(n - 1)
factorial(3)
returns 3 * factorial(2)
, which becomes 3 * 2 * factorial(1)
, and so on…
Note
Be careful! Recursive functions must always have a base case (a stopping condition), or they’ll go on forever.
Anonymous (Lambda) Functions#
A lambda function is a quick way to define a simple function in one line, without using def
. It’s very useful with .apply()
in pandas.
Let’s label patients with a bmi_status column, based on their BMI:
stroke_data = setup_code.stroke_data
stroke_data['bmi_status'] = stroke_data['bmi'].apply(lambda x: 'high' if x > 30 else 'normal')
This adds a new column with the label ‘high’ if BMI is over 30, ‘normal’ if less or equal to 30. The code above is the same as writing the following:
def bmi_label(x):
return 'high' if x > 30 else 'normal'
stroke_data['bmi_status'] = stroke_data['bmi'].apply(bmi_label)
Useful Built-in Functions: map
, filter
, and zip
#
These are functions you can use with lists or Series to quickly transform or filter data.
map(function, iterable)
#
Useful to apply a function to every item in a list or Series.
#Example: Let’s get stroke status labels (string) for each patient.
stroke_flags = stroke_data['stroke'].map(lambda x: 'Yes' if x == 1 else 'No')
filter(function, iterable)
#
Filters a list by a condition that returns True.
#Example: Get a list of patient ages 80 or older
ages = stroke_data['age'].tolist()
senior_patients = list(filter(lambda x: x >= 80, ages))
zip(function, iterable)
#
Combines two lists into pairs.
#Example: Pair each patient’s ID and age into tuples
patient_ids = stroke_data['id'].head(3)
ages = stroke_data['age'].head(3)
list(zip(patient_ids, ages))
[(9046, 67.0), (51676, 61.0), (31112, 80.0)]