Hide code cell source

import sys
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '..', 'shared')))
import setup_code
stroke_data = setup_code.stroke_data

Module 6f: Extra (OPTIONAL)#

Section Objectives:
- Learn some useful functions we use often.

These are some bonus concepts that aren’t essential for day-to-day data analysis but are useful to know — and often come up in more advanced tasks and code reading.

Recursion#

A recursive function is a function that calls itself to solve a problem step by step. Recursion is not commonly used for DataFrame operations, but it helps you understand how some algorithms work.

Let’s say we want to calculate a factorial:

def factorial(n):
    if n == 0 or n == 1:
        return 1
    return n * factorial(n - 1)

factorial(3) returns 3 * factorial(2), which becomes 3 * 2 * factorial(1), and so on…

Note

Be careful! Recursive functions must always have a base case (a stopping condition), or they’ll go on forever.

Anonymous (Lambda) Functions#

A lambda function is a quick way to define a simple function in one line, without using def. It’s very useful with .apply() in pandas. Let’s label patients with a bmi_status column, based on their BMI:

stroke_data = setup_code.stroke_data
stroke_data['bmi_status'] = stroke_data['bmi'].apply(lambda x: 'high' if x > 30 else 'normal')

This adds a new column with the label ‘high’ if BMI is over 30, ‘normal’ if less or equal to 30. The code above is the same as writing the following:

def bmi_label(x):
    return 'high' if x > 30 else 'normal'

stroke_data['bmi_status'] = stroke_data['bmi'].apply(bmi_label)

Useful Built-in Functions: map, filter, and zip#

These are functions you can use with lists or Series to quickly transform or filter data.

map(function, iterable)#

Useful to apply a function to every item in a list or Series.

#Example: Let’s get stroke status labels (string) for each patient.
stroke_flags = stroke_data['stroke'].map(lambda x: 'Yes' if x == 1 else 'No')

filter(function, iterable)#

Filters a list by a condition that returns True.

#Example: Get a list of patient ages 80 or older
ages = stroke_data['age'].tolist()
senior_patients = list(filter(lambda x: x >= 80, ages))

zip(function, iterable)#

Combines two lists into pairs.

#Example: Pair each patient’s ID and age into tuples
patient_ids = stroke_data['id'].head(3)
ages = stroke_data['age'].head(3)

list(zip(patient_ids, ages))
[(9046, 67.0), (51676, 61.0), (31112, 80.0)]