Hide code cell source

import sys
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '..', 'shared')))
import setup_code
stroke_data = setup_code.stroke_data

Module 6e: Parameters and Arguments#

Section Objectives:
- Differentiate between arguments and parameters in functions.
- Recognize how the number of arguments affects function calls and definitions.
- Implement default parameter values in function definitions.

Parameters vs Arguments#

In programming, we differentiate between the variable names declared in the function, and the actual variables passed in the function.

A parameter is the placeholder variable name you declare in the function. It is part of the function’s signature. An argument is the actual variable or value you pass in the function when you call it.

Let’s go back to the count_stroke_patients() function to understand the difference better.

stroke_data = setup_code.stroke_data
def count_stroke_patients(df):
    """
    Return the number of patients in the dataset who had a stroke.

    Parameters
    ----------
    df : pandas.DataFrame
        The dataset containing patient information. Must include a 'stroke' column
        with binary values (e.g., 0 = no stroke, 1 = stroke).

    Returns
    -------
    int
        The number of patients who had a stroke.
    """
    stroke_count = df['stroke'].sum()
    return stroke_count

When we defined this function, we used df as the parameter. It isn’t a real dataset; it is an indicator for you to know that the function requires a dataframe that will be used in its code [@params_vs_args].

count_stroke_patients(stroke_data)

When we called this function, we used the dataframe stroke_data as the argument. This is the actual data we want the function to work with.

Note

It’s good practice to use meaningful parameter names. It makes your code more understandable and easier to maintain.

Multiple Parameters, Multiple Arguments#

Functions can have more than one parameter. This allows you to pass multiple pieces of information into it. When you call the function, you must provide the same number of arguments (unless some parameters have default values, which you’ll learn later).

The order matters — the first argument goes to the first parameter, the second to the second, etc.

This is called positional arguments (we’ll discuss keyword arguments later).

def patient_summary(df, column):
    """
    Print the average value for a given column in the dataframe.
    """
    avg_value = df[column].mean()
    print(f"The average {column} is {avg_value:.2f}")

With two parameters, we call the function with two arguments as follows:

patient_summary(stroke_data, "age")
patient_summary(stroke_data, "avg_glucose_level")
The average age is 43.23
The average avg_glucose_level is 106.15

Try switching the argument order to see what the function does:

### Your code here ###

Default parameters/default arguments#

You can give your function parameters a default value so that callers don’t have to provide every argument every time. These are called default parameters or optional arguments.

def check_high_glucose(df, threshold=125):
    """
    Add a column 'glucose_status' labeling patients as 'high' or 'normal'
    based on avg_glucose_level threshold.
    """
    def glucose_level(glucose):
        if glucose > threshold:
            return 'high'
        else:
            return 'normal'
    
    df['glucose_status'] = df['avg_glucose_level'].apply(glucose_level)
    return df

The parameter threshold has a default value of 125.

If you call the function without specifying threshold, it will use 125.

check_high_glucose(stroke_data)  # Uses default threshold = 125
id gender age hypertension heart_disease ever_married work_type Residence_type avg_glucose_level bmi smoking_status stroke glucose_status
0 9046 Male 67.0 0 1 Yes Private Urban 228.69 36.6 formerly smoked 1 high
1 51676 Female 61.0 0 0 Yes Self-employed Rural 202.21 NaN never smoked 1 high
2 31112 Male 80.0 0 1 Yes Private Rural 105.92 32.5 never smoked 1 normal
3 60182 Female 49.0 0 0 Yes Private Urban 171.23 34.4 smokes 1 high
4 1665 Female 79.0 1 0 Yes Self-employed Rural 174.12 24.0 never smoked 1 high
... ... ... ... ... ... ... ... ... ... ... ... ... ...
5105 18234 Female 80.0 1 0 Yes Private Urban 83.75 NaN never smoked 0 normal
5106 44873 Female 81.0 0 0 Yes Self-employed Urban 125.20 40.0 never smoked 0 high
5107 19723 Female 35.0 0 0 Yes Self-employed Rural 82.99 30.6 never smoked 0 normal
5108 37544 Male 51.0 0 0 Yes Private Rural 166.29 25.6 formerly smoked 0 high
5109 44679 Female 44.0 0 0 Yes Govt_job Urban 85.28 26.2 Unknown 0 normal

5110 rows Ɨ 13 columns

But you can override it by passing your own value:

check_high_glucose(stroke_data, threshold=140)  
id gender age hypertension heart_disease ever_married work_type Residence_type avg_glucose_level bmi smoking_status stroke glucose_status
0 9046 Male 67.0 0 1 Yes Private Urban 228.69 36.6 formerly smoked 1 high
1 51676 Female 61.0 0 0 Yes Self-employed Rural 202.21 NaN never smoked 1 high
2 31112 Male 80.0 0 1 Yes Private Rural 105.92 32.5 never smoked 1 normal
3 60182 Female 49.0 0 0 Yes Private Urban 171.23 34.4 smokes 1 high
4 1665 Female 79.0 1 0 Yes Self-employed Rural 174.12 24.0 never smoked 1 high
... ... ... ... ... ... ... ... ... ... ... ... ... ...
5105 18234 Female 80.0 1 0 Yes Private Urban 83.75 NaN never smoked 0 normal
5106 44873 Female 81.0 0 0 Yes Self-employed Urban 125.20 40.0 never smoked 0 normal
5107 19723 Female 35.0 0 0 Yes Self-employed Rural 82.99 30.6 never smoked 0 normal
5108 37544 Male 51.0 0 0 Yes Private Rural 166.29 25.6 formerly smoked 0 high
5109 44679 Female 44.0 0 0 Yes Govt_job Urban 85.28 26.2 Unknown 0 normal

5110 rows Ɨ 13 columns

Note

  • Use default values only when it makes sense. For example, if there’s a common or typical value that fits most cases, it’s helpful to set it as a default.

  • Parameters with default values must come after parameters without defaults.

  • You cannot have a required parameter after a default one.

Positional vs Keyword arguments#

Using positional arguments#

You might have noticed by now that,even without default arguments, the order of called arguments matters. Arguments are matched to parameters based on their position — first with first, second with second, and so on. You must pass them in the same sequence the function expects. Let’s take a look at the following function:

def describe_patient(df, index, age_col, gender_col, hypertension_col):
    """
    Print a basic description of a patient at the given index.

    Parameters
    ----------
    df : pandas.DataFrame
        The dataset containing patient information.
    index : int
        The row index of the patient in the dataset.
    age_col : str
        Name of the column containing age information.
    gender_col : str
        Name of the column containing gender information.
    hypertension_col : str
        Name of the column indicating whether the patient has hypertension (0 or 1).

    Returns
    -------
    None
        This function only prints output to the screen.
    """
    patient = df.iloc[index]
    
    print(f"Patient #{index}: {patient[age_col]}-year-old {patient[gender_col]}.")
    
    if patient[hypertension_col]:
        print("Has hypertension.")
    else:
        print("No hypertension.")

To call the function and follow the correct positional argument order, we use:

describe_patient(stroke_data, 10, 'age', 'gender', 'hypertension')
Patient #10: 81.0-year-old Female.
Has hypertension.

stroke_data goes to df, 10 goes to 10 goes to index, age goes to age_col, gender goes to gender_col, and hypertension goes to hypertension_col.

What happens if we swap the arguments 'age' and 'gender'?

describe_patient(stroke_data, 10, 'gender', 'age', 'hypertension')
Patient #10: Female-year-old 81.0.
Has hypertension.

You’ll get nonsense output because Python assigns ā€˜Male’ to index (which should be an int), and so on.

Using keyword arguments#

You can explicitly specify which argument refers back to which parameter in your function as follows:

describe_patient(
    df=stroke_data,
    index=10,
    age_col='age',
    gender_col='gender',
    hypertension_col='hypertension'
)
Patient #10: 81.0-year-old Female.
Has hypertension.

The order then no longer matters since Python assigns the values based on the parameter names.

Arbitrary positional arguments (*args)#

Sometimes, you don’t know in advance how many positional arguments someone will pass.

The *args syntax collects all extra positional arguments into a tuple.

Let’s try a different function which shows you patient data based on the index provided:

def show_patient_columns(df, index, *columns):
    """
    Print specified column values for a patient at a given index.
    
    Parameters:
        df (pd.DataFrame): The dataset.
        index (int): Patient row index.
        *columns (str): One or more column names to display.
    """
    patient = df.iloc[index]
    print(f"Patient #{index} data:")
    for col in columns:
        print(f"{col}: {patient[col]}")

Since we are using arbitrary positional arguments or *args, you can call the function with any number of columns:

show_patient_columns(stroke_data, 10, 'age', 'bmi', 'avg_glucose_level')
Patient #10 data:
age: 81.0
bmi: 29.7
avg_glucose_level: 80.43

Arbitrary keywords arguments (**kwargs)#

The **kwargs syntax collects any extra keyword arguments into a dictionary, allowing flexible labeled data input. Arguments are passed as key=value pairs, and the function accesses them by key name.

def patient_summary(df, index, **info):
    """
    Print a summary report for a specific patient, including extra labeled information.

    Parameters
    ----------
    df : pandas.DataFrame
        The stroke dataset containing patient data.
    index : int
        The row index of the patient in the DataFrame.
    **info : dict
        Arbitrary keyword arguments where keys are labels and values are either
        column names in `df` or literal information to print.

    Returns
    -------
    None
        Prints the summary information without returning any value.
    """
    patient = df.iloc[index]
    print(f"Summary for patient #{index}:")
    for label, col_name in info.items():
        if col_name in df.columns:
            print(f"{label.capitalize()}: {patient[col_name]}")
        else:
            print(f"{label.capitalize()}: {col_name}")  # literal if not a column

We call it like this:

patient_summary(
    stroke_data,
    7,
    age='age',
    gender='gender',
    smoker='smoking_status'
)
Summary for patient #7:
Age: 69.0
Gender: Female
Smoker: never smoked

Here, info is a dictionary: {ā€˜age’: ā€˜age’, ā€˜gender’: ā€˜gender’, ā€˜smoker’: ā€˜smoking_status’}.

*args vs **kwargs#

Both *args and **kwargs let your functions accept an arbitrary number of arguments beyond those explicitly declared, but they work differently:

Aspect

*args

**kwargs

Collects

Extra positional arguments (tuple)

Extra keyword arguments (dict)

Usage

When number/order of args vary

When number/names of args vary

How to pass

Without keywords, just values

Using key=value pairs

Access in function

By position (iteration over tuple)

By keys (iteration over dict)

Example use case

Columns list to print

Named patient info or options

Flexibility

Flexible number but no labels

Flexible number and labels

TLDR;

  • Use *args if you want to accept many values without caring about their names, like a list of columns.

  • Use **kwargs if you want to accept many options or labeled data, where each argument’s name matters.

Quick Practice#

*args and **kwargs not mutually exclusive. You can combine both in a function!

Write a function named patient_report that:

  1. Takes these parameters:

    • df: the stroke dataset DataFrame

    • index: an integer representing the patient row index

    • *columns: any number of column names to display

    • **extra_info: any number of keyword arguments with extra info to print

  2. Prints a report for the patient at the given index:

    • For each column in *columns, print the column name and that patient’s value.

    • For each key-value pair in **extra_info, print the key (capitalized) and the value as is.

Hint 1

Use .capitalize() on the keys from extra_info for nicer formatting.

Hint 2

Loop over columns (the *args) to print each requested column and value. Loop over extra_info.items() (the **kwargs) to print each key-value pair.

Solution


def patient_report(df, index, *columns, **extra_info):
    """
    Print a detailed report for a specific patient in the DataFrame.
    Parameters
    ----------
    df : pandas.DataFrame
        The stroke dataset containing patient data.
    index : int
        The row index of the patient in the DataFrame.
    *columns : str
        Variable length argument list of column names to display values for.
    **extra_info : dict
        Arbitrary keyword arguments representing additional labeled information to include in the report.
    Returns
    -------
    None
        This function prints the patient report directly and does not return a value.
    """
    patient = df.iloc[index]
    print(f"Patient #{index} report:")
    for col in columns:
        print(f"{col}: {patient[col]}")
    for key, val in extra_info.items():
        print(f"{key.capitalize()}: {val}")
    

References#