Module 6b: Return Statements#
- Understand the concept of return values from functions versus functions that only print output without returning values.
Functions that print instead of return#
Sometimes you may want a function to display something on the screen rather than calculate and return a value. This is common when:
Youâre debugging (checking your logic step by step)
You want to summarize or visualize data directly for the user
Youâre building exploratory tools, not reusable computations
Letâs modify the smoking count function as an example. Weâll add a print statement to help us see whatâs going on during execution.
def count_smokers_by_category(df):
if 'smoking_status' in df.columns:
print("Column found. Counting smoking categories...")
return df['smoking_status'].value_counts()
else:
print("Column 'smoking_status' not found in DataFrame.")
return None
None
represents the absence of value, and has the NoneType
data type. Youâll see other programming languages call it null
, nil
, or undefined
.
Now letâs write a function that doesnât return any value â it just prints a useful summary.
def print_stroke_summary(df):
total = len(df)
stroke_count = df['stroke'].sum()
no_stroke_count = total - stroke_count
stroke_rate = stroke_count / total * 100
print("Stroke Summary Report:")
print(f"Total records: {total}")
print(f"Stroke cases: {stroke_count}")
print(f"No stroke cases: {no_stroke_count}")
print(f"Stroke rate: {stroke_rate:.2f}%")
This function prints information to the screen, but doesnât return anything. If you tried to assign its output to a variable, youâd get:
result = print_stroke_summary(stroke_data)
print(result) # Output: None
Stroke Summary Report:
Total records: 5110
Stroke cases: 249
No stroke cases: 4861
Stroke rate: 4.87%
None
When your function ends with a print statement or doesnât have a return statement, Python adds return None
behind the scenes. This is also the case when you use the return
keyword by itself without any value added to the statement.
Weâve seen a similar behavior with loops; when you write a for
or while
loop, Python implicitly adds a continue
statement [@automatetheboringstuff_ch3].
Functions with no return statement#
A function does not need a return statement. If it doesnât return anything explicitly, it still runs â it just returns None by default.
This is common in:
Plotting and visualization
Functions designed for side effects (like writing to a file or showing a chart)
In module 5, you learned how to write code to plot a histogram for the average glucose level. Letâs reuse that code, but turn it into a general-purpose display-only function which plots the histogram of any passed column.
def plot_histogram(df, column, bins=30):
"""
Plot a normalized histogram for any numeric column in the DataFrame.
Parameters:
df (pd.DataFrame): The dataset.
column (str): The name of the numeric column to plot.
bins (int): Number of bins for the histogram (default 30).
"""
plt.figure(figsize=(6, 4))
plt.hist(df[column].dropna(), bins=bins, alpha=0.6, edgecolor='black', density=True)
plt.xlabel(column, fontsize=14)
plt.ylabel('Probability density', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.title(f'Normalized Histogram of {column}', fontsize=18)
plt.show()
Remember that if you want to actually store the figure for later use â like saving it to a file or modifying it, you can update the function to return the figure object as follows:
def plot_histogram(df, column, bins=30):
"""
Plot a normalized histogram for any numeric column in the DataFrame.
Parameters:
df (pd.DataFrame): The dataset.
column (str): The name of the numeric column to plot.
bins (int): Number of bins for the histogram (default 30).
"""
fig = plt.figure(figsize=(6, 4))
plt.hist(df[column].dropna(), bins=bins, alpha=0.6, edgecolor='black', density=True)
plt.xlabel(column, fontsize=14)
plt.ylabel('Probability density', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.title(f'Normalized Histogram of {column}', fontsize=18)
plt.show()
return fig
Functions that return multiple variables#
So far, weâve seen functions that return one value â like a number or a string. But what if you need to return multiple results from a single function?
In Python, you can do that easily using tuples (see Module 2). When you return multiple values from a function, Python automatically creates a tuple behind the scenes.
đč Why would you return multiple values? Sometimes a function does more than one thing and needs to give back multiple pieces of information. For example:
Suppose you have a dataset of patients. You want a function that counts how many are adults (age 18 or older) and how many are minors (under 18).
def count_adults_and_minors(df):
"""
Count how many patients are adults (age >= 18) and minors (age < 18).
Parameters:
df (pd.DataFrame): The stroke dataset with an 'age' column.
Returns:
tuple: (number_of_adults, number_of_minors)
"""
adults = df[df["age"] >= 18].shape[0]
minors = df[df["age"] < 18].shape[0]
return adults, minors
adult_count, minor_count = count_adults_and_minors(stroke_data)
What the function actually returns is the tuple (adults, minors).
Python automatically unpacks it into two variables: adult_count
and minor_count
.
You could assign the tuple to a variable first:
result = count_adults_and_minors(stroke_data)
print(result)
(4254, 856)
Then access the different parts as follows:
print(result[0]) # adults
print(result[1]) # minors
4254
856
But using unpacking (as we did earlier) is cleaner and easier to read.
Quick Practice#
Write a function that does two things:
Prints the average stroke rate for each smoking status group.
Plots a bar chart to visualize those averages (check Module 5 or official documentation if youâve forgotten how to do this!).
Optional: returns the plot object so you can store or save it later.
def plot_stroke_by_smoking(df):
"""
Show average stroke rate by smoking status group.
Parameters:
df (pd.DataFrame): The stroke dataset.
Returns:
matplotlib.figure.Figure: The figure object for further use (e.g. saving).
"""
# Step 1: Group by smoking status and calculate average stroke rate
# Step 2: Print summary
# Step 3: Plot
# Step 4: Return figure
pass
Hint 1
To calculate the average of a column grouped by categories in another column, you can use this pattern:
grouped_averages = df.groupby('grouping_column')['value_column'].mean()
Youâll get a Series where the index contains each unique group from the grouping column, and the values are the calculated averages for each group from the value column.
Hint 2
Use plt.bar()
or sns.barplot()
to create the chart.
Solution
To print the average stroke rate grouped by smoking status and plot it as a bar chart, you can use:
def plot_stroke_by_smoking(df):
"""
Show average stroke rate by smoking status group.
Parameters:
df (pd.DataFrame): The stroke dataset.
Returns:
matplotlib.figure.Figure: The figure object for further use (e.g. saving).
"""
# Step 1: Group by smoking status and calculate average stroke rate
stroke_rates = df.groupby('smoking_status')['stroke'].mean()
# Step 2: Print summary
print("Average stroke rate by smoking status:")
for group, rate in stroke_rates.items():
print(f"- {group}: {rate:.3f}")
# Step 3: Plot
fig = plt.figure(figsize=(6, 4))
stroke_rates.plot(kind='bar', color='teal', edgecolor='black')
plt.title("Stroke Rate by Smoking Status")
plt.ylabel("Average Stroke Rate")
plt.xlabel("Smoking Status")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# Step 4: Return figure
return fig