Solutions — Data Analysis with Python

Interactive, in-browser edition — completed code you can run and edit.

Author

Dr. Chester Ismay

About this page

These are the completed solutions, runnable in your browser — no install needed. Click Run Code on any cell (run the import and data-loading cells near the top first). Edit any cell to experiment.

Looking for the blanks to fill in yourself? See the exercises page.

Intro: Foundations of Data Analysis with Python

Walkthrough: Setting Up the Python Environment

If you haven’t already installed Python, Jupyter, and the necessary packages, there are instructions on the course repo in the README to do so here.

Exercise: Setting Up the Python Environment

By completing this exercise, you will be able to - Import necessary Python packages - Check for successful package loading

Follow the instructions above in Walkthrough to check for correct installation of necessary packages. We’ll wait a few minutes to make sure as many of you are set up as possible. Please give a thumbs up in the pulse check if you are ready to move on.

Module 1: Data Wrangling with Pandas

Walkthrough 1.1: Loading and Inspecting Data with Pandas

Import data from a CSV or from an Excel file

Common Pitfall: read_excel relies on the openpyxl engine for .xlsx files. If you see an ImportError or Missing optional dependency 'openpyxl', install it with pip install openpyxl. Also note that read_excel reads only the first sheet by default; pass sheet_name= if your workbook has multiple sheets.

Perform an initial exploration of the data

Exercise 1.1: Loading and Inspecting Data with Pandas

By completing this exercise, you will be able to use pandas to - Import data from a CSV or from an Excel file - Perform an initial exploration of the data

Walkthrough 1.2: Cleaning and Preparing Data with Pandas

Handle missing data

Remove rows

Remove columns

Replace missing values with specific value

This can be extended to replace missing values with the mean, median, or mode of the column too.

Common Pitfall: Methods like dropna, fillna, and rename return a new DataFrame by default and do NOT modify the original in place. Assign the result to a variable (or pass inplace=True) or your changes will be lost. Reassigning, as done here, is the recommended pattern.

Convert a column to a different data type

Rename a column

Changing a DataFrame’s index

Set the index

Reset the index

Filtering rows based on conditions

Conditions on a single column

Common Pitfall: When combining multiple conditions, wrap each comparison in parentheses and use & (and) / | (or), not the Python keywords and/or. Forgetting the parentheses, e.g. economies['inflation_rate'] < 0 & ..., triggers an operator-precedence error because & binds more tightly than <.

Conditions on multiple columns

Exercise 1.2: Cleaning and Preparing Data with Pandas

By completing this exercise, you will be able to use pandas to - Handle missing data - Convert a column to a different data type - Rename a column - Change a DataFrame’s index - Filter a DataFrame

Handle Missing Data

Remove rows

Remove columns

Replace missing values with specific value

Convert a Column to a Different Data Type and Rename a Column

Convert a Column to a Different Data Type

Rename a Column

Change a DataFrame’s Index and Filter a DataFrame

Change a DataFrame’s Index

Filter a DataFrame

Walkthrough 1.3: Transforming and Aggregating Data with Pandas

Grouping data

Applying Functions

Applying a function element-wise with `map()`

Applying a Function to Groups with `groupby()` and `agg()`

Summary tables

Analyzing categorical data

Using cross-tabulation

By getting group counts

Exercise 1.3: Transforming and Aggregating Data with Pandas

By completing this exercise, you will be able to use pandas to - Aggregate data effectively by grouping it - Transform data by applying functions element-wise or to groups - Create summary tables - Analyze categorical data using cross-tabulation and counts

Grouping Data

Applying Functions

Applying a function element-wise with `map()`

Applying a function to groups with `groupby()` and `agg()`

Summary Tables

Analyzing Categorical Data

Using Cross-Tabulation

By Getting Group Counts

Self-Check

By the end of this module, you should be able to:

Load data from CSV and Excel files with read_csv and read_excel
Inspect a DataFrame using head(), info(), describe(), dtypes, and isnull().sum()
Handle missing data by dropping rows/columns or filling values with dropna and fillna
Convert column data types, rename columns, and set or reset the index
Filter rows using single and multiple boolean conditions
Aggregate and summarize data with groupby, agg, pivot_table, crosstab, and value_counts

Module 2: Data Visualization Basics with Matplotlib and Seaborn

Walkthrough 2.1: Creating Basic Plots with Matplotlib

Line plot

Bar chart

Adding labels and titles

Adjusting axes and tick marks

Common Pitfall: Call plt.figure() before each new plot and plt.show() after it. If you skip plt.figure(), matplotlib keeps drawing onto the previous figure, and successive plots can overlap or accumulate unexpectedly within the same axes.

Exercise 2.1: Creating Basic Plots with Matplotlib

By completing this exercise, you will be able to use matplotlib to - Create line plots and bar charts - Add labels and titles - Adjust axes and tick marks

Line Plot

Bar Chart

Adding Labels and Titles

Adjusting Axes and Tick Marks

Walkthrough 2.2: Data Visualization Techniques with Seaborn

Heatmap

Common Pitfall: corr() only works on numeric columns. Select numeric columns first (as shown with select_dtypes); passing a DataFrame that still contains string columns such as code or income_group will either error or silently drop them depending on your pandas version.

Pair plot

Violin plot

Customizing Seaborn plots

Exercise 2.2: Data Visualization Techniques with Seaborn

By completing this exercise, you will be able to use seaborn to - Create heatmaps - Design pair plots and violin plots - Customize Seaborn plots

Heatmap

Pair Plot

Violin Plot

Customizing Seaborn Plots

Self-Check

By the end of this module, you should be able to:

Create line plots and bar charts (vertical and horizontal) with matplotlib
Add axis labels, titles, and gridlines to a plot
Adjust axis limits and tick marks with ylim, xticks, and yticks
Build heatmaps, pair plots, and violin plots with seaborn
Compute a correlation matrix on numeric columns before plotting it
Customize seaborn plots with palettes, hues, and matplotlib styling

Module 3: Interactive Data Visualization with Plotly

Walkthrough 3.1: Interactive Charts and Dashboards with Plotly

Basic interactive chart

Common Pitfall: If a Plotly figure shows up blank in a Jupyter notebook, the renderer was likely not initialized. Make sure the setup cell that runs pyo.init_notebook_mode(connected=True) was executed first, and remember that you must call fig.show() – simply creating fig does not display anything.

Adding interactive elements

Designing a simple dashboard

Common Pitfall: Plotly subplot positions are 1-indexed, so the first cell is row=1, col=1 (not 0). The rows/cols you pass to make_subplots must also be large enough to hold every add_trace position, or you will get an index error.

Exercise 3.1: Interactive Charts and Dashboards with Plotly

By completing this exercise, you will be able to use plotly to - Create a basic interactive chart - Add interactive elements: hover, zoom, and selection tools - Design a simple dashboard with multiple charts

Basic Interactive Chart

Adding Interactive Elements

Designing a Simple Dashboard

Walkthrough 3.2: Creating a Dynamic Data Report

Selecting relevant data

Building a dynamic report

Adding contextual text and summaries

Exercise 3.2: Creating a Dynamic Data Report

By completing this exercise, you will be able to use pandas and plotly to - Select relevant data - Build a dynamic report - Add contextual text and summaries

Selecting Relevant Data

Building a Dynamic Report

Adding Contextual Text and Summaries

Self-Check

By the end of this module, you should be able to:

Create basic interactive charts with Plotly Express (px.line, px.scatter)
Add interactive elements such as hover text, color encoding, and custom labels
Build multi-panel dashboards with make_subplots and add_trace
Combine go.Scatter and go.Bar traces into a single figure
Add contextual annotations and summary text to a report with add_annotation
Confirm Plotly figures render by initializing the notebook mode and calling fig.show()

Module 4: Real-World Data Analysis Project

Walkthrough 4.1: Interactive Charts and Dashboards with Plotly

Selecting a Dataset

Questions to Ask:

What industry problem or area of interest does the dataset align with?
- Is the dataset relevant to economic analysis, market research, policy planning, or another industry?
Does the dataset provide sufficient complexity and scope for a thorough analysis?
- Does it include multiple variables and data points across different time periods and categories (e.g., income groups, countries)?
What specific questions or hypotheses do we want to explore with this dataset?
- Are we interested in comparing economic indicators across countries, understanding the impact of GDP per capita on other variables, or identifying trends over time?

Example:

Dataset: The economies dataset.
Industry Problem: Understanding economic disparities between countries and the impact of economic indicators on overall economic health.
Specific Questions:
- How do GDP per capita and gross savings vary across different income groups?
- How has the inflation rate changed over time for specified income groups?

Applying Cleaning, Transforming, and Analysis Techniques

Questions to Ask:

What cleaning steps are necessary to prepare the data for analysis?
- Are there any missing values that need to be handled? Are there any inconsistencies in data types?
What transformations are required to make the data analysis-ready?
- Do we need to create new columns, filter specific rows, or aggregate data by certain categories?
How can we analyze the data to uncover patterns, trends, or anomalies?
- What statistical methods or visualizations can we use to explore relationships between variables?

Example:

Cleaning:

Common Pitfall: Build new columns on a DataFrame you fully own (here economies_cleaned, created by fillna), not on a filtered slice of another DataFrame. Assigning to a column of a slice can raise pandas’ SettingWithCopyWarning and may not update the data you expect. When you need an independent copy, use .copy().

Transforming:
Analyzing:

Initial Findings and Interpretation

Questions to Ask:

What do the initial findings tell us about the data?
- Are there any notable patterns or trends in the data? Are there any unexpected results?
How do these insights relate to the problem defined earlier?
- Do the findings help us understand economic disparities between countries? Do they provide insights into the impact of certain economic indicators?
What hypotheses can we test based on the initial results?
- Can we test hypotheses about the relationship between GDP per capita and other economic indicators? Can we refine our analysis to explore these hypotheses further?

Example:

Initial Findings:
- GDP per Capita vs. Gross Savings: The scatter plot shows that high-income countries generally have higher GDP per capita and gross savings. There seems to be a slight positive correlation between these two indicators.
- Inflation Rate Over Time: The line plot indicates that inflation rates vary significantly over time and across different income groups. Low and lower middle income countries tend to experience higher volatility in inflation rates.
Interpretation:
- These findings suggest that economic health, as measured by GDP per capita and gross savings, is strongly influenced by the income group of a country. High-income countries appear to have more stable and higher economic performance.
- The volatility in inflation rates among low-income countries may indicate economic instability, which could be a key area for policy intervention.
Hypotheses:
- Hypothesis 1: High-income countries have a higher average GDP per capita and gross savings compared to low-income countries.
- Hypothesis 2: Low-income countries experience greater volatility in inflation rates compared to high-income countries.
Next Steps:
- Conduct further analysis to test these hypotheses, using statistical methods to confirm the observed patterns.
- Explore other economic indicators to gain a more comprehensive understanding of economic disparities and trends.

By following these steps, you can effectively select, clean, transform, and analyze the economies dataset to gain valuable insights and address common industry problems or research questions.

Walkthrough 4.2: Finalizing and Presenting Your Data Analysis Project

Integrate Feedback to Refine the Analysis

Questions to Ask:

What feedback have you received from peers, stakeholders, or mentors?
- Is there feedback on the clarity of the analysis, choice of visualizations, or the comprehensiveness of the analysis?
How can you incorporate this feedback into your analysis?
- Are there additional variables that need to be analyzed? Do you need to clean the data further or adjust the visualizations?
What new questions or hypotheses have emerged from the feedback?
- Does the feedback suggest new directions for the analysis or areas that need more focus?

Example:

Feedback:
- Peers suggested that the analysis should also consider the impact of unemployment rates.
- Stakeholders requested more clarity on the relationship between GDP per capita and inflation rates across different income groups.
Refining the Analysis:
- Additional data needs to be found to meet the request for more clarity. Or maybe a further drilldown on specific countries would be helpful?

Finalize the Presentation with Impactful Visuals and Narrative

Questions to Ask:

What are the key insights from the analysis that need to be highlighted?
- What are the most important findings that should be communicated to the audience?
How can you create impactful visuals that clearly convey these insights?
- What types of charts or visualizations best represent the data and findings?
What narrative will you use to guide the audience through the presentation?
- How will you structure the presentation to tell a compelling story with the data?

Example:

Key Insights:
- High-income countries have higher GDP per capita and gross savings.
- There is a positive correlation between GDP per capita and gross savings.
- Low-income countries experience greater volatility in inflation rates.
- Unemployment rates vary significantly across income groups.
Impactful Visuals:
Narrative:
- Introduction: Introduce the dataset and the industry problem. Explain why understanding economic indicators across different income groups is important.
- Key Findings: Present the key findings using the visualizations created. Highlight the relationship between GDP per capita, gross savings, inflation rates, and unemployment rates.
- Detailed Analysis: Dive deeper into each key finding, providing more context and interpretation. Explain the significance of the trends and patterns observed in the data.
- Conclusion: Summarize the insights and discuss potential implications for policy or business decisions. Suggest areas for further research or analysis based on the findings.

Rehearse the Presentation

Questions to Ask:

How will you structure your presentation to ensure a smooth flow?
- What order will you present the visualizations and insights? How will you transition between different sections?
How will you engage your audience and ensure they understand the key points?
- What techniques will you use to highlight important information and keep the audience’s attention?
What potential questions or feedback might you receive, and how will you address them?
- How will you prepare for questions about the data, analysis methods, or findings?

Example:

Structuring the Presentation:
- Start with an overview of the dataset and the industry problem.
- Move on to the key findings, using the most impactful visualizations to illustrate each point.
- Provide a detailed analysis of each finding, explaining the significance and implications.
- Conclude with a summary of insights and suggestions for further research.
Engaging the Audience:
- Use clear and concise language to explain complex concepts.
- Highlight key points using annotations or callouts on the visualizations.
- Encourage questions and interaction to keep the audience engaged.
Preparing for Questions:
- Anticipate common questions about the data sources, cleaning methods, and analysis techniques.
- Prepare explanations for any limitations of the data or analysis.
- Be ready to discuss potential next steps and areas for further research based on the findings.

By following these steps, you can effectively integrate feedback, finalize your presentation with impactful visuals and narrative, and rehearse to ensure a smooth and engaging delivery.

Self-Check

By the end of this module, you should be able to:

Select a dataset and frame the industry problem and questions it can answer
Apply cleaning and transforming techniques to prepare data for analysis
Engineer new columns, such as a grouped pct_change, for deeper analysis
Produce exploratory visuals and interpret initial findings into hypotheses
Integrate feedback to refine the analysis and choice of visualizations
Finalize and rehearse a presentation that tells a clear, impactful data story

Intro: Foundations of Data Analysis with Python

Walkthrough: Setting Up the Python Environment

Exercise: Setting Up the Python Environment

Module 1: Data Wrangling with Pandas

Walkthrough 1.1: Loading and Inspecting Data with Pandas

Import data from a CSV or from an Excel file

Perform an initial exploration of the data

Exercise 1.1: Loading and Inspecting Data with Pandas

Walkthrough 1.2: Cleaning and Preparing Data with Pandas

Handle missing data

Remove rows

Remove columns

Replace missing values with specific value

Convert a column to a different data type

Rename a column

Changing a DataFrame’s index

Set the index

Reset the index

Filtering rows based on conditions

Conditions on a single column

Conditions on multiple columns

Exercise 1.2: Cleaning and Preparing Data with Pandas

Handle Missing Data

Remove rows

Remove columns

Replace missing values with specific value

Convert a Column to a Different Data Type and Rename a Column

Convert a Column to a Different Data Type

Rename a Column

Change a DataFrame’s Index and Filter a DataFrame

Change a DataFrame’s Index

Filter a DataFrame

Walkthrough 1.3: Transforming and Aggregating Data with Pandas

Grouping data

Applying Functions

Applying a function element-wise with map()

Applying a Function to Groups with groupby() and agg()

Summary tables

Analyzing categorical data

Using cross-tabulation

By getting group counts

Exercise 1.3: Transforming and Aggregating Data with Pandas

Grouping Data

Applying Functions

Applying a function element-wise with map()

Applying a function to groups with groupby() and agg()

Summary Tables

Analyzing Categorical Data

Using Cross-Tabulation

By Getting Group Counts

Self-Check

Module 2: Data Visualization Basics with Matplotlib and Seaborn

Walkthrough 2.1: Creating Basic Plots with Matplotlib

Line plot

Bar chart

Adding labels and titles

Adjusting axes and tick marks

Exercise 2.1: Creating Basic Plots with Matplotlib

Line Plot

Bar Chart

Adding Labels and Titles

Adjusting Axes and Tick Marks

Walkthrough 2.2: Data Visualization Techniques with Seaborn

Heatmap

Pair plot

Violin plot

Customizing Seaborn plots

Exercise 2.2: Data Visualization Techniques with Seaborn

Heatmap

Pair Plot

Violin Plot

Customizing Seaborn Plots

Self-Check

Module 3: Interactive Data Visualization with Plotly

Walkthrough 3.1: Interactive Charts and Dashboards with Plotly

Basic interactive chart

Adding interactive elements

Designing a simple dashboard

Exercise 3.1: Interactive Charts and Dashboards with Plotly

Basic Interactive Chart

Applying a function element-wise with `map()`

Applying a Function to Groups with `groupby()` and `agg()`

Applying a function element-wise with `map()`

Applying a function to groups with `groupby()` and `agg()`