Tuesday Tip #32: Fancy filtering in pandas ๐ŸŽฉ


Hi Reader,

Starting next week, Iโ€™ll be offering a Black Friday sale on ALL of my courses.

Iโ€™ll send you the details tomorrow! ๐Ÿšจ


๐Ÿ”— Link of the week

โ€‹Visual Vocabulary (PDF)

Not sure which type of visualization to use? This beautiful poster from the Financial Times will help direct you to a suitable visualization based on the type of data you have and the story you're trying to tell.

Also available in Spanish, French, Chinese, and Japanese.


๐Ÿ‘‰ Tip #32: Explore, filter, and reshape your data with pandas

โ€‹Two weeks ago, I showed you three visualizations that I created from the World Happiness Report using Datawrapper.

โ€‹Last week, I walked through the steps needed to clean the data in order to create the first visualization, namely a world map.

Today, I'll show you how I transformed the raw data into the line chart data using Python's pandas library. Here are the specific steps:

  1. Read in & clean the data
  2. Decide which years to keep
  3. Keep only those years
  4. Keep countries that have ALL of those years
  5. Change to wide format
  6. Write data to CSV file

If you want to follow along with the code, you can run it online using Google Colab.

Hereโ€™s the end result:


Step 1: Read in & clean the data

To start, I read in the dataset from a URL using the read_excel function and selected which columns to keep.

Then, I standardized the column names and rounded the happiness column to 2 digits using the round method.


Step 2: Decide which years to keep

I wanted the line chart to span multiple years so that we could examine happiness trends over time. However, not every country has happiness data for every year, so I needed to decide which years to include.

To inform this decision, I selected the year column, used the value_counts method to count the number of times each year appears, and used the sort_index method to sort the resulting Series by year.

I knew the line chart would end with 2022, so I decided to start with 2011 since that year had a lot of data and was long enough ago to create a interesting visualization.


Step 3: Keep only those years

First, I used the range and set functions to create a set of integers from 2011 through 2022, which I called years.

Then, I used the isin method to create a boolean Series marking which year values were within the years set, and I used that Series to filter the DataFrame.

As you can see, the only years left in the dataset are 2011 through 2022.


Step 4: Keep countries that have ALL of those years

I decided to limit the visualization to only include countries that had complete data for the entire 12-year span. Thus, I needed to eliminate countries that were missing any years between 2011 and 2022.

There are many ways to accomplish this (as I learned from my Stack Overflow question), but the simplest method is to group the data by country, and then for each country, check whether the set of that country's years is equal to the years set.

You can see that there are only 83 countries left in the dataset, and each country has data for all 12 years.


Step 5: Change to wide format

In order to create this particular visualization, Datawrapper needs the year as the index, the country as the columns, and the happiness as the values within the table. In other words, the data needs to be in "wide" format rather than "long" format.

To reshape the data, I used the pivot method, specifying which columns should be used as the index, the columns, and the values.


Step 6: Write data to CSV file

Finally, I wrote the contents of the DataFrame to a file using the to_csv method. I included the index in the CSV (as the first column) since it contains the year values.


Publish with Datawrapper

I uploaded the CSV into Datawrapper, customized the chart's appearance (by setting United States to red, for example), and published it online.

Hereโ€™s the end result, which you can click on and interact with:


If you enjoyed this weekโ€™s tip, please forward it to a friend! Takes only a few seconds, and it really helps me reach more people! ๐Ÿ™

Also, stay tuned for an email tomorrow about my Black Friday sale! ๐Ÿค‘

- Kevin

P.S. Frog riding a unicycleโ€‹

Did someone awesome forward you this email? Sign up here to receive Data Science tips every week!

Learn Artificial Intelligence from Data School ๐Ÿค–

Join 25,000+ intelligent readers and receive AI tips every Tuesday!

Read more from Learn Artificial Intelligence from Data School ๐Ÿค–

Hi Reader, I'm thrilled to announce that my new book, Master Machine Learning with scikit-learn, is now on sale! Buy from Amazon I poured my heart and soul into making this the highest quality and most practical Machine Learning book available. Publishing this book is a dream come true, and I'd be grateful if you'd consider picking up a copy! ๐Ÿ™ Option 1: Get the paperback from Amazon ($19) Although most technical books of this size (300+ pages) tend to sell for at least $39, I've priced the...

Hi Reader, A few months ago, I announced that my new book, Master Machine Learning with scikit-learn, would be published in December. Since then, my personal life has undergone some dramatic changes ๐Ÿฅด During the transition, it has been challenging to focus on anything other than bare life essentials ๐Ÿฝ๏ธ ๐Ÿ›Œ ๐Ÿšฟ Thankfully, my life has begun to steady (yay!), and so in the past few weeks I've been able to wrap up some key pieces of the project! โœ… I'm thrilled to hold in my hands the FINAL proof...

Hi Reader, happy new year! ๐ŸŽ‰ I wanted to share with you the three most important articles I found that look back at AI progress in 2025 and look forward at what is coming in 2026 and beyond. Iโ€™ve extracted the key points from each article, but if you have the time and interest, Iโ€™d encourage you to read the full articles! ๐Ÿ’  The Shape of AI: Jaggedness, Bottlenecks and Salients By Ethan Mollick โ€œJaggednessโ€ describes the uneven abilities of AI: Itโ€™s superhuman in some areas and far below human...