Introduction to Data Analysis Using Pandas PyCon UK 2023

Introduction to Data Analysis Using Pandas
.ical

2023-09-22 15:30–17:00, Room L

Working with data can be challenging: it often doesn’t come in the best format for analysis, and understanding it well enough to extract insights requires both time and the skills to manipulate and visualize it. This session will teach you to effectively use pandas to make this process easier.

Pandas makes it possible to work with tabular data and perform all parts of the analysis from collection and manipulation through aggregation and visualization. While most of this session focuses on pandas, during our discussion of visualization, we will also introduce at a high level Matplotlib (the library that pandas uses for its visualization features, which when used directly makes it possible to create custom layouts, add annotations, etc.) and Seaborn (another plotting library, which features additional plot types and the ability to visualize long-format data).

Section 1: Getting Started With Pandas

We will begin by introducing the Series, DataFrame, and Index classes, which are the basic building blocks of the pandas library, and showing how to work with them. By the end of this section, you will be able to create DataFrames and perform operations on them to inspect and filter data.

Section 2: Data Wrangling

To prepare our data for analysis, we need to perform data wrangling. We will learn how to clean and reformat data (e.g. renaming columns, fixing data type mismatches), restructure/reshape it, and enrich it (e.g. discretizing columns, calculating aggregations, combining data sources).

Section 3: Data Visualization

The human brain excels at finding patterns in visual representations of the data; so in this section, we will learn how to visualize data using pandas along with the Matplotlib and Seaborn to help us better understand our data.

Target Audience

This tutorial is for anyone with basic knowledge of Python and an interest in learning how to analyze data in Python. We will be working with Jupyter Notebooks, so attendees should familiarize themselves with the interface (i.e., know how to run/edit a cell) beforehand.

Environment Setup

Please setup your environment prior to the session by following the instructions here.

Is your proposal suitable for beginners?: yes

Stefanie Molin

Stefanie Molin is a software engineer at Bloomberg in NYC, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of Hands-On Data Analysis with Pandas, which is currently in its second edition and has been translated into Korean. She holds a bachelor’s degree in operations research from Columbia University, as well as a master’s degree in computer science, with a specialization in ML, from Georgia Tech.