Data Analysis with Jupyter Notebook

Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. It has become a popular tool for data cleaning, transformation, analysis and visualization. In this article, we will walk through a data analysis project in Jupyter Notebook using Python.

Getting Started

First, you need to install Jupyter Notebook on your machine. The easiest way is to install the Anaconda distribution which includes Jupyter Notebook. Once installed, launch Jupyter Notebook from your terminal or Anaconda Navigator. This will open up a browser window showing the Notebook Dashboard.

The Dashboard allows you to create new notebooks or open existing ones. Notebook files have a .ipynb extension. Let’s create a new Python 3 notebook for our data analysis project.

Importing Data

We will use the popular Titanic dataset for this tutorial. To import the data, we run the following code in a code cell:

import pandas as pd

titanic = pd.read_csv('titanic.csv')

This loads the titanic.csv file into a Pandas DataFrame. We can check the header rows using:

titanic.head()

And get summary statistics of the DataFrame using:

python
titanic.info()

Data Cleaning

Now we can start cleaning the data. Let's fill missing age values with the mean age:

python
mean_age = titanic["Age"].mean()
titanic["Age"] = titanic["Age"].fillna(mean_age)

And drop unnecessary columns:

python
titanic = titanic.drop(['PassengerId', 'Name', 'Ticket'], axis=1)

Data Analysis 

The data is ready for analysis. Let's explore the relationship between age and survival:

python
import seaborn as sns

sns.FacetGrid(data=titanic, hue="Survived", height=6).map(sns.kdeplot, "Age").add_legend()

This generates a nice plot visualizing survivorship across different ages.

We can also use Pandas groupby to analyze survival rate by passenger class:

python
titanic[['Pclass', 'Survived']].groupby(['Pclass']).mean()

And so on for other insights. Jupyter Notebook makes exploring the data very interactive.

Conclusion

In this article, we saw how Jupyter Notebook provides a powerful yet easy way to do data cleaning, analysis and visualization. The ability to combine code, results, plots and narrative text helps make the analysis an engaging story. While our example used the Titanic dataset, the methods generalize to any dataset. Jupyter Notebook helps speed up the cycle of exploration and analysis, allowing deeper data insights.

Tags: