Pandas Profiling: Quick way to analyze the data.

Aditya Kumar
3 min readMar 9, 2023

--

Pandas Profiling is a python library that can help in a quicker understanding of the data. Using this library we will create an HTML page that will give a summary of the data frame, the relation among the data available, and much more.

Let's try to bring data to a data frame and after that, we will apply pandas profiling to it.

Fig 1: Importing Pandas and reading CSV.

Once we have the data frame with us, let's install Pandas profiling.

Fig 2: Installing Pandas Profiling

I have specified the version because without the version it was not working for me. Once we have the library installed, we can call it and ask the library to create an HTML file.

Fig 3: Creating the HTML file.

After the execution, we have the HTML file with us which contains all the information that we can need to understand the data.

Fig 4. Profiling HTML page.

Above is the first page of the html file, which gives very brief info about the fields of the data frame.

Fig 5: Pandas Profiling HTML page overview.

The first tab gives an overview of the total data. It shows how many fields we have and if there are any missing values. If there are correlations between any two fields.

Fig 6: Dataset Statistics
Fig 7: Correlation among data
Fig 8: Detail of every field.

This can help to understand if any field has repeating values or missing values.

Fig 9: Relation between 2 fields using graph

We can change the fields in the interaction and tab and check the relation among the fields.

Fig 10: Correlations among the data

Using this library, we can save time used for analyzing the data. This will give all the data in a graph and a proper description.

--

--

Aditya Kumar
Aditya Kumar

Written by Aditya Kumar

Data Scientist with 6 years of experience. To find out more connect with me on https://www.linkedin.com/in/adityakumar529/

No responses yet