Introduction
Exploratory Data Analysis. You hear it a lot, but what actually is it? Are there different categories of data analysis? And how do you do it? Well in this post I hope to answer those questions among more on exploratory data analysis. To start things off there are 4 major types of data analysis: Exploratory, Descriptive, Predictive, and Inferential. This post focuses on exploratory data analysis with future posts focusing on the other 3 types of data analysis (Descriptive, Predictive, Inferential). To explain exploratory data analysis I will be using the article titled “What is Exploratory Data Analysis?” on Medium written by Prasad Patil (the article can be found here).
Summary of Article
The article firsts get into the raw definition of exploratory data analysis which is(according to the article): Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations. The article displays using all the parts of the definition, via using a dataset and performing exploratory data analysis on that dataset. The article uses a multitude of functions to perform exploratory data analysis. The functions accomplish the following tasks:
- Getting to know the dataset(using the .head(), .tail(), and .shape() functions)
- Seeing if there are null values in the data(using the .info() function)
- Getting some basic statistical measures run on the dataset(using the .describe() function)
- The quality of the data(using the .quality.value_counts() function)
- The correlation between any two columns in the data(using Seaborn)
- A box and whisker plot of the data(using Seaborn)
- A distribution plot of each of the columns in the data(using Seaborn)
The article then closes out by offering an alternate – and more broad definition of exploratory data analysis: Exploratory Data Analysis is a philosophical and an artistical approach to gauge every nuance from the data at early encounter.
My Take
Overall the article was a good article. I really liked the way it was written and the insight that it offered. The article gave great visuals for each function that the article talked about. I also really liked how the article provided definitions for terms that readers might not be familiar with. Through the article, I learned that exploratory data analysis is the first part of data analysis where you get to know your dataset. the size of the dataset, the amount of data, how good that data is, if it is skewed or not if any column is correlated or not, the distribution of the columns, and much more. It really showed me how not only to perform exploratory data analysis but also introduced me to some new functions that I didn’t know existed that are extremely useful such as the .describe() function. I will be sure to use the .describe() function in my data analysis among others discussed in this article. I do still have one question, however: Where is the line between exploratory data analysis and the next phase in data analysis?
Conclusion
All in all, this article was a very good article. With its combined visuals, definitions, and its use of a real dataset to take you through the exploratory data analysis and demonstrate what the article is trying to tell you in words, it really makes for an interesting and informative read. Overall, I really recommend you read this article (the article can be found here).