We will look at the distribution of students across gender, race/ethnicity, their lunch status, and whether they have a test preparation course or not. We will be using a bar graph for this purpose. We will now proceed to analyze this dataset, observe patterns, and identify outliers with the help of graphs and figures. You can download the dataset for your reference.įortunately for us, there are no missing values in this dataset. We will now read the data from a CSV file into a Pandas DataFrame. These include NumPy, Pandas, Matplotlib, and Seaborn. We will start by importing the libraries we will require for performing EDA. We will use Python language ( Pandas library) for this purpose. In this article, we will understand EDA with the help of an example dataset. EDA is the process of investigating the dataset to discover patterns, and anomalies (outliers), and form hypotheses based on our understanding of the dataset.ĮDA involves generating summary statistics for numerical data in the dataset and creating various graphical representations to understand the data better. This article was published as a part of the Data Science BlogathonĮxploratory Data Analysis, or EDA, is an important step in any Data Analysis or Data Science project.
0 Comments
Leave a Reply. |