Zomato Data Analytics and Visualization PT2
In this part, we will plunge into data analytics and answer some business questions from this dataset.
DataSet
The first thing to work on in this project is to get the datasets and practice with the ipynb file (jupyter notebook), you can get all of this here.
Data Exploration Process
- Load your data — This involves importing the necessary libraries on your jupyter notebook as shown here.
2. Know your data — This is very important, you need to get at least a summary of what your data is about to know the data types, either to change it before it’s served to the model or not. Then, to check for missing values.
3. Identify Missing Values — This is one of the important processes in Data Analysis. Find the missing values and this can be done easily with isnull().sum() function. You could also easily check it out.
It’s quite evident that the only function that\| has missing values is the Cuisines feature.
4. Replace missing values — It’s a great step to replace missing values with mode. According to statistics, a mode is the most occurring number, which is safe to use, instead of using median, because it gives another value to the figure. The mode of the data can easily be gotten by running df.mode()
5. Join Datasets — There are two datasets, then we need to join them, and we could use the merge function.
pd.read_excel — This loads the dataset in the notebook to the new variable, df_country.
pd.merge — help to join the two datasets, that’s the country dataset and the initial dataset we have.
‘On’ — Helps to determine where you want the column of the country to fall into with the ‘how’ function it could either be left or right.
Breaking off the Loop
Yeah, so I decided to just focus on some part of the data, which is the rating features. The rating feature is really important because it identifies the pain points of customers of Zomato.
This will easily help to make the right decisions and achieve progress.
Ratings
- Create a Table — We need to create a data frame or preferably called a table to help analyze our data appropriately. The code below helps us easily to do that.
The groupby() — helps to group different columns or features together
The size() — used to count the columns
reset_index() — This is used to convert the data to a dataframe format
rename(columns=) — This is used to rename columns.
Business Inferences and Conclusions
This is the time we have been waiting for, making business decisions and bringing out inferences is the most important part of data analysis.
Let’s ask some questions and use Python as a tool for data analytics to answer those questions.
Questions
- What’s the average rating from customers?
The easiest way to get insight into data is to visualize the data. I visualized the data using the python seaborn library using “Aggregate rating” and the “rating count” — which means the number of people that did the ratings. It’s clearly evident from the graph that the people with 0 ratings are the highest and the average rating was 2.8 to 3.7, which shows that most of the ratings were between 3 to 4 stars.
2. What are the countries that gave 0 rating?
We want to know the countries that gave 0 rating. The countries with the higherst 0 rating is India.
3. The number of people using a different currency
This will help the country, to decide which payment gateway system to use. It’s evident that Indians rupees have the highest, followed by dollars ($)
4. Which countries have Online Deliveries?
This code will answer the questions to determine the unpenetrated market in zomato company and can lead to more questions on why people don’t take online deliveries in different parts of the country.
Conclusion
I’m glad, you followed through to this extent. Data analytics has proven to be a great tool to help grow business, make informed decisions, and satisfy customers which is the basis of any company’s growth.
If you found value here, make sure you subscribe, like, clap and share.
SEE YOU LATER!!!😊😊😊