Retail Sales Alysis
Analysing Sales Data To Optimize Product Inventory
Management
BY
Olaseni Oyefejo
December, 2023
Table of Content
Introduction
Tools used
Skills
Data sources
Data Preprocessing
Exploratory Data Analysis
Analysing the data
Sales Forecasting
Insights and Recommendations
Reflections and Conclusions
INTRODUCTION
This project is a real world case study scenario within the retail industry. A retail chain seeks
actionable insights to optimize their product inventory management strategy.
TOOLS USED
1. Microsoft Excel
2. Microsoft Power BI
SKILLS
1.
2.
3.
4.
5.
Extract, Load and Transform (ETL)
Data visualisation
Data analysis
Statistical analysis
Problem solving
DATA SOURCES
The dataset was shared with participants of the 20DaysOfDataChallenge organised by Code
Vixen Academy. It consists of 307,646 rows and 9 columns.
Snapshot of dataset:
Link to dataset:
Codevixens Data Challenge Project Dataset
DATA PREPROCESSING
I downloaded the data in an excel format and made a copy which I began work on. I made the
following decisions in the preprocessing of the data:
1. Blanks:
● I replaced blank rows in the suppliers column with “open market purchase”
● Replaced blank rows in item type column with type “wine” It had similar item description
with those in the wine category
● 3 rows that had a blank retail sales were replaced with 0. It is relatively small. The
supplier was blank and the warehouse sale and retail transfers had 0
2. Outliers: Lots of outliers were present in the dataset. This was expected since after all its
a sales distribution cutting across different years and item types. However, some
appeared to be completely out of line, out of their league and were treated appropriately
as follows:
● Retail sales column: Two rows with the item description “ice” were replaced with the
average of ice types “688”. These values were way higher than other values in the ice
category
● Warehouse sales: A row with item description “istore training overview” was removed.
This didn't seem like an item and it was the only one in its category which is unusual in a
data of about 30,000 rows
● The retail sale of a row with item description “street supplies” was replaced with the
average of street supplies category “7.01”
3. Split texts:
● Splitted the item description column into description and unit
4. Calculated columns:
● Created a date column which combines the month and year column for an efficient time
analysis
P.S. The data had no duplicated rows.
Here is a snapshot of the dataset after undergoing the ETL processes:
EXPLORATORY DATA ANALYSIS
To further understand the data, I conducted an exploratory data analysis (EDA) on the Retail
sales. I implemented the Pivot tables sn scatter plot to do this.
Row
Labels
Sum of
RETAIL
SALES
Min of
RETAIL
SALES2
Max of
RETAIL
SALES2
Count of
RETAIL
SALES2
Average of
RETAIL SALES2
LIQUOR
802,691.43
-2.47
1,816.49
64,910.00
12.37
WINE
746,498.59
-6.49
813.84
187,641.00
3.98
BEER
574,220.53
-0.13
1,494.00
42,413.00
13.54
NON-AL
COHOL
31,276.31
-0.08
854.00
1,908.00
16.39
STR_SU
PPLIES
2,740.88
0.00
368.64
405.00
6.77
REF
663.63
-0.08
36.00
126.00
5.27
KEGS
0.00
0.00
0.00
10,146.00
0.00
DUNNA
GE
0.00
0.00
0.00
95.00
0.00
Grand
Total
2,158,091.37
-6.49
1,816.49
307,644.00
7.01
Median = 0.32
From the above Table and Scatter plot, we notice that the majority of the sales are distributed
between 0 and 10. The higher values we see could be based on the item type or increase in
prices over the years.
We can deduce the following from the pivot table:
● “Liquor” contributes the Highest Total sale
● “Non Alcohol” contributes the Highest average sale
● “Wine” is the most frequently purchased item
I further explored the retail sales based on the Item types using the box plots
This reveals the reason behind those outliers we noticed in the scatter plot. Liquor has the
highest incremental values (Outliers) followed by Beer, Non-Alcohol and Wine.
ANALYSING THE DATA
To investigate factors affecting the retail sales, a scatter plot was plotted between Retail sales
and retail transfers.
As observed in the plot above, As the Retail transfers increase, the Retails sales increases. This
indicates a positive correlation between the Retail sales and the Retail transfers irrespective of
the Item type.
Then i wanted to investigate if it was the same thing with the wholesales sales.
Alas! The above was what I got. The trendline shows a positive correlation as well, however, the
variability at the right and left warrant further investigation which will not be covered in this
project due to time constraint. I however was curious about the item types that correlated. I
filtered the item type to observe each type and discovered that “Beer” and “Wine” were the only
item types that had a positive correlation.
SALES FORECASTING
Next i decided to study the trend of the average sales over the years and make a 3 year
forecast
The chart shows the fall in average sales in 2018. After which there was an increase of 19% in
2019 and an increase of 12% in 2020. As a result of this trend, it is forecasted that there will be
a decrease of 8% in 2021, 2022 and 2023. It further gave an upper bound of 11% increase and
a lower bound of 26% decrease. These are the forecasted worst and best case scenarios.
This forecast was made with a confidence interval of 95%.
DATA VISUALISATION
This is a visualisation summarising the Retails sales data. The retail sales makes up about 21%
of the total sum of sales while the Warehouse sales accounted for the rest. About 186,000
counts of items were sold, 122,000 items remained unsold and 113 items suffered losses. It also
shows the top 10 suppliers frequented by count of retail sales made.
INSIGHTS AND RECOMMENDATIONS
1. Correlation Analysis: The perfect positive correlation between retail sales and retail
transfers suggests that as one increases, the other also increases consistently. This
indicates a strong relationship between sales and the transfer of products between retail
locations. Evaluate the efficiency of the transfer process in response to changing sales
patterns. Ensure that transfers are dynamic and responsive to shifts in demand.
2. The decreasing sales forecast warrants the evaluation of the factors contributing to the
decline. Explore the reasons behind the sales forecast decrease. Identify any shifts in
customer preferences, changes in market conditions, or external factors influencing
demand. This insight is crucial for adjusting inventory levels and product assortments.
3. Collaboration with suppliers: Strengthen collaboration with suppliers to establish flexible
and responsive supply chains. Discuss the possibility of adjusting order quantities and
delivery schedules based on changing retail sales patterns.
4. Optimise Warehouse sales: Among the item types, Wine and Beer retail sales had a
positive correlation with their corresponding Warehouse sales. There is a need to
optimise the warehouse sales of these items to ensure that warehouse inventory aligns
with their expected retail demand.
5. Consider implementing targeted promotions and marketing strategies to stimulate retail
sales. A decline in the sales forecast may be addressed through initiatives to attract
customers and drive sales.
6. Embrace agile supply chain practices that allow for quick adjustments in inventory levels.
This may involve implementing just-in-time inventory management or utilizing
demand-driven replenishment strategies.
7. Establish a continuous monitoring system to track the effectiveness of implemented
strategies. Regularly update and adapt inventory management practices based on
ongoing sales trends and market dynamics.
8. Finally, there is the need to implement a more robust model that incorporates relevant
variables, market trends, and seasonality.
CONCLUSION AND REFLECTION
This was a very insightful project and as I analysed the data, I discovered more aspects that
needed broader, deeper and further investigation which I would make further research on but
cannot do at the moment due to time constraint. 307,645 rows were analysed however
emphasis was on the Retail sales data.
Aside from time constraints, I encountered the issue of delay in queries load time due to the
large volume of the data.
In further analysis, I would normalise the dataset by creating smaller tables with each of the item
types being a table on its own. I would also add more pages, bookmarks and more features to
the dashboard to make it more interactive for the user.