Power point deck for demand forecasting
Demand forecasting
By Apurv Jain
1. Introduction
IArt of predicating demand for a product
or service at some future date on the
basis of present and past behaviour of
some related event
Benefits of Demand Forecasting
Planning and
decision making
Demand forecasting
helps businesses
maintain the right level
of inventory to meet
customer demand
without overstocking or
understocking.
Business forecasting
Accurate demand
forecasts help
optimize resource
allocation across
the organization.
Production analysis
Production analysis
driven by demand
forecasts can lead to
cost reduction
Types of Demand Forecasting
Planning and
decision making
Buiness forecasting
Short Term
Production analysis
Medium Term
Long Term
Types of Demand Forecasting
Short Term
Planning and
decision making
Medium Term
Long Term
Concerned with short time period usually less than a year
Buiness forecasting
Production analysis
Needed When a company is considering expanding or
modifying its production facilities,
Needed for the capacity expansion like growth of the firm,
recruitment and diversification policies., usually more than
3-5 years
Problem Statement
Create a forecast for what the visits would look like for the next year
based on the historic data points
Solution
Read the data containing 2 columns ‘Date’ and ‘selected period’ and
119 rows in Python jupyter notebook
Performed data manipulations, Exploratory data analysis and feature
engineering
Built a demand forecasting model using XGBoost and conducted
hyperparameter tuning to get better accuracy
Understanding the Time Series plot
The data consist of a weekly data that spans from Dec 26, 2011 to Mar 31, 2014 containing
random spikes and random dip across the whole time series.
It can be assessed that there is no cyclic pattern i.e there is no seasonality in the data
which is further strengthened by acf plot using statsmodel
Understanding the Time Series plot
The figures showing the distribution of data thru histogram and box plot and it can be clearly seen that
there are some outliers which have to be taken care of
Analysis of Trend and Seasonality
´
´The trend of the data, which has a zig
zag pattern i.e first it increases then
decreased and again increases
figure showing the acf plot of the data
which tells us that there is no
seasonality in the data as there as no
repetitive spikes in the data
Feature Engineering
´Carried out feature engineeing to create features such as quarter, month , year
based on the date to get more accurate predictions
´df["day_of_week"] = df["Date"].dt.dayofweek
´df["day_of_year"] = df["Date"].dt.dayofyear
´df['month_year'] = df['Date'].dt.to_period('M')
´df["quarter"] = df["Date"].dt.quarter
´df["year"] = df["Date"].dt.year
Analysis of distribution
´figure showing the trend as per the different months
and diff years, but no general trend can be seen, the
graphs show zigzag trend.
Model building using XGBoost
●
Divided the data into X and Y variable and splitted the data into train and test data
with train size = 90 rows and test size = 24 rows.
X variables= "day_of_year","month_year","quarter","year"
Y variable = "Selected Period"
●
Used XGBoost to build a demand forecasting model as it is known for its high
predictive accuracy. It can capture complex relationships between demand and
various factors, making it suitable for accurately forecasting demand even in situations
with intricate patterns.
Model building using XGBoost
●
Conducted hyper parameter tuning to get more accurate predictions
cv_split = TimeSeriesSplit(n_splits=4, test_size=10)
model = XGBRegressor()
parameters = {
"max_depth": [3, 4, 6, 5, 10],
"learning_rate": [0.01, 0.05, 0.1, 0.2, 0.3],
"n_estimators": [100, 300, 500, 700, 900, 1000],
"colsample_bytree": [0.3, 0.5, 0.7]
grid_search = GridSearchCV(estimator=model, cv=cv_split, param_grid=parameters)
grid_search.fit(X_train, y_train)
Model Evaluation
MAE:-
MSE:-
MAPE:-
As the data is very less (119 rows), after removing outliers the training data has
90 rows which is very less for an algorithm to learn the trend between variables
and predict the output with greater accuracy, so that’s why the actual and
predicted output differ by a great amount and hence the higher value for
errors.