Demand and Supply comparison for rides
ber-demand-and-supply-gap-analysis
August 8, 2024
[1]: import
import
import
import
numpy as np
pandas as pd
matplotlib.pyplot as plt
seaborn as sns
[2]: uber=pd.read_csv(r"D:\PYTHON\COURSE-3-EDA&STASTISTICS\ASSIGNMENT\Uber Request␣
↪Data.csv")
uber.head()
[2]:
0
1
2
3
4
Request id Pickup point
619
Airport
867
Airport
1807
City
2532
Airport
3112
City
0
1
2
3
4
Drop timestamp
11/7/2016 13:00
11/7/2016 18:47
12/7/2016 9:58
12/7/2016 22:-:25:47
Driver id-
Trip
Trip
Trip
Trip
Trip
[3]: uber.info()
RangeIndex: 6745 entries, 0 to 6744
Data columns (total 6 columns):
Request id
6745 non-null int64
Pickup point
6745 non-null object
Driver id
4095 non-null float64
Status
6745 non-null object
Request timestamp
6745 non-null object
Drop timestamp
2831 non-null object
dtypes: float64(1), int64(1), object(4)
memory usage: 316.2+ KB
1
Status
Completed
Completed
Completed
Completed
Completed
Request timestamp
11/7/2016 11:51
11/7/2016 17:57
12/7/2016 9:17
12/7/2016 21:-:33:16
\
[4]: #checking for any duplicate values in'Request id'
sum(uber.duplicated(subset='Request id'))==0
[4]: True
[5]: #checking for null values column wise
uber.isnull().sum()
[5]: Request id
Pickup point
Driver id
Status
Request timestamp
Drop timestamp
dtype: int64
-
[6]: uber['Request timestamp']=uber['Request timestamp'].astype(str)
uber['Request timestamp']=uber['Request timestamp'].str.replace('/','-')
uber['Request timestamp']=pd.to_datetime(uber['Request␣
↪timestamp'],dayfirst=True)
[7]: uber['Request timestamp'].head()
[7]:-:51:-:57:-:17:-:08:-:33:16
Name: Request timestamp, dtype: datetime64[ns]
[8]: uber['Drop timestamp']=pd.to_datetime(uber['Drop timestamp'],dayfirst=True)
uber['Drop timestamp'].head()
[8]:-:00:-:47:-:58:-:03:-:25:47
Name: Drop timestamp, dtype: datetime64[ns]
0.0.1
converted ‘Request timestamp’ and ‘Drop timestamp’ into the required format
using pd.to_datetime
[9]: plt.figure(figsize=(4,6))
sns.countplot(x="Status",data=uber)
plt.show()
2
[10]: uber['Status'].value_counts()
[10]: Trip Completed
No Cars Available
Cancelled
Name: Status, dtype:
1
-
int64
count of requests is plotted against ‘Status’.It is apparent that
combined total of ‘no cars available’ and ‘cancelled’ is more than
‘trips completed’
[11]: plt.figure(figsize=(2,4))
sns.countplot(x="Pickup point",data=uber)
plt.show()
3
[12]: uber['Pickup point'].value_counts()
[12]: City
3507
Airport
3238
Name: Pickup point, dtype: int64
2
count of requests is plotted against Pickup points,number of
request differ by less amount.
[13]: uber.head()
[13]:
0
1
2
3
4
0
1
2
3
Request id Pickup point
619
Airport
867
Airport
1807
City
2532
Airport
3112
City
Driver id-
Trip
Trip
Trip
Trip
Trip
Drop timestamp-:00:-:47:-:58:-:03:00
4
Status
Completed
Completed
Completed
Completed
Completed
Request timestamp-:51:-:57:-:17:-:08:-:33:16
\
-:25:47
[14]: uber['pickup hour']=uber['Request timestamp'].dt.hour
[ ]:
3
created different column hour of day to facilitate analysis
[15]: uber.info()
RangeIndex: 6745 entries, 0 to 6744
Data columns (total 7 columns):
Request id
6745 non-null int64
Pickup point
6745 non-null object
Driver id
4095 non-null float64
Status
6745 non-null object
Request timestamp
6745 non-null datetime64[ns]
Drop timestamp
2831 non-null datetime64[ns]
pickup hour
6745 non-null int64
dtypes: datetime64[ns](2), float64(1), int64(2), object(2)
memory usage: 368.9+ KB
4
now analyzing problems for uber
[16]: plt.figure(num=None, figsize=(10,8), dpi=100 )
sns.barplot(x='pickup hour',y='Request id',data=uber,estimator=len)
plt.title("Number of request per hour")
plt.ylabel("count of Request id")
plt.show()
C:\New folder\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using
a non-tuple sequence for multidimensional indexing is deprecated; use
`arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted
as an array index, `arr[np.array(seq)]`, which will result either in an error or
a different result.
return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
5
4.1
from the graph it is evident that max requests are made between 0500
hours-1000 hours in the morning and 1700 hours -2200 hours in the evening
[17]: plt.figure(figsize=(5,8))
fig1=sns.countplot(x='Status',hue='Pickup point',data=uber)
fig1.set_title("Frequency of trips according to pickup points")
ncount=len(uber)
for p in fig1.patches:
x=p.get_bbox().get_points()[:,0]
y=p.get_bbox().get_points()[1,1]
fig1.annotate((y),(x.mean(),y),ha='center',va='bottom')
plt.show()
6
7
4.1.1
we can see the relative proportion of trips ie when the trip is completed,when
4.1.2
it is cancelled and when there was no car available for both city and airport
4.1.3
while ‘trips completed’ status is greater in city and ‘no cars available’ status is
higher in
4.1.4
airport location.
[18]: uber['Timeslot']=pd.cut(uber['pickup␣
↪hour'],bins=[0,3,10,15,20,22,24],labels=["Early␣
↪Morning","Morning","Afternoon","Evening","Night","Late Night"])
4.2
using pd.cut function to create a new column ‘Timeslot’ from ‘pickup hour’
4.3
and created bins for different timeslots of morning etc according to time.
[19]: uber.head()
[19]:
0
1
2
3
4
0
1
2
3
4
Request id Pickup point
619
Airport
867
Airport
1807
City
2532
Airport
3112
City
Drop timestamp-:00:-:47:-:58:-:03:-:25:47
Driver id-
pickup hour-
Trip
Trip
Trip
Trip
Trip
Status
Completed
Completed
Completed
Completed
Completed
Request timestamp-:51:-:57:-:17:-:08:-:33:16
Timeslot
Afternoon
Evening
Morning
Night
Morning
[20]: plt.figure(figsize=(8,9),dpi=100)
fig2=sns.countplot(x='Timeslot',hue='Status',data=uber)
fig2.set_title("Frequency of trips according to diff timeslots")
ncount=len(uber)
for p in fig2.patches:
x=p.get_bbox().get_points()[:,0]
y=p.get_bbox().get_points()[1,1]
fig2.annotate((y),(x.mean(),y),ha='center',va='bottom')
plt.show()
8
\
4.4
graph showing relative proportion of trips according to timeslots
4.5
no cars available is high during evening (from 1500 hours -2000 hours) and
4.6
cancellation and trips completed is high during morning(from 0300 hours
-1000 hours)
[21]: hour_pivot_df=uber.pivot_table(values=['Request id'],index=['pickup␣
↪hour','Pickup point','Status'],aggfunc='count')
hour_pivot_df
9
[21]:
Request id
pickup hour Pickup point Status
0
Airport
No Cars Available
Trip Completed
City
Cancelled
No Cars Available
Trip Completed
1
Airport
No Cars Available
Trip Completed
City
Cancelled
No Cars Available
Trip Completed
2
Airport
No Cars Available
Trip Completed
City
Cancelled
No Cars Available
Trip Completed
3
Airport
No Cars Available
Trip Completed
City
Cancelled
No Cars Available
Trip Completed
4
Airport
Cancelled
No Cars Available
Trip Completed
City
Cancelled
No Cars Available
Trip Completed
5
Airport
Cancelled
No Cars Available
Trip Completed
City
Cancelled
…
18
City
Trip Completed
19
Airport
Cancelled
No Cars Available
Trip Completed
City
Cancelled
No Cars Available
Trip Completed
20
Airport
Cancelled
No Cars Available
Trip Completed
City
Cancelled
No Cars Available
Trip Completed
21
Airport
Cancelled
10
-
…-
City
22
Airport
City
23
Airport
City
No Cars Available
Trip Completed
Cancelled
No Cars Available
Trip Completed
Cancelled
No Cars Available
Trip Completed
Cancelled
No Cars Available
Trip Completed
No Cars Available
Trip Completed
Cancelled
No Cars Available
Trip Completed
-
[139 rows x 1 columns]
4.7
hourwise breakup of ‘status’ ie trips completed,cancellations and no cars
available for both c’pickup points’ ie city and airport
[22]: #now plotting for requests per hour according to status
plt.figure(figsize=(12,10),dpi=100,facecolor='w',edgecolor='k')
sns.barplot(x='pickup hour',y='Request id',hue='Status',data=uber,estimator=len)
plt.title("no of requests per hour according to status")
plt.ylabel("count of Request id")
plt.show()
11
4.8
from the graph it is evident that max no of requests that are made between
5-10 am are cancelled and max no of requests between 5-10 pm in night
show ‘no cars available’
[23]: #plot of no. of requests from city to airport/airport to city
plt.figure(figsize=(12,10),dpi=80,facecolor='y',edgecolor='k')
sns.barplot(x='pickup hour', y='Request id', hue='Pickup point',data=uber,␣
↪estimator=len)
plt.title("frequency of request from city-airport/airport-city")
plt.ylabel("count of Request id")
plt.show()
12
4.9
no of request from city to airport is max between 5-10 am in the morning
and no of requests from airport to city is max between 1700 hours-2200
hours
[24]: #plots of frquency of request
plt.figure(figsize=(14,10),dpi=80,facecolor='y',edgecolor='k')
plt.subplot(1,2,1)
sns.barplot(x='Timeslot',y='Request id',hue='Status',data=uber[(uber['Pickup␣
↪point']=='Airport')&
(uber['Status']!='Trip Completed')],estimator=len)
plt.title("AIRPORT")
plt.ylabel("Count of Request id")
plt.subplot(1,2,2)
sns.barplot(x='Timeslot',y='Request id',hue='Status',data=uber[(uber['Pickup␣
↪point']=='City')&
(uber['Status']!='Trip Completed')],estimator=len)
13
plt.title("CITY")
plt.ylabel("Count of Request id")
plt.show()
4.10
creating demand and supply and gap between them.Demand= Trips completed+cancellation+no cars available and supply is the trips which were
completed . So supply=Trips completed
[25]: # creating demand ,supply and gap between them according to timeslots
df=pd.DataFrame(uber.groupby(by='Timeslot',as_index=False)['Request id'].
↪count())
df.columns.values[1]='Demand'
df_1=uber[uber.Status=='Trip Completed'].
↪groupby(by='Timeslot',as_index=False)['Request id'].count()
df_1.columns.values[1]='Supply'
df['Supply']=df_1['Supply']
df['Gap']=df['Demand']-df['Supply']
[26]: df
14
[26]:
0
1
2
3
4
5
Timeslot
Early Morning
Morning
Afternoon
Evening
Night
Late Night
Demand-
Supply-
Gap-
[27]: # plot showing demand ,supply and gap between them according to timeslot
df.plot(x='Timeslot', y=['Demand','Supply','Gap'], kind='bar',figsize=(14,12),␣
↪color=['blue','red','green'],
title="Demand Supply and Gap according to timeslots")
plt.show()
15
4.11
from the graph it is clear that a huge gap exists between demand and
supply in the ’morning and ’evening timeslots
[28]: #creating columns for demand,suply and gap according to timeslots for airport
df_Airport=uber[uber['Pickup point']=='Airport']
df_A=pd.DataFrame(df_Airport.groupby(by='Timeslot', as_index=False)['Request␣
↪id'].count())
df_A.columns.values[1]='Demand'
df_2=df_Airport[df_Airport.Status=='Trip Completed'].
↪groupby(by='Timeslot',as_index=False)['Request id'].count()
df_2.columns.values[1]='Supply'
df_A['Supply']=df_2['Supply']
df_A['Gap']=df_A['Demand']-df_A['Supply']
[29]: df_A
[29]:
0
1
2
3
4
5
Timeslot
Early Morning
Morning
Afternoon
Evening
Night
Late Night
Demand-
Supply-
Gap-
[30]: #plot showing demand ,supply and gap between them for airport
df_A.
↪plot(x='Timeslot',y=['Demand','Supply','Gap'],kind='bar',figsize=(12,10),color=['green','blu
title="Demand, Supply and gap between requests for Airport to city")
plt.show()
16
4.12
it is evident from the graph that the gap is highest for evening and night
timeslots
[31]: # now creating demand ,suopply and gap between them for city
df_City=uber[uber['Pickup point']=='City']
df_C=pd.DataFrame(df_City.groupby(by='Timeslot', as_index=False)['Request id'].
↪count())
df_C.columns.values[1]='Demand'
df_3=df_City[df_City.Status=='Trip Completed'].
↪groupby(by='Timeslot',as_index=False)['Request id'].count()
df_3.columns.values[1]='Supply'
df_C['Supply']=df_3['Supply']
df_C['Gap']=df_C['Demand']-df_C['Supply']
[32]: df_C
17
[32]:
0
1
2
3
4
5
Timeslot
Early Morning
Morning
Afternoon
Evening
Night
Late Night
Demand-
Supply-
Gap-
[33]: #plot showing demand ,supply and gap between them for city
df_C.
↪plot(x='Timeslot',y=['Demand','Supply','Gap'],kind='bar',figsize=(12,10),color=['yellow','bl
title="Demand, Supply and gap between requests from city to airport")
plt.show()
18
4.13
it is evident from the graph that maximum of requests in the morning
timeslot go unmet and hence a huge gap can be seen there
[ ]:
[34]: uber
[34]:-
…-
Request id Pickup point
619
Airport
867
Airport
1807
City
2532
Airport
3112
City
3879
Airport
4270
Airport
5510
Airport
6248
City
267
City
1467
Airport
1983
City
2784
Airport
3075
City
3379
City
3482
Airport
4652
City
5335
Airport
535
Airport
960
Airport
1934
Airport
2083
Airport
2211
Airport
3096
Airport
3881
Airport
5254
City
5434
City
5916
City
669
City
1567
Airport
…
…
6683
City
6686
Airport
6688
Airport
6689
Airport
6693
City
6696
City
6697
Airport
6709
Airport
Driver id-
…
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
19
No
No
No
No
No
No
No
No
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
Trip
…
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Status
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Available
Available
Available
Available
Available
Available
Available
Available
\
-
-
-
Airport
City
City
City
City
Airport
City
Airport
Airport
City
Airport
Airport
Airport
Airport
City
City
City
City
Airport
City
City
Airport
Request timestamp-:51:-:57:-:17:-:08:-:33:-:57:-:15:-:11:-:57:-:46:-:08:-:30:-:49:-:02:-:23:-:23:-:01:-:24:-:00:-:45:-:17:-:46:-:00:00
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
Drop timestamp-:00:-:47:-:58:-:03:-:25:-:28:-:13:-:07:-:50:-:25:-:02:-:57:-:23:-:16:-:35:-:20:-:36:-:18:-:31:-:23:-:23:-:40:-:28:00
20
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Cars
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
Available
pickup hour-
Timeslot
Afternoon
Evening
Morning
Night
Morning
Night
Morning
Morning
Evening
Morning
Morning
Afternoon
Morning
Morning
Afternoon
Evening
Afternoon
Night
Morning
Evening
Afternoon
Afternoon
Evening
-
…-
-
08:17:29
21:54:18
21:23:03
02:41:38
10:00:43
13:08:00
06:21:00
…
22:34:01
22:36:13
22:37:37
22:43:46
22:49:33
22:49:45
22:51:18
22:56:00
22:58:15
23:03:23
23:08:14
23:11:41
23:14:36
23:14:39
23:16:48
23:18:21
23:21:53
23:26:50
23:27:55
23:35:50
23:39:15
23:42:51
23:43:54
23:46:03
23:46:20
23:49:03
23:50:05
23:52:06
23:54:39
23:55:03
-
09:22:37
22:51:23
22:25:19
03:24:43
10:53:06
13:49:00
07:10:00
…
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
NaT
-
…-
Morning
Night
Night
Early Morning
Morning
Afternoon
Morning
…
Night
Night
Night
Night
Night
Night
Night
Night
Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
Late Night
[6745 rows x 8 columns]
[40]: df_7=pd.crosstab([uber['Pickup␣
↪point'],uber['Status']],uber['Timeslot'],margins=True)
df_7
[40]: Timeslot
Pickup point Status
Early Morning
21
Morning
Afternoon
Evening
\
Airport
City
Cancelled
No Cars Available
Trip Completed
Cancelled
No Cars Available
Trip Completed
-
All
Timeslot
Pickup point Status
Airport
Cancelled
No Cars Available
Trip Completed
City
Cancelled
No Cars Available
Trip Completed
All
-
Night
Late Night
All
-
-
-
[ ]:
[ ]:
22
-
-