Analysis in R
Analysis of the Difference in Use of Cyclistic Bikes between
Members and Casual Riders (2022 First Quarter)
Frederick Eze
Objective
The objective of this business task is to identify how annual members and casual riders use cyclistic bikes
differently.
Data Source
The dataset used for analysis consisted of data of each cyclistic bike ride, collected and stored on the cyclistic
database.
Data Cleaning and Manipulation
Importing the Files This was done using R Studio and began with loading the tidyverse library and
importing the csv files and assigning them to a variable name
library(tidyverse)
ctd_2022_01 <- read_csv("cyclistic_trip_data_2022/cyclistic_trip_data_2022_01/cyclistic_trip_data_2022_0
ctd_2022_02 <- read_csv("cyclistic_trip_data_2022/cyclistic_trip_data_2022_02/cyclistic_trip_data_2022_0
ctd_2022_03 <- read_csv("cyclistic_trip_data_2022/cyclistic_trip_data_2022_03/cyclistic_trip_data_2022_0
Joining the Tables Then joining the tables using the full_join function to join the three tables together
join1<-full_join(ctd_2022_01,ctd_2022_02)
ctd_2022<-full_join(join1,ctd_2022_03)
Mutate The mutate function was used to add three columns; ride_length to calculate the duration of
the ride, ride_date to seperate the start date from the time in the started_at column and day_of_week to
record the day of the week that the ride started
ctd_2022<-mutate(ctd_2022, ride_length=difftime(ended_at, started_at, units = "mins"), ride_date=ctd_202
ctd_2022<-mutate(ctd_2022, day_of_week=wday(ctd_2022$ride_date, label = TRUE))
head(ctd_2022)
##
##
##
##
##
##
##
##
##
##
# A tibble: 6 x 16
ride_id
rideable_type started_at
ended_at
1 C2F7DD78E82EC875 electric_bike-:59:-:02:44
2 A6CF8980A652D272 electric_bike-:41:-:46:17
3 BD0F91DFF741C66D classic_bike-:53:-:58:01
4 CBB80ED- classic_bike-:18:-:33:00
5 DDC963BFDDA51EEA classic_bike-:31:-:37:12
6 A39C6F6CC0586C0B classic_bike-:48:-:51:31
# i 12 more variables: start_station_name , start_station_id ,
1
## #
## #
## #
end_station_name , end_station_id , start_lat ,
start_lng , end_lat , end_lng , member_casual ,
ride_length , ride_date , day_of_week
Filtering/Sorting and Making New Dataframes New data frames were created for the analysis of
the number of rides and the average length of the rides and were named no_of_rides and avg_ride_length.
no_of_rides<-ctd_2022 %>%
select(ride_id, member_casual, rideable_type, day_of_week) %>%
group_by(member_casual, rideable_type, day_of_week) %>%
summarise(no_rides=n_distinct(ride_id))
avg_ride_length<-ctd_2022 %>%
select(member_casual, rideable_type, ride_length, day_of_week) %>%
group_by(day_of_week, member_casual, rideable_type) %>%
summarise(avg_time=mean(ride_length))
head(no_of_rides)
New Dataframe Output
##
##
##
##
##
##
##
##
##
##
# A tibble: 6 x 4
# Groups:
member_casual, rideable_type [1]
member_casual rideable_type day_of_week no_rides
1 casual
classic_bike Sun
9368
2 casual
classic_bike Mon
8262
3 casual
classic_bike Tue
5236
4 casual
classic_bike Wed
7371
5 casual
classic_bike Thu
5982
6 casual
classic_bike Fri
4343
head(avg_ride_length)
##
##
##
##
##
##
##
##
##
##
# A tibble: 6 x 4
# Groups:
day_of_week, member_casual [3]
day_of_week member_casual rideable_type avg_time
1 Sun
casual
classic_bike- Sun
casual
docked_bike- Sun
casual
electric_bike- Sun
member
classic_bike- Sun
member
electric_bike- Mon
casual
classic_bike-
mins
mins
mins
mins
mins
mins
Analysis
Total Number of Rides From the no_of_rides data frame, the total number of rides between annual
members and casual riders was compared and it was found out that the total number of rides by annual
members was significantly higher than the total number of rides by casual riders and is shown below
ggplot(data=no_of_rides)+
geom_col(mapping = aes(x=member_casual, y=no_rides, fill=member_casual))+
labs(title = "Total Number of Rides of Member and Casual Riders", x="Member or Casual Rider", y="Numbe
2
Total Number of Rides of Member and Casual Riders
Number of Rides
3e+05
member_casual
2e+05
casual
member
1e+05
0e+00
casual
member
Member or Casual Rider
Total Number of Rides on Each Bike Further analysis showed that annual members on average take
more trips on the classic bikes followed by the electric bikes with no ride recorded on the docked bikes.
Analysis also showed that on average, casual riders take the most trips on the electric bikes followed by the
classic bikes with the docked bikes being used the least.
ggplot(data=no_of_rides)+
geom_col(mapping = aes(x=member_casual, y=no_rides, fill=member_casual))+
facet_wrap(~rideable_type)+
labs(title = "Number of Rides on Each Bike", subtitle = "The total number of rides on each bike by mem
3
Number of Rides on Each Bike
The total number of rides on each bike by member and casual riders
classic_bike
docked_bike
electric_bike
200000
Number of Rides
150000
member_casual
100000
casual
member
50000
0
casual
member
casual
member
casual
member
Member or Casual
Number of Rides Per Day of Week Analysis showed that for annual members, the average number
of rides on both the classic and electric bike during the week is higher than during the weekends while for
casual riders, on the electric bike,the average number of rides remain steady during the week with a drop on
Fridays, the classic bikes seem to be more active on the weekends and see a decline during the week while
docked bikes seem to remain steady
ggplot(data = no_of_rides)+
geom_col(mapping = aes(x=day_of_week, y=no_rides, fill=member_casual))+
facet_grid(rideable_type~member_casual)+
labs(title = "Number of Rides on Each Bike Per Day of Week", subtitle = "The total number of rides on
theme(axis.text.x = element_text(angle = 45))
4
Number of Rides on Each Bike Per Day of Week
The total number of rides on each bike per day of week by member and casual riders
casual
member
classic_bike
-
0
30000
docked_bike
Number of Rides
10000
-
member_casual
casual
member
0
electric_bike
-
i
Sa
t
Fr
Th
u
W
ed
Tu
e
n
on
M
Su
t
Sa
i
Fr
Th
u
W
ed
Tu
e
on
M
Su
n
0
Day of the Week
Average Ride Length From the avg_ride_length dataframe,the average duration of rides between annual
members and casual riders was compared and it was found out that the average duration of rides by casual
riders was significantly longer than the average duration of rides by annual members and is shown below
ggplot(data = avg_ride_length)+
geom_col(mapping = aes(x=member_casual, y=avg_time, fill=member_casual))+
labs(title = "Average Ride Length of Member and Casual Riders", x="Member or Casual", y="Average Time
5
Average Ride Length of Member and Casual Riders
1250
Average Time (mins)
1000
750
member_casual
casual
member
500
250
0
casual
member
Member or Casual
Average Ride Length on Each Bike Further analysis showed that casual riders had a significantly
longer ride duration on docked bikes than the other two, followed by the classic bikes with electric bikes
having the shortest ride duration. It also showed that on average, annual members take longer trips on classic
bikes than electric bikes but with no significant difference.
ggplot(data = avg_ride_length)+
geom_col(mapping = aes(x=member_casual, y=avg_time, fill=member_casual))+
facet_wrap(~rideable_type)+
labs(title = "Average Ride Length on Each Bike", subtitle = "The average ride length on each bike by m
6
Average Ride Length on Each Bike
The average ride length on each bike by member and casual riders
classic_bike
docked_bike
electric_bike
Average Time (mins)
750
member_casual
500
casual
member
250
0
casual
member
casual
member
casual
member
Member or Casual
Average Ride Length Per Day of Week for Each Bike Analysis showed that the average ride duration
of casual riders on classic and electric bikes were relatively the same throughout the week with little fluctuation
while on docked bikes, Thursday was higher than the rest of the days which might be influenced by factors
unknown. The average ride duration of annual members on classic and electric bikes were also relatively the
same throughout the week with little fluctuation.
ggplot(data = avg_ride_length)+
geom_col(mapping = aes(x=day_of_week, y=avg_time, fill=member_casual))+
facet_grid(rideable_type~member_casual)+
labs(title = "Average Ride Length on Each Bike Per Day of Week", subtitle = "The average ride length o
theme(axis.text.x = element_text(angle = 45))
7
Average Ride Length on Each Bike Per Day of Week
The average ride length on each bike per day of week by member and casual riders
casual
member
classic_bike
150
100
0
150
docked_bike
Average Time (mins)
50
100
50
member_casual
casual
member
0
electric_bike
-
t
i
Sa
u
Fr
Th
W
ed
Tu
e
n
on
M
Su
t
i
Sa
u
Fr
Th
W
ed
Tu
e
M
Su
n
on
0
Day of the Week
Conclusion
• Annual members take more rides than casual riders
• Annual members take more rides during the week while casual riders take more trips during the weekend
• Annual members take the most trips on classic bikes while casual riders take the most trips on electric
bikes
• Only casual riders use docked bikes
• Casual riders take longer trips than annual members
• Casual riders use docked bikes for significantly longer trips
8