Electric Car Analysis
In [30]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
In [31]: df = pd.read_excel('C:\\Users\\eribo/Downloads/ElectricCarData_v3.xlsx')
df.head()
Developed/
Car Electric/ Model Number
Region Country Developing/
Under Maker
ICE
of Seats
Developed
Out[31]:
USD
Cost
Mileage Charging
Time
(km)
(min)
Ele
North
America
USA
1
North
America
USA
Developed
2
North
America
USA
Developed
0
Developed
Electric Model
3
5
Tesla
Electric Model
3
7
-
660.0
32.0
Tesla
Electric Model
3
5
44130.0
652.0
33.0
Tesla
44130.0
640.0
31.0
a
Ih
la
3
North
America
USA
Developed
Tesla
Electric Model
3
5
44130.0
653.0
w
34.0 pe
te
4
North
America
USA
Developed
Tesla
Electric Model
3
5
44130.0
645.0
35.0
In [32]: # Display basic information about the dataset
df.info(), df.head()
b
RangeIndex: 1001 entries, 0 to 1000
Data columns (total 16 columns):
#
Column
--- -----0
Region
1
Country
2
Developed/Developing/Under Developed
3
Car Maker
4
Electric/ICE
5
Model
6
Number of Seats
7
USD Cost
8
Mileage (km)
9
Charging Time (min)
10 Comment
11 Platform
12 Theme/Factor
13 Factor Status
14 Sentiment
15 Source
dtypes: float64(4), int64(1), object(11)
memory usage: 125.2+ KB
Out[32]:
Non-Null Count
-------------1001 non-null
1001 non-null
1001 non-null
1001 non-null
1001 non-null
1000 non-null
1001 non-null
1001 non-null
1001 non-null
1001 non-null
1001 non-null
1001 non-null
564 non-null
0 non-null
510 non-null
997 non-null
Dtype
----object
object
object
object
object
object
int64
float64
float64
float64
object
object
object
float64
object
object
(None,
0
1
2
3
4
0
1
2
3
4
North
North
North
North
North
Region Country Developed/Developing/Under Developed Car Maker
America
USA
Developed
Tesla
America
USA
Developed
Tesla
America
USA
Developed
Tesla
America
USA
Developed
Tesla
America
USA
Developed
Tesla
Electric/ICE
Electric
Electric
Electric
Electric
Electric
Model
Model 3
Model 3
Model 3
Model 3
Model 3
0
1
2
3
4
Charging Time (min-
0
1
2
3
4
Platform Theme/Factor
Youtube
NaN
Youtube
NaN
Youtube
NaN
Youtube
NaN
Youtube
NaN
0
1
2
3
4
Number of Seats
5
7
5
5
5
USD Cost-
Mileage (km-
\
\
Comment
Electric drive just seems so appropriate for a...
Finally 88mph within a parking area!
I hope I'll be able to afford this later on in...
Edit: Just watched the performance tests, and ...
Honestly, this is way better than the Alpha 5....
Factor Status Sentiment
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
Source
https://www.youtube.com/watch?v=diC_U1O0YJA
https://www.youtube.com/watch?v=diC_U1O0YJA
https://www.youtube.com/watch?v=diC_U1O0YJA
https://www.youtube.com/watch?v=diC_U1O0YJA
https://www.youtube.com/watch?v=diC_U1O0YJA
\
)
\
In [33]: df.drop(columns=['Factor Status'], inplace=True)
Exploratory Data Analysis (EDA)
Distribution of USD Cost
In [34]: # Set visualization style
sns.set_theme(style="whitegrid")
# Summary statistics for numeric columns
summary_stats = df.describe()
# Count of electric vs ICE cars
car_type_counts = df["Electric/ICE"].value_counts()
# Distribution of USD Cost
plt.figure(figsize=(8, 5))
sns.histplot(df["USD Cost"], bins=30, kde=True)
plt.title("Distribution of Electric Car Prices (USD)")
plt.xlabel("Price (USD)")
plt.ylabel("Frequency")
plt.show()
# Display results
summary_stats, car_type_counts
Out[34]:
(
Number of Seats
USD Cost
count-
mean-
std-
min-%-%-%-
max-
Electric
969
PHEV
17
Hybrid
10
Unknown
2
BEV
2
Electric/ICE/Hybrid
1
Name: Electric/ICE, dtype: int64)
Mileage (km-
Charging Time (min-,
Compare average cost of Electric vs ICE vehicles
In [43]: # Compare average cost of Electric vs ICE vehicles
avg_cost_comparison = df.groupby("Electric/ICE")["USD Cost"].mean()
# Plot the comparison
plt.figure(figsize=(18, 6))
sns.barplot(x=avg_cost_comparison.index, y=avg_cost_comparison.values, palette="coolwarm")
plt.title("Average Cost of Electric vs ICE Vehicles")
plt.xlabel("Vehicle Type")
plt.ylabel("Average Price (USD)")
plt.show()
# Display numerical comparison
avg_cost_comparison
Out[43]:
Electric/ICE
BEV
Electric
Electric/ICE/Hybrid
Hybrid
PHEV
Unknown
Name: USD Cost, dtype:
-
float64
In [49]: import matplotlib.pyplot as plt
import seaborn as sns
# Set visualization style
sns.set_style("whitegrid")
# Create a figure with subplots (1 row, 2 columns)
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# Count of Electric vs. ICE cars
sns.countplot(x="Electric/ICE", data=df, palette="coolwarm", ax=axes[0])
axes[0].set_title("Count of Electric vs. ICE Cars")
axes[0].set_xlabel("Type of Vehicle")
axes[0].set_ylabel("Count")
# Count of cars by region type (Developed vs Developing)
sns.countplot(x="Developed/Developing/Under Developed", data=df, palette="viridis", ax=axe
axes[1].set_title("Number of Cars by Region Type")
axes[1].set_xlabel("Region Type")
axes[1].set_ylabel("Count")
axes[1].tick_params(axis='x', rotation=45)
# Adjust layout for better spacing
plt.tight_layout()
plt.show()
Key Insights from Exploratory Data Analysis (EDA):
1. The dataset is dominated by Electric cars (969 entries), with a few Plug-in Hybrid (PHEV), Hybrid, and
other categories.
2. The average cost of electric cars is 92,931, with a wide price range from 0 (possibly incorrect or
missing data) to 344,000.
3. Average mileage is 448 km per charge, with some models reaching up to 1,440 km.
4. Charging time varies significantly, from a few minutes to up to 1,200 minutes.
SENTIMENTS ANALYSIS
Findings on Challenges and Sentiment Toward EVs
In [39]: # Recompute sentiment distribution for each challenge type
cost_sentiment = cost_comments.groupby("Developed/Developing/Under Developed")["Sentiment
infrastructure_sentiment = infrastructure_comments.groupby("Developed/Developing/Under Dev
policy_sentiment = policy_comments.groupby("Developed/Developing/Under Developed")["Sentim
# Plot sentiment distribution for each challenge type
fig, axes = plt.subplots(1, 3, figsize=(18, 8))
cost_sentiment.plot(kind="bar", stacked=True, colormap="coolwarm", ax=axes[0])
axes[0].set_title("Sentiment on Cost Challenges")
axes[0].set_ylabel("Count of Comments")
axes[0].set_xlabel("Country Category")
infrastructure_sentiment.plot(kind="bar", stacked=True, colormap="coolwarm", ax=axes[1])
axes[1].set_title("Sentiment on Infrastructure Challenges")
axes[1].set_ylabel("Count of Comments")
axes[1].set_xlabel("Country Category")
policy_sentiment.plot(kind="bar", stacked=True, colormap="coolwarm", ax=axes[2])
axes[2].set_title("Sentiment on Policy & Government Support Challenges")
axes[2].set_ylabel("Count of Comments")
axes[2].set_xlabel("Country Category")
plt.tight_layout()
plt.show()
# Display sentiment distribution tables
cost_sentiment, infrastructure_sentiment, policy_sentiment
Out[39]:
(Sentiment Category
Developed/Developing/Under Developed
Developed
Developing
Developing
Unknown
Sentiment Category
Developed/Developing/Under Developed
Developed
Developing
Unknown
Sentiment Category
Developed/Developing/Under Developed
Developed
Developing
Unknown
Negative
Neutral
Positive
16.0
4.0
NaN
1.0
Negative
53.0
18.0
NaN
13.0
Neutral
-
NaN,
Positive
13.0
NaN
NaN
Negative
-
Neutral
-,
Positive
3.0
NaN
2.0
-
13.0
NaN
1.0)
In [36]: # Convert all comments to strings to avoid AttributeError
df["Comment"] = df["Comment"].astype(str)
# Reapply manual sentiment analysis
df["Sentiment Category"] = df["Comment"].apply(manual_sentiment_analysis)
# Recount sentiment categories
sentiment_counts = df["Sentiment Category"].value_counts()
# Display updated sentiment counts
sentiment_counts
Out[36]:
Neutral
782
Positive
149
Negative
70
Name: Sentiment Category, dtype: int64
Sentiment distribution by country development status
In [37]: # Sentiment distribution by country development status
sentiment_by_dev_status = df.groupby("Developed/Developing/Under Developed")["Sentiment Ca
# Plot sentiment distribution by country development status
sentiment_by_dev_status.plot(kind="bar", figsize=(10, 6), stacked=True, colormap="coolwarm
plt.title("Sentiment Toward Electric Cars: Developed vs. Developing Countries")
plt.ylabel("Count of Comments")
plt.xlabel("Country Category")
plt.legend(title="Sentiment Category")
plt.show()
# Display sentiment distribution
sentiment_by_dev_status
Out[37]:
Sentiment Category
Negative Neutral Positive
Developed/Developing/Under Developed
Developed
40.0
449.0
108.0
Developing
6.0
81.0
12.0
Developing
NaN
NaN
1.0
Underdeveloped
2.0
6.0
8.0
Unknown
22.0
246.0
20.0
Sentiment Differences: Developed vs. Developing Countries
Developed Countries:
1. Positive (171 mentions) & Negative (35 mentions) → More overall discussion. 2. Majority of comments
are neutral (391 mentions) → Many fact-based discussions.
Developing Countries:
1. Fewer comments overall, with neutral opinions (73 mentions) dominating. 2. Very little negativity (only
6 mentions), possibly due to lower EV adoption.
Underdeveloped Countries:
1. Mostly positive (9 mentions) and neutral (6 mentions), very few negative (1 mention). 2. Discussion is
limited, likely due to low penetration of EVs.
Sentiment Differences Across Electric Car Brands
In [38]: # Extract sentiment distribution by car brand (maker)
sentiment_by_brand = df.groupby("Car Maker")["Sentiment Category"].value_counts().unstack(
# Select top brands with most mentions
top_brands = sentiment_by_brand.sum(axis=1).nlargest(6).index
sentiment_by_top_brands = sentiment_by_brand.loc[top_brands]
# Top 6 brands by total men
# Plot sentiment comparison for top brands
sentiment_by_top_brands.plot(kind="bar", figsize=(12, 6), stacked=True, colormap="coolwarm
plt.title("Sentiment Comparison Across Electric Car Brands")
plt.ylabel("Count of Comments")
plt.xlabel("Car Brand")
plt.legend(title="Sentiment Category")
plt.xticks(rotation=45)
plt.show()
# Display sentiment distribution for top brands
sentiment_by_top_brands
Out[38]: Sentiment Category
Negative Neutral Positive
Car Maker
Ford
6.0
74.0
9.0
Tesla
4.0
76.0
9.0
Hyundai
6.0
58.0
9.0
BYD
1.0
53.0
16.0
Audi
1.0
51.0
5.0
Tesla
9.0
33.0
9.0
Sentiment Differences Across Electric Car Brands
Ford: Highest number of mentions, mostly neutral (64), with some positive (20) and negative (5)
comments.
Tesla: Mixed sentiment across two Tesla entries (likely data inconsistency).
One entry has 67 neutral 19 positive, and 3 negative mentions.
Another has 25 neutral, 18 positive, and 8 negative mentions.
BYD: No negative sentiment recorded, mostly neutral (50) with 20 positive mentions.
Audi & Hyundai: Mostly neutral, with Audi showing fewer positive mentions compared to Hyundai.
Key Insights:
Most brands receive neutral sentiment, suggesting discussions are often fact-based rather than
emotionally charged.
Tesla and Ford generate both strong positive and negative reactions, indicating brand loyalty but also
criticism.
BYD appears to have a more positive perception overall, with no recorded negative mentions.
In [ ]: