Etinosa David Eribo | Freelancer Electric Car Analysis

Electric Car Analysis

In [30]: import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn import preprocessing import warnings warnings.filterwarnings('ignore') %matplotlib inline In [31]: df = pd.read_excel('C:\\Users\\eribo/Downloads/ElectricCarData_v3.xlsx') df.head() Developed/ Car Electric/ Model Number Region Country Developing/ Under Maker ICE of Seats Developed Out[31]: USD Cost Mileage Charging Time (km) (min) Ele North America USA 1 North America USA Developed 2 North America USA Developed 0 Developed Electric Model 3 5 Tesla Electric Model 3 7 - 660.0 32.0 Tesla Electric Model 3 5 44130.0 652.0 33.0 Tesla 44130.0 640.0 31.0 a Ih la 3 North America USA Developed Tesla Electric Model 3 5 44130.0 653.0 w 34.0 pe te 4 North America USA Developed Tesla Electric Model 3 5 44130.0 645.0 35.0 In [32]: # Display basic information about the dataset df.info(), df.head() b RangeIndex: 1001 entries, 0 to 1000 Data columns (total 16 columns): # Column --- -----0 Region 1 Country 2 Developed/Developing/Under Developed 3 Car Maker 4 Electric/ICE 5 Model 6 Number of Seats 7 USD Cost 8 Mileage (km) 9 Charging Time (min) 10 Comment 11 Platform 12 Theme/Factor 13 Factor Status 14 Sentiment 15 Source dtypes: float64(4), int64(1), object(11) memory usage: 125.2+ KB Out[32]: Non-Null Count -------------1001 non-null 1001 non-null 1001 non-null 1001 non-null 1001 non-null 1000 non-null 1001 non-null 1001 non-null 1001 non-null 1001 non-null 1001 non-null 1001 non-null 564 non-null 0 non-null 510 non-null 997 non-null Dtype ----object object object object object object int64 float64 float64 float64 object object object float64 object object (None, 0 1 2 3 4 0 1 2 3 4 North North North North North Region Country Developed/Developing/Under Developed Car Maker America USA Developed Tesla America USA Developed Tesla America USA Developed Tesla America USA Developed Tesla America USA Developed Tesla Electric/ICE Electric Electric Electric Electric Electric Model Model 3 Model 3 Model 3 Model 3 Model 3 0 1 2 3 4 Charging Time (min- 0 1 2 3 4 Platform Theme/Factor Youtube NaN Youtube NaN Youtube NaN Youtube NaN Youtube NaN 0 1 2 3 4 Number of Seats 5 7 5 5 5 USD Cost- Mileage (km- \ \ Comment Electric drive just seems so appropriate for a... Finally 88mph within a parking area! I hope I'll be able to afford this later on in... Edit: Just watched the performance tests, and ... Honestly, this is way better than the Alpha 5.... Factor Status Sentiment NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Source https://www.youtube.com/watch?v=diC_U1O0YJA https://www.youtube.com/watch?v=diC_U1O0YJA https://www.youtube.com/watch?v=diC_U1O0YJA https://www.youtube.com/watch?v=diC_U1O0YJA https://www.youtube.com/watch?v=diC_U1O0YJA \ ) \ In [33]: df.drop(columns=['Factor Status'], inplace=True) Exploratory Data Analysis (EDA) Distribution of USD Cost In [34]: # Set visualization style sns.set_theme(style="whitegrid") # Summary statistics for numeric columns summary_stats = df.describe() # Count of electric vs ICE cars car_type_counts = df["Electric/ICE"].value_counts() # Distribution of USD Cost plt.figure(figsize=(8, 5)) sns.histplot(df["USD Cost"], bins=30, kde=True) plt.title("Distribution of Electric Car Prices (USD)") plt.xlabel("Price (USD)") plt.ylabel("Frequency") plt.show() # Display results summary_stats, car_type_counts Out[34]: ( Number of Seats USD Cost count- mean- std- min-%-%-%- max- Electric 969 PHEV 17 Hybrid 10 Unknown 2 BEV 2 Electric/ICE/Hybrid 1 Name: Electric/ICE, dtype: int64) Mileage (km- Charging Time (min-, Compare average cost of Electric vs ICE vehicles In [43]: # Compare average cost of Electric vs ICE vehicles avg_cost_comparison = df.groupby("Electric/ICE")["USD Cost"].mean() # Plot the comparison plt.figure(figsize=(18, 6)) sns.barplot(x=avg_cost_comparison.index, y=avg_cost_comparison.values, palette="coolwarm") plt.title("Average Cost of Electric vs ICE Vehicles") plt.xlabel("Vehicle Type") plt.ylabel("Average Price (USD)") plt.show() # Display numerical comparison avg_cost_comparison Out[43]: Electric/ICE BEV Electric Electric/ICE/Hybrid Hybrid PHEV Unknown Name: USD Cost, dtype: - float64 In [49]: import matplotlib.pyplot as plt import seaborn as sns # Set visualization style sns.set_style("whitegrid") # Create a figure with subplots (1 row, 2 columns) fig, axes = plt.subplots(1, 2, figsize=(12, 5)) # Count of Electric vs. ICE cars sns.countplot(x="Electric/ICE", data=df, palette="coolwarm", ax=axes[0]) axes[0].set_title("Count of Electric vs. ICE Cars") axes[0].set_xlabel("Type of Vehicle") axes[0].set_ylabel("Count") # Count of cars by region type (Developed vs Developing) sns.countplot(x="Developed/Developing/Under Developed", data=df, palette="viridis", ax=axe axes[1].set_title("Number of Cars by Region Type") axes[1].set_xlabel("Region Type") axes[1].set_ylabel("Count") axes[1].tick_params(axis='x', rotation=45) # Adjust layout for better spacing plt.tight_layout() plt.show() Key Insights from Exploratory Data Analysis (EDA): 1. The dataset is dominated by Electric cars (969 entries), with a few Plug-in Hybrid (PHEV), Hybrid, and other categories. 2. The average cost of electric cars is 92,931, with a wide price range from 0 (possibly incorrect or missing data) to 344,000. 3. Average mileage is 448 km per charge, with some models reaching up to 1,440 km. 4. Charging time varies significantly, from a few minutes to up to 1,200 minutes. SENTIMENTS ANALYSIS Findings on Challenges and Sentiment Toward EVs In [39]: # Recompute sentiment distribution for each challenge type cost_sentiment = cost_comments.groupby("Developed/Developing/Under Developed")["Sentiment infrastructure_sentiment = infrastructure_comments.groupby("Developed/Developing/Under Dev policy_sentiment = policy_comments.groupby("Developed/Developing/Under Developed")["Sentim # Plot sentiment distribution for each challenge type fig, axes = plt.subplots(1, 3, figsize=(18, 8)) cost_sentiment.plot(kind="bar", stacked=True, colormap="coolwarm", ax=axes[0]) axes[0].set_title("Sentiment on Cost Challenges") axes[0].set_ylabel("Count of Comments") axes[0].set_xlabel("Country Category") infrastructure_sentiment.plot(kind="bar", stacked=True, colormap="coolwarm", ax=axes[1]) axes[1].set_title("Sentiment on Infrastructure Challenges") axes[1].set_ylabel("Count of Comments") axes[1].set_xlabel("Country Category") policy_sentiment.plot(kind="bar", stacked=True, colormap="coolwarm", ax=axes[2]) axes[2].set_title("Sentiment on Policy & Government Support Challenges") axes[2].set_ylabel("Count of Comments") axes[2].set_xlabel("Country Category") plt.tight_layout() plt.show() # Display sentiment distribution tables cost_sentiment, infrastructure_sentiment, policy_sentiment Out[39]: (Sentiment Category Developed/Developing/Under Developed Developed Developing Developing Unknown Sentiment Category Developed/Developing/Under Developed Developed Developing Unknown Sentiment Category Developed/Developing/Under Developed Developed Developing Unknown Negative Neutral Positive 16.0 4.0 NaN 1.0 Negative 53.0 18.0 NaN 13.0 Neutral - NaN, Positive 13.0 NaN NaN Negative - Neutral -, Positive 3.0 NaN 2.0 - 13.0 NaN 1.0) In [36]: # Convert all comments to strings to avoid AttributeError df["Comment"] = df["Comment"].astype(str) # Reapply manual sentiment analysis df["Sentiment Category"] = df["Comment"].apply(manual_sentiment_analysis) # Recount sentiment categories sentiment_counts = df["Sentiment Category"].value_counts() # Display updated sentiment counts sentiment_counts Out[36]: Neutral 782 Positive 149 Negative 70 Name: Sentiment Category, dtype: int64 Sentiment distribution by country development status In [37]: # Sentiment distribution by country development status sentiment_by_dev_status = df.groupby("Developed/Developing/Under Developed")["Sentiment Ca # Plot sentiment distribution by country development status sentiment_by_dev_status.plot(kind="bar", figsize=(10, 6), stacked=True, colormap="coolwarm plt.title("Sentiment Toward Electric Cars: Developed vs. Developing Countries") plt.ylabel("Count of Comments") plt.xlabel("Country Category") plt.legend(title="Sentiment Category") plt.show() # Display sentiment distribution sentiment_by_dev_status Out[37]: Sentiment Category Negative Neutral Positive Developed/Developing/Under Developed Developed 40.0 449.0 108.0 Developing 6.0 81.0 12.0 Developing NaN NaN 1.0 Underdeveloped 2.0 6.0 8.0 Unknown 22.0 246.0 20.0 Sentiment Differences: Developed vs. Developing Countries Developed Countries: 1. Positive (171 mentions) & Negative (35 mentions) → More overall discussion. 2. Majority of comments are neutral (391 mentions) → Many fact-based discussions. Developing Countries: 1. Fewer comments overall, with neutral opinions (73 mentions) dominating. 2. Very little negativity (only 6 mentions), possibly due to lower EV adoption. Underdeveloped Countries: 1. Mostly positive (9 mentions) and neutral (6 mentions), very few negative (1 mention). 2. Discussion is limited, likely due to low penetration of EVs. Sentiment Differences Across Electric Car Brands In [38]: # Extract sentiment distribution by car brand (maker) sentiment_by_brand = df.groupby("Car Maker")["Sentiment Category"].value_counts().unstack( # Select top brands with most mentions top_brands = sentiment_by_brand.sum(axis=1).nlargest(6).index sentiment_by_top_brands = sentiment_by_brand.loc[top_brands] # Top 6 brands by total men # Plot sentiment comparison for top brands sentiment_by_top_brands.plot(kind="bar", figsize=(12, 6), stacked=True, colormap="coolwarm plt.title("Sentiment Comparison Across Electric Car Brands") plt.ylabel("Count of Comments") plt.xlabel("Car Brand") plt.legend(title="Sentiment Category") plt.xticks(rotation=45) plt.show() # Display sentiment distribution for top brands sentiment_by_top_brands Out[38]: Sentiment Category Negative Neutral Positive Car Maker Ford 6.0 74.0 9.0 Tesla 4.0 76.0 9.0 Hyundai 6.0 58.0 9.0 BYD 1.0 53.0 16.0 Audi 1.0 51.0 5.0 Tesla 9.0 33.0 9.0 Sentiment Differences Across Electric Car Brands Ford: Highest number of mentions, mostly neutral (64), with some positive (20) and negative (5) comments. Tesla: Mixed sentiment across two Tesla entries (likely data inconsistency). One entry has 67 neutral 19 positive, and 3 negative mentions. Another has 25 neutral, 18 positive, and 8 negative mentions. BYD: No negative sentiment recorded, mostly neutral (50) with 20 positive mentions. Audi & Hyundai: Mostly neutral, with Audi showing fewer positive mentions compared to Hyundai. Key Insights: Most brands receive neutral sentiment, suggesting discussions are often fact-based rather than emotionally charged. Tesla and Ford generate both strong positive and negative reactions, indicating brand loyalty but also criticism. BYD appears to have a more positive perception overall, with no recorded negative mentions. In [ ]: