Exploring Market Trends: An Exploratory Data Analysis
Exploring Market Trends: An Exploratory Data
Analysis (EDA) Approach
BY LEKE, John Oluwagbemiga
For HNG Stage 2 Task
Introduction
This report dives into an exploratory data analysis (EDA) of a marketing dataset to uncover valuable
insights that can help improve campaign performance and guide smarter decision-making. This data
set “marketing_campaign_dataset” was provided by HNG Tech as part of the task for the internship
program to apply the Exploratory Data Analysis (EDA) approach.
Objectives of this project:
The focus is on understanding which marketing channels are the most effective, analyzing ROI
trends, and identifying location-based engagement patterns to see where campaigns are performing
well and where there’s room for improvement.
Beyond just the numbers, this analysis aims to provide actionable recommendations that can help
optimize future marketing strategies and ensure resources are being used in the best possible way.
1.Initial dataset
Tools
•
•
•
R & RStudio
Google sheet
Google Slides
Exploring Market Trends: An Exploratory Data Analysis Approach.
1
Data Overview
The dataset consists of:
•
•
•
Rows - 200,005
Columns – 15
No missing values
2. first view of the dataset when loaded into the RStudio Environment
Key Columns Identified:
•
•
•
•
•
•
•
•
Campaign_ID: Unique identifier for each marketing campaign.
Channel_Used: Marketing channel (Google Ads, YouTube, etc.).
Conversion_Rate: Percentage of people who completed a desired action.
ROI: Return on investment.
Clicks: Number of ad clicks.
Impressions: Number of times the ad was displayed.
Acquisition_Cost: Cost to acquire a customer.
Location: Region where the campaign was run.
Data Cleaning and Preparation
This phase we check for missing values, duplicates, data types, count total rows and columns.
Counting total rows and columns
Exploring Market Trends: An Exploratory Data Analysis Approach.
2
We used the code as seen in the image above to count the total number of rows and column and we
have a total of 200,005 rows and 15 columns.
Checking for missing values and data type
The dataset was checked for missing values, duplicates and incorrect data types and we have;
•
•
•
0 missing values
0 duplicates
The data types consist of number (num) and character(chr)
The date was saved as a character (chr). And then converted from chr format to date format with
this code:
Findings and Insights
Key Statistics and Observations
We checked the summary of the key columns to see the statistical summary.
Exploring Market Trends: An Exploratory Data Analysis Approach.
3
•
•
•
•
•
Conversion Rate: Ranges from 1% to 15%, with an average of 8%.
Acquisition Cost: Varies between $5,000 and $20,000, with an average of $12,504.
ROI: Ranges from 2.0 to 8.0, with an average of 5.0.
Clicks & Impressions:
o Clicks range from 100 to 1,000, with a mean of ~550.
o Impressions range from 1,000 to 10,000, averaging 5,507.
Engagement Score: Ranges from 1 to 10, with an average of 5.5.
Performing basic statistic
Using the “Summary()” function to find the mean, median standard deviation of Return of
Investment (ROI), Conversion Rate and Acquisition Cost.
The process was carried out to find the average, median and the standard deviation.
Unique target audiences and marketing channels.
Analyzing the unique target audiences and marketing channels, also to count how many times each
of our unique market audience and marketing channel appear using the “table()” function. The table
function will count how many times each category appears.
Exploring Market Trends: An Exploratory Data Analysis Approach.
4
The unique market audience shows the different age and gender categories like “men 18-24” and
the the unique marketing channel like “Google Ads”, “YouTube”,” Instagram”, “Website”,
“Facebook”, “Email”. We went ahead to count how many times each category appears.
Checking for Outliers
Outliers are known to be extreme values that are much higher or lower than the rest of the data.
Identifying and managing outliers is crucial as they can distort results and lead to misleading
conclusions. We will identify outliers in impressions, clicks, and spend.
We used the “summary()” function to solve some key statistics like minimum, maximum, median
and quartiles. These can be used to solve for outliers.
Calculating to check for outlier using the interquartile range (IQR) method
Formula: IQR=Q3−Q1
Where:
•
•
•
Q1 (1st quartile) = 25th percentile
Q3 (3rd quartile) = 75th percentile
IQR = Q3 - Q1
Outliers are values outside the range: [ Q1− 1.5 × IQR, Q3 + 1.5 × IQR]
Exploring Market Trends: An Exploratory Data Analysis Approach.
5
Checking Outliers for impressions
IQR = 7753 – 3266 = 4487
Lower Bound = 3266 – (1.5 x 4487) = -3466 (no lower outlier)
Upper Bound = 7753 + (1.5 x 4487) = 14483 (no upper outlier)
Therefore, No outliers in impressions.
Checking Outliers for Clicks
IQR = 775 - 325 = 450
Lower Bound = 325 - (1.5 × 450) = -350 (no lower outlier)
Upper Bound = 775 + (1.5 × 450) = 1450 (no upper outlier)
No outliers in Clicks.
Checking Outliers for Acquisition Cost
IQR = 16264 - 8740 = 7524
Lower Bound = 8740 - (1.5 × 7524) = -2546 (no lower outlier)
Upper Bound = 16264 + (1.5 × 7524) = 27550 (no upper outlier)
No outliers in Acquisition Cost
Initial visualization for outlier detection
Key observations
•
•
•
The dashed lines A.K.A whiskers extend to the min/max values within the normal range.
There are no dots or points beyond the whiskers, meaning no outliers based on the IQR (the
interquartile range) rule.
The median (black line) is centered in all boxplots, indicating a relatively balanced
distribution.
Exploring Market Trends: An Exploratory Data Analysis Approach.
6
Based on the boxplots, there aren't any outliers in these datasets
Calculating metrics to access campaign effectiveness
Calculating metrics like Click Through Rate (CTR), Cost Per Click (CPC), and Conversion Rate per
channel to access campaign effectiveness. A new variable named “channel_performance” was
declared to compare the campaign effectiveness across different channels.
The Average of Click Through Rate (CTR (Clicks ÷ Impressions))
Average Cost Per Click (CPC (Acquisition Cost ÷ Clicks))
Average Conversion rate
Average Return on Investment (ROI)
were calculated and the result displayed, shows us the comparison in the campaign effectiveness
across different channel used
Code and Methodology in Visualizing
Creating Insights
We analyze the dataset to extract meaningful insights such as comparing campaign performance
across different channels. With the ggplot2 library we were able to create meaningful insights.
A Bar Chart of the Average CTR against Channel Used
Exploring Market Trends: An Exploratory Data Analysis Approach.
7
Result:
This bar chart compares the Average Click Through Rate (CTR) by Channel Used. All channels
(Email, Facebook, Google Ads, Instagram, Website, and YouTube) have similar CTR values with no
significant outliers. No single channel outperforms or underperforms noticeably. Adding numeric
labels could improve clarity for precise comparison.
Identifying high-performing and underperforming campaigns based on
Return on Investment (ROI).
To identify the performance based on Return on Investment (ROI), the median ROI was calculated by
creating a new Variable with this code;
The median Return of Investment (ROI) is 5.01
Creating a boxplot to show the campaign performance based on ROI and campaign type
Exploring Market Trends: An Exploratory Data Analysis Approach.
8
To evaluate campaign performance, we analyzed the Return on Investment (ROI) across all
campaigns. This boxplot provides a clear visual representation of ROI distribution, helping us
distinguish high-performing and underperforming campaigns.
ROI Distribution of Campaigns Performance
This box plot illustrates the Return of Investment (ROI) distribution for various marketing campaigns
(Display, Email, Influencer, Search, and social media). The median ROI is similar across all campaign
types, indicating steady performance. The range of ROI values is fairly consistent, with no significant
outliers. this indicates that all campaign types deliver comparable returns with low variation.
Generating Insights on Location-Based Trends for Campaign Success
To uncover demographic or cultural influences on campaign success, we analyze how different
locations impact key marketing metrics like ROI, CTR (Click-Through Rate), CPC (Cost Per Click),
and Conversion Rate. We achieve this by grouping the dataset by location to get the average
performance by each region.
The table shows the average performance metrics for ROI, CTR (Click-Through Rate), CPC (Cost Per
Click), and Conversion Rate.
Exploring Market Trends: An Exploratory Data Analysis Approach.
9
Visualizing a heatmap to show performance metrics by location
We visualized a heatmap of Conversion Rate vs CTR by Location with this code:
Result:
We use the heatmap for location-based trends because it helps determine which region respond
best to campaign and shows high engagement (CTR) and identifies underperforming locations that
may need different marketing strategies.
•
•
•
Houston & Los Angeles (Red) have relatively high CTR but lower conversion rates. It means
strong engagement, but potential issues in converting clicks into customers
Miami (purple) moderate CTR and conversion rate
New York (Blue) has the lowest CTR among all locations yet a higher conversion rate.
Plotting the Distribution of Acquisition Cost Across Marketing Channels
We used the ggplot2 to create a Density plot because it provides more customization options we
achieved this by running the code:
Exploring Market Trends: An Exploratory Data Analysis Approach.
10
The density plot shows how acquisition costs are distributed across different marketing
channels like YouTube, Website, Instagram, Google Ads, Facebook, and Email. Since the
colours are evenly spread, it suggests that costs are fairly consistent across all channels, with
no one platform standing out as significantly higher or lower. There aren’t any noticeable
peaks or outliers, meaning the spending pattern is pretty balanced across the board.
Recommendations
•
•
•
•
•
Optimize campaigns by focusing on high-ROI channels like email and website and
also improve engagement in New York.
Reduce costs by reallocating budgets from high Cost Per Click (CPC) channels to
more cost-effective ones.
Target high performing audiences by prioritizing men 18-34 and adjust messaging for
women 35-44
Improve conversion in key regions like Houston & Los Angeles have high CTR but
low conversion.
Identifying common factors in successful campaign by analyzing ROI drivers.
Conclusion & Next Steps
The analysis shows that while the campaigns are getting good engagement, they’re not converting as
well as they should. Email and Website bring the best ROI, but Houston and Los Angeles have high
engagement with low conversions, meaning there's room for improvement. New York, on the other
hand, has a high conversion rate but low clicks, so better ad engagement could help. Since there are
no outliers, the insights are solid and reliable.
Moving forward, the focus should be on improving conversion strategies, shifting budgets to more
cost-effective channels, refining audience targeting and boosting ad engagement where needed. Using
predictive modelling and analyzing the customer journey will also help make smarter, data-driven
decisions.
Exploring Market Trends: An Exploratory Data Analysis Approach.
11