Globox A/B Testing Report
A Mastery Project
Olayemi Elebute
28 August 2023
Table of Content
Project Overview…………………………………………………………………………………………………………………………….2
Stakeholder Profile………………………………………………………………………………………………………………………….2
Data Extraction……………………………………………………………………………………………………………………………….2
Result of statistical analysis…………………………………………………………………………………………………………….3
Data Visualization……………………………………………………………………………………………………………………………6
Novelty effect………………………………………………………………………………………………………………………………….7
Power Analysis………………………………………………………………………………………………………………………………..8
Recommendations……………………………………………………………………………………………………………………………10
Appendices…………………………………………………………………………………………………………………………………….10
1
PROJECT OVERVIEW
This report highlight the results, analysis and recommendations from the A/B test experiment
conducted by Globox, an e-commerce company whose goals is to bring more awareness to its
food and drinks offerings which has seen significant growth in previous months.
The experiment randomly assigns customers or website visitors to two version of the company’s
website(one showing the normal view and the other with a banner that highlights its key food
and drink products, and only available on mobile). It sets out to ascertain whether or not users
who visited the company’s mobile site and were randomly selected to either test group Control (Group A) or Treatment (Group B) made purchase or not after seeing the banner.
Lastly, the experiment through its findings seek to recommend launching the experience to all
websites users/visitors subsequently if the results show potential increase in test metrics.
STAKEHOLDERS PROFILE
Role
Task and performance metrics
Growth product and The team develops features for the GloBox website that drive growth
Engineering team
in users and revenue. It comprises product manager, a user experience
designer, an engineering manager, software engineers, and the data
analyst
Product
Growth
Manager, Decides goals and projects of growth team, measuring their success
against defined KPIs, and communicating results to other company
leaders
User
Experience Conducts user research and designed the experience that the A/B test
Designer
is evaluating.
Head of Marketing
Works on targeting audiences with effective marketing campaigns to
drive customers to the GloBox website.
collaborates frequently with other departments to design website
experiences that will align well with the current marketing efforts.
DATA EXTRACTION
The data analyst utilize SQL to extract data-set which was further used for analysis. The
extracted data from the website users include; User-id, country, gender, test-group, conversion
status and amount spent as shown in table 1 below.
2
Further, table 2 below show a brief summary of the test group information summary
Test group
Sample
Number
Conversion Total Spent Average
Proportions
of
rate%
($)
spent ($)
Converted
Control (A)
24343
955
0.039
3.92
82,145.90
3.37
Treatment (B)
24600
1139
0.046
4.63
83,415.32
3.39
RESULT OF STATISTICAL ANALYSIS
The data analyst carried out four statistical test in order to establish a statistical significance and
confidence level in the test metrics.
Two test on proportion was done on the conversion rate between test groups using z-test and
these are hypothesis and confidence interval. In addition, two test of sample means was done
on the average amount spent between test groups using T-test.
Sample proportion test result
1. Hypothesis test result on the difference in conversion rate between test groups
The null hypothesis stated that there is no statistically significant difference between the
conversion rate of the test groups while the alternative hypothesis reads that there is a
significant difference.
Calculation
Notation
Value
sample size(group A)
n1
24343
sample size(group b)
n2
24600
sample proportion(group A)
p1 hat
0.0392
sample proportion(group B)
p2 hat
0.0463
pooled proportion
p hat
0.0428
test statistics
T
3.8815
alpha
a
0.05
p-value
pval
0.0001
3
From the above table, result shows there is a statistical significant difference between
conversion rates of the two groups(A&B) since p-value 0.0001 is < alpha 0.05, and as such we
reject the null hypothesis.
(Comprehensive spreadsheet calculation to arrive at the above result is found in Appendix 1.)
2. Confidence interval test result for difference in conversion rate between test groups
The analyst assume a 95% confidence interval which corresponds to the 97.5th percentile of the
normal distribution since 95% of the distribution is within 1.96 standard deviations from the
mean.
Calculation
Notation
Value
sample size(group A)
n1
24343
sample size(group b)
n2
24600
sample proportion(group A)
p1 hat
0.039
sample proportion(group B)
p2 hat
0.046
Difference in sample proportion
p1 hat-p2 hat
0.007
Critical value
z
-
Standard error
se
-
Margin error
(z*se)
-
Confidence interval(upper bound)
CI
-
Confidence interval(lower bound)
CI
-
From the above analysis, the confidence interval is (0.003, 0.0105) for the true difference in
proportions between group A and group B. It means there is 95% confidence that the true
difference in proportion(0.007) fall within the lower and upper bound of the interval.
Sample mean test result
1. Hypothesis test on the difference in average amount spent between test groups
The null hypothesis stated that there is no statistically significant difference in the average
amount spent between the test groups while the alternative hypothesis reads that there is a
significant difference.
4
From the above table, result shows there is no statistical significant difference in the amount
spent between the test groups(A&B) since p-value 0.9438 is > alpha value 0.05, and as such we
Calculation
Notation
Value
sample size(group A)
n1
24343
sample size(group b)
n2
24600
sample mean(group A)
x1
-
sample mean(group B)
x2
-
sample standard deviation(A)
Sa
-
sample standard deviation(B)
Sb
-
Standard error
SE
-
Test statistics
T
-
degree of freedom
df
-
p-value
pval
-
We fail to reject the null hypothesis.
2. Confidence interval test result for difference in average amount spent between test groups
Calculation
Notation
Value
sample mean(group A)
n1
-
sample mean(group B)
n2
-
critical t-value
Standard error
SE
-
difference in mean
t.statistics
-
Margin error
t*SE
-
Confidence Interval(lower bound)
CI
-
Confidence Interval(upper bound)
CI
-
5
From the above analysis, the confidence interval is (-0.438, 0471) for the true difference in
means between group A and group B. Although It means there is 95% confidence that the true
difference in means (0.016) fall within the lower and upper bound of the interval, we cannot say
with greater precision of same if we were to repeat the sample collection process because of
the wide confidence interval suggesting some degree of uncertainty in estimating the true
difference in means.
A/B TEST RESULT DATA VISUALIZATION
Figure 1 below is one of the visuals showing the relationship between the test metrics and test
groups. Other tableau visualizations showing the insights in the datasets is shown in Appendix
and were based on five business questions raised by stakeholders.
Dashboard visualization for the datasets is displayed in figure 2 below showing the interactions
and relationship between test metrics and users demographic attributes and device. The
dashboard focus on five key test parameters which include test metrics (conversation rate and
average amount spent) versus test group; average amount spent per user for each test group;
relationship between test metrics and users device; relationship between test metrics and
users gender; and lastly relationship between test metrics and users country.
This visualization was done on tableau and gave insights into users behaviour across different
test indices which would later form basis for recommendations to improve customer experience
on the website.
6
NOVELTY EFFECTS
This is the behaviour of users when exposed to a new treatment which could be based on
curiousity and not necessarily because of the treatment’s usefulness(in this case, the banner).
Figure 3 below shows the difference in conversion rate and total amount spent between test
groups across the test period.
7
From the above chart, it is revealed that there was a slight rise in the conversion rate of users in
group B(treatment group) over the last few days of the test period after what seem like a steady
balance slope between 25th January and 3rd February. In other words, the user conversion rate
of 4.53% on the first day of the test steadily rose to 5.97% by the last day of the test period
suggesting partly that the banner’s effectiveness is not entirely short lived.
However, a look at the total amount spent among users in the treatment group across the test
period suggest otherwise as there seem to be a decline in users total spending across after the
first day of the test which suggest the novelty effect and other factors such as the season of the
test in the different countries of the treatment group.
In summary, the conversion rate of group B(treatment group) across the test period showed
potential based on the banner’s introduction but we cannot entirely rule out the presence of a
novelty effect, thus, suggesting the banner may or may not have been partly effective. Also,
since we cannot exactly say the same for the average spent between both groups across the test
period, we may need to run the test much longer to be able to see better trend to be certain
the changes observed are not a result of Novelty effect.
POWER ANALYSIS
It is important to state that as a result of the banner’s introduction, a percentage change of 18%
in conversion rate was detected in the treatment group as against the control group showing its
statistical significance as corroborated by the hypothesis test where we rejected the null
hypothesis to accept the alternative hypothesis that a statistical significance difference exist
between the conversion rate of the test groups.
The calculation for percentage change in conversion rate between test group = Group A: 0.039,
Group B: 0.046
(0.046 - 0.039)/0.039 = 0.179 approximate to 18%.
In view of the above, the meaningful change for the business if it launch the banner is increase
in revenue(possible when there is an increase in either conversion rate or amount spent) and as
such practical significance bar is set to 10 - 15% increase in revenue. However, since the
treatment group(B) had a 4.6% conversion rate which equals 18% increase when compared with
the control group as a result of the banner, and using the statsig.com/calculator where 3.9% =
baseline conversion rate; 10% = minimum detectable effect; significance level = 0.05, statistical
power of 0.8, we arrive at a total sample size of 60,900 suggesting that we had fewer sample
size(48943) in our A/B test, and as such we say the conversion rate is only statistically significant
but not practically or substantially significant.
8
RECOMMENDATIONS
After a thorough analysis and visualization of the datasets, I strongly recommend we should
continue iterating based on the following justification.
Continue Iterating(Justification)
The test showed great prospect in one of the test metric, particularly the conversion rate of the
treatment group. However, the novelty effect check conducted between the test metrics across
the test duration pointed to a sharp decline in total amount spent after the first few days
despite potential in the conversion rate which was steady and ended on an increase on the last
day of the test as against the first few days.
Also, the hypothesis test in the conversion rate between test groups showed an obvious
difference in conversion of both groups with the treatment group(B) having a conversion of
4.6% against the control group(A) with 3.9%, thus leading to 18% increase in conversion rate as
a result of the banner’s introduction. On the other hand, we could not statistically justify same
in the hypothesis test for the total amount spent between test groups.
Lastly, the power analysis showed that the sample size may not be sufficient enough to be
substantially significant based on the business expectations of 10 - 15% practical significance
bar of revenue increase.
This means that we didn’t observe enough enhancement in our test metrics to be certain about
launching the banner feature at this time as the launch may not lead to desired revenue and
recommends the A/B test should be repeated with improvement to the banner feature and run
for more days in order to arrive at a sample size estimate of 60,900 based on the power analysis.
9
APPENDIX
A1. Excel sheet for Globox data-set
https://eu.docworkspace.com/d/sIKqRrNFM846zpwY
A2. Excel sheet for hypothesis test
https://eu.docworkspace.com/d/sIJORrNFM2JuzpwY
A3. Tableau visualization for Question 1 to 5
Q1. Conversion rate and average amount spent per between the two user groups
https://public.tableau.com/views/GloboxDataset/Q1Visual?:language=enUS&publish=yes&:display_count=n&:origin=viz_share_link
Q2. Average amount spent per user for each test group
https://public.tableau.com/views/GloboxDatasetVisualQuestion2/Q2Visual?:language=enUS&publish=yes&:display_count=n&:origin=viz_share_link
Q3. Relationship between test metrics and user device
https://public.tableau.com/views/GloboxDatasetVisualQuestion3/Q3Visual?:language=enUS&:retry=yes&publish=yes&:display_count=n&:origin=viz_share_link
Q4. Relationship between test metrics and user gender
https://public.tableau.com/views/GloboxDatasetVisualQuestion4/Q4Visual?:language=enUS&publish=yes&:display_count=n&:origin=viz_share_link
Q5. Relationship between test metrics and user country
https://public.tableau.com/views/GloboxDatasetVisualQuestion5/Q5Visual?:language=enUS&publish=yes&:display_count=n&:origin=viz_share_link
A4. Powerpoint presentation https://eu.docworkspace.com/d/sIKSRrNFM4smBqgY
B. SQL Query
B1. Query for main Globox data-set extraction
SELECT u.id AS user_id,
u.country,
u.gender,
g.device,
"group",
CASE
WHEN spent > 0 THEN 'Converted'
ELSE 'Not converted'
10
END AS conversion_status,
SUM(COALESCE(spent, 0)) AS Total_spent
FROM users u
LEFT JOIN activity a
ON
u.id = a.uid
LEFT JOIN groups g
ON
u.id = g.uid
GROUP BY u.id, u.country, u.gender, g.device,
"group", conversion_status
B2. Query for Novelty check
SELECT u.id AS user_id,
u.country,
u.gender,
g.device,
"group",
g.join_dt,
CASE
WHEN spent > 0 THEN 'Converted'
ELSE 'Not converted'
END AS conversion_status,
SUM(COALESCE(spent, 0)) AS Total_spent
FROM users u
LEFT JOIN activity a
ON
u.id = a.uid
LEFT JOIN groups g
ON
u.id = g.uid
GROUP BY u.id, u.country, u.gender, g.device,
g.join_dt, "group", conversion_status
11
12