Olayemi Samuel Elebute | Freelancer Portfolio Item #366492

Globox A/B Testing Report A Mastery Project Olayemi Elebute 28 August 2023 Table of Content Project Overview…………………………………………………………………………………………………………………………….2 Stakeholder Profile………………………………………………………………………………………………………………………….2 Data Extraction……………………………………………………………………………………………………………………………….2 Result of statistical analysis…………………………………………………………………………………………………………….3 Data Visualization……………………………………………………………………………………………………………………………6 Novelty effect………………………………………………………………………………………………………………………………….7 Power Analysis………………………………………………………………………………………………………………………………..8 Recommendations……………………………………………………………………………………………………………………………10 Appendices…………………………………………………………………………………………………………………………………….10 1 PROJECT OVERVIEW This report highlight the results, analysis and recommendations from the A/B test experiment conducted by Globox, an e-commerce company whose goals is to bring more awareness to its food and drinks offerings which has seen significant growth in previous months. The experiment randomly assigns customers or website visitors to two version of the company’s website(one showing the normal view and the other with a banner that highlights its key food and drink products, and only available on mobile). It sets out to ascertain whether or not users who visited the company’s mobile site and were randomly selected to either test group Control (Group A) or Treatment (Group B) made purchase or not after seeing the banner. Lastly, the experiment through its findings seek to recommend launching the experience to all websites users/visitors subsequently if the results show potential increase in test metrics. STAKEHOLDERS PROFILE Role Task and performance metrics Growth product and The team develops features for the GloBox website that drive growth Engineering team in users and revenue. It comprises product manager, a user experience designer, an engineering manager, software engineers, and the data analyst Product Growth Manager, Decides goals and projects of growth team, measuring their success against defined KPIs, and communicating results to other company leaders User Experience Conducts user research and designed the experience that the A/B test Designer is evaluating. Head of Marketing Works on targeting audiences with effective marketing campaigns to drive customers to the GloBox website. collaborates frequently with other departments to design website experiences that will align well with the current marketing efforts. DATA EXTRACTION The data analyst utilize SQL to extract data-set which was further used for analysis. The extracted data from the website users include; User-id, country, gender, test-group, conversion status and amount spent as shown in table 1 below. 2 Further, table 2 below show a brief summary of the test group information summary Test group Sample Number Conversion Total Spent Average Proportions of rate% ($) spent ($) Converted Control (A) 24343 955 0.039 3.92 82,145.90 3.37 Treatment (B) 24600 1139 0.046 4.63 83,415.32 3.39 RESULT OF STATISTICAL ANALYSIS The data analyst carried out four statistical test in order to establish a statistical significance and confidence level in the test metrics. Two test on proportion was done on the conversion rate between test groups using z-test and these are hypothesis and confidence interval. In addition, two test of sample means was done on the average amount spent between test groups using T-test. Sample proportion test result 1. Hypothesis test result on the difference in conversion rate between test groups The null hypothesis stated that there is no statistically significant difference between the conversion rate of the test groups while the alternative hypothesis reads that there is a significant difference. Calculation Notation Value sample size(group A) n1 24343 sample size(group b) n2 24600 sample proportion(group A) p1 hat 0.0392 sample proportion(group B) p2 hat 0.0463 pooled proportion p hat 0.0428 test statistics T 3.8815 alpha a 0.05 p-value pval 0.0001 3 From the above table, result shows there is a statistical significant difference between conversion rates of the two groups(A&B) since p-value 0.0001 is < alpha 0.05, and as such we reject the null hypothesis. (Comprehensive spreadsheet calculation to arrive at the above result is found in Appendix 1.) 2. Confidence interval test result for difference in conversion rate between test groups The analyst assume a 95% confidence interval which corresponds to the 97.5th percentile of the normal distribution since 95% of the distribution is within 1.96 standard deviations from the mean. Calculation Notation Value sample size(group A) n1 24343 sample size(group b) n2 24600 sample proportion(group A) p1 hat 0.039 sample proportion(group B) p2 hat 0.046 Difference in sample proportion p1 hat-p2 hat 0.007 Critical value z - Standard error se - Margin error (z*se) - Confidence interval(upper bound) CI - Confidence interval(lower bound) CI - From the above analysis, the confidence interval is (0.003, 0.0105) for the true difference in proportions between group A and group B. It means there is 95% confidence that the true difference in proportion(0.007) fall within the lower and upper bound of the interval. Sample mean test result 1. Hypothesis test on the difference in average amount spent between test groups The null hypothesis stated that there is no statistically significant difference in the average amount spent between the test groups while the alternative hypothesis reads that there is a significant difference. 4 From the above table, result shows there is no statistical significant difference in the amount spent between the test groups(A&B) since p-value 0.9438 is > alpha value 0.05, and as such we Calculation Notation Value sample size(group A) n1 24343 sample size(group b) n2 24600 sample mean(group A) x1 - sample mean(group B) x2 - sample standard deviation(A) Sa - sample standard deviation(B) Sb - Standard error SE - Test statistics T - degree of freedom df - p-value pval - We fail to reject the null hypothesis. 2. Confidence interval test result for difference in average amount spent between test groups Calculation Notation Value sample mean(group A) n1 - sample mean(group B) n2 - critical t-value Standard error SE - difference in mean t.statistics - Margin error t*SE - Confidence Interval(lower bound) CI - Confidence Interval(upper bound) CI - 5 From the above analysis, the confidence interval is (-0.438, 0471) for the true difference in means between group A and group B. Although It means there is 95% confidence that the true difference in means (0.016) fall within the lower and upper bound of the interval, we cannot say with greater precision of same if we were to repeat the sample collection process because of the wide confidence interval suggesting some degree of uncertainty in estimating the true difference in means. A/B TEST RESULT DATA VISUALIZATION Figure 1 below is one of the visuals showing the relationship between the test metrics and test groups. Other tableau visualizations showing the insights in the datasets is shown in Appendix and were based on five business questions raised by stakeholders. Dashboard visualization for the datasets is displayed in figure 2 below showing the interactions and relationship between test metrics and users demographic attributes and device. The dashboard focus on five key test parameters which include test metrics (conversation rate and average amount spent) versus test group; average amount spent per user for each test group; relationship between test metrics and users device; relationship between test metrics and users gender; and lastly relationship between test metrics and users country. This visualization was done on tableau and gave insights into users behaviour across different test indices which would later form basis for recommendations to improve customer experience on the website. 6 NOVELTY EFFECTS This is the behaviour of users when exposed to a new treatment which could be based on curiousity and not necessarily because of the treatment’s usefulness(in this case, the banner). Figure 3 below shows the difference in conversion rate and total amount spent between test groups across the test period. 7 From the above chart, it is revealed that there was a slight rise in the conversion rate of users in group B(treatment group) over the last few days of the test period after what seem like a steady balance slope between 25th January and 3rd February. In other words, the user conversion rate of 4.53% on the first day of the test steadily rose to 5.97% by the last day of the test period suggesting partly that the banner’s effectiveness is not entirely short lived. However, a look at the total amount spent among users in the treatment group across the test period suggest otherwise as there seem to be a decline in users total spending across after the first day of the test which suggest the novelty effect and other factors such as the season of the test in the different countries of the treatment group. In summary, the conversion rate of group B(treatment group) across the test period showed potential based on the banner’s introduction but we cannot entirely rule out the presence of a novelty effect, thus, suggesting the banner may or may not have been partly effective. Also, since we cannot exactly say the same for the average spent between both groups across the test period, we may need to run the test much longer to be able to see better trend to be certain the changes observed are not a result of Novelty effect. POWER ANALYSIS It is important to state that as a result of the banner’s introduction, a percentage change of 18% in conversion rate was detected in the treatment group as against the control group showing its statistical significance as corroborated by the hypothesis test where we rejected the null hypothesis to accept the alternative hypothesis that a statistical significance difference exist between the conversion rate of the test groups. The calculation for percentage change in conversion rate between test group = Group A: 0.039, Group B: 0.046 (0.046 - 0.039)/0.039 = 0.179 approximate to 18%. In view of the above, the meaningful change for the business if it launch the banner is increase in revenue(possible when there is an increase in either conversion rate or amount spent) and as such practical significance bar is set to 10 - 15% increase in revenue. However, since the treatment group(B) had a 4.6% conversion rate which equals 18% increase when compared with the control group as a result of the banner, and using the statsig.com/calculator where 3.9% = baseline conversion rate; 10% = minimum detectable effect; significance level = 0.05, statistical power of 0.8, we arrive at a total sample size of 60,900 suggesting that we had fewer sample size(48943) in our A/B test, and as such we say the conversion rate is only statistically significant but not practically or substantially significant. 8 RECOMMENDATIONS After a thorough analysis and visualization of the datasets, I strongly recommend we should continue iterating based on the following justification. Continue Iterating(Justification) The test showed great prospect in one of the test metric, particularly the conversion rate of the treatment group. However, the novelty effect check conducted between the test metrics across the test duration pointed to a sharp decline in total amount spent after the first few days despite potential in the conversion rate which was steady and ended on an increase on the last day of the test as against the first few days. Also, the hypothesis test in the conversion rate between test groups showed an obvious difference in conversion of both groups with the treatment group(B) having a conversion of 4.6% against the control group(A) with 3.9%, thus leading to 18% increase in conversion rate as a result of the banner’s introduction. On the other hand, we could not statistically justify same in the hypothesis test for the total amount spent between test groups. Lastly, the power analysis showed that the sample size may not be sufficient enough to be substantially significant based on the business expectations of 10 - 15% practical significance bar of revenue increase. This means that we didn’t observe enough enhancement in our test metrics to be certain about launching the banner feature at this time as the launch may not lead to desired revenue and recommends the A/B test should be repeated with improvement to the banner feature and run for more days in order to arrive at a sample size estimate of 60,900 based on the power analysis. 9 APPENDIX A1. Excel sheet for Globox data-set https://eu.docworkspace.com/d/sIKqRrNFM846zpwY A2. Excel sheet for hypothesis test https://eu.docworkspace.com/d/sIJORrNFM2JuzpwY A3. Tableau visualization for Question 1 to 5 Q1. Conversion rate and average amount spent per between the two user groups https://public.tableau.com/views/GloboxDataset/Q1Visual?:language=enUS&publish=yes&:display_count=n&:origin=viz_share_link Q2. Average amount spent per user for each test group https://public.tableau.com/views/GloboxDatasetVisualQuestion2/Q2Visual?:language=enUS&publish=yes&:display_count=n&:origin=viz_share_link Q3. Relationship between test metrics and user device https://public.tableau.com/views/GloboxDatasetVisualQuestion3/Q3Visual?:language=enUS&:retry=yes&publish=yes&:display_count=n&:origin=viz_share_link Q4. Relationship between test metrics and user gender https://public.tableau.com/views/GloboxDatasetVisualQuestion4/Q4Visual?:language=enUS&publish=yes&:display_count=n&:origin=viz_share_link Q5. Relationship between test metrics and user country https://public.tableau.com/views/GloboxDatasetVisualQuestion5/Q5Visual?:language=enUS&publish=yes&:display_count=n&:origin=viz_share_link A4. Powerpoint presentation https://eu.docworkspace.com/d/sIKSRrNFM4smBqgY B. SQL Query B1. Query for main Globox data-set extraction SELECT u.id AS user_id, u.country, u.gender, g.device, "group", CASE WHEN spent > 0 THEN 'Converted' ELSE 'Not converted' 10 END AS conversion_status, SUM(COALESCE(spent, 0)) AS Total_spent FROM users u LEFT JOIN activity a ON u.id = a.uid LEFT JOIN groups g ON u.id = g.uid GROUP BY u.id, u.country, u.gender, g.device, "group", conversion_status B2. Query for Novelty check SELECT u.id AS user_id, u.country, u.gender, g.device, "group", g.join_dt, CASE WHEN spent > 0 THEN 'Converted' ELSE 'Not converted' END AS conversion_status, SUM(COALESCE(spent, 0)) AS Total_spent FROM users u LEFT JOIN activity a ON u.id = a.uid LEFT JOIN groups g ON u.id = g.uid GROUP BY u.id, u.country, u.gender, g.device, g.join_dt, "group", conversion_status 11 12