Introduction
The use of data has driven unmatched success in almost every industry. Some of this data is obtained is through A/B testing. This is a statistical test in which you have two or more versions of a variable. It is also sometimes referred to as bucket testing because you can either do this or that. For example, if I run a study on whether people like sleeping in hot rooms or cold rooms better it is an A/B test because the respondent can only pick a certain number of things (in this case 2), either hot room or cold room. Now, this type of testing has some drawbacks, and people are taking it way too literally and failing to understand the context when they use this. This concept of being careful with this test is further described in the article “Delusive Extrapolation and A/B Testing” by Staffan Nöteberg (the article can be found here).
Summary of Article
The article lists a multitude of reasons to be careful with this test. The first one is about the extrapolation of the test. Nöteberg points out that this test is a rather simple one and we use it to predict rather complicated results. He also warns us that we should be weary of predicting the future and its accuracy, especially with something as simple as this test. Next, he transitions from the nature of how the test is perceived by the respondent to the actual preliminary work before the test. Nöteberg cites things such as Feedback loops, nonlinearities, and the size of the data set as other factors of error in this test. The next topic that Nöteberg notes is the lack of proof for true causation. One goal of a study is to determine if 2 variables are correlated and later have causation. No matter how well the study is set up, there will always be a chance of something else being the driving force in the results compared to what you predicted in the study. Lastly, the article finishes by talking about time and its effect. Nöteberg points out that one must not forget the effect of time. He recognizes that what is true at one time might not be true at another.
My Take
Overall the article was a good one. It was pure statistics, but I liked it. It explained the concepts well, and since this is on the newer side to me that was very much appreciated. Also, the fact that the author provided hyperlinks to anything technical helped me understand it as it gave me more resources to look into. I was shocked to see the sheer volume of factors that can cause these tests to be unreliable. Since these are paramount to data science, it was scary to see how many possible flaws these tests can have. I am also shocked however that even with so many possible flaws we as data scientists can predict, pretty much the future, with such precision. It speaks volumes about the subject. I also liked how the author included examples from the real world and notable companies like The Coca-Cola Company to show the readers how this is used heavily in the real world, and how real-world companies also have to face these problems. One question I had however was whether there is a quantitate way that we can measure how far an A/B test can be extrapolated. The author mentioned problems with overextrapolation but how much is too much? Is there a quantitative way of figuring this out with numbers so that we can have that boundary to ensure that we are keeping integrity in our experiments and our findings?
Conclusion
All in all, this was a good, thought-provoking article. It was written very well, with a plethora of extra resources to be accessed if needed. It was also a very interesting article to look into the foundation of data science and how many possible flaws this type of test has. This was a really good article and I recommend you read it (the article can be found here).