Surprising A/B Testing Result #1

One reason we run experiments is that it is difficult, if not impossible, to consistently know if changes we want to make to a website will help us reach our objectives. Quite often changes that are thought would help our site end up being negative for the most important metrics.

The MSN Real Estate site wanted us to run a test to improve revenue for an advertising widget. They had a design company make five new widgets to compete with the current one. (By convention, the default user experience is called the Control and the competitors are known as Treatments. If there are no Treatments better than the Control, best practice is to keep the Control.) The experiment tested all six widgets concurrently by randomly assigning one-sixth of users to see each widget over a two week period. Here are the six competitors. Before you read further, check your understanding by guessing which widget would perform the best.

We ran a survey prior to the experiment for about 60 people to guess which widget would bring the most advertising revenue. Members of the MSN Real Estate team, the design company and our experimentation team participated in the survey. The widget that won is the one that got the fewest votes, Treatment 5. This widget, the simplest one, was statistically significantly better than any of the other widgets for revenue as well as click-throughs and had a 9.7% increase in revenue over the Control.

An experiment is great for objectively testing with end users under natural conditions which option will perform best. With an experiment we can draw a direct causal link and conclude that changing to the new alternative will improve our primary metrics. What an experiment does not tell us is “why?”. For example, in this experiment, did Treatment 5 perform better because it took the least space on the page? Or because it required the least input from the user? Or was less confusing? Or for some other reason? The experiment itself cannot tell us which of these is the reason for the better performance. We can begin to understand why by consulting design and usability experts and by running follow-up experiments that test new alternatives to the best-performing Treatment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s