Category Archives: Increase Sales

Surprising A/B Testing Result #1

One reason we run experiments is that it is difficult, if not impossible, to consistently know if changes we want to make to a website will help us reach our objectives. Quite often changes that are thought would help our site end up being negative for the most important metrics.

The MSN Real Estate site wanted us to run a test to improve revenue for an advertising widget. They had a design company make five new widgets to compete with the current one. (By convention, the default user experience is called the Control and the competitors are known as Treatments. If there are no Treatments better than the Control, best practice is to keep the Control.) The experiment tested all six widgets concurrently by randomly assigning one-sixth of users to see each widget over a two week period. Here are the six competitors. Before you read further, check your understanding by guessing which widget would perform the best.

We ran a survey prior to the experiment for about 60 people to guess which widget would bring the most advertising revenue. Members of the MSN Real Estate team, the design company and our experimentation team participated in the survey. The widget that won is the one that got the fewest votes, Treatment 5. This widget, the simplest one, was statistically significantly better than any of the other widgets for revenue as well as click-throughs and had a 9.7% increase in revenue over the Control.

An experiment is great for objectively testing with end users under natural conditions which option will perform best. With an experiment we can draw a direct causal link and conclude that changing to the new alternative will improve our primary metrics. What an experiment does not tell us is “why?”. For example, in this experiment, did Treatment 5 perform better because it took the least space on the page? Or because it required the least input from the user? Or was less confusing? Or for some other reason? The experiment itself cannot tell us which of these is the reason for the better performance. We can begin to understand why by consulting design and usability experts and by running follow-up experiments that test new alternatives to the best-performing Treatment.

Improving Call Center Sales With Multi-Factor Experimentation

Call centers are a natural place to run experiments because all the elements are present to give good experimental results. (For a list of these elements, see White Paper.) In a typical call center there are potentially hundreds of agents handling many calls per day. Each call can have several quantifiable outcomes related to the objective of interest. For example, the length of each call can be measured if the objective is to improve efficiency. If the objective is to increase revenue, sales per call is measured. Other metrics may include customer satisfaction, return call rate, customer retention, etc.

In some cases several call centers from the same organization will be part of the same experiment which will ensure the results are applicable to all call centers. Industries that use experimentation in call centers include credit card companies, banks, service, retail and online. Experimentation works equally well whether the calls are in-bound or out-bound.

Case Study –Improving Call Center Sales

This organization wanted to improve net sales in their call centers. They had eight call centers where they received calls regarding their credit cards and chose to use three in this improvement effort. In addition to the primary objective of increasing sales they wanted to decrease time on call and improve employee satisfaction since they were experiencing high employee turnover. After conducting a number of brainstorming sessions with customer service representatives (CSRs), team leads and managers the list of ideas was narrowed down to those that were actually tested. The experiment was in three call centers, included hundreds of CSRs, 24 team leads and tested 10 ideas.

The ideas they tested were:

  1. Sales coach availability (coach was ready to coach after any sales call or not)
  2. Unit manager monitoring calls (or not)
  3. Use of lead associates as coaches (instead of dedicated sales coaches)
  4. Operations manager available on the floor (or not)
  5. Use of unit managers as coaches
  6. Increase the time off the phone for call center associates
  7. Increased training to access customer and product information
  8. New hire coaching (or not)
  9. Self-paced training for call center associates (via taped calls, or not)
  10. Self-paced training for call center associates (via Web, or not)

Five of these factors were identified as improving at least one of the key metrics (increasing sales, decreasing call time, improving employee satisfaction). The increase in net sales was approximately four times what Citibank management had hoped the experiment would achieve and resulted in an additional millions of dollars in sales per year!

In addition to improvement in sales, the implemented factors also had a notable positive impact on employee morale and engagement.

Increasing Sales in Retail Using Multi-Factor Experimentation

We have personally been involved with experiments to improve retail operations with big box retailers such as Saks and several of its subsidiaries, Toys-R-Us and AutoNation. Other retailers that have made improvements through multi-factor experimentation include Pilot, Quick Chek, Lowe’s, and others. The most common goal in these experiments is revenue improvement with cost reduction or profit margin often a secondary objective. The business ideas that were tested with these clients included media factors (e.g. TV, radio, newspaper), in-store signage, in-store layout, sales force appearance, staffing levels, sales process, loyalty program features and more.

Case Study – Retail Store Chain

This case study involves a large regional retailer in the U.S. This retailer had over 150 stores in a six state region. After an analysis of the most recent years of weekly sales and discussions with operations managers 32 stores were ruled out as being ineligible due to metric instability at the stores. Since we were using comp sales (i.e. ratio of this year’s sales to last year’s sales) any store that was not open for at least a year or had an unusual disruption in sales last year was disqualified. Also, any store where major changes had taken place since the previous year (e.g. major remodeling or increase in square footage) was eliminated as was any store that was expected to have a disruption during the time of the experiment.

After many brainstorming sessions and filtering of ideas, we were left with 23 ideas to be tested. Examples of the ideas tested:

  1. Change in sales associate attire,
  2. Changes to the newspaper circular,
  3. Store signage (prices, directions),
  4. Product demos,
  5. Checkout procedure,
  6. Changes to sales training,
  7. Sales incentives,
  8. Management incentives, etc.

Since there were 23 ideas to be tested, we needed a statistical design that would be able to estimate the effect of each of these on sales independently with maximum sensitivity or power. (The technical term for the statistical design we used is a 24 run Placket-Burman design (ref).)

We needed 24 groups of stores for this test where each group of stores received the same set of factors. Therefore, a subset of the 118 stores was randomly assigned to the 24 groups with an almost equal number of stores in each group. Since we had some media factors (newspaper and radio) in the mix we used a restricted randomization scheme (stores in the same group had to get the same radio and newspaper factor). The screening experiment ran for six weeks.

Seven of the 23 factors in the screening test were statistically significant with five having a positive effect. These five were carried into the refining experiment. Since fewer factors were being tested the second experiment was logistically much simpler but the size of the experiment (number of stores and number of weeks) needed to be about the same to get the same sensitivity as the screening experiment. We chose a statistical design that would allow us to estimate interactions among the factors. We used a full factorial in the five factors, which required 32 groups of stores.

The final analysis showed all five of the factors in the refining design would help sales and gave a prediction of a 10% increase in sales if all five were implemented. Follow-up analysis confirmed a sales increase of approximately 10% for the chain after implementation of the five ideas.