Increasing Sales in Retail Using Multi-Factor Experimentation

We have personally been involved with experiments to improve retail operations with big box retailers such as Saks and several of its subsidiaries, Toys-R-Us and AutoNation. Other retailers that have made improvements through multi-factor experimentation include Pilot, Quick Chek, Lowe’s, and others. The most common goal in these experiments is revenue improvement with cost reduction or profit margin often a secondary objective. The business ideas that were tested with these clients included media factors (e.g. TV, radio, newspaper), in-store signage, in-store layout, sales force appearance, staffing levels, sales process, loyalty program features and more.

Case Study – Retail Store Chain

This case study involves a large regional retailer in the U.S. This retailer had over 150 stores in a six state region. After an analysis of the most recent years of weekly sales and discussions with operations managers 32 stores were ruled out as being ineligible due to metric instability at the stores. Since we were using comp sales (i.e. ratio of this year’s sales to last year’s sales) any store that was not open for at least a year or had an unusual disruption in sales last year was disqualified. Also, any store where major changes had taken place since the previous year (e.g. major remodeling or increase in square footage) was eliminated as was any store that was expected to have a disruption during the time of the experiment.

After many brainstorming sessions and filtering of ideas, we were left with 23 ideas to be tested. Examples of the ideas tested:

  1. Change in sales associate attire,
  2. Changes to the newspaper circular,
  3. Store signage (prices, directions),
  4. Product demos,
  5. Checkout procedure,
  6. Changes to sales training,
  7. Sales incentives,
  8. Management incentives, etc.

Since there were 23 ideas to be tested, we needed a statistical design that would be able to estimate the effect of each of these on sales independently with maximum sensitivity or power. (The technical term for the statistical design we used is a 24 run Placket-Burman design (ref).)

We needed 24 groups of stores for this test where each group of stores received the same set of factors. Therefore, a subset of the 118 stores was randomly assigned to the 24 groups with an almost equal number of stores in each group. Since we had some media factors (newspaper and radio) in the mix we used a restricted randomization scheme (stores in the same group had to get the same radio and newspaper factor). The screening experiment ran for six weeks.

Seven of the 23 factors in the screening test were statistically significant with five having a positive effect. These five were carried into the refining experiment. Since fewer factors were being tested the second experiment was logistically much simpler but the size of the experiment (number of stores and number of weeks) needed to be about the same to get the same sensitivity as the screening experiment. We chose a statistical design that would allow us to estimate interactions among the factors. We used a full factorial in the five factors, which required 32 groups of stores.

The final analysis showed all five of the factors in the refining design would help sales and gave a prediction of a 10% increase in sales if all five were implemented. Follow-up analysis confirmed a sales increase of approximately 10% for the chain after implementation of the five ideas.