Reducing Drop-out Rate in WIC Program

Women, Infants and Children (WIC) program is a Special Supplemental Nutrition Program that provides federal grants to states for supplemental foods, health care referrals, and nutrition education for low-income pregnant, breastfeeding, and non-breastfeeding postpartum women, and to infants and children up to age five who are found to be at nutritional risk. The objective is to meet the nutritional needs of young children and nursing and lactating mothers. This has been shown to improve the health of those involved in the program and to help the children be ready to start school. WIC is one of the U.S.’s most cost effective programs in providing nutrition to young families. WIC serves about half of all infants born in the United States.

The state of Texas wanted help in reducing their drop-out rate. A family may become ineligible due to income level or if the youngest child is older than five. However, if a family is eligible, it is better for the children if they remain in the program. We gave the administrators of the program a dashboard (based on PowerBI) with three views:

  • Current state
  • Past trends
  • Drop-out predictions

Current State

The first view showed a map of Texas with the Local Agencies highlighted. The number of certificates for the five certificate types is shown overall (upper left), by County (lower left), by city (lower center and Local Agency-Clinic. One can click on any of these to zero in on the status of that area. For example, if you click on Houston in the lower center, you get the next view.

Now the information specific to Houston is highlighted.

Past Trends

The next view show a history of the number of Certificates of each type. The default view is for the state as a whole and for the entire period of the data. There are some interesting trends, especially for Category ‘C’ in the lower left panel. Management may already have an explanation for these trends, but it’s helpful to see them over time. Again, one can select a specific city, county, Local Agency or clinic as well as a subset of the date range.

When we select Dallas for the most recent two years we get the next view.

Drop-out Predictions

The final view gives a prediction of who is likely to drop out of the program prior to the Certificate end date. The opening view is for the entire state. A client is considered a “likely” dropout based on the threshold set by the slider bar in the upper right. The user has the ability to change this if desired. For this example, a client is considered a likely dropout if the probability they will drop out is greater than 50%. The list in the lower left is the clients with the highest probability of dropping out, sorted decreasing by dropout probability. (The client ID (CID) has been hidden for confidentiality reasons.) There are some clients listed here with a very high probability of dropping out. The middle lower panel is the number of dropouts by city and lower right by clinic for a given Local Agency.

One way this view would be used by local management is to retrieve the information specific to the City or Clinic. When the highest bar in the lower right graph is clicked the Clinic management will get a list of their clients who are most likely to drop out of the program and they can take appropriate action to encourage them to stay in the program.

These predictions were based on numerous sources of data and were constructed using Machine Learning.

Surprising A/B Testing Result #1

One reason we run experiments is that it is difficult, if not impossible, to consistently know if changes we want to make to a website will help us reach our objectives. Quite often changes that are thought would help our site end up being negative for the most important metrics.

The MSN Real Estate site wanted us to run a test to improve revenue for an advertising widget. They had a design company make five new widgets to compete with the current one. (By convention, the default user experience is called the Control and the competitors are known as Treatments. If there are no Treatments better than the Control, best practice is to keep the Control.) The experiment tested all six widgets concurrently by randomly assigning one-sixth of users to see each widget over a two week period. Here are the six competitors. Before you read further, check your understanding by guessing which widget would perform the best.

We ran a survey prior to the experiment for about 60 people to guess which widget would bring the most advertising revenue. Members of the MSN Real Estate team, the design company and our experimentation team participated in the survey. The widget that won is the one that got the fewest votes, Treatment 5. This widget, the simplest one, was statistically significantly better than any of the other widgets for revenue as well as click-throughs and had a 9.7% increase in revenue over the Control.

An experiment is great for objectively testing with end users under natural conditions which option will perform best. With an experiment we can draw a direct causal link and conclude that changing to the new alternative will improve our primary metrics. What an experiment does not tell us is “why?”. For example, in this experiment, did Treatment 5 perform better because it took the least space on the page? Or because it required the least input from the user? Or was less confusing? Or for some other reason? The experiment itself cannot tell us which of these is the reason for the better performance. We can begin to understand why by consulting design and usability experts and by running follow-up experiments that test new alternatives to the best-performing Treatment.

Improving Performance for Entering College Students

A state university wanted to increase the success rate of incoming students in their “Gateway” courses. Gateway courses are pre-college level Math and English courses that students must pass before moving on to credit courses if they do not have the qualifications to place them directly into college level courses. There were eight Math and five English Gateway courses. Most of these were sequenced so that a student who needed to start with the first in the sequence may need to take as many as 3-4 courses before their first college credit course. For example, the courses a student needed to complete in Math depended on whether they were in the Science, Education, Social Sciences or Business track.

A student’s work in a course was considered a success if they achieved at least a B-. Our objective was to predict success based on many of the factors that could be related to whether a student would be successful in a specific (Gateway or first college credit) course. We used data sources such as

Pre-college data

  • Placement testing results
  • High school coursework and grades
  • High school attended
  • Date of last course in Math or English (and grade)
  • If English is the student’s first language
  • GED completion if no high school degree
  • Demographic data
  • Veteran status

Previous Gateway courses

  • Courses started, completed and grades
  • Instructors
  • Time since last course

Next Gateway or college credit course

  • Instructor
  • Student status (e.g. PT/FT, number of concurrent courses)
  • Student employment status (number of hours per week)
  • Number of classroom sessions per week
  • Classroom hours per week

The resulting model worked quite well in predicting the success of each student in the next Gateway or college credit course. The school is using this model to make recommendations to students as to which Gateway course they should take next as well as advising students on how to be successful in their next course. Administrators and faculty may use these results to improve the sequence of gateway courses for all students. Finally, there was one accelerated Gateway Math course that could be taken in one quarter or taken at a slower pace as two courses over two quarters. The study helped the school understand who should not be taking this accelerated course.

Crime and Economics

Crime has always been a problem for society. Over the course of human history there have been plenty of instances in which the level of criminal activity rose to an uncontrollable level. One example of this calamitous process of increasing social upheaval was observed in the events leading up to the French Revolution. Late nineteenth century England provides another equally applicable illustration. Clearly, it would be prudent to find whether there is a root cause attributed to these surges in social lawlessness.

This is not the first time such an endeavor has been carried out in data science. The reason for the spike in violent crime that occurred throughout the late 1990s across the United States has long been a subject of debate. Strangely enough one of the more credible arguments attributes the violent crime to emissions from gasoline with a lead additive. This practice of adding lead to gasoline was finally ended in 1986 but the psychological damage to the children that grew up breathing in those fumes had already taken its toll.

Case Study – Seattle WA

The purpose of this report is to determine what possible causes, if any, may account for increasing crime trends. These possible causes have been isolated to only include socioeconomic variables that are deemed reliable in their estimation. The research was done only for the city of Seattle WA for the six year period beginning in 2008. However, it was structured in order that the process could be repeated in other cities. My research plan included the following stages:

  1. Researching theories of notable experts in the field.
    • Primary expert used was Gary Becker and his ‘Economic Model of Crime’
  2. Empirical analysis of raw data and analysis of quantified results.
    • This was accomplished using a time series, multiple regression model. The independent variables used were CPI, unemployment (with lags), economic growth index, and population.
  3. Interview of law enforcement official along with researching of historical validity.
  4. Recommendation of further study and actions the public can take to prevent increasing crime trends.

I found a strong connection between economic indicators and reported criminal activity. The results of the model detailed that increased inflation (consumer price index) had the largest impact on increasing crime. These results could be interpreted as the following:

A 1% increase in in prices (CPI) causes a

  • 3% increase in Assaults
  • 4% increase in Burglaries
  • 5% increase in Robberies

This study was done in the hopes of finding an economic variable that would not just predict but allow prevention of sudden increases in crime. The link between crime and increased prices provides a relationship that helps decision makers understand the crime problem and may suggest proactive steps society can take, given that the model and results are accurate.

For further details, please see the full report.

Improving Call Center Sales With Multi-Factor Experimentation

Call centers are a natural place to run experiments because all the elements are present to give good experimental results. (For a list of these elements, see White Paper.) In a typical call center there are potentially hundreds of agents handling many calls per day. Each call can have several quantifiable outcomes related to the objective of interest. For example, the length of each call can be measured if the objective is to improve efficiency. If the objective is to increase revenue, sales per call is measured. Other metrics may include customer satisfaction, return call rate, customer retention, etc.

In some cases several call centers from the same organization will be part of the same experiment which will ensure the results are applicable to all call centers. Industries that use experimentation in call centers include credit card companies, banks, service, retail and online. Experimentation works equally well whether the calls are in-bound or out-bound.

Case Study –Improving Call Center Sales

This organization wanted to improve net sales in their call centers. They had eight call centers where they received calls regarding their credit cards and chose to use three in this improvement effort. In addition to the primary objective of increasing sales they wanted to decrease time on call and improve employee satisfaction since they were experiencing high employee turnover. After conducting a number of brainstorming sessions with customer service representatives (CSRs), team leads and managers the list of ideas was narrowed down to those that were actually tested. The experiment was in three call centers, included hundreds of CSRs, 24 team leads and tested 10 ideas.

The ideas they tested were:

  1. Sales coach availability (coach was ready to coach after any sales call or not)
  2. Unit manager monitoring calls (or not)
  3. Use of lead associates as coaches (instead of dedicated sales coaches)
  4. Operations manager available on the floor (or not)
  5. Use of unit managers as coaches
  6. Increase the time off the phone for call center associates
  7. Increased training to access customer and product information
  8. New hire coaching (or not)
  9. Self-paced training for call center associates (via taped calls, or not)
  10. Self-paced training for call center associates (via Web, or not)

Five of these factors were identified as improving at least one of the key metrics (increasing sales, decreasing call time, improving employee satisfaction). The increase in net sales was approximately four times what Citibank management had hoped the experiment would achieve and resulted in an additional millions of dollars in sales per year!

In addition to improvement in sales, the implemented factors also had a notable positive impact on employee morale and engagement.

Reducing Legal Fees

Although we would hope it weren’t the case, sometimes our customers have a problem. It may be with the product/service itself, or it may be with the licensing process or other issue. This often leads to what is known as the “moment of truth” when the response to the customer determines whether that customer will abandon you, remain as an unhappy customer or become a loyal advocate. These interactions should be anticipated and a planned response ready for the occasion. Some of the responses are quick and inexpensive and others may be prolonged and expensive. Are the expensive solutions worth it? Can we implement inexpensive solutions that have higher ROI than the more expensive ones? An experiment can help determine how we should respond.

Case Study – Insurance Company

This insurance company had done several process improvement projects, some with designed experiments, to improve cost structure and customer satisfaction in the call center, insurance application and claims processing processes but still had a significant problem. There was an alarming trend toward more claimants hiring attorneys where there was bodily injury (BI) after an auto accident. When they looked at the data for a large number of claims they found that for claims with an attorney involvement 1) the cost to the company was significantly higher, 2) the claims process took significantly longer and 3) the claimant received less than $100 on average. Therefore, it seemed it would be in everyone’s best interest to reduce attorney involvement in the claims process (except for the attorneys 😉 ).

At the beginning of this effort 40% of the claims with BI had attorney involvement. Initial process improvement efforts targeting the “low-hanging fruit” lowered that to 36%. Each reduction of 1% equated to a savings of more than $5,000,000 to the company.

They held brainstorming sessions with many employees and did several surveys to get a long list of potential ideas they could test to reduce the percentage of claims where an attorney was retained in the first 60 days after the accident. They narrowed the list of ideas to be tested down to 15.

The screening test showed four ideas that had a beneficial effect (in order of benefit):

  1. Pay more than blue book value when the value of the automobile was debatably higher. (This was a controversial finding which went against the philosophy of the CEO and the industry – to pay more than necessary for property damage claims in order to reduce the overall cost of bodily injury claims.)
  2. When appropriate (and legal) give the claimant an open-ended BI release form, i.e. if the claimant came back later with additional medical expenses than originally anticipated, the company would pay those expenses.
  3. Increase the number of in-person contacts with the claimant.
  4. Increase the range and discretion by agents in settling BI claims.

These ideas were all put into a second (refining) experiment and not only were they validated, they got a more precise estimate of the cost and benefit of each idea. The final, verified improvement in percentage of claims with attorney involvement was a reduction of an additional 8%, from 36% to 28%.

Increasing Sales in Retail Using Multi-Factor Experimentation

We have personally been involved with experiments to improve retail operations with big box retailers such as Saks and several of its subsidiaries, Toys-R-Us and AutoNation. Other retailers that have made improvements through multi-factor experimentation include Pilot, Quick Chek, Lowe’s, and others. The most common goal in these experiments is revenue improvement with cost reduction or profit margin often a secondary objective. The business ideas that were tested with these clients included media factors (e.g. TV, radio, newspaper), in-store signage, in-store layout, sales force appearance, staffing levels, sales process, loyalty program features and more.

Case Study – Retail Store Chain

This case study involves a large regional retailer in the U.S. This retailer had over 150 stores in a six state region. After an analysis of the most recent years of weekly sales and discussions with operations managers 32 stores were ruled out as being ineligible due to metric instability at the stores. Since we were using comp sales (i.e. ratio of this year’s sales to last year’s sales) any store that was not open for at least a year or had an unusual disruption in sales last year was disqualified. Also, any store where major changes had taken place since the previous year (e.g. major remodeling or increase in square footage) was eliminated as was any store that was expected to have a disruption during the time of the experiment.

After many brainstorming sessions and filtering of ideas, we were left with 23 ideas to be tested. Examples of the ideas tested:

  1. Change in sales associate attire,
  2. Changes to the newspaper circular,
  3. Store signage (prices, directions),
  4. Product demos,
  5. Checkout procedure,
  6. Changes to sales training,
  7. Sales incentives,
  8. Management incentives, etc.

Since there were 23 ideas to be tested, we needed a statistical design that would be able to estimate the effect of each of these on sales independently with maximum sensitivity or power. (The technical term for the statistical design we used is a 24 run Placket-Burman design (ref).)

We needed 24 groups of stores for this test where each group of stores received the same set of factors. Therefore, a subset of the 118 stores was randomly assigned to the 24 groups with an almost equal number of stores in each group. Since we had some media factors (newspaper and radio) in the mix we used a restricted randomization scheme (stores in the same group had to get the same radio and newspaper factor). The screening experiment ran for six weeks.

Seven of the 23 factors in the screening test were statistically significant with five having a positive effect. These five were carried into the refining experiment. Since fewer factors were being tested the second experiment was logistically much simpler but the size of the experiment (number of stores and number of weeks) needed to be about the same to get the same sensitivity as the screening experiment. We chose a statistical design that would allow us to estimate interactions among the factors. We used a full factorial in the five factors, which required 32 groups of stores.

The final analysis showed all five of the factors in the refining design would help sales and gave a prediction of a 10% increase in sales if all five were implemented. Follow-up analysis confirmed a sales increase of approximately 10% for the chain after implementation of the five ideas.