Category Archives: Statistics

Reducing Drop-out Rate in WIC Program

Women, Infants and Children (WIC) program is a Special Supplemental Nutrition Program that provides federal grants to states for supplemental foods, health care referrals, and nutrition education for low-income pregnant, breastfeeding, and non-breastfeeding postpartum women, and to infants and children up to age five who are found to be at nutritional risk. The objective is to meet the nutritional needs of young children and nursing and lactating mothers. This has been shown to improve the health of those involved in the program and to help the children be ready to start school. WIC is one of the U.S.’s most cost effective programs in providing nutrition to young families. WIC serves about half of all infants born in the United States.

The state of Texas wanted help in reducing their drop-out rate. A family may become ineligible due to income level or if the youngest child is older than five. However, if a family is eligible, it is better for the children if they remain in the program. We gave the administrators of the program a dashboard (based on PowerBI) with three views:

  • Current state
  • Past trends
  • Drop-out predictions

Current State

The first view showed a map of Texas with the Local Agencies highlighted. The number of certificates for the five certificate types is shown overall (upper left), by County (lower left), by city (lower center and Local Agency-Clinic. One can click on any of these to zero in on the status of that area. For example, if you click on Houston in the lower center, you get the next view.

Now the information specific to Houston is highlighted.

Past Trends

The next view show a history of the number of Certificates of each type. The default view is for the state as a whole and for the entire period of the data. There are some interesting trends, especially for Category ‘C’ in the lower left panel. Management may already have an explanation for these trends, but it’s helpful to see them over time. Again, one can select a specific city, county, Local Agency or clinic as well as a subset of the date range.

When we select Dallas for the most recent two years we get the next view.

Drop-out Predictions

The final view gives a prediction of who is likely to drop out of the program prior to the Certificate end date. The opening view is for the entire state. A client is considered a “likely” dropout based on the threshold set by the slider bar in the upper right. The user has the ability to change this if desired. For this example, a client is considered a likely dropout if the probability they will drop out is greater than 50%. The list in the lower left is the clients with the highest probability of dropping out, sorted decreasing by dropout probability. (The client ID (CID) has been hidden for confidentiality reasons.) There are some clients listed here with a very high probability of dropping out. The middle lower panel is the number of dropouts by city and lower right by clinic for a given Local Agency.

One way this view would be used by local management is to retrieve the information specific to the City or Clinic. When the highest bar in the lower right graph is clicked the Clinic management will get a list of their clients who are most likely to drop out of the program and they can take appropriate action to encourage them to stay in the program.

These predictions were based on numerous sources of data and were constructed using Machine Learning.

Improving Performance for Entering College Students

A state university wanted to increase the success rate of incoming students in their “Gateway” courses. Gateway courses are pre-college level Math and English courses that students must pass before moving on to credit courses if they do not have the qualifications to place them directly into college level courses. There were eight Math and five English Gateway courses. Most of these were sequenced so that a student who needed to start with the first in the sequence may need to take as many as 3-4 courses before their first college credit course. For example, the courses a student needed to complete in Math depended on whether they were in the Science, Education, Social Sciences or Business track.

A student’s work in a course was considered a success if they achieved at least a B-. Our objective was to predict success based on many of the factors that could be related to whether a student would be successful in a specific (Gateway or first college credit) course. We used data sources such as

Pre-college data

  • Placement testing results
  • High school coursework and grades
  • High school attended
  • Date of last course in Math or English (and grade)
  • If English is the student’s first language
  • GED completion if no high school degree
  • Demographic data
  • Veteran status

Previous Gateway courses

  • Courses started, completed and grades
  • Instructors
  • Time since last course

Next Gateway or college credit course

  • Instructor
  • Student status (e.g. PT/FT, number of concurrent courses)
  • Student employment status (number of hours per week)
  • Number of classroom sessions per week
  • Classroom hours per week

The resulting model worked quite well in predicting the success of each student in the next Gateway or college credit course. The school is using this model to make recommendations to students as to which Gateway course they should take next as well as advising students on how to be successful in their next course. Administrators and faculty may use these results to improve the sequence of gateway courses for all students. Finally, there was one accelerated Gateway Math course that could be taken in one quarter or taken at a slower pace as two courses over two quarters. The study helped the school understand who should not be taking this accelerated course.

Crime and Economics

Crime has always been a problem for society. Over the course of human history there have been plenty of instances in which the level of criminal activity rose to an uncontrollable level. One example of this calamitous process of increasing social upheaval was observed in the events leading up to the French Revolution. Late nineteenth century England provides another equally applicable illustration. Clearly, it would be prudent to find whether there is a root cause attributed to these surges in social lawlessness.

This is not the first time such an endeavor has been carried out in data science. The reason for the spike in violent crime that occurred throughout the late 1990s across the United States has long been a subject of debate. Strangely enough one of the more credible arguments attributes the violent crime to emissions from gasoline with a lead additive. This practice of adding lead to gasoline was finally ended in 1986 but the psychological damage to the children that grew up breathing in those fumes had already taken its toll.

Case Study – Seattle WA

The purpose of this report is to determine what possible causes, if any, may account for increasing crime trends. These possible causes have been isolated to only include socioeconomic variables that are deemed reliable in their estimation. The research was done only for the city of Seattle WA for the six year period beginning in 2008. However, it was structured in order that the process could be repeated in other cities. My research plan included the following stages:

  1. Researching theories of notable experts in the field.
    • Primary expert used was Gary Becker and his ‘Economic Model of Crime’
  2. Empirical analysis of raw data and analysis of quantified results.
    • This was accomplished using a time series, multiple regression model. The independent variables used were CPI, unemployment (with lags), economic growth index, and population.
  3. Interview of law enforcement official along with researching of historical validity.
  4. Recommendation of further study and actions the public can take to prevent increasing crime trends.

I found a strong connection between economic indicators and reported criminal activity. The results of the model detailed that increased inflation (consumer price index) had the largest impact on increasing crime. These results could be interpreted as the following:

A 1% increase in in prices (CPI) causes a

  • 3% increase in Assaults
  • 4% increase in Burglaries
  • 5% increase in Robberies

This study was done in the hopes of finding an economic variable that would not just predict but allow prevention of sudden increases in crime. The link between crime and increased prices provides a relationship that helps decision makers understand the crime problem and may suggest proactive steps society can take, given that the model and results are accurate.

For further details, please see the full report.