Using Data from Schools and Child Welfare Agencies to Predict Near-Term Academic Risks

Independent Research Report

The National Center for Education Evaluation and Regional Assistance at the Institute of Education Sciences (US Department of Education) examined data from Allegheny County students to better understand predictors of near-term academic risks. The goal of this research to provide information for administrators, researchers, and student support staff in local education agencies who are interested in identifying students who are likely to have near-term academic problems such as absenteeism, suspensions, poor grades, and low performance on state tests.

What is this report about? 

The report describes an approach for developing a predictive model and assesses how well the model identifies at-risk students using data from two local education agencies in Allegheny County, Pennsylvania: a large local education agency and a smaller charter school network. It also examines which types of predictors— in-school variables (performance, behavior, and consequences) and out-of-school variables (human services involvement and public benefit receipt)—are individually related to each type of near-term academic problem to better understand why the model might flag students as at risk and how best to support these students.

What are the takeaways?

The study finds that predictive models using machine learning algorithms identify at-risk students with moderate to high accuracy. In-school variables drawing on school data are the strongest predictors across all outcomes, and predictive performance is not reduced much when out-of-school variables drawing on human services data are excluded and only school data are used. However, some out-of-school events and services—including child welfare involvement, emergency homeless services, and juvenile justice system involvement —are individually related to near-term academic problems. The models are more accurate for the large local education agency than for the smaller charter school network. The models are better at predicting low grade point average, course failure, and scores below the basic level on state tests in grades 3–8 than at predicting chronic absenteeism, suspensions, and scores below the basic level on high school end-of-course standardized tests. The findings suggest that many local education agencies could apply machine learning algorithms to existing school data to identify students who are at risk of near-term academic problems that are known to be precursors to school dropout.