In the private sector, companies sponsor data science competitions to solve stagnating problems, outsource difficult work and advertise to the labor market. Netflix announced the first major data challenge in 2006 in hopes of improving its recommendation algorithm by 10%. The reward—a cool $1 million.
In the first week, a team submitted an algorithm that bested Netflix’s proprietary model. Throughout the competition, Netflix required the competitors to publish their codes, and eventually, the winning algorithm—to the benefit of the data science community at large.
Most recently, the real estate marketplace Zillow hosted a challenge to improve its zEstimate and awarded a $1.2 million prize to a team of data scientists who had never met each other in person. It exchanged code through GitHub, coordinated through Slack, and spent hours on video chat. This was truly a 21st-century collaboration.
Inspired by the private sector and motivated to increase collaboration among the Army’s data scientists, Maj. Gen. John Ferrari, then director of program analysis and evaluation (PA&E) in the Army’s Office of the Deputy Chief of Staff for Programs, and his deputy, Krystyna Kolesar, this past spring proposed the service’s first data science challenge. This friendly competition, held from July 15 to Aug. 9, encouraged analysts to build teams of complementary talents to best predict the number of aviation flying hours required to meet the Army’s readiness goals based on historical execution and outcomes.
The development team relied on contributions across PA&E to include data-cleaning by Maj. Oliver Moore and full-time technical support from PA&E data specialists Jeff Ward and Michael Zager.
The competition challenge seemed relatively simple: “How many flight hours will a combat aviation brigade fly in a given month?” However, many entangled data elements impacted the problem. Disaggregating the data and identifying the relative importance of each data element echoed the readiness challenge faced by commanders and staffs across the Army.
As evidence of the complexity of the problem, the Army, at the headquarters level, has seen a variance in predicted flight hours required for combat formations, due in no small part to the rapidly changing operating environment, the development of a more modern fleet, and variations in pilot performance.
The Flying Hour Challenge team focused on delivering a minimum viable product quickly. In just over a month, the PA&E team canvassed the Army and obtained seven years of the following historical records (at the month level): maintenance history by aircraft, individual personnel records, fuel usage, ammunition expenditure, force structure, accident records, deployment information, training center rotations, weather by location and flight hours by aircraft. The most recent two years of data were held back, to measure the predictions generated by each team. Teams then had three weeks to clean the data, analyze and produce a model.
Access to the data was critical for this learning event. The PA&E team leveraged its cloud-based cProbe system to host the data, as well as baseline analysis tools, primarily RStudio software. Making the data and software network accessible through a cloud server saved each team time and resources, while allowing the PA&E team to closely observe activity.
Midway through the competition, the teams needed the data processing capability to handle the increased stress of multiple teams testing complex models simultaneously. Hosting on the cloud allowed the PA&E data management team to increase computing capacity overnight, rather than take weeks to order and install required hardware.
Teams used data science techniques, including linear regression, machine learning, social network analysis and other analytics. To determine a winner, the PA&E team measured each submission against the most recent two years of each combat aviation brigade’s monthly flying hours. To inspire creativity in presenting results, a second competitive category was set: best data visualization.
Of the 19 teams in the competition, 12 submitted a final model for prediction and/or visualization, and the top three teams for each category were recognized. Capt. Iain Cruickshank, the overall winner, is a doctoral candidate at Carnegie Mellon University in Pittsburgh, studying societal computing with professor Kathleen Carley. His research focuses on dynamic social network analysis using machine learning. Cruickshank described his predictive model as a classic implementation of machine learning. Lt. Col. Kelly Ryan of the Army’s Artificial Intelligence Task Force described him as an innovative problem-solver who finds new ways to frame problems: “The way Capt. Cruickshank sees a problem often opens up new avenues for solutions.”
For the second category, Maj. Gen. Karl Gingrich, who became PA&E director in July, selected the best visualization. The winning team, from the U.S. Army Network Enterprise Technology Command, provided a clear, concise narrative of its modeling process and provided stimulating visualizations that helped the viewer understand the data, and ultimately, the Army flying-hour program.
In the data science community, the tremendous value of an illustrative visualization cannot be overstated, but especially in an army. There, where battle plans and intelligence insights are often depicted pictorially, effective data visualization is critical.
The challenge teams identified data factors that affect Army readiness predictive models. Funding constraints in prior years influenced training, operations, supply and maintenance. Modernization changed the mix of aircraft in the units, introducing new patterns of maintenance and supply consumption. Deployments and combat training center rotations both built and consumed readiness.
Finally, aviation units historically have task-organized to provide ground commanders with a concentrated mix of capabilities focused on training and deployments. As Army units change their deployment patterns and units receive modernized equipment, creating adaptive predictive models for readiness will prove essential.
Looking ahead, Gingrich plans “to continue this kind of competition in order to solve real-world challenges together, as an analytic community.”
The Flying Hour Challenge exemplified how analysts and data scientists from across the Army can come together virtually and crowdsource to solve complex problems. As the Army changes focus to confront new threats, these types of data analytics will help leaders understand their decision space, ensuring a more optimal allocation of scarce resources.