In complex systems such as airline travel, predicting delays can be daunting. Given the multitude of factors such as maintenance problems, security concerns, or congestion, weather stands out as the major contributing factor to late arrivals of aircraft. According to the Bureau of Transportation Statistics, weather accounted for 33 to 46% of all delay minutes during the past 10 years, in which they include extreme weather as well as 53% of the National Aviation System delays and spillover from previous flights (‘Aircraft Arriving Late’).
Examination of the TranStats database reveals significant aircraft delays during weather such as fog, thundershowers, and snow. High winds often accompany heavy rain and are a cause of delay unto themselves. In the United States, inclement weather is particularly strong during the month of January in 2017, and choosing Chicago (“The Windy City”) as a point of origin we take flights to the relatively warm state of Georgia to another major airport in Atlanta. To try to model how weather affects delays on this route, we capture factors such as temperature, precipitation, wind gust speed, and if weather events such as rain or snow occurred in both Chicago and Atlanta using Weather Underground. With this dataset we are able to perform a causal analysis.
Traditional statistical techniques such as multiple linear regression offer potential help in determining weights for various factors causing delay, but linear models are unlikely to help when most flights have zero delays, then occasionally suffer medium to large late spikes. Given this complexity, machine learning offers potential help in not only model construction (open source programs such as Google’s TensorFlow make creating models relatively straightforward) but also in accuracy.
- TensorFlow’s DNNRegressor (Deep Neural Network) performs better than the LinearRegressor with about a 25% lower loss rate using a 465 row training set for each.
- Loss rates can be further reduced by increasing the training set size to about 2/3rd of the total dataset, but begin increasing again as the evaluation set becomes to statistically insignificant in relative size to offer any meaningful comparison.
- When using the predict_scores method on the evaluation data sets, zeroing out the negative predictions results in a substantially improved prediction model, versus the naive model allowing for negative delays (not present in the training dataset.)
Perhaps the ultimate benchmark in machine learning should be one of a simple, intuitive model. In the case of flight delays, where the most common case is where there is no delay, the simplest model is to assume delays will be 0 minutes for each flight. Under the assumption, the average error is approximately 3 minutes. The best DNN average error was close, at about 5 minutes, but still less accurate. What is clear, however, is the linear regression model was much worse, with 13 minutes of error on average. With further tuning, it should be possible to improve the DNN model to beat the 3 minute benchmark. Machine learning results are summarized below: