Researchers from the Leverhulme Centre for Demographic Science have outlined a framework for understanding the limits of predictive accuracy in a correspondence article published today in Nature Computational Science.
Predictive machines use data and computational models to forecast future outcomes in everyday life, from court rulings to medical diagnoses. Advancements in computational science and artificial intelligence have made these predictions more accurate than ever, but just how accurate can they truly be, and is there a limit to their accuracy?
In their correspondence, Dr Charles Rahal and Jiani Yan explore the fundamental limits of prediction, and argue that in open and dynamic systems, we can never know if improvements are possible, unless predictions are already perfect.
Jiani Yan, first author and DPhil Student at the Leverhulme Centre for Demographic Science explains ‘Predictions rely on sets of information, but since both the quality and quantity of information is always evolving, estimates of accuracy remain in flux. Large language models illustrate this challenge. While trained on vast datasets, they can only reduce their errors if the data they process is both relevant and extensive. Our framework helps researchers to think about the limits of prediction in order to break down sources of error, which is especially relevant in social and demographic scenarios where the accuracy of predictions is rapidly improving.’
The framework helps researchers distinguish between three distinct types of ‘epistemic’ error, which result from missing knowledge and can be reduced with more relevant data, better utilisation of that data, or with better models. It also enables researchers to think about the isolation of ‘aleatoric’ errors, which are inherently random and cannot be modelled. While aleatoric errors cannot be measured directly, they define the limits of accuracy based on our current understanding, which will evolve over time.
Dr Charles Rahal, senior author and Associate Professor in Data Science and Informatics at Oxford Population Health’s Demographic Science Unit and the Leverhulme Centre for Demographic said ‘By distinguishing these errors, our framework helps developers think about where predictions can be improved. It also reminds us that while some models may perform well, we can never truly know their accuracy until they make perfect predictions, something that may never be achievable.’