The chances that there is a secret trick that tools such as artificial intelligence (AI) or machine learning (ML) will enable the human mind to solve complex problems better and faster as if by magic are small since these tools are limited by our own ability to design and codify algorithms for pattern recognition. This was clearly illustrated by the recent reputation crisis that affected some generative AI engines, whose biased outcomes were driven by the socio-political views and preferences of their programmers. However, AI and ML tools developed under clear governance and controls can help us uncover complex relationships and patterns in large volumes of data, which could be a daunting task for humans.
An area of risk management that needs sound judgment and deep understanding of complex relationships is the quantification of risk (from credit, market and operational risk to reputation, compliance and conduct risk). To overcome the uncertainty associated with risk quantification, academics and practitioners rely heavily on assumptions about the random nature of severe losses and on models of loss likelihood and severity based on statistical analysis of historical data, including the use of AI and ML pattern recognition techniques. Note however that the purpose of using these tools for estimating future losses is simply to help us in forming a reasonable judgment in situations where we have limited or incomplete information. Furthermore, the sole purpose of the statistical analysis of current or past data is to obtain a judgment that allows us to estimate the likelihood of future events based on the subjective belief that there is a causal relationship between what we have observed in the past and what we expect to occur in the future.
To illustrate, given a random sequence of events such as flipping a fair coin, the likelihood of the next head or tail event is not ‘derived’ from information about the next random coin flip (which is always unknown in advance) but ‘inferred’ statistically from events already observed. That is, it is ‘assumed’ to be determined by the frequency of heads and tails already observed. As additional coin flips occur, this assumption may become stronger if supported by the new information. However, a prediction of future events based only on past observations is always a subjective judgment inferred from the available data and the strong belief that nothing else about the problem will change in an unexpected way. This could happen, for example, by accidentally replacing the fair coin with a biased coin after a few flips, or by unintentionally changing the way we flip the coin.
When we infer that some observed property may hold also in the future, we are making an educated -but subjective- judgment. This is the situation currently faced by financial institutions, which always have imperfect or limited information on clients, borrowers, competitors, markets and changes in the business environment, and often need to analyze large volumes of historical data to obtain meaningful estimates of the likelihood and severity of different events.
Statistically based relationships and models are approximations, and, except for rare exceptions, no single approach can capture perfectly all the details of real-world events. We often select the “best” approach among alternatives that most closely captures the observed relationships using predetermined criteria for weighting estimation errors. For example, if our criteria included maximizing the probability that predictions were aligned to the distribution of observed outcomes, a maximum likelihood approach could be used. These criteria provide a means to quantify the fit-for-purpose nature of the approach.
As AI and ML tools become more readily available, financial institutions have increased their use in the identification of significant relationships across risk drivers, their combinations and complex transformations. This type of data-driven number-crunching analysis often results in overfitting and artificial relationships tailored to data, which limits the validity of statistical inference made about the data and may provide a false sense of confidence on the observed relationships. This issue can go unnoticed, leading to incorrect inference and model selection and, ultimately, it can result in poor decision making.
The issues above lead to a key question for making valid inferences from data about risk quantification: which tool to use? Critical to this process is to have a conceptually sound hypothesis testing framework, well-defined assumptions to be tested, and a clear understanding of the data used to support the analysis.
There is no shortage of AI and ML tools to choose from (e.g., deep learning neural networks, Bayesian inference models, or other complex techniques). For example, neural networks are adaptive nonlinear models composed of layers of processing units (highly stylized artificial neurons) that interconnect with each other. Processing units include one or multiple inputs, which are weighted and transformed into an output that reflects a nonlinear response that feeds other processing units. There are two basic features that characterize a neural network: its connectivity (architecture or topology), and its learning process (how its internal parameters are determined). A common misconception is that these tools are “black boxes” with hidden or unclear variable relationships. Many of these tools are simply higher order regressions based on conventional statistical methods that can be used for performing inductive inference. Usually, these tools replace single regression equations with multiple nested regressions that need to be solved simultaneously. Depending on the tool connectivity, the estimation of the internal regression parameters could be a daunting computational task. Regression parameters are often found by leveraging the mutual reinforcement and feedback of all interacting regressions based on the learning process data used for training these tools. The novelty of these tools lies in their ability to identify nonlinear relationships across risk characteristics and other variables with little or no a-priori information.
Despite the ability of AI and ML tools to identify complex relationships in data, the key goal for risk quantification remains the same: provide insight into the risk assessment of future events aligned with appropriate cost criteria for reducing estimation or classification errors. Any good risk classification approach should make the number of misclassifications or estimation errors as small as possible.
For binary choices such as the identification of financially sound borrowers vs. future credit defaulters, the misclassification costs can be defined by four simple outcomes:
Hit: the model correctly classifies a troubled borrower as a defaulter.
Miss: the model assigns low risk to a defaulter.
False alarm: a financially sound borrower is classified as a defaulter.
Correct rejection: the model assigns low risk to a financially sound borrower.
The relative magnitudes of the outcomes above depend on the ability to differentiate borrowers based on adequate credit risk attributes. Notice however that risk classification approaches evaluated only in terms of misclassification probabilities would ignore other financial costs, which could lead to significant credit losses or missed investment opportunities. Ultimately, people’s knowledge and experience to identify key risk attributes and determine the overall business impact, combined with adequate AI and ML tools for pattern recognition, are the most important issues for successful risk quantification.