The Perception Gap: What Cyber Risk Quantification Actually Requires

Ask a room of financial services regulators whether the absence of historical cyber loss data would hinder their ability to adopt cyber risk quantification, and almost all of them will say yes. In research I conducted with prudential regulators across the Caribbean region, 97.5% of respondents identified data availability as a barrier to CRQ adoption and 55% said it would certainly hinder progress.

While it would be premature to assume this finding proves data scarcity is the real obstacle, it’s a finding that is worth taking seriously. It shows that the widely held perception is an obstacle in its own right.

If regulators believe quantification requires data they don’t have, they won’t attempt it, regardless of whether the belief is accurate. And it isn’t.

Contents

1 The claim examined
- 1.1 A misdiagnosis dressed up as rigour
2 What CRQ frameworks actually require
3 The counterargument worth addressing
4 What the finding actually tells us

The claim examined

To be fair, the position isn’t unreasonable on its face. Quantification can sound like it requires numbers, and numbers imply data, and cyber loss data in smaller financial markets is genuinely limited. The inference here can not only feel rigorous but it can sound like genuine scientific caution rather than avoidance.

But the claim carries a hidden assumption that the data needed is historical loss data, and that without it, meaningful analysis cannot begin. That assumption deserves scrutiny and fails because it conflates two things that are actually quite different: (1) the need for quantitative inputs, and (2) the need for historical frequency data to generate them.

A misdiagnosis dressed up as rigour

I’ve written before about the distinction between data problems and uncertainty problems in cyber risk and this is where that distinction does its most important work.

When an analyst or regulator says “we need more historical data,” they’re implicitly treating the constraint as an aleatory one: we haven’t observed the phenomenon enough times to characterize its distribution. That is sometimes true but in cyber risk, it often isn’t. Much of the uncertainty we face when assessing cyber exposure is epistemic in that it exists because we haven’t measured, tested, or modelled what we already have access to, not because the underlying phenomenon is insufficiently observed.

Consider the question of whether a sophisticated attacker could bypass your detection controls. Is the uncertainty there primarily about historical breach frequency, or is it primarily about threat actor capability, detection coverage, and control effectiveness? In most cases, it’s the latter and none of those questions require a loss database to begin answering. They require threat intelligence, honest control testing, and structured expert judgment. More breach data from other organisations doesn’t resolve this but better reasoning about what you already know can.

Treating an epistemic problem as if it requires aleatory data is a category error and it’s a category error that routinely justifies inaction. The fact remains that there will never be enough historical cyber incidents in your specific sector, against your specific infrastructure, with your specific controls configuration, to satisfy the demand it creates. We don’t have enough data quietly becomes a permanent deferral.

What CRQ frameworks actually require

Here’s what makes the perception particularly frustrating: the quantification frameworks most associated with cyber risk were specifically designed for environments where historical loss data is sparse.

The FAIR (Factor Analysis of Information Risk) model doesn’t require or ask for a loss database. It asks for estimates of loss event frequency and loss magnitude, decomposed into their component parts: threat event frequency, vulnerability, contact frequency, probability of action, primary loss, and secondary risk. Each of those components can be estimated through structured expert judgment, threat intelligence, analogous data, and scenario reasoning. The starting point for quantification with FAIR is disciplined thinking, not data completeness.

Douglas Hubbard’s work in applied information economics makes a related and equally important point – we consistently overestimate how much data we need to reduce uncertainty meaningfully. A small number of well-calibrated, properly structured estimates can produce more defensible outputs than a qualitative risk matrix backed by no quantitative reasoning at all. The standard for analysis shouldn’t be “do we have enough data to be certain”. It should be “does this output reduce uncertainty enough to improve the decision.” Those are very different bars, and most organizations can clear the second one without clearing the first.

The counterargument worth addressing

A reasonable objection at this point will be – even if structured estimation is a viable starting point, doesn’t better data still improve the output? Yes, it does.

Richer incident data, more mature threat intelligence, deeper historical context all improve precision over time. The case for better regional cyber incident reporting and data aggregation is legitimate and worth pursuing. For exactly this reason, the Financial Stability Board has been pushing for greater convergence in incident reporting with initiatives such as the Format for Incident Reporting Exchange (FIRE).

But there’s a meaningful difference between “better data improves CRQ outputs over time” and “we cannot begin until we have it.” The first is a long-term improvement agenda. The second is a reason not to start. The former is true. The latter is the perception that needs correcting.

What the finding actually tells us

Returning to the survey result: 97.5% of regulators flagging data availability as a barrier isn’t primarily evidence of a data infrastructure problem. It’s evidence of a methodology literacy gap. The regulators surveyed aren’t wrong that data matters but they’re working from an incomplete picture of what the relevant frameworks actually demand as a starting point.

That gap is addressable. It doesn’t require a regional data aggregation programme before progress can be made. It requires investment in understanding what CRQ actually involves (i.e. the frameworks, the estimation approaches, the analytical logic) so that practitioners can assess their real starting position rather than the one they’ve assumed.

For a practical introduction to what those frameworks look like and how a regulator might apply them, the primer to be introduced in this series will cover the ground in accessible terms.

The real barrier to CRQ adoption in the region isn’t data. It’s the belief that data is the barrier.