Reading note: Super-forecasting
Super-forecasting | Philip Tetlock and Dan Gardner (2019)
A recent cult classic that I’d never read
Super-forecasting has rapidly become a cult classic in the rationalist/economics/tech canon. I found myself noticing references to it everywhere, usually somewhere in the same sentence as ‘updating priors’.
In the past year, I’ve also become fascinated with prediction markets. These have created an economic framework for super-forecasters to both show off, and get paid for their skills. More broadly, the fact that some people are much better at predicting the future than the rest of us seems non-obvious… and worthy of intense study.
Super-forecasting was both an enjoyable and useful read. My primary reflection is that what makes super-forecasters great at predicting the future is simple, yet rare.
Let me explain.
It started with the intelligence agencies, not finance bros
I’d always assumed that forecasting geopolitical events had stemmed from the finance sector. Placing bets on how markets would react to unfolding events or elections is almost the definition of a futures market. But the history is more interesting - it’s the intelligence community, particularly in the US, that historically have been heavily invested in better forecasting.
Tetlock partnered with IARPA in 2011 to run the Good Judgement Project. This aimed to identify cutting-edge methods to forecast geopolitical events. The result found that super-forecasters outperformed intelligence analysts with access to classified data. This is a pretty stunning conclusion. True ‘alpha’ in the intelligence community is not who has the best information - but who uses that information most wisely.
Look for the background rate, then add your own flavour
There’s a simple reason that super-forecasters repeatedly beat the rest of us. Tetlock and Gardner show it’s not raw intellect or fancy degrees - it’s a process.
Super-forecasters address highly complex (yet short term) questions by first looking for background rates of occurrence, regardless of whether it’s a topical issue of the day.
This is best done when you Fermi-ize a question into manageable parts:
Fermi knew people could do much better and the key was to break down the question with more questions like “What would have to be true for this to happen?”. Here, we can break the question down by asking, “What information would allow me to answer the question?”
There’s a useful analogy in the book. What’s the chances a specific family in the US (who happen to have an Italian last name among other traits) own a pet?
If we Fermi-ize this analysis, the ‘outside view’ is the probably that any family in the US owns a pet. Forget the personal characteristics for now… start with the base rate.
Easier said than done, of course:
It’s natural to be drawn to the inside view. It’s usually concrete and filled with engaging detail we an use to craft a story about what’s going on. The outside view is typically abstract, bare, and doesn’t lend itself so readily to storytelling.
This approach has its limits of course. Taking the long time average for a data series spanning multiple decades doesn’t work well where there’s been significant, ‘structural’ change to the event probability (e.g. extreme weather events).
Yet, discarding historical data must be done for very good reason, rather than just recency bias. If you’re thinking that this is essentially the Bayesian method - you’d be correct. Super-forecasters happen to be exceptional at updating their priors with just the right level of precision.
The word ‘likely’ is almost meaningless
As an economist I probably use the word likely as a crutch more than most. Turns out I am not alone. Yet in forecasting, it is dangerous to use such a vague term to make recommendations.
Does ‘likely’ mean 51% or 99%, or anything in-between? There’s a very big difference between taking a course of action with a 51% chance of success compared to something almost certain.
Sherman Kent, the so called ‘father of intelligence analysis’ tried to implement a new system for communications at the CIA. While it was never adopted, it remains a great rule of thumb in day to day conversation.
I may well adopt this myself.
Super-forecasting is skill, not luck
I was skeptical. If you have a group of people flip coins for long enough, one of them is going to flip 10 heads in a row. Is super-forecasting really just putting people in the right hand tail of a distribution on a pedestal? Apparently not.
Tetlock and Gardner explain:
The correlation between how well individuals do from one year to the next is about 0.65… So we should still expect considerable regression o the mean. And we observe just that. Each year, roughly 30% of the individual super-forecasters fall from the ranks of the top 2% next year. But that also implies a good deal of consistency over time: 70% of super-forecasters remain super-forecasters.
Is there some luck in being a super forecaster (e.g. being in the right tail of the distribution)? Of course… but 65% of forecasters aren’t just lucky… there’s a whole chunk of skill in there too.
The best forecasters update and change their mind often
Changing your mind often, especially in light of new data, seems obvious in theory. Yet in practice, almost no one does it.
Tetlock and Gardner explain the concept of ‘active open-mindedness’ goes beyond encountering new perspectives - it’s an ability to rewrite beliefs as needed:
For super-forecasters, beliefs are hypotheses to be tested, not treasures to be guarded. It would be facile to reduce super-forecasting to a bumper-sticker slogan, but if I had to, that would be it.
What’s more, the data from forecasting competitions proves this:
Super-forecasters update much more frequently, on average, than regular forecasters. That obviously matters. An updated forecast is likely to be a better-informed forecast and therefore a more accurate forecast.
But hang on… how much of this is that super-forecasters are just news junkies watching every scrap of new information that becomes available? That’s been tested too:
…super-forcasters’ initial forecasts were at least 50% more accurate than those of regular forecasters. Even if the tournament has asked for only one forecast, and did not permit updating, super-forecasters would have won decisively.
Tetlock and Gardner are quick to point out that a forecaster can both under and overreact to bad news. Overreaction is a fundamental misunderstanding of priors (e.g. just taking the last data point rather than adding that data point into a time series).
However, under reaction seems to be more deep rooted:
…case of what psychologists call “belief perseverance”. People can be astonishingly intransigent - and capable of rationalising like crazy to avoid acknowledging new information that upsets their settled beliefs… More commonly, when we are confronted by facts impossible to ignore, we budge, grudgingly, but the degree of change is likely to be less than it should be.
This makes sense. Splitting the difference or ‘meeting in the middle’ feels like progress (and it generally is in the right direction - but it doesn’t make you correct).
Is this under reaction a function of ego? The authors propose that super-forecasters:
…aren’t deeply committed to their judgements, which makes it easier to admit when a forecast is offtrack and adjust
Is this useful? I imagine it’s incredibly hard to disassociate from the outcomes if you’re a member of the intelligence community.
So what’s the answer? Change your mind as frequently as the data changes:
A forecaster who doesn’t adjust her views in light of new information won’t capture the value of that information, while a forecaster who is so impressed buy the new information that he bases his forecast entirely on it will lose the value of the old information that underpinned his prior forecast. But the forecaster who carefully balances old and new captures the value in both - and puts it into her new forecast. The best way to do that is by updating often but bit by bit.
Don’t bother trying to forecast more than a few years out
Tetlock and Gardner don’t talk about chaos theory by name in the book - but they hint at the same conclusions.
There’s a whole section that recounts the basics of Nassim Taleb’s The Black Swan regarding fat tailed distributions. It’s important that forecasters recognise the size of the tails in the distribution. Some tails are small (human height) and some are enormous (human wealth). The non-zero probabilities on the outer extremes are worthy of consideration.
Yet, even if we assume there are fat tails… that doesn’t help long term (e.g. 10-year) forecasts for sensitive variables like geopolitics:
Taleb, Kahneman, and I agree there is no evidence that geopolitical or economic forecasters can predict anything ten years out beyond the excruciatingly obvious - “there will be conflicts” - and the odd lucky hits that are inevitable whenever lots of forecasters make lots of forecasts. These limits of predictability are the predictable results of the butterfly dynamics of nonlinear systems.
The sweet-spot for super-forecasters seems to be predictions for events less than 18-months away.
Unresolved questions and open loops
First, I there’s a sequel needed for this book to understand how super-forecasters are interacting with the financial markets. They should all be filthy rich is one assumption… however I suspect it’s a mixed bag.
Why aren’t the folks who are brilliant at forecasting geopolitics either making a fortune themselves on polymarket, or consulting for traditional quant houses? Maybe they already are… there’s certainly a book in that story.
Second, AI models are extremely good at weighting variables and updating priors. I don’t think this means AI assisted forecasters will be able to forecast farther into the future… more so that the fidelity of a updating a prediction (e.g. is it 60 per cent or 62 per cent) may get more accurate.
I also imagine an AI model can learn your biases quickly and help course correct your under/over reactions if given enough data.
If I had to make a bet, I think we’re going to see more super-forecasters in the next 5-years then we have since measuring Brier scores began.


