Consumers of free-to-read media represent a broad section of society. They’re not equally informed. If we are to rely on readers to evaluate opinion articles, mechanisms that unearth the wisest members of the crowd are required to produce signal from the noise.
Opinions fall into two categories: those that’ll be verified soon and those that won’t be conclusively proven for a long time (if ever!). Attaining signal before a near-term definitive event is straightforward via markets or forecasting competitions. However, most opinion we read in the media falls into the latter category. Standard market and forecasting competition designs don’t work very well in this scenario, as in the absence of a conclusive event, the only way to measure individual performance is against the aggregated inputs. The truth is what the crowd says it is basically. Under these conditions contrarians are disincentivised to participate as they know they won’t be ‘proved’ right within a reasonable time frame. If ever. This in turn ensures an unattractive price (or probability in the forecasting world) for conformists. The end result is a lack of participation and a poor signal of collective intelligence. We therefore need another scoring or payout rule for participants. One whose incentives for questions that won’t have a verifiable answer any time soon mirror those for questions that do. Many might rationally question that we could ever provide a reliable signal on unverifiable opinions but there are powerful reasons to think we can if we get the incentives right. In this post I’ll outline why.
Prediction markets and reputation-based forecasting tournaments are proving to be remarkably effective at harnessing collective intelligence and guiding better decisions across a wide range of field – from academic research and corporate strategy to policymaking and intelligence analysis. Prediction markets established at firms like Google, Ford and Firm X that tap into the wisdom of all employees have improved corporate decisions relative to traditional hierarchical methods, while betting platforms like Polymarket allow participants to wager on political and economic outcomes, recently outperforming experts and models in prediction outcomes of the US Presidential election. Netflix Prize has also showcased the power of crowdsourced insight where thousands of participants iteratively tested and refined models to dramatically improve recommendation algorithms. Possibly the finest example of the power of properly incentivised crowd intelligence mechanisms is IARPA’s Aggregative Contingent Estimation (ACE) Program outperforming the CIA at geopolitical forecasts by 20-30% despite not having access to the same classified information the CIA does – an incredibly impressive feat. These experiments all show that pooling knowledge in structured, accountable ways can lead to sharper insights and more reliable decision-making across diverse domains.
Both market and reputation-based approaches to prediction appear to achieve comparable efficacy which is unsurprising as they similarly filter the crowd albeit in different ways. Markets tend to ensure that only those with better information, and consequently, higher conviction speak up because it is costly to do so. Forecasting tournaments tend to separate the wheat from the chaff by allowing people express levels of confidence and by giving higher weight to those with a better track record. What is often overlooked however is that there is another filtering mechanism at the very beginning of the process too. People generally need to sign up to participate in forecasting tournaments. This alone creates a selection bias where those with an interest in – and likely an aptitude for – making predictions, self-select themselves to participate. This is a group that isn’t representative of the wider populations even before it is further stratified by relative performance across many forecasts. These features of markets and forecasting competitions are important as they that elevate the opinions of the most insightful members of the crowd. Studies have shown crowd wisdom outcomes aren’t particularly ‘wise’ when mechanisms like these are absent. For example, an unaccountable survey of readers of opinion articles wouldn’t produce a good view on the accuracy of the opinion as the views of the most insightful would be lost in the noise.
Standard market and reputation-based forecasting designs can’t unearth the wisest members of the crowd in every situation however. Many predictions aren’t ones that can be easily verified over the near or medium term because the information to do so won’t be available for some time. Other questions are even fuzzier. Many might never have an answer without crazy technological breakthroughs like time travel, or being able to read peoples’ thoughts, or being able to see outcomes in parallel universes. That’s because the verifiable answers to these questions are lost to history (as the people that know the truth are dead), or exist only in peoples’ minds (eg someone’s true intentions), or are counterfactuals we can never know how an alternative turned out. It’s hard to even define a contract for these types of markets or competitions let alone entice bettors or forecasters that believe they have insight to take part. Those in a minority are aware their view won’t be validated for some time if ever. Consequently they have no incentive to bet on their genuine view. Therefore in the absence of validating information, it is a racing certainty the markets would develop into a popularity contest (Keynesian Beauty Contest) rather than a truth-seeking one. Predicting the aggregate crowd view is the only way to rationally profit. The paradox here is that since contrarians are disincentivised, the majority position becomes extremely crowded, rendering it unattractive to even those that hold that view to enter the market. What we’re left with is an illiquid market or low participation in a forecasting competition and a poor picture of the true aggregate crowd position.
This is a highly unsatisfactory situation as most of the questions that interest us, and thus attract journalists to write about, have indefinitely unverifiable answers. It’s particularly unsatisfactory as we know markets and reputation-based competitions are excellent at helping us predict the future. It’d be a shame if we couldn’t find a way to effectively harness them for these types of questions too. The key to successfully doing this is by recreating the incentives that exist for verifiable questions. More specifically we need a mechanism that encourages people that believe they have insight to participate with a truth finding profit/reputation-maximisation strategy rather than a popularity finding one even in the absence of a verifiable ‘settling’ event.
So, do we have such a mechanism? We do. A very clever one that was originally invented for an entirely different reason. Twenty years ago, in 2004, Drazen Prelec designed a mechanism to elicit honest answers in the absence of a definitive ground truth. The specific problem Prelec was trying to solve was people giving dishonest answers in surveys. The social sciences heavily depend on surveys and the quality of those surveys depends on honest responses. In many instances across the social sciences this can be difficult. In the field of psychology for example, surveys on substance abuse, a socially stigmatised behaviour, often suffer from underreporting or misreporting. In economics, ‘willingness to pay’ surveys, on the contrary, often suffer from overreporting. Respondents often report a willingness to pay higher taxes to fund a certain public project than they really are prepared to pay when push comes to shove. In political science, there is the well-known ‘shy voter’ phenomenon which came to the fore recently in the case of the Polymarket whale who bet the house on Trump winning the election after he polled people using “The Neighbour Method” – a technique that draws from Prelec’s insights – to account for the shy voter problem.
In all these situations there is no way to verify if respondents are really telling the truth. Nobody can know what their true opinion is. Yet in most situations they do have a true opinion. Even if a weak one. So the question Prelec asked himself was: How do you incentivise people to tell the truth about matters that are subjective or otherwise not objectively verifiable? His ingenious solution was to introduce a scoring system tied to others’ predictions and perceptions. This scoring rule, which he named Bayesian Truth Serum (BTS) reduces the payoff for simply conforming or giving “safe” answers – encouraging people to provide their genuine beliefs.
It works by scoring people on two separate things:
- Their ability to predict the aggregate crowd prediction
- How unexpectedly common their own answer is
An unexpectedly common answer is one that is more prevalent than is predicted by the aggregate answer to question 1. Prelec calls it the Surprisingly Popular Answer (SPA).
The genius of the method is that it releases people to answer the second question honestly – the aim of the game – as they are rewarded for predicting what is socially desirable via their answer to the first question, and then further rewarded for giving what they believe to be the true answer, as people would rationally expect it to be more common than predicted since they themselves have given this answer. For example, if I believe Ankara is the capital of Turkey but think many other people think Istanbul is, I’m going to think Ankara will be unexpectedly common as by answering Ankara while predicting Istanbul to be a more common answer, I know my true belief is more likely to be a surprisingly popular answer. If you feel there are even just a few other people out there that hold the same views as you, you’re going to score even better. The rational thing to do then is to try and predict the crowd’s view as accurately as you can and state your own honest position. This is strategy is in fact a Nash equilibrium. It’s an extremely clever mechanism whose significance, when grasped, opens one’s eyes to all manner of applications beyond surveys of personal sentiment or intentions.
Indeed it took Prelec over ten years to realise the crowd wisdom predictive power of his own invention himself. Some 13 years after his seminal BTS paper, he released another paper dedicated to the predictive ability of the aggregate outcome of this survey design. Specifically, the surprisingly popular answer (SPA). He had realised that the answer that is more common than the crowd predicted, when tested against information that is either verifiable or becomes verifiable in the near future, turns out to be the right answer a lot of the time. In fact, the SPA turns out to be true more often than many prevalent forecasting methods. It is particularly performant for binary answer questions. In hindsight this probably shouldn’t be too surprising as any mechanism designer will tell you if you create the right incentives, you’ll generally get the desired outcome. In incentivising people to be both accurate and honest, BTS produces a ‘surprisingly popular answer’ that is predictive of reality in many cases.

The breakthrough this represents for decision-making is huge. Here we now have a mechanism whose incentives remain the same regardless of whether the truth is definitively verifiable or not. The Nash equilibrium is the Nash equilibrium for any type of question. This is game changing as it can be used to evaluate the many, many opinions that fall into the category where the truth isn’t verifiable. A BTS survey can form the bedrock of any market/competition structure established to evaluate an opinion of any type as long as – and this is absolutely key – individual respondents are scored using BTS rather than any aggregate output. The aggregate must solely be used to score the opinion itself. Otherwise estimating the aggregate will become the goal of respondents rather than accuracy and honesty. We’d be back to square one.
Upon this BTS foundation we can also apply other insights from forecasting science to improve the accuracy of the aggregate result such as allowing people make probabilistic judgments, upweighting those with strong track records (signals general forecasting ability) and upweighting those whose own prediction deviates strongly from what they predict the crowd thinks (signals domain-specific knowledge). The latter two insights enable us to isolate the knowledgeable within the crowd, an essential requirement when the crowd represents the masses. Those with large deviations between their answers are also the members of the crowd that create a SPA, most likely explaining its predictive ability. Finally, we can incorporate an economic element where it costs money to pass judgment and there are rewards for being a top performer in BTS competitions. This is likely a crucial addition as mechanisms that isolate the insightful require there to be insightful members of the crowd in the first place. The possibility of profit incentivises the participation of those that can add value. The combination of these proven crowd wisdom techniques can produce a system that enables us to confidently evaluate all sorts of opinions we haven’t been able to systematically evaluate until now because we haven’t had the tools to do so. Hopefully this will represent a huge step forward for the quality of our media and our collective intelligence.