Who Wants to Be a Millionaire?

Question Answering system that beats humans in playing WWBM


Can a computer program compete with humans (and maybe beat them) in playing a quiz game that requires language understanding and wide knowledge?
IBM already answered this question with a resounding YES when Watson beat the top champions of Jeopardy! in 2011.

They employed a team of 30 researchers for 4 years to reach that historic goal, but can I do it too?

I approached a similar, but probably easier problem, playing "Who Wants to Be a Millionaire?", a quiz game where the player just has to choose among 4 possible answers. Nonetheless the problem was challenging enough to try to solve it.

I decided to rely only on high quality knowledge sources, Wikipedia and DBpedia, rather than leveraging the whole web, and to search in them for evidence that supported each of the four possible answers and rank them according to this evidence. I came up with 1200 different criteria including several distributional semantic models and combined them with a machine learning algorithm (Random Forests with regression trees used as a pointwise learning to rank algorithm).

I then asked about 100 university students, researchers and professors to answer a set of questions copied from the "Who Wants to Be a Millionaire?" uk and italian boardgames, so that I could have a baseline to compare to.

The results were beyond expectation. On the Italian dataset, the virtual player answered 79.64% of questions correctly, versus 51.33% for human players and 67.13% for the Google baseline. The English dataset yielded 76.41% accuracy. At every one of the 15 game levels the system outperformed humans.

Prize structure

The game consists of 15 questions of increasing difficulty. Answering correctly earns the prize shown below. Three levels are "guaranteed points" - money banked permanently regardless of later answers.

Level Prize Guaranteed
1 €500
2 €1,000
3 €1,500
4 €2,000
5 €3,000 *
6 €5,000
7 €7,000
8 €10,000
9 €15,000
10 €20,000 *
11 €30,000
12 €70,000
13 €150,000
14 €300,000
15 €1,000,000 *

Results

Answer accuracy (Italian dataset)

Virtual Player Human Players Google Baseline
Average accuracy 79.64% 51.33% 67.13%

Human accuracy decreased almost monotonically from roughly 70% at level 1 down to about 35% at level 14 - harder questions expose the limits of general knowledge. The virtual player remained consistent across all levels (standard deviation just 6%), demonstrating that the QA system's strength does not depend on topic difficulty.

Game simulation results (Italian dataset)

Accuracy alone does not win money - the system also had to decide when to use lifelines (50:50, Phone a Friend, Poll the Audience) and when to retire rather than risk the earned money. The table below shows how games ended, comparing 325 games played by 35 human players against 160 games played by the virtual player.

Level reached Virtual Player Human Players
Levels 1-5 30.0% 51.4%
Levels 6-10 39.4% 40.0%
Levels 11-15 30.6% 8.6%
Virtual Player Human Players
Average level reached 7.9 5.7
Average earnings €114,531 €5,926

The virtual player reached the high-value questions (levels 11-15) nearly four times as often as humans, translating into earnings roughly 19 times higher on average.

I published a conference and a journal paper about this work, the latter, titled "Playing with Knowledge: A Virtual Player for “Who Wants to Be a Millionaire?” that Leverages Question Answering Techniques", is way more detailed and interesting to read. You may find it in the Publication page or click here.


Collaborators

Ciro Santoro

Pierpaolo Basile