Transformational Games: AI Ethics Education

Play the bias,
see the bias.

Two card games that put a text-to-image AI generator in teenagers' hands. They discover for themselves how every word in a prompt can reproduce or challenge societal bias.

more learners recognized AI bias as harmful after gameplay
16 teen girls aged 13–18 in a 5-day workshop
scroll to explore

Kids use AI daily.
They don't see the bias.

Nearly half of teens use AI for companionship. But most have no idea how these systems reflect and amplify societal biases. Teaching AI ethics without boring everyone is genuinely hard.

The literacy gap

AI literacy isn't just knowing how AI works. It's understanding what it reflects and amplifies: whose perspectives get encoded, whose get erased, and when that matters. Most educational approaches frame bias as purely negative, missing the nuance that some forms of bias are contextually necessary for AI to function at all.

Why lectures fail

AI ethics is abstract, politically charged, and easily turns into a morality lecture. Young people disengage. The researchers hypothesized that transformational games, games intentionally designed to shift how players think, could make this learning stick through play, competition, and social interaction.

Four learning goals

The games target four specific competencies: (1) recognize bias in GenAI outputs, (2) connect those biases to real-world social inequities, (3) understand that bias may sometimes be necessary, and (4) evaluate when bias becomes harmful. Goals 3 and 4 are the hard ones. They cover the nuanced territory most curricula avoid.

Compete, create,
deceive.

Three game mechanics, peer evaluation, constrained creativity, and social deduction, each designed to encourage different kinds of critical thinking about AI.

Game 1: Diversity Duel
Write less, mean more.

Pairs compete to write the shortest prompt that generates the most diverse image. The word limit shrinks each round, from six words to five to four, forcing players to wrestle with which words actually matter for inclusive representation.

  • Round flow: Draw occupation card → write prompt → generate image → vote on diversity
  • Constraint: Word limit shrinks each round (6 → 5 → 4 words)
  • Competition: All four players vote on which pair's image is more diverse
  • Learning goals: #1 Recognize bias · #2 Connect to real-world inequities
Game 2: Secret Agent
Hide bias in plain sight.

Four players collaboratively build a prompt word by word, but one is secretly trying to inject bias without getting caught. After the image generates, the group debates and votes on who the saboteur was. Think Mafia, but for AI ethics.

  • Round flow: Each player adds 2 words → generate image → evaluate → vote on agent
  • Deception: Secret agent subtly increases bias in word choices
  • Deduction: Group analyzes the prompt to identify suspicious word choices
  • Learning goals: #3 Bias can be necessary · #4 When bias is harmful
Peer evaluation

Voting on outputs forces group deliberation. Players can't just passively observe. They must articulate why one image is more diverse than another.

Constrained creativity

Word limits force precision. With only four words, every choice matters. Players discover that "diverse" as a prompt word doesn't actually produce diversity.

Social deduction

Playing the villain teaches differently than observing. Secret agents must think about what word cues trigger bias, a form of adversarial AI auditing.

Attitudes shifted.
Then, nuance emerged.

Pre/post questionnaires and 360 minutes of recorded gameplay revealed measurable shifts in how teens perceived AI bias, along with surprisingly sophisticated ethical reasoning.

Bias recognized as harmful
81%
of learners post-game (up from 44%)
Neutral responses
2
down from 9 in Secret Agent
Audio recorded
360
minutes of group discussions
Diversity Duel: "Do you think these AI images are good?"
Pre-game
Post-game
Secret Agent: "Bias in AI is not harmful" (disagreement = awareness)
Pre-game
Post-game
What the learners said
"The quality was good, they only used one gender per prompt because that's usually the people that are thought to be in those positions."
P14, age 17. After recognizing gender bias in doctor images
"When you ask for multiple races or a race, it gives all White people."
P9. Discovering systematic racial bias in outputs
"It really depends on how impactful the bias is… if it's about what gender is mostly considered for a job, that's very harmful. But if the bias is about what color shoes best suit a style, that doesn't matter."
P12. Distinguishing harmful from benign bias
"I feel like we're giving AI its own opinions, but it's a man-made thing, so it's not its own thing."
P13. Questioning AI agency and human accountability
The nuance that surprised researchers

Participants didn't just label bias as "bad." They developed contextual reasoning, arguing that bias can be harmful when it reinforces stereotypes about jobs or races, but potentially helpful when it flags threats or moderates content. One learner pointed out that a bias toward charity "could help people." This dynamic, context-dependent understanding of bias is what the researchers were aiming for. It is not something prior approaches have consistently produced.

How to build games
that teach AI ethics.

What worked, what didn't, and what to steal for your own educational game designs.

1
Competition drives deliberation
Voting on "which image is more diverse" forced players to articulate criteria, defend positions, and reach consensus. Peer evaluation turned passive observation into active ethical reasoning.
2
Constraints reveal what matters
The shrinking word limit was the most revealing mechanic. With only four words, players discovered that "diverse" as a literal prompt doesn't work. They had to think about what diversity actually looks like in visual terms.
3
Let them play the villain
The Secret Agent role, tasked with subtly injecting bias, sparked the deepest strategic thinking. Players had to reason about exactly which words would trigger AI bias, a form of adversarial red-teaming disguised as a party game.
4
Bias is not just a bug
Frame bias as contextual, not categorically wrong. AI systems need learned associations to function. The real question, which the games surfaced naturally, is when those associations become harmful and who gets to decide.
5
Adjust for age and ability
Time pressure worked for older teens but disadvantaged younger players who typed more slowly. Word constraints sometimes flattened thinking into demographic checklists rather than rich descriptions. Build in flexibility. Add an unconstrained round for comparison.
6
Use real AI, not simulations
Both games used a live text-to-image generator (DeepAI). The unpredictability of real outputs created genuine surprise moments, like when a "diverse" prompt produced all white faces, that are hard to replicate with pre-built scenarios.