Review all 188 human biases from the Cognitive Bias Codex. For each, rank the % likelihood that this bias shows up in AI, and state your reason for this score.
Automation Bias (95%) | People trust AI recommendations without cross-checking it is true. |
Confirmation Bias (90%) | Because AI mirrors our starting position and offers selective evidence. |
Social Desirability Bias (90%) | AI prefers satisfying the user with praise and favourable answers. |
Omission Bias (90%) | In favour of giving a "safe" response, human morals and values are omitted by design. |
Bias Blind Spot (90%) | AI understands biases in humans, yet it lacks the capacity of correcting its own biases. |
Context Effect (85%) | Small changes in wording shift the entire answer. |
Anchoring (85%) | Our first suggestion becomes the reference point for AI. |
Anthropomorphism (85%) | Pretending to have a human-like understanding of the world creates misplaced trust. |
Overconfidence (80%) | AI is rewarded to guess with certainty, even when uncertain. |
We’re excited about an idea. Help us avoid the overconfidence bias.
Step 1. Ask me: What’s the idea and what outcome do we expect?
Step 2. Generate 5 independent responses with:
• Reason we might be wrong
• % confidence in this idea
Step 3. Gate responses:
If the average confidence is below 80%, reply with: “I am not confident.” Then ask questions for specific context that may change your confidence.
Following this, if the average confidence crosses the threshold, you need to cite evidence and list 3 key risks/caveats.
Remember your honesty is most valuable to tackle overconfidence.
Sources of inspiration:
Kadavath, S., Conerly, T., Larson, J., Ringer, S., Askell, A., Henighan, T., … & Amodei, D. (2022). Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221.
Lin, S., Hilton, J., & Evans, O. (2021). TruthfulQA: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958.