- Evidence-Based Prompts
- Posts
- š§ Turn AI Hallucinations Into Reliable Evidence
š§ Turn AI Hallucinations Into Reliable Evidence
Can you turn AI hallucinations into reliable evidence? Spot the three types of hallucinationsāand apply proven strategies to get more reliable outputs from LLMs in your research.

Kia ora, Namaskaram šš¾
Do you trust your favourite AI chatbot?
If you're using ChatGPT, Claude, Copilotāor any other LLMs for researchāyou're bound to encounter āhallucinations.ā
These are answers that may sound rightābut are often made up, taken out of context, or just confident-sounding waffle.
š Evidence on Hallucinations
To reduce hallucinations, it helps to know which type you're dealing withāso you can prompt effectively and get more reliable evidence.
Type I ā Factual Inaccuracies (Sounds true, but wrong facts)
šš¾ Use RAG i.e. Retrieval-Augmented Generation
Cross-check outputs with trusted sourcesālike giving your AI assistant access to a fixed but reliable research library.
š Effectiveness: Very High
š Why: The evidence highlights RAG as one of the most potent strategies. It grounds LLM outputs in external, trustworthy information (e.g., knowledge graphs, databases, APIs). This significantly reduces reliance on memorised or fabricated content.
Type II ā Semantic Distortions (Misunderstands the context)
šš¾ Use Prompt Engineering
Be clear and specificālike you're briefing a junior colleague. The more context you give, the better the AI understands what and how you're asking it to research.
š Effectiveness: Medium to High
š Why: The evidence notes that well-structured, context-rich prompts (including techniques like in-context learning and instruction tuning) can improve the relevance and precision of outputs. Itās especially helpful for improving how LLMs interpret meaningābut it depends heavily on user skill.
Type III ā Fluency Discrepancies (Overconfident output)
šš¾ Use Chain-of-Thought (CoT) Prompting
Ask AI to think and write down step by stepāit makes the logic visible and easier to understand how it arrived at the final output.
š Effectiveness: High
š Why: Chain-of-Thought prompting encourages step-by-step reasoning, which the evidence shows improves factual consistency and logical coherence. It reduces the risk of āfluent nonsenseā by making the AI reveal its internal logic.
Hallucinations have reducedābut they havenāt disappeared.
With the best prompts and latest model updates, theyāre likely here to stay.
Because thatās just how the models works.
"LLMs are 'good at things that don't have wrong answers' but 'very bad at precise information retrieval'."
The real question is ā¦
When should we use LLMs for more reliable research?
ā When to Use LLMs for Research
| ā When Not to Use LLMs for Research
|
Want my full ChatGPT Playbook for Behaviour Changeātaught live? Join my upcoming 2-week course.

Written by Vishal George, Chief Behavioural Scientist at Behavioural by Design.
P.S. Newsletter readers can use the code EVIDENCE to get $50 off the course (Offer expires in 48 hours)