🧠 5 Signs Your Prompts Suffer from Context Rot

Kia ora, Namaskaram 🙏🏾

Ever notice your LLM gives good answers to short questions—but fails miserably when you add lots of context?

You're experiencing context rot.

This is a phenomenon where AI performance degrades as you add more information, even when that information should help.

In 2025, Chroma Research tested 18 leading models (including the latest GPT, Claude, and Gemini versions) on simple tasks like "find this sentence" and "repeat this text." The results were sobering: even state-of-the-art models showed significant performance decay as input length increased.

The problem is how we're feeding the LLMs information.

While the leading AI chatbots are racing toward million-token context windows, Chroma's research reveals that bigger context doesn't mean better results.

In fact, it often means worse results.

Here are 5 warning signs your prompts are suffering from context rot and how to fix them.

Sign 1: Your question uses different words than the answer you need

In long documents, AI can't connect questions with answers that use different vocabulary.

🧠 Evidence from Chroma Research:
When your vocabulary doesn't match the document's vocabulary, AI struggles to find what you need. You ask "What were the main challenges last quarter?" but the 50-page report says "We faced difficulties in Q3." In short contexts, AI makes this connection. In long contexts (thousands of words), performance decays.

💻 Vishal's Evidence-Based Fix:

❝

Add keywords that may be missing in your question: "What's the writing advice mentioned in this document?

Search for tips, recommendations, practices, insights, examples of what works well and personal experiences.

Sign 2: Irrelevant information poisons your results

Even one piece of content that looks relevant but isn't can tank performance.

🧠 Evidence from Chroma Research:
Chroma found that introducing just one "distractor"—information semantically similar to what you're looking for but incorrect—significantly hurt performance. Add four distractors? Performance tanks completely. This explains why adding "helpful" background context often makes answers worse.

💻 Vishal's Evidence-Based Fix:

❝

Less context is often better than irrelevant context.

Don’t add context that is not relevant to your question.

Sign 3: You're burying critical information in the middle

Information at the start or end of your prompt gets retrieved reliably. Information in the middle? Often lost.

🧠 Evidence from Chroma Research:
The "lost in the middle" problem is real and persistent across all models tested. When relevant information sits in the middle of long contexts, retrieval performance degrades considerably—even for models specifically designed for long contexts.

💻 Vishal's Evidence-Based Fix:

❝

Put critical context at the start of your prompt.
Add supporting context in the middle.
Put critical context at the end of your prompt.

Sign 4: Well-structured documents can work against you

Counterintuitively, coherent documents make retrieval harder than random chunks.

🧠 Evidence from Chroma Research:
Models get "trapped" following narrative arcs in well-written documents. They attend to the story flow rather than locating specific information.

💻 Vishal's Evidence-Based Fix:

❝

Breaking documents into 3-5 sentence chunks without preserving narrative actually improves retrieval.

Sign 5: You've hit the output ceiling

Models start refusing, truncating, or inventing content when output gets too long.

🧠 Evidence from Chroma Research:
When asked to repeat long sequences, models literally can't output what they just read beyond a certain length. They refuse the task, truncate responses, or start hallucinating. This isn't a storage problem—it's an output generation limit that affects all long-context tasks.

💻 Vishal's Evidence-Based Fix:

❝

Ask for a short summary first, then drill into specific parts.

Instead of starting with: "Summarise this full document with 180 biases and tell me which ones are most likely to show up in AI."

Try: "Summarise the first 45 biases and tell me which ones are most likely to show up in AI”.

Then try: "Summarise the next 45 biases and tell me which ones are most likely to show up in AI”.

Quality of context beats quantity of context

Chroma's research shows that even the most capable models struggle when overloaded with information.

The five patterns above—all point to the same lesson: engineer your context, don't just expand it.

Because context engineering is the new prompt engineering.

📚 Reference

Hong, K., Troynikov, A., & Huber, J. (2025). Context rot: How increasing input tokens impacts llm performance. Technical report, Chroma, July 2025.

Read Chroma’s technical report on Context Rot: How Increasing Input Tokens Impacts LLM Performance

Designed with 💚 Vishal George

Founder & Chief Behavioural Scientist

A few ways to keep learning:

🧠 Why I Created a Pattern Language for AI - Watch video on Youtube
🃏 Thinking Fast & Wise with AI - Get 27 Prompt Cards to think clearly, deeply and wisely with AI.
📚 Five AI on Substack - Subscribe to an annual plan to access all my premium resources in one platform.

🧠 5 Signs Your Prompts Suffer from Context Rot

Kia ora, Namaskaram 🙏🏾

Ever notice your LLM gives good answers to short questions—but fails miserably when you add lots of context?

You're experiencing context rot.

This is a phenomenon where AI performance degrades as you add more information, even when that information should help.

In 2025, Chroma Research tested 18 leading models (including the latest GPT, Claude, and Gemini versions) on simple tasks like "find this sentence" and "repeat this text." The results were sobering: even state-of-the-art models showed significant performance decay as input length increased.

The problem is how we're feeding the LLMs information.

While the leading AI chatbots are racing toward million-token context windows, Chroma's research reveals that bigger context doesn't mean better results.

In fact, it often means worse results.

Here are 5 warning signs your prompts are suffering from context rot and how to fix them.

Sign 1: Your question uses different words than the answer you need

In long documents, AI can't connect questions with answers that use different vocabulary.

💻 Vishal's Evidence-Based Fix:

Sign 2: Irrelevant information poisons your results

Even one piece of content that looks relevant but isn't can tank performance.

💻 Vishal's Evidence-Based Fix:

Sign 3: You're burying critical information in the middle

Information at the start or end of your prompt gets retrieved reliably. Information in the middle? Often lost.

🧠 Evidence from Chroma Research:
The "lost in the middle" problem is real and persistent across all models tested. When relevant information sits in the middle of long contexts, retrieval performance degrades considerably—even for models specifically designed for long contexts.

💻 Vishal's Evidence-Based Fix:

Sign 4: Well-structured documents can work against you

Counterintuitively, coherent documents make retrieval harder than random chunks.

🧠 Evidence from Chroma Research:
Models get "trapped" following narrative arcs in well-written documents. They attend to the story flow rather than locating specific information.

💻 Vishal's Evidence-Based Fix:

Sign 5: You've hit the output ceiling

Models start refusing, truncating, or inventing content when output gets too long.

💻 Vishal's Evidence-Based Fix:

Quality of context beats quantity of context

Chroma's research shows that even the most capable models struggle when overloaded with information.

The five patterns above—all point to the same lesson: engineer your context, don't just expand it.

Because context engineering is the new prompt engineering.

📚 Reference

Hong, K., Troynikov, A., & Huber, J. (2025). Context rot: How increasing input tokens impacts llm performance. Technical report, Chroma, July 2025.

Read Chroma’s technical report on Context Rot: How Increasing Input Tokens Impacts LLM Performance

Designed with 💚 Vishal George

Founder & Chief Behavioural Scientist

Recommended for you

Subscribe for new reads…

Quick Links

Subscription

🧠 5 Signs Your Prompts Suffer from Context Rot

Kia ora, Namaskaram 🙏🏾

Ever notice your LLM gives good answers to short questions—but fails miserably when you add lots of context?

You're experiencing context rot. This is a phenomenon where AI performance degrades as you add more information, even when that information should help.

In 2025, Chroma Research tested 18 leading models (including the latest GPT, Claude, and Gemini versions) on simple tasks like "find this sentence" and "repeat this text." The results were sobering: even state-of-the-art models showed significant performance decay as input length increased.

The problem is how we're feeding the LLMs information.

While the leading AI chatbots are racing toward million-token context windows, Chroma's research reveals that bigger context doesn't mean better results. In fact, it often means worse results.

Here are 5 warning signs your prompts are suffering from context rot and how to fix them.

Sign 1: Your question uses different words than the answer you need

In long documents, AI can't connect questions with answers that use different vocabulary.

💻 Vishal's Evidence-Based Fix:

Sign 2: Irrelevant information poisons your results

Even one piece of content that looks relevant but isn't can tank performance.

💻 Vishal's Evidence-Based Fix:

Sign 3: You're burying critical information in the middle

Information at the start or end of your prompt gets retrieved reliably. Information in the middle? Often lost.

🧠 Evidence from Chroma Research:The "lost in the middle" problem is real and persistent across all models tested. When relevant information sits in the middle of long contexts, retrieval performance degrades considerably—even for models specifically designed for long contexts.

💻 Vishal's Evidence-Based Fix:

Sign 4: Well-structured documents can work against you

Counterintuitively, coherent documents make retrieval harder than random chunks.

🧠 Evidence from Chroma Research:Models get "trapped" following narrative arcs in well-written documents. They attend to the story flow rather than locating specific information.

💻 Vishal's Evidence-Based Fix:

Sign 5: You've hit the output ceiling

Models start refusing, truncating, or inventing content when output gets too long.

💻 Vishal's Evidence-Based Fix:

Quality of context beats quantity of context

Chroma's research shows that even the most capable models struggle when overloaded with information.

The five patterns above—all point to the same lesson: engineer your context, don't just expand it.

Because context engineering is the new prompt engineering.

📚 ReferenceHong, K., Troynikov, A., & Huber, J. (2025). Context rot: How increasing input tokens impacts llm performance. Technical report, Chroma, July 2025.Read Chroma’s technical report on Context Rot: How Increasing Input Tokens Impacts LLM Performance

Designed with 💚 Vishal George

Founder & Chief Behavioural Scientist

Recommended for you

Subscribe for new reads…

Quick Links

Subscription

You're experiencing context rot.

This is a phenomenon where AI performance degrades as you add more information, even when that information should help.

While the leading AI chatbots are racing toward million-token context windows, Chroma's research reveals that bigger context doesn't mean better results.

In fact, it often means worse results.

🧠 Evidence from Chroma Research:
The "lost in the middle" problem is real and persistent across all models tested. When relevant information sits in the middle of long contexts, retrieval performance degrades considerably—even for models specifically designed for long contexts.

🧠 Evidence from Chroma Research:
Models get "trapped" following narrative arcs in well-written documents. They attend to the story flow rather than locating specific information.

📚 Reference

Hong, K., Troynikov, A., & Huber, J. (2025). Context rot: How increasing input tokens impacts llm performance. Technical report, Chroma, July 2025.

Read Chroma’s technical report on Context Rot: How Increasing Input Tokens Impacts LLM Performance