Tokenization Error
Definition
Text is split into tokens incorrectly, damaging meaning.
Solution
Use the right tokenizer and test domain-specific terms, names, code, and multilingual text.
RAG & LLM Generative AI Failures terms and explanations from the AI Failure Dictionary.
Definition
Text is split into tokens incorrectly, damaging meaning.
Solution
Use the right tokenizer and test domain-specific terms, names, code, and multilingual text.
Definition
The model struggles with rare words, names, slang, or new terms.
Solution
Use subword tokenization, domain data, updated embeddings, or retrieval.
Definition
The model misses or mislabels names, dates, places, organizations, or products.
Solution
Add better labeled examples and evaluate entity-level precision and recall.
Definition
The model identifies an entity but links it to the wrong real-world concept.
Solution
Use better knowledge bases, disambiguation logic, and context-aware linking.
Definition
The model misunderstands references such as ``he,'' ``she,'' ``they,'' or ``it.''
Solution
Use better context modeling and coreference-specific training or evaluation data.
Definition
The model misses words like ``not,'' ``never,'' or ``without.''
Solution
Add negation-heavy examples and rule-based checks for high-risk tasks.
Definition
The model interprets sarcastic text literally.
Solution
Use domain-specific examples and human review for sensitive decisions.
Definition
The model chooses the wrong meaning for a word or phrase.
Solution
Use more context or ask a clarifying question when the meaning is unclear.
Definition
The model ignores important surrounding text.
Solution
Use better prompts, retrieval, context windows, and targeted evaluation.
Definition
The model assigns the wrong sentiment.
Solution
Use balanced labeled data, domain-specific evaluation, and error analysis.
Definition
The model misunderstands what the user wants.
Solution
Clarify intent labels and train on real user examples.
Definition
The system extracts the wrong values from user input.
Solution
Improve annotation rules, examples, schema validation, and extraction checks.
Definition
The model mistranslates text or loses meaning.
Solution
Use domain-tuned translation and human review for high-risk content.
Definition
A summary includes details not found in the source text.
Solution
Use grounding checks, citation requirements, and source-faithfulness evaluation.
Definition
Generated text does not accurately reflect the input.
Solution
Use extractive checks, citations, factual consistency metrics, and verification prompts.
Definition
Harmful content is missed or safe content is incorrectly flagged.
Solution
Use balanced safety datasets, threshold tuning, and human review for edge cases.
Definition
The model performs worse on dialects, multilingual text, slang, or new terms.
Solution
Continuously evaluate language coverage and update domain data.
Definition
Small wording changes cause large output differences.
Solution
Test prompts across many examples and use robust templates.
Definition
The text is too long for the model to process fully.
Solution
Use chunking, summarization, retrieval, or hierarchical processing.
Definition
The model has access to long text but fails to use the right part.
Solution
Use retrieval, section ranking, summaries, and long-context evaluation.
Definition
The system focuses on matching words instead of meaning.
Solution
Use semantic search, embeddings, hybrid retrieval, and relevance evaluation.
Definition
Text is categorized incorrectly because the model misses the real meaning.
Solution
Improve labels, add representative examples, and evaluate semantic edge cases.
Definition
The model selects too much or too little text as an entity.
Solution
Improve annotation guidelines and use entity-span evaluation.
Definition
Generated text slowly moves away from the original subject.
Solution
Use stronger prompts, section constraints, and validation checks.
Definition
The system fails to retrieve useful documents.
Solution
Improve indexing, embeddings, query rewriting, search strategy, and retrieval evaluation.
Definition
The retriever misses important relevant documents.
Solution
Increase top-k, improve chunking, use hybrid search, and expand queries carefully.
Definition
The retriever returns too many irrelevant documents.
Solution
Use reranking, metadata filters, better embeddings, and relevance thresholds.
Definition
Documents are split in a way that loses meaning.
Solution
Use semantic chunking, overlap, and document-aware splitting.
Definition
The needed answer is split across chunks and not retrieved together.
Solution
Use chunk overlap, parent-child retrieval, larger chunks, or hierarchical retrieval.
Definition
Irrelevant retrieved text distracts the model.
Solution
Use stricter retrieval filters, reranking, and context pruning.
Definition
Too much retrieved content overwhelms the model.
Solution
Select only the most relevant evidence and summarize where appropriate.
Definition
The answer is not present in the knowledge base.
Solution
Update the corpus or return a clear ``not found'' response.
Definition
The vector or search index is outdated.
Solution
Schedule reindexing and monitor document freshness.
Definition
Indexed documents become outdated compared with the real world.
Solution
Track document versions, owners, expiration dates, and refresh cycles.
Definition
The system retrieves an old or incorrect version of a document.
Solution
Use version-aware metadata filters and deprecate outdated content.
Definition
The retriever returns repeated or near-identical chunks.
Solution
Apply deduplication and diversity-aware retrieval.
Definition
The query and documents are embedded in a way that fails to capture meaning.
Solution
Use stronger or domain-specific embedding models and evaluate retrieval quality.
Definition
Retrieved results are semantically related but not actually useful.
Solution
Use reranking, relevance labels, and task-specific retrieval evaluation.
Definition
The retriever misunderstands the user's search intent.
Solution
Use intent detection, query rewriting, and clarification for ambiguous queries.
Definition
The system rewrites the user query incorrectly.
Solution
Evaluate rewrite quality and keep original query signals available.
Definition
Expanded terms move retrieval away from the true intent.
Solution
Limit expansion and validate expanded queries against relevance metrics.
Definition
Incorrect metadata filters exclude relevant documents.
Solution
Validate metadata quality and test filter logic.
Definition
Important metadata such as date, author, version, or product is unavailable.
Solution
Enrich documents during ingestion and enforce metadata requirements.
Definition
The correct document exists but is not included in the selected top results.
Solution
Tune top-k, retrieval scoring, reranking, and hybrid search.
Definition
The reranker fails to move the best evidence to the top.
Solution
Train or select stronger rerankers and evaluate with labeled queries.
Definition
Relevant documents are retrieved, but the model does not use them correctly.
Solution
Use answer-evidence prompts and verification checks.
Definition
The final answer is not supported by retrieved context.
Solution
Require citations, refuse unsupported claims, and validate claim-source alignment.
Definition
The model cites a source that does not support the answer.
Solution
Check each claim against its cited source before final output.
Definition
The model gives an answer without showing where it came from.
Solution
Require traceable source references for factual claims.
Definition
The answer cannot be traced back to reliable evidence.
Solution
Add source linking, evidence snippets, and audit logs.
Definition
The model retrieves the right evidence but combines it incorrectly.
Solution
Use structured synthesis prompts, chain verification, and contradiction checks.
Definition
The system fails when the answer requires multiple documents or reasoning steps.
Solution
Use iterative retrieval, graph retrieval, or query decomposition.
Definition
Search or retrieval takes too long.
Solution
Optimize indexes, cache results, reduce candidate sets, and tune infrastructure.
Definition
Similarity search fails to find the most useful content.
Solution
Use hybrid search, better embeddings, metadata filters, and reranking.
Definition
The model generates false or unsupported information.
Solution
Use grounding, retrieval, citations, uncertainty handling, and factuality evaluation.
Definition
The model invents facts, numbers, sources, citations, or events.
Solution
Require evidence and allow the model to say ``I do not know'' when information is missing.
Definition
The model sounds certain even when it is wrong.
Solution
Calibrate responses and require source-backed claims for factual answers.
Definition
The model gives answers that conflict with itself or known facts.
Solution
Use consistency checks, better context management, and verification prompts.
Definition
The model gives an unclear answer that can be interpreted multiple ways.
Solution
Ask clarifying questions or enforce structured output.
Definition
The model does not follow the user or system instruction.
Solution
Use clearer prompts, examples, schemas, and output validators.
Definition
The model receives competing instructions and follows the wrong one.
Solution
Define instruction priority and remove contradictions.
Definition
The model misunderstands what the user actually asked.
Solution
Add task clarification, examples, and intent checks.
Definition
The model forgets or ignores important information from earlier context.
Solution
Use conversation summaries, memory, retrieval, and better context selection.
Definition
The response slowly moves away from the original question or task.
Solution
Use tighter prompts, checkpoints, and validation against the user request.
Definition
The model repeats the same phrase, idea, or pattern.
Solution
Use decoding controls, repetition penalties, and response validation.
Definition
The model gives repetitive, generic, or overly similar answers.
Solution
Improve prompting, sampling settings, examples, and output diversity checks.
Definition
The model gives too much detail when a concise answer is needed.
Solution
Specify length, audience, and format constraints.
Definition
The model gives an incomplete answer.
Solution
Use coverage rubrics, checklists, and completeness validation.
Definition
The model adds unnecessary or unsupported information.
Solution
Limit scope and require evidence for added claims.
Definition
The model misunderstands which role or persona it should follow.
Solution
Use clear role instructions and periodic role reminders.
Definition
The output does not follow the requested format.
Solution
Use schemas, examples, structured output, and validators.
Definition
The model refuses a safe request or answers an unsafe request.
Solution
Improve safety policy interpretation and refusal evaluation.
Definition
The model produces harmful, unsafe, or policy-violating content.
Solution
Use safety filters, policy checks, red-team testing, and human review for high-risk cases.
Definition
The model generates offensive, abusive, hateful, or unsafe language.
Solution
Use toxicity filtering, safer training data, and moderation policies.
Definition
The model guesses instead of saying it does not know.
Solution
Instruct the model to express uncertainty and ask for missing information.
Definition
The model claims it can do something it cannot actually do.
Solution
Clearly define available tools, limits, and system capabilities.
Definition
The explanation does not match the real basis for the answer.
Solution
Use evidence-based explanations and separate hidden reasoning from user-facing justification.
Definition
The model produces different answers for the same input.
Solution
Lower temperature, use deterministic settings where possible, and validate outputs.
Definition
The model claims it used a tool or source when it did not.
Solution
Separate tool execution from response generation and log tool calls.
Definition
Unexpected harmful or incorrect behavior appears at scale.
Solution
Use staged rollout, monitoring, red teaming, and incident response.
Definition
The AI system behaves in a way that does not match human goals, rules, or expectations.
Solution
Improve instruction tuning, policy design, evaluation, and guardrails.
Explore more chapters or test your knowledge with quizzes.