Overfitting
Definition
The model memorizes training data and performs poorly on new data.
Solution
Use more data, regularization, simpler models, dropout, cross-validation, or early stopping.
ML & Deep Learning Training Failures terms and explanations from the AI Failure Dictionary.
Definition
The model memorizes training data and performs poorly on new data.
Solution
Use more data, regularization, simpler models, dropout, cross-validation, or early stopping.
Definition
The model is too simple to learn useful patterns.
Solution
Use better features, a stronger model, more training, or lower regularization.
Definition
The model works in development but fails on real-world examples.
Solution
Use realistic validation sets, production-like test data, and edge-case coverage.
Definition
The model does not transfer well to unseen examples.
Solution
Improve data diversity and evaluate on data that better represents deployment conditions.
Definition
The model learns from information it should not access.
Solution
Audit features, joins, target timing, and train-test boundaries.
Definition
The same or very similar examples appear in both training and test sets.
Solution
Use duplicate detection, grouped splitting, and leakage checks.
Definition
Evaluation examples accidentally influence training.
Solution
Keep test sets locked, private, and separate from training decisions.
Definition
Training, validation, and test sets are divided incorrectly.
Solution
Use stratified, grouped, time-based, or user-based splits depending on the problem.
Definition
Time-based data is split randomly, causing future information to leak.
Solution
Train on the past and validate on future time windows.
Definition
The model ignores minority classes because majority classes dominate training.
Solution
Use class weights, resampling, threshold tuning, and minority-class data collection.
Definition
Small parameter changes cause large performance differences.
Solution
Use systematic search, robust validation, and multiple random seeds.
Definition
The model is not constrained enough, increasing overfitting risk.
Solution
Use L1/L2 regularization, dropout, pruning, simpler models, or early stopping.
Definition
The learning algorithm fails to find a good solution.
Solution
Tune learning rate, optimizer, loss function, initialization, and preprocessing.
Definition
Training gets stuck in a poor solution.
Solution
Use better initialization, adaptive optimizers, learning-rate schedules, or restarts.
Definition
Model performance changes significantly across datasets or runs.
Solution
Use more data, cross-validation, ensembling, and regularization.
Definition
The model is too limited and consistently misses important patterns.
Solution
Use richer features, a more expressive model, or reduce excessive regularization.
Definition
Predicted probabilities do not match real-world likelihoods.
Solution
Use calibration methods such as Platt scaling, isotonic regression, or temperature scaling.
Definition
Results change significantly when the random seed changes.
Solution
Run multiple seeds and report mean, variance, and confidence intervals.
Definition
The team cannot reproduce training results.
Solution
Version code, data, features, configs, environments, and random seeds.
Definition
Training parameters, datasets, metrics, and artifacts are not recorded properly.
Solution
Use experiment tracking tools such as MLflow, Weights \& Biases, or a model registry.
Definition
The team does not compare the model against a simple baseline.
Solution
Build a simple baseline first and require new models to beat it meaningfully.
Definition
A model has high accuracy but performs poorly on the important class.
Solution
Use precision, recall, F1, ROC-AUC, PR-AUC, and class-specific metrics.
Definition
The model learns a false pattern that does not truly cause the outcome.
Solution
Run stress tests, causal review, subgroup analysis, and out-of-distribution evaluation.
Definition
Production data differs from training data.
Solution
Monitor data distributions and retrain or adapt when shifts are detected.
Definition
Input feature distribution changes between training and production.
Solution
Track feature distributions and update data, features, or models as needed.
Definition
The distribution of output classes changes over time.
Solution
Monitor class distribution and recalibrate or retrain.
Definition
The relationship between inputs and outputs changes over time.
Solution
Detect drift and retrain with newer labeled data.
Definition
Model quality gets worse as the environment changes.
Solution
Monitor performance and schedule retraining or model refreshes.
Definition
The model receives data very different from what it saw during training.
Solution
Detect OOD inputs, reject uncertain cases, or route them to human review.
Definition
The input data does not contain enough predictive information.
Solution
Improve features, collect stronger signals, or reconsider whether the task is learnable.
Definition
Gradients become too small, so early layers stop learning.
Solution
Use better activations, normalization, residual connections, and architecture changes.
Definition
Gradients become too large, causing unstable training.
Solution
Use gradient clipping, normalization, lower learning rates, and stable initialization.
Definition
A neural unit stops activating and contributes little or nothing.
Solution
Adjust initialization, learning rate, architecture, or activation function.
Definition
A ReLU neuron outputs zero for most or all inputs.
Solution
Use Leaky ReLU, GELU, better initialization, or a lower learning rate.
Definition
Activation functions enter flat regions where gradients are very small.
Solution
Use modern activations, normalization, and better initialization.
Definition
Initial weights make training slow, unstable, or ineffective.
Solution
Use initialization methods such as Xavier, He initialization, or pretrained weights.
Definition
Training jumps around and fails to converge.
Solution
Lower the learning rate or use a learning-rate scheduler.
Definition
Training is extremely slow or gets stuck.
Solution
Increase the learning rate or use adaptive optimizers.
Definition
Batch size causes noisy gradients or poor generalization.
Solution
Tune batch size, use gradient accumulation, and scale learning rates carefully.
Definition
A generative model produces limited or repetitive outputs.
Solution
Use better objectives, diversity penalties, training stabilization, and evaluation for diversity.
Definition
A model forgets old knowledge when trained on new data.
Solution
Use replay data, regularization, frozen layers, or parameter-efficient fine-tuning.
Definition
Learned representations become too similar and lose useful distinctions.
Solution
Use contrastive objectives, normalization, better negatives, and representation diagnostics.
Definition
Embeddings lose semantic diversity and become less useful.
Solution
Improve training objectives, data diversity, and embedding evaluation.
Definition
Training updates are unstable because gradients are too noisy.
Solution
Tune batch size, optimizer, gradient accumulation, and learning rate.
Definition
The loss stops improving before reaching good performance.
Solution
Adjust learning rate, architecture, data quality, optimizer, or schedule.
Definition
The model never reaches a stable or useful solution.
Solution
Debug data, labels, loss function, optimizer, architecture, and preprocessing.
Definition
Attention focuses on irrelevant or too narrow parts of the input.
Solution
Use better data, architecture tuning, attention diagnostics, and regularization.
Definition
A pretrained model does not adapt well to the target task.
Solution
Use domain data, careful fine-tuning, and task-specific validation.
Definition
Fine-tuning damages useful pretrained behavior.
Solution
Use lower learning rates, LoRA or PEFT, frozen layers, and validation gates.
Definition
New learning disrupts previously learned patterns.
Solution
Use continual-learning strategies and mixed old/new training data.
Definition
The model is too small or constrained for the task.
Solution
Increase model capacity, improve architecture, or simplify the task.
Definition
The model has more parameters than needed, increasing cost and overfitting risk.
Solution
Use smaller models, regularization, pruning, distillation, or model selection.
Definition
Loss, gradients, or metrics fluctuate unpredictably.
Solution
Inspect data, reduce learning rate, stabilize optimization, and monitor gradients.
Definition
The model fails to identify an object that is present.
Solution
Add more labeled examples, tune thresholds, and improve detection architecture.
Definition
The model detects something that is not actually there.
Solution
Add hard negative examples and tune confidence thresholds.
Definition
The image is assigned to the wrong class.
Solution
Improve labels, augmentation, class balance, and confusion-matrix analysis.
Definition
The object is detected, but the bounding box is inaccurate.
Solution
Improve annotations and use localization-focused loss functions.
Definition
The model fails to correctly separate object regions.
Solution
Improve masks, add training examples, and evaluate segmentation metrics.
Definition
The model fails when objects are partially hidden.
Solution
Use occlusion augmentation and realistic training data.
Definition
The model fails when objects are too small, too large, or at unusual distances.
Solution
Use multi-scale training and feature pyramid networks.
Definition
The model fails when objects appear at different angles.
Solution
Use rotation augmentation and rotation-invariant architectures when needed.
Definition
Shadows, glare, darkness, or brightness reduce performance.
Solution
Use lighting augmentation and collect diverse image data.
Definition
Motion blur or low image quality causes mistakes.
Solution
Use blur augmentation and image quality checks.
Definition
The model relies on background patterns instead of the main object.
Solution
Train with diverse backgrounds and object-focused augmentation.
Definition
The CNN relies too much on texture rather than shape.
Solution
Use shape-focused augmentation, diverse data, and robustness testing.
Definition
Production images differ from training images in camera, lighting, angle, or environment.
Solution
Collect production samples and use domain adaptation.
Definition
A visual pattern tricks the model into a wrong prediction.
Solution
Use adversarial testing, robust training, and input monitoring.
Definition
The model misses tiny objects in the image.
Solution
Use higher-resolution inputs, tiling, and small-object-focused training.
Definition
Visually similar classes are repeatedly confused.
Solution
Collect more examples, improve labels, and analyze the confusion matrix.
Definition
Augmentation creates unrealistic or harmful training examples.
Solution
Review augmentations against real-world conditions and remove harmful transforms.
Definition
Bounding boxes or masks in training data are incorrect.
Solution
Audit labels and improve reviewer quality control.
Definition
Image resizing, cropping, or normalization differs between training and inference.
Solution
Share the same preprocessing pipeline across training and production.
Definition
Text in images is read incorrectly.
Solution
Improve image quality, tune OCR models, and validate extracted text.
Definition
The model fails when objects or people appear in unusual positions.
Solution
Add pose-diverse data and augmentation.
Definition
A model trained on high-quality images fails on low-quality production images.
Solution
Train and evaluate with production-quality images.
Definition
A video model misses important moments because frames are sampled poorly.
Solution
Tune sampling strategy and evaluate temporal coverage.
Explore more chapters or test your knowledge with quizzes.