Greedy Decoding
Definition
Always picks the single highest-probability token at each step. Fully deterministic, no randomness.
Purpose
Fastest possible decoding, but provably suboptimal: repetitive loops that sampling or beam search escape.
Decoding Strategies terms and explanations from the LLM Optimization Dictionary.
Definition
Always picks the single highest-probability token at each step. Fully deterministic, no randomness.
Purpose
Fastest possible decoding, but provably suboptimal: repetitive loops that sampling or beam search escape.
Definition
Maintains k partial sequences, expanding each by all tokens and keeping the top-k at each step.
Purpose
Used in MT and summarization where quality trumps speed. Beam width k=4 is a common production default.
Definition
Divides all logits by T before softmax. T<1 sharpens the distribution; T>1 flattens it.
Purpose
T\rightarrow0 converges to greedy; T=1 is standard sampling. The primary creativity control knob.
Definition
Zeros out all but the k highest-probability tokens, then samples from the normalized remainder.
Purpose
k=50 is a common default. Too small causes repetition; too large allows incoherent low-probability tokens.
Definition
Constructs the smallest token set whose cumulative probability \geq p, then samples from it.
Purpose
Self-calibrates vocabulary size to the model's local confidence. Generally preferred over fixed top-k.
Definition
Filters tokens whose probability is below p_{min}×p_{top} (the top token's probability).
Purpose
Adaptive and elegant: nearly greedy at high-confidence steps, diverse at uncertain steps. Smooth tradeoff.
Definition
Controls output quality as a feedback loop, adjusting temperature to hit a target perplexity \tau in real time.
Purpose
The only sampling method that provides a direct semantic quality knob. Consistent output across contexts.
Definition
Divides the logit of any previously generated token by a penalty scalar greater than 1.0.
Purpose
Directly reduces loops. Too aggressive causes the model to avoid necessary common words like pronouns.
Definition
Reduces the logit of each token by its generation count times a penalty factor.
Purpose
Proportional penalty encourages diverse vocabulary; less blunt than repetition penalty for long outputs.
Definition
Reduces the logit of each token by a fixed amount if it has appeared at least once in the output.
Purpose
Promotes topic shifts and new concepts. Weaker than frequency penalty for heavily repeated tokens.
Definition
Manually increments or decrements the raw logit for specific token IDs before softmax is applied.
Purpose
Used to ban offensive tokens, force JSON brackets, or guide constrained generation in production systems.
Definition
Masks invalid tokens at each step using a finite state machine or context-free grammar parser.
Purpose
Guarantees structurally valid JSON/SQL/YAML output without post-processing. Essential for tool-calling.
Definition
Runs conditional and unconditional passes; extrapolates: logits = uncond + s×(cond - uncond).
Purpose
Sharpens prompt adherence at cost of 2x compute per step. Borrowed from image generation models.
Definition
Filters out tokens with probability below an absolute threshold ε (e.g., 0.0001) before sampling.
Purpose
Removes statistically improbable tokens without the sharp cutoff of top-k; complements top-p well.
Definition
Selects tokens closest to the expected entropy of the distribution rather than highest probability.
Purpose
Produces more human-like output by avoiding both too-obvious and too-surprising next tokens.
Definition
Boosts tokens the large expert model favors over a small amateur model at each step.
Purpose
Reduces hallucination and improves factual accuracy without a reward model. Free quality improvement.
Explore more chapters or test your knowledge with quizzes.