Back to LLM Optimization Dictionary
LLM Optimization Dictionary

Decoding Strategies

Decoding Strategies terms and explanations from the LLM Optimization Dictionary.

16 terms in this chapter
01

Greedy Decoding

Definition

Always picks the single highest-probability token at each step. Fully deterministic, no randomness.

Purpose

Fastest possible decoding, but provably suboptimal: repetitive loops that sampling or beam search escape.

03

Temperature Sampling

Definition

Divides all logits by T before softmax. T<1 sharpens the distribution; T>1 flattens it.

Purpose

T\rightarrow0 converges to greedy; T=1 is standard sampling. The primary creativity control knob.

04

Top-k Sampling

Definition

Zeros out all but the k highest-probability tokens, then samples from the normalized remainder.

Purpose

k=50 is a common default. Too small causes repetition; too large allows incoherent low-probability tokens.

05

Top-p (Nucleus) Sampling

Definition

Constructs the smallest token set whose cumulative probability \geq p, then samples from it.

Purpose

Self-calibrates vocabulary size to the model's local confidence. Generally preferred over fixed top-k.

06

Min-p Sampling

Definition

Filters tokens whose probability is below p_{min}×p_{top} (the top token's probability).

Purpose

Adaptive and elegant: nearly greedy at high-confidence steps, diverse at uncertain steps. Smooth tradeoff.

07

Mirostat

Definition

Controls output quality as a feedback loop, adjusting temperature to hit a target perplexity \tau in real time.

Purpose

The only sampling method that provides a direct semantic quality knob. Consistent output across contexts.

08

Repetition Penalty

Definition

Divides the logit of any previously generated token by a penalty scalar greater than 1.0.

Purpose

Directly reduces loops. Too aggressive causes the model to avoid necessary common words like pronouns.

09

Frequency Penalty

Definition

Reduces the logit of each token by its generation count times a penalty factor.

Purpose

Proportional penalty encourages diverse vocabulary; less blunt than repetition penalty for long outputs.

10

Presence Penalty

Definition

Reduces the logit of each token by a fixed amount if it has appeared at least once in the output.

Purpose

Promotes topic shifts and new concepts. Weaker than frequency penalty for heavily repeated tokens.

11

Logit Bias

Definition

Manually increments or decrements the raw logit for specific token IDs before softmax is applied.

Purpose

Used to ban offensive tokens, force JSON brackets, or guide constrained generation in production systems.

12

Constrained Decoding

Definition

Masks invalid tokens at each step using a finite state machine or context-free grammar parser.

Purpose

Guarantees structurally valid JSON/SQL/YAML output without post-processing. Essential for tool-calling.

13

CFG (Classifier-Free Guidance)

Definition

Runs conditional and unconditional passes; extrapolates: logits = uncond + s×(cond - uncond).

Purpose

Sharpens prompt adherence at cost of 2x compute per step. Borrowed from image generation models.

14

Epsilon Sampling

Definition

Filters out tokens with probability below an absolute threshold ε (e.g., 0.0001) before sampling.

Purpose

Removes statistically improbable tokens without the sharp cutoff of top-k; complements top-p well.

15

Typical Sampling

Definition

Selects tokens closest to the expected entropy of the distribution rather than highest probability.

Purpose

Produces more human-like output by avoiding both too-obvious and too-surprising next tokens.

16

Contrastive Decoding

Definition

Boosts tokens the large expert model favors over a small amateur model at each step.

Purpose

Reduces hallucination and improves factual accuracy without a reward model. Free quality improvement.

Explore more chapters or test your knowledge with quizzes.