Agentic AI Glossary

MLOps, LLMOps & Observability

MLOps, LLMOps & Observability terms and explanations from the Agentic AI Glossary.

180 terms in this chapter

API

Definition

An application programming interface that lets software systems exchange requests, responses, data, or actions.

API Gateway

Definition

API Gateway is a backend component related to api gateway. AI engineers use it to connect applications with users, services, workers, or data systems.

Authentication

Definition

Authentication is a backend component related to authentication. AI engineers use it to connect applications with users, services, workers, or data systems.

Authorization

Definition

Authorization is a backend component related to authorization. AI engineers use it to connect applications with users, services, workers, or data systems.

Background Job

Definition

Background Job is a backend component related to background job. AI engineers use it to connect applications with users, services, workers, or data systems.

Celery

Definition

A Python task queue used to run background jobs, scheduled work, and asynchronous processing.

Circuit Breaker

Definition

Circuit Breaker is a backend component related to circuit breaker. AI engineers use it to connect applications with users, services, workers, or data systems.

Cron Job

Definition

Cron Job is a backend component related to cron job. AI engineers use it to connect applications with users, services, workers, or data systems.

Django

Definition

A full-featured Python web framework used to build database-backed web applications and admin systems.

Express

Definition

A Node.js web framework used to build APIs, web servers, and middleware services.

FastAPI

Definition

A Python web framework often used to build fast, typed APIs for AI applications and model services.

Flask

Definition

A lightweight Python web framework commonly used for small APIs, prototypes, and internal AI tools.

GraphQL

Definition

An API query language that lets clients request exactly the fields they need from a service.

gRPC

Definition

A high-performance API protocol that uses typed service definitions and binary messages for fast system-to-system communication.

JWT

Definition

A signed JSON Web Token used to prove identity or authorization claims between services.

Kafka

Definition

A distributed event streaming platform used to move high-volume messages between systems.

Message Broker

Definition

Message Broker is a backend component related to message broker. AI engineers use it to connect applications with users, services, workers, or data systems.

Node.js

Definition

A JavaScript runtime used to build server-side applications, APIs, real-time services, and tool backends.

OAuth

Definition

An authorization standard that lets applications access resources on behalf of a user without sharing the user's password.

Queue

Definition

Queue is a backend component related to queue. AI engineers use it to connect applications with users, services, workers, or data systems.

RabbitMQ

Definition

A message broker used to route tasks or events through queues between applications.

Rate Limiting

Definition

Rate Limiting is a backend component related to rate limiting. AI engineers use it to connect applications with users, services, workers, or data systems.

Redis Queue

Definition

Redis Queue is a backend component related to redis queue. AI engineers use it to connect applications with users, services, workers, or data systems.

Request Timeout

Definition

Request Timeout is a backend component related to request timeout. AI engineers use it to connect applications with users, services, workers, or data systems.

REST API

Definition

An API style that uses standard HTTP methods such as GET, POST, PUT, and DELETE to access resources.

Retry Policy

Definition

Retry Policy is a backend component related to retry policy. AI engineers use it to connect applications with users, services, workers, or data systems.

Server-Sent Events

Definition

A web streaming method where a server pushes one-way updates to a browser or client over HTTP.

Spring Boot

Definition

A Java framework that simplifies building production-ready APIs, services, and enterprise applications.

Streaming Response

Definition

Streaming Response is a backend component related to streaming response. AI engineers use it to connect applications with users, services, workers, or data systems.

Temporal

Definition

A durable workflow platform used to run long-lived processes with retries, state, and recovery.

Webhook

Definition

An HTTP callback that lets one system notify another system when an event occurs.

WebSocket

Definition

A persistent two-way connection that allows servers and clients to exchange messages in real time.

Worker

Definition

Worker is a backend component related to worker. AI engineers use it to connect applications with users, services, workers, or data systems.

API Design

Definition

API Design means designing the structure, data flow, controls, and trade-offs for api. It turns requirements into an architecture that can be built and operated.

Architecture

Definition

The high-level structure of components, data flows, models, tools, storage, and controls in a system.

Availability

Definition

The percentage of time an AI service is usable and able to respond within expected reliability targets.

Best-of-Breed

Definition

Using specialized tools for each layer instead of one integrated platform.

Caching Strategy

Definition

Caching Strategy is a planned approach for caching. It guides design, deployment, measurement, governance, and continuous improvement.

Control Flow

Definition

The order of decisions and actions that determines which model, tool, branch, or human approval path runs next.

Cost Budget

Definition

Cost Budget sets a limit or target for cost. Engineers use it to make practical trade-offs during system design and operations.

Data Flow

Definition

The path data follows through ingestion, retrieval, model calls, tools, storage, logging, and output.

Database Design

Definition

Database Design means designing the structure, data flow, controls, and trade-offs for database. It turns requirements into an architecture that can be built and operated.

Deployment Design

Definition

Deployment Design means designing the structure, data flow, controls, and trade-offs for deployment. It turns requirements into an architecture that can be built and operated.

Enterprise-Grade

Definition

Built to satisfy enterprise expectations for security, governance, scale, support, compliance, and reliability.

Evaluation Design

Definition

Evaluation Design means designing the structure, data flow, controls, and trade-offs for evaluation. It turns requirements into an architecture that can be built and operated.

Failure Mode Analysis

Definition

A review of possible failure points and their impact so teams can design tests, guardrails, and recovery paths.

Functional Requirement

Definition

A behavior the system must provide, such as answering questions, calling tools, creating tickets, or generating reports.

Guardrail Design

Definition

Guardrail Design means designing the structure, data flow, controls, and trade-offs for guardrail. It turns requirements into an architecture that can be built and operated.

High-Level Design

Definition

High-Level Design means designing the structure, data flow, controls, and trade-offs for high-level. It turns requirements into an architecture that can be built and operated.

Latency Budget

Definition

Latency Budget sets a limit or target for latency. Engineers use it to make practical trade-offs during system design and operations.

Low-Level Design

Definition

Low-Level Design means designing the structure, data flow, controls, and trade-offs for low-level. It turns requirements into an architecture that can be built and operated.

Maintainability

Definition

How easy the system is to update, debug, test, document, and operate as requirements change.

Memory Design

Definition

Memory Design means designing the structure, data flow, controls, and trade-offs for memory. It turns requirements into an architecture that can be built and operated.

Non-Functional Requirement

Definition

A quality constraint such as latency, availability, security, privacy, cost, scalability, or maintainability.

Observability Design

Definition

Observability Design means designing the structure, data flow, controls, and trade-offs for observability. It turns requirements into an architecture that can be built and operated.

Platform Approach

Definition

Using an integrated platform for knowledge, actions, orchestration, monitoring, and governance rather than many disconnected tools.

Platform Strategy

Definition

Platform Strategy is a planned approach for platform. It guides design, deployment, measurement, governance, and continuous improvement.

POC

Definition

Proof of concept: a small validation project before larger rollout.

Privacy

Definition

Design practices that limit collection, exposure, retention, and misuse of personal or sensitive data.

Production Readiness

Definition

The level of reliability, security, observability, scalability, and support needed for live use.

Proof of Concept

Definition

A limited implementation used to validate feasibility, value, risk, and stakeholder confidence.

Queue Design

Definition

Queue Design means designing the structure, data flow, controls, and trade-offs for queue. It turns requirements into an architecture that can be built and operated.

RAG Design

Definition

RAG Design means designing the structure, data flow, controls, and trade-offs for rag. It turns requirements into an architecture that can be built and operated.

Requirement Gathering

Definition

The process of discovering user goals, constraints, data sources, risks, and success metrics before designing a system.

Roadmap

Definition

A planned evolution of capabilities, milestones, and product direction.

Tool Design

Definition

Tool Design means designing the structure, data flow, controls, and trade-offs for tool. It turns requirements into an architecture that can be built and operated.

Trade-Off Analysis

Definition

Comparing design choices such as speed versus accuracy, cost versus quality, or autonomy versus safety.

Vendor Lock-In

Definition

Risk that a product architecture becomes difficult to migrate away from due to proprietary dependencies.

Amazon Bedrock

Definition

An AWS service for building generative AI applications with managed foundation models and enterprise controls.

Anthropic API

Definition

The API used to access Claude models and related capabilities such as tool use and structured outputs.

Autoscaling

Definition

Automatically adding or removing compute capacity based on traffic, queue depth, latency, or resource usage.

AWS Lambda

Definition

A serverless compute service that runs code in response to events without managing servers.

Azure AI Foundry

Definition

Microsoft's platform for building, evaluating, deploying, and managing AI applications and agents.

Azure Blob Storage

Definition

Microsoft cloud object storage for files, documents, images, logs, and other unstructured data.

Azure Functions

Definition

A serverless compute service in Azure for running event-driven code.

Container

Definition

A portable package that bundles application code, dependencies, and configuration so it runs consistently across environments.

CUDA

Definition

NVIDIA's parallel computing platform that lets software use GPUs for accelerated model training and inference.

Deployment

Definition

The process of releasing application, model, or agent changes into an environment where users or systems can access them.

Docker

Definition

A container platform commonly used to package AI services, APIs, workers, and supporting infrastructure.

Environment Variables

Definition

External configuration values used to set API keys, endpoints, feature flags, and runtime settings without changing code.

GCS

Definition

Google Cloud Storage, an object storage service for files, model artifacts, datasets, and logs.

Google Cloud Run

Definition

A managed serverless platform for running containerized services that scale based on request traffic.

GPU

Definition

A graphics processing unit used to accelerate matrix operations for model training and inference.

Helm

Definition

A Kubernetes package manager used to install and configure applications with reusable charts.

Image

Definition

A versioned container template containing code, dependencies, and runtime instructions used to start containers.

Infrastructure as Code

Definition

Managing cloud resources with versioned configuration files so environments can be reviewed, reproduced, and automated.

Kubernetes

Definition

A container orchestration system for deploying, scaling, and managing services across clusters.

Load Balancer

Definition

A component that distributes traffic across service instances to improve availability, throughput, and fault tolerance.

Model Serving

Definition

Hosting a model behind an API or inference endpoint so applications can send inputs and receive predictions.

Object Storage

Definition

Durable storage for large files such as documents, logs, datasets, embeddings, exports, and model artifacts.

Ollama

Definition

A local runtime for downloading and running open-source language models on a developer machine.

OpenAI API

Definition

The API used to access OpenAI models, embeddings, tools, structured outputs, and agent capabilities.

Pod

Definition

The smallest deployable unit in Kubernetes, usually containing one or more containers that run together.

S3

Definition

Amazon Simple Storage Service, an object store often used for datasets, logs, artifacts, and document repositories.

Secrets Manager

Definition

A secure service for storing and rotating API keys, database passwords, tokens, and certificates.

Serverless

Definition

A deployment model where the cloud provider manages servers and automatically scales execution for events or requests.

Service

Definition

A network-accessible application component, often exposing an API for agents, tools, users, or other systems.

Terraform

Definition

An infrastructure-as-code tool used to define and provision cloud resources through configuration files.

TGI

Definition

Text Generation Inference, a server for deploying and serving transformer-based language models.

100

TPU

Definition

A tensor processing unit, Google's accelerator hardware for machine learning workloads.

101

Vertex AI

Definition

Google Cloud's managed platform for building, training, deploying, and monitoring ML and generative AI systems.

102

vLLM

Definition

A high-throughput inference engine for serving large language models efficiently.

103

Async Execution

Definition

Running work without blocking the main request, useful for slow tools, background tasks, and parallel operations.

104

Batching

Definition

Grouping multiple requests or tasks together to improve throughput and reduce overhead.

105

Caching

Definition

Saving reusable results, prompts, embeddings, retrieval outputs, or responses to reduce latency and cost.

106

Cost Optimization

Definition

Reducing model, infrastructure, token, storage, and operations cost while preserving quality.

107

Cost per Request

Definition

The average spend for one user or API request, including model usage, retrieval, tools, and infrastructure.

108

Cost per User

Definition

The average AI system cost attributable to one user over a period, useful for pricing and capacity planning.

109

Distillation

Definition

Training a smaller model to imitate a larger model so inference becomes cheaper or faster.

110

Inference Cost

Definition

The compute and provider cost required to run a model and generate outputs after training is complete.

111

Large Model Escalation

Definition

Routing difficult, risky, or low-confidence tasks from a smaller model to a stronger model.

112

Load Shedding

Definition

Temporarily rejecting or delaying lower-priority work to protect system stability during overload.

113

Parallel Execution

Definition

Running independent steps at the same time to reduce total task latency.

114

Prompt Cache

Definition

Prompt Cache stores reusable prompt results. It reduces repeated work, lowers cost, and improves speed when similar requests appear again.

115

Response Cache

Definition

Response Cache stores reusable response results. It reduces repeated work, lowers cost, and improves speed when similar requests appear again.

116

Retrieval Cache

Definition

Retrieval Cache stores reusable retrieval results. It reduces repeated work, lowers cost, and improves speed when similar requests appear again.

117

Small Model Routing

Definition

Sending simple or low-risk tasks to cheaper smaller models while reserving larger models for harder cases.

118

Streaming

Definition

Delivering output incrementally as it is generated, improving perceived responsiveness for users.

119

Token Cost

Definition

The price associated with processing input and output tokens during model inference.

120

Token Management

Definition

Controlling prompt size, retrieved context, output length, and caching to manage cost and latency.

121

Tokens per Second

Definition

A throughput metric showing how quickly a model produces or processes tokens.

122

A/B Testing

Definition

Testing different model, prompt, workflow, or UX variants against real or representative traffic.

123

AI Audit Trail

Definition

AI Audit Trail is an AI operations practice for ai audit trail. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

124

Batch Inference

Definition

Batch Inference is an AI operations practice for batch inference. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

125

Blue-Green Deployment

Definition

Blue-Green Deployment is an AI operations practice for blue-green deployment. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

126

Canary Deployment

Definition

Canary Deployment is an AI operations practice for canary deployment. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

127

CI/CD for ML

Definition

CI/CD for ML is an AI operations practice for ci/cd for ml. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

128

Concept Drift

Definition

Concept Drift is an AI operations practice for concept drift. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

129

Continuous Evaluation

Definition

Ongoing testing of models, prompts, retrieval, and agents as data, users, and behavior change over time.

130

Continuous Monitoring

Definition

Continuous Monitoring is an AI operations practice for continuous monitoring. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

131

Data Card

Definition

Data Card is an AI operations practice for data card. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

132

Data Drift

Definition

Data Drift is an AI operations practice for data drift. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

133

Dataset Versioning

Definition

Dataset Versioning is an AI operations practice for dataset versioning. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

134

Drift Detection

Definition

The ability to identify drift signals in inputs, outputs, logs, retrieved content, or system behavior.

135

Experiment Tracking

Definition

Experiment Tracking is an AI operations practice for experiment tracking. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

136

Feature Store

Definition

A storage system for feature data that an AI application can save, query, or retrieve during execution.

137

Inference Endpoint

Definition

Inference Endpoint is an AI operations practice for inference endpoint. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

138

LLMOps

Definition

LLMOps is an AI operations practice for llmops. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

139

MLOps

Definition

MLOps is an AI operations practice for mlops. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

140

Model Card

Definition

Model Card is an AI operations practice for model card. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

141

Model Drift

Definition

Model Drift is an AI operations practice for model drift. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

142

Model Governance

Definition

Model Governance is an AI operations practice for model governance. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

143

Model Monitoring

Definition

Model Monitoring is an AI operations practice for model monitoring. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

144

Model Registry

Definition

Model Registry is an AI operations practice for model registry. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

145

Model Versioning

Definition

Model Versioning is an AI operations practice for model versioning. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

146

Offline Evaluation

Definition

Testing AI behavior on saved datasets or scenarios before release, without affecting live users.

147

Online Inference

Definition

Online Inference is an AI operations practice for online inference. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

148

Rollback

Definition

Rollback is an AI operations practice for rollback. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

149

Shadow Deployment

Definition

Shadow Deployment is an AI operations practice for shadow deployment. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.

150

Agent Trace

Definition

A record of the agent's steps, tool calls, observations, and decisions during execution.

151

Arize Phoenix

Definition

An observability term for arize phoenix, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

152

Audit Log

Definition

An observability term for audit log, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

153

Audit Logs

Definition

Durable records of prompts, decisions, tool calls, data access, outputs, and human approvals.

154

Cost Monitoring

Definition

An observability term for cost monitoring, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

155

Error Rate

Definition

The percentage of requests, tool calls, or workflow steps that fail or return invalid results.

156

Evaluation Dashboard

Definition

An observability term for evaluation dashboard, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

157

Feedback Loop

Definition

A mechanism for using user feedback, human review, logs, and metrics to improve future behavior.

158

Humanloop

Definition

An observability term for humanloop, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

159

LangSmith

Definition

An observability term for langsmith, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

160

Latency

Definition

The elapsed time between a request and an agent response or action.

161

Logging

Definition

An observability term for logging, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

162

Metrics

Definition

An observability term for metrics, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

163

Observability

Definition

Visibility into prompts, model calls, retrieval, tool calls, traces, metrics, errors, and outcomes.

164

OpenTelemetry

Definition

An observability term for opentelemetry, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

165

Prompt Trace

Definition

A record of prompt templates, context, model parameters, and generated outputs for debugging.

166

Retrieval Trace

Definition

A recorded timeline of retrieval activity that helps engineers debug behavior and understand execution history.

167

Retry Count

Definition

An observability term for retry count, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

168

Session ID

Definition

An observability term for session id, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

169

Session Replay

Definition

An observability term for session replay, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

170

Span

Definition

A timed unit of work inside a trace, such as retrieval, model call, or tool execution.

171

Telemetry

Definition

Operational data emitted by systems for monitoring, debugging, evaluation, and cost analysis.

172

Throughput

Definition

The amount of work an AI system can process over a period of time.

173

Time to First Token

Definition

An observability term for time to first token, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

174

Tool Monitoring

Definition

An observability term for tool monitoring, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

175

Tool Trace

Definition

A record of tool invocations, inputs, outputs, latency, errors, and retries.

176

Trace ID

Definition

A unique identifier that connects logs and spans for one request or workflow.

177

Tracing

Definition

An observability term for tracing, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

178

TruLens

Definition

An observability term for trulens, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

179

User Feedback

Definition

An observability term for user feedback, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

180

Weights \& Biases

Definition

An observability term for weights \& biases, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.

Explore more chapters or test your knowledge with quizzes.

Back to Agentic AI Glossary All glossary chapters Practice quizzes