API
Definition
An application programming interface that lets software systems exchange requests, responses, data, or actions.
MLOps, LLMOps & Observability terms and explanations from the Agentic AI Glossary.
Definition
An application programming interface that lets software systems exchange requests, responses, data, or actions.
Definition
API Gateway is a backend component related to api gateway. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
Authentication is a backend component related to authentication. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
Authorization is a backend component related to authorization. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
Background Job is a backend component related to background job. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
A Python task queue used to run background jobs, scheduled work, and asynchronous processing.
Definition
Circuit Breaker is a backend component related to circuit breaker. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
Cron Job is a backend component related to cron job. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
A full-featured Python web framework used to build database-backed web applications and admin systems.
Definition
A Node.js web framework used to build APIs, web servers, and middleware services.
Definition
A Python web framework often used to build fast, typed APIs for AI applications and model services.
Definition
A lightweight Python web framework commonly used for small APIs, prototypes, and internal AI tools.
Definition
An API query language that lets clients request exactly the fields they need from a service.
Definition
A high-performance API protocol that uses typed service definitions and binary messages for fast system-to-system communication.
Definition
A signed JSON Web Token used to prove identity or authorization claims between services.
Definition
A distributed event streaming platform used to move high-volume messages between systems.
Definition
Message Broker is a backend component related to message broker. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
A JavaScript runtime used to build server-side applications, APIs, real-time services, and tool backends.
Definition
An authorization standard that lets applications access resources on behalf of a user without sharing the user's password.
Definition
Queue is a backend component related to queue. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
A message broker used to route tasks or events through queues between applications.
Definition
Rate Limiting is a backend component related to rate limiting. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
Redis Queue is a backend component related to redis queue. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
Request Timeout is a backend component related to request timeout. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
An API style that uses standard HTTP methods such as GET, POST, PUT, and DELETE to access resources.
Definition
Retry Policy is a backend component related to retry policy. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
A web streaming method where a server pushes one-way updates to a browser or client over HTTP.
Definition
A Java framework that simplifies building production-ready APIs, services, and enterprise applications.
Definition
Streaming Response is a backend component related to streaming response. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
A durable workflow platform used to run long-lived processes with retries, state, and recovery.
Definition
An HTTP callback that lets one system notify another system when an event occurs.
Definition
A persistent two-way connection that allows servers and clients to exchange messages in real time.
Definition
Worker is a backend component related to worker. AI engineers use it to connect applications with users, services, workers, or data systems.
Definition
API Design means designing the structure, data flow, controls, and trade-offs for api. It turns requirements into an architecture that can be built and operated.
Definition
The high-level structure of components, data flows, models, tools, storage, and controls in a system.
Definition
The percentage of time an AI service is usable and able to respond within expected reliability targets.
Definition
Using specialized tools for each layer instead of one integrated platform.
Definition
Caching Strategy is a planned approach for caching. It guides design, deployment, measurement, governance, and continuous improvement.
Definition
The order of decisions and actions that determines which model, tool, branch, or human approval path runs next.
Definition
Cost Budget sets a limit or target for cost. Engineers use it to make practical trade-offs during system design and operations.
Definition
The path data follows through ingestion, retrieval, model calls, tools, storage, logging, and output.
Definition
Database Design means designing the structure, data flow, controls, and trade-offs for database. It turns requirements into an architecture that can be built and operated.
Definition
Deployment Design means designing the structure, data flow, controls, and trade-offs for deployment. It turns requirements into an architecture that can be built and operated.
Definition
Built to satisfy enterprise expectations for security, governance, scale, support, compliance, and reliability.
Definition
Evaluation Design means designing the structure, data flow, controls, and trade-offs for evaluation. It turns requirements into an architecture that can be built and operated.
Definition
A review of possible failure points and their impact so teams can design tests, guardrails, and recovery paths.
Definition
A behavior the system must provide, such as answering questions, calling tools, creating tickets, or generating reports.
Definition
Guardrail Design means designing the structure, data flow, controls, and trade-offs for guardrail. It turns requirements into an architecture that can be built and operated.
Definition
High-Level Design means designing the structure, data flow, controls, and trade-offs for high-level. It turns requirements into an architecture that can be built and operated.
Definition
Latency Budget sets a limit or target for latency. Engineers use it to make practical trade-offs during system design and operations.
Definition
Low-Level Design means designing the structure, data flow, controls, and trade-offs for low-level. It turns requirements into an architecture that can be built and operated.
Definition
How easy the system is to update, debug, test, document, and operate as requirements change.
Definition
Memory Design means designing the structure, data flow, controls, and trade-offs for memory. It turns requirements into an architecture that can be built and operated.
Definition
A quality constraint such as latency, availability, security, privacy, cost, scalability, or maintainability.
Definition
Observability Design means designing the structure, data flow, controls, and trade-offs for observability. It turns requirements into an architecture that can be built and operated.
Definition
Using an integrated platform for knowledge, actions, orchestration, monitoring, and governance rather than many disconnected tools.
Definition
Platform Strategy is a planned approach for platform. It guides design, deployment, measurement, governance, and continuous improvement.
Definition
Proof of concept: a small validation project before larger rollout.
Definition
Design practices that limit collection, exposure, retention, and misuse of personal or sensitive data.
Definition
The level of reliability, security, observability, scalability, and support needed for live use.
Definition
A limited implementation used to validate feasibility, value, risk, and stakeholder confidence.
Definition
Queue Design means designing the structure, data flow, controls, and trade-offs for queue. It turns requirements into an architecture that can be built and operated.
Definition
RAG Design means designing the structure, data flow, controls, and trade-offs for rag. It turns requirements into an architecture that can be built and operated.
Definition
The process of discovering user goals, constraints, data sources, risks, and success metrics before designing a system.
Definition
A planned evolution of capabilities, milestones, and product direction.
Definition
Tool Design means designing the structure, data flow, controls, and trade-offs for tool. It turns requirements into an architecture that can be built and operated.
Definition
Comparing design choices such as speed versus accuracy, cost versus quality, or autonomy versus safety.
Definition
Risk that a product architecture becomes difficult to migrate away from due to proprietary dependencies.
Definition
An AWS service for building generative AI applications with managed foundation models and enterprise controls.
Definition
The API used to access Claude models and related capabilities such as tool use and structured outputs.
Definition
Automatically adding or removing compute capacity based on traffic, queue depth, latency, or resource usage.
Definition
A serverless compute service that runs code in response to events without managing servers.
Definition
Microsoft's platform for building, evaluating, deploying, and managing AI applications and agents.
Definition
Microsoft cloud object storage for files, documents, images, logs, and other unstructured data.
Definition
A serverless compute service in Azure for running event-driven code.
Definition
A portable package that bundles application code, dependencies, and configuration so it runs consistently across environments.
Definition
NVIDIA's parallel computing platform that lets software use GPUs for accelerated model training and inference.
Definition
The process of releasing application, model, or agent changes into an environment where users or systems can access them.
Definition
A container platform commonly used to package AI services, APIs, workers, and supporting infrastructure.
Definition
External configuration values used to set API keys, endpoints, feature flags, and runtime settings without changing code.
Definition
Google Cloud Storage, an object storage service for files, model artifacts, datasets, and logs.
Definition
A managed serverless platform for running containerized services that scale based on request traffic.
Definition
A graphics processing unit used to accelerate matrix operations for model training and inference.
Definition
A Kubernetes package manager used to install and configure applications with reusable charts.
Definition
A versioned container template containing code, dependencies, and runtime instructions used to start containers.
Definition
Managing cloud resources with versioned configuration files so environments can be reviewed, reproduced, and automated.
Definition
A container orchestration system for deploying, scaling, and managing services across clusters.
Definition
A component that distributes traffic across service instances to improve availability, throughput, and fault tolerance.
Definition
Hosting a model behind an API or inference endpoint so applications can send inputs and receive predictions.
Definition
Durable storage for large files such as documents, logs, datasets, embeddings, exports, and model artifacts.
Definition
A local runtime for downloading and running open-source language models on a developer machine.
Definition
The API used to access OpenAI models, embeddings, tools, structured outputs, and agent capabilities.
Definition
The smallest deployable unit in Kubernetes, usually containing one or more containers that run together.
Definition
Amazon Simple Storage Service, an object store often used for datasets, logs, artifacts, and document repositories.
Definition
A secure service for storing and rotating API keys, database passwords, tokens, and certificates.
Definition
A deployment model where the cloud provider manages servers and automatically scales execution for events or requests.
Definition
A network-accessible application component, often exposing an API for agents, tools, users, or other systems.
Definition
An infrastructure-as-code tool used to define and provision cloud resources through configuration files.
Definition
Text Generation Inference, a server for deploying and serving transformer-based language models.
Definition
A tensor processing unit, Google's accelerator hardware for machine learning workloads.
Definition
Google Cloud's managed platform for building, training, deploying, and monitoring ML and generative AI systems.
Definition
A high-throughput inference engine for serving large language models efficiently.
Definition
Running work without blocking the main request, useful for slow tools, background tasks, and parallel operations.
Definition
Grouping multiple requests or tasks together to improve throughput and reduce overhead.
Definition
Saving reusable results, prompts, embeddings, retrieval outputs, or responses to reduce latency and cost.
Definition
Reducing model, infrastructure, token, storage, and operations cost while preserving quality.
Definition
The average spend for one user or API request, including model usage, retrieval, tools, and infrastructure.
Definition
The average AI system cost attributable to one user over a period, useful for pricing and capacity planning.
Definition
Training a smaller model to imitate a larger model so inference becomes cheaper or faster.
Definition
The compute and provider cost required to run a model and generate outputs after training is complete.
Definition
Routing difficult, risky, or low-confidence tasks from a smaller model to a stronger model.
Definition
Temporarily rejecting or delaying lower-priority work to protect system stability during overload.
Definition
Running independent steps at the same time to reduce total task latency.
Definition
Prompt Cache stores reusable prompt results. It reduces repeated work, lowers cost, and improves speed when similar requests appear again.
Definition
Response Cache stores reusable response results. It reduces repeated work, lowers cost, and improves speed when similar requests appear again.
Definition
Retrieval Cache stores reusable retrieval results. It reduces repeated work, lowers cost, and improves speed when similar requests appear again.
Definition
Sending simple or low-risk tasks to cheaper smaller models while reserving larger models for harder cases.
Definition
Delivering output incrementally as it is generated, improving perceived responsiveness for users.
Definition
The price associated with processing input and output tokens during model inference.
Definition
Controlling prompt size, retrieved context, output length, and caching to manage cost and latency.
Definition
A throughput metric showing how quickly a model produces or processes tokens.
Definition
Testing different model, prompt, workflow, or UX variants against real or representative traffic.
Definition
AI Audit Trail is an AI operations practice for ai audit trail. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Batch Inference is an AI operations practice for batch inference. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Blue-Green Deployment is an AI operations practice for blue-green deployment. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Canary Deployment is an AI operations practice for canary deployment. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
CI/CD for ML is an AI operations practice for ci/cd for ml. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Concept Drift is an AI operations practice for concept drift. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Ongoing testing of models, prompts, retrieval, and agents as data, users, and behavior change over time.
Definition
Continuous Monitoring is an AI operations practice for continuous monitoring. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Data Card is an AI operations practice for data card. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Data Drift is an AI operations practice for data drift. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Dataset Versioning is an AI operations practice for dataset versioning. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
The ability to identify drift signals in inputs, outputs, logs, retrieved content, or system behavior.
Definition
Experiment Tracking is an AI operations practice for experiment tracking. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
A storage system for feature data that an AI application can save, query, or retrieve during execution.
Definition
Inference Endpoint is an AI operations practice for inference endpoint. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
LLMOps is an AI operations practice for llmops. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
MLOps is an AI operations practice for mlops. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Model Card is an AI operations practice for model card. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Model Drift is an AI operations practice for model drift. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Model Governance is an AI operations practice for model governance. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Model Monitoring is an AI operations practice for model monitoring. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Model Registry is an AI operations practice for model registry. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Model Versioning is an AI operations practice for model versioning. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Testing AI behavior on saved datasets or scenarios before release, without affecting live users.
Definition
Online Inference is an AI operations practice for online inference. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Rollback is an AI operations practice for rollback. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
Shadow Deployment is an AI operations practice for shadow deployment. It supports versioning, deployment, evaluation, monitoring, governance, or rollback after launch.
Definition
A record of the agent's steps, tool calls, observations, and decisions during execution.
Definition
An observability term for arize phoenix, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
An observability term for audit log, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
Durable records of prompts, decisions, tool calls, data access, outputs, and human approvals.
Definition
An observability term for cost monitoring, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
The percentage of requests, tool calls, or workflow steps that fail or return invalid results.
Definition
An observability term for evaluation dashboard, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
A mechanism for using user feedback, human review, logs, and metrics to improve future behavior.
Definition
An observability term for humanloop, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
An observability term for langsmith, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
The elapsed time between a request and an agent response or action.
Definition
An observability term for logging, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
An observability term for metrics, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
Visibility into prompts, model calls, retrieval, tool calls, traces, metrics, errors, and outcomes.
Definition
An observability term for opentelemetry, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
A record of prompt templates, context, model parameters, and generated outputs for debugging.
Definition
A recorded timeline of retrieval activity that helps engineers debug behavior and understand execution history.
Definition
An observability term for retry count, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
An observability term for session id, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
An observability term for session replay, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
A timed unit of work inside a trace, such as retrieval, model call, or tool execution.
Definition
Operational data emitted by systems for monitoring, debugging, evaluation, and cost analysis.
Definition
The amount of work an AI system can process over a period of time.
Definition
An observability term for time to first token, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
An observability term for tool monitoring, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
A record of tool invocations, inputs, outputs, latency, errors, and retries.
Definition
A unique identifier that connects logs and spans for one request or workflow.
Definition
An observability term for tracing, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
An observability term for trulens, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
An observability term for user feedback, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Definition
An observability term for weights \& biases, used to inspect agent behavior, debug failures, monitor production health, or analyze cost.
Explore more chapters or test your knowledge with quizzes.