Glossary
AI Security Glossary.
Plain-language definitions for the AI security terms that matter. Linked, citable, and short.
- Adversarial Testing (AI) Systematic testing of an AI system against attacks, edge cases, and failure modes.
- Agentic AI AI systems that take actions in the world via tool use, not just produce text. Includes custom GPTs, autonomous agents, and tool-using LLM applications.
- AI Firewall Network-layer security control that inspects AI traffic (prompts, outputs, API calls) for policy violations, sensitive data, threats.
- AI Governance The organizational structures, accountabilities, and processes for managing AI risk and ensuring responsible AI use.
- AI Policy (Internal) Organizational policy governing AI use: what's allowed, what's prohibited, who approves, how violations are handled.
- AI Red-Teaming Adversarial testing of AI systems to identify safety, security, and robustness failures before production.
- Constitutional AI Anthropic's approach to training language models with a defined set of principles (a constitution) used during fine-tuning to bias toward safer outputs.
- Custom GPT OpenAI's brand for user- or organization-built variants of ChatGPT with custom instructions, knowledge, and tool access.
- Differential Privacy A mathematical framework for measuring and bounding the privacy loss when statistics or models are released from sensitive data.
- Embedding Inversion Recovering source text or sensitive features from vector embeddings without direct access to the original content.
- EU AI Act Regulation (EU) 2024/1689. Risk-tiered AI obligations including prohibited practices, high-risk system requirements, transparency obligations.
- Excessive Agency (LLM06) An LLM-based agent has more permissions, tool access, or autonomy than its task requires.
- Hallucination (LLM) When an LLM generates plausible but factually incorrect content.
- Indirect Prompt Injection Prompt injection delivered through retrieved or referenced content (web pages, documents, emails) rather than direct user input.
- ISO/IEC 42001 International standard for AI management systems. Certifiable. Published 2023.
- Jailbreak (LLM) A specific class of prompt injection that bypasses an LLM's safety training to elicit content the model was tuned to refuse.
- LLM Firewall A security control that inspects prompts and outputs of LLM applications for policy violations, prompt injection, sensitive data, or jailbreak patterns.
- MITRE ATLAS Adversarial Threat Landscape for AI Systems. MITRE's catalog of adversarial machine learning tactics and techniques. Modeled on MITRE ATT&CK.
- Model Card Standardized documentation of an AI model's intended use, training data, performance, limitations, and ethical considerations.
- Model Inversion An attack that recovers training data or sensitive features by querying a model and analyzing outputs.
- NIST AI RMF NIST AI Risk Management Framework. Voluntary US framework defining four functions (Govern, Map, Measure, Manage) for AI risk.
- OWASP LLM Top 10 OWASP's catalog of the top 10 risks for LLM applications. Updated annually. The most-cited LLM security framework.
- Prompt Injection An attack where crafted input causes an LLM to override its instructions or context. Direct injection comes through user input. Indirect injection comes through retrieved or referenced content the LLM processes.
- RAG Security Security considerations specific to Retrieval-Augmented Generation pipelines: vector-store access control, corpus integrity, embedding inversion, indirect prompt injection.
- RLHF (Reinforcement Learning from Human Feedback) Technique for fine-tuning language models using human feedback to align outputs with preferred behaviors.
- Shadow AI Unsanctioned AI use within an organization, including consumer-tier ChatGPT, Copilot trial, custom GPTs, browser extensions.
- Sycophancy (LLM) An LLM's tendency to agree with the user's stated position or assumption rather than provide accurate analysis.
- System Card OpenAI's term for documentation of an AI system's safety properties, limitations, and risk evaluations.
- Training Data Poisoning Adversarial manipulation of training, fine-tuning, or RAG-corpus data to alter model behavior.
- Vector Store A database optimized for similarity search over high-dimensional embeddings. Foundational component of RAG pipelines.
- Vendor Due Diligence (AI) The process of assessing an AI vendor's security, privacy, and operational posture before procurement.
Want a personalized read?
Take the free AI Posture Check. The glossary terms become specific recommendations for your environment.
Take the AI Posture Check