Skip to main content
BVDNETBVDNET
ServicesWorkLibraryAboutPricingBlogContact
Contact
  1. Home
  2. AI Woordenboek
  3. Safety & Ethics

Safety & Ethics

22 concepts

All categoriesModels & ArchitectureTools & FrameworksAgentic AIResearchOpen SourceSafety & EthicsMultimodal & CreativeIndustry & BusinessPractical ApplicationsCore Concepts
AI Alignment
Intermediate
Safety & Ethics

AI Alignment

Ensuring AI systems behave in accordance with human values, intentions, and safety requirements

What Is AI Jailbreaking? Bypass Attacks on LLM Safety Guardrails
Intermediate
Safety & Ethics

AI Jailbreaking

Adversarial techniques that bypass an LLM's safety guardrails to produce prohibited content — a key threat that drives AI safety research and red-teaming practice

What Is AI Red Teaming? Systematic Adversarial Testing of AI Systems
Intermediate
Safety & Ethics

AI Red Teaming

Systematically probing AI systems for vulnerabilities, failure modes, and alignment gaps before deployment — now quantifiable in dollar terms via economic benchmarks like ACE.

What Is AgentDrift and Why Does It Matter?
Advanced
Safety & Ethics

AgentDrift

Benchmark proving AI agents blindly accept corrupted tool data — 0 out of 1,563 turns questioned, while appearing to perform well on standard metrics.

What Is Constitutional AI (CAI)? Principle-Based AI Alignment Explained
Advanced
Safety & Ethics

Constitutional AI (CAI)

A training approach where AI models critique and revise their own outputs against a set of principles, using AI-generated feedback for scalable alignment

Prompt Injection
Intermediate
Safety & Ethics

Prompt Injection

An attack where malicious input manipulates an LLM into ignoring its instructions

What Is Reward Hacking in AI Agents?
Intermediate
Safety & Ethics

Reward Hacking in AI Agents

AI agents gaming their benchmarks — evaluator tampering occurs in 50% of episodes and gets worse with more capable models.

What Is SynthID?
Intermediate
Safety & Ethics

SynthID

Google's digital watermarking technology that embeds imperceptible, persistent identifiers in AI-generated images, audio, text, and video to prove synthetic origin.

What Is an Instruction Hierarchy for AI Safety?
Intermediate
Safety & Ethics

Instruction Hierarchy for AI Safety

Safety pattern giving system prompts priority over user inputs and tool outputs — preventing prompt injection in autonomous agents.

What are Guardrails?
Intermediate
Safety & Ethics

Guardrails

Guardrails are safety mechanisms that constrain AI system behavior — filtering inputs, validating outputs, and preventing harmful or off-topic responses in production applications.

What is AI Governance?
Beginner
Safety & Ethics

AI Governance

AI governance is the framework of policies, regulations, and practices that ensure AI systems are developed and deployed responsibly, fairly, and in compliance with laws.

What is Autonomous AI Cybersecurity Defense?
Advanced
Safety & Ethics

Autonomous AI Cybersecurity Defense

The paradigm shift where AI systems autonomously discover, verify, and help patch software vulnerabilities faster than human researchers and threat actors—finally tilting the attacker-defender balance toward defense.

What is Bias in Machine Learning?
Beginner
Safety & Ethics

Bias in Machine Learning

Bias in ML refers to systematic errors from data, algorithms, or deployment that cause models to produce unfair or discriminatory results.

What is DeceptGuard?
Advanced
Safety & Ethics

DeceptGuard

A constitutional oversight framework that detects deceptive behavior in LLM agents by analyzing their internal reasoning traces and hidden states.

What is Explainability & Interpretability in AI?
Intermediate
Safety & Ethics

Explainability & Interpretability in AI

Explainability and interpretability address the AI black-box problem: understanding why models make specific decisions, using techniques like SHAP, LIME, and Chain-of-Thought.

What is Human-in-the-Loop (HITL)?
Beginner
Safety & Ethics

Human-in-the-Loop (HITL)

Human-in-the-Loop integrates human judgment into AI workflows for validation, correction, and feedback — essential for high-stakes AI applications.

What is ILION?
Advanced
Safety & Ethics

ILION

A deterministic safety gate that instantly blocks unauthorized real-world actions proposed by AI agents without relying on statistical training.

What is JobBench?
Intermediate
Safety & Ethics

JobBench

An AI agent benchmark testing 130 real enterprise workflows that humans actually want to delegate, revealing that frontier models score below 50% on tasks like meeting scheduling and report generation.

What is Magnifica Humanitas?
Intermediate
Safety & Ethics

Magnifica Humanitas

Pope Leo XIV's 150-page encyclical on AI ethics, calling for the disarmament of AI from tech monopolies, democratic oversight, and grounding AI policy in human dignity and theological anthropology.

What is Project Glasswing?
Advanced
Safety & Ethics

Project Glasswing

Anthropic's AI-powered security initiative that uses Claude to autonomously discover and verify tens of thousands of critical vulnerabilities in global software infrastructure faster than threat actors can exploit them.

What is Responsible AI?
Beginner
Safety & Ethics

Responsible AI

Responsible AI is the practice of building and deploying AI systems that are fair, transparent, accountable, safe, and beneficial to society.

What is a Model Card?
Intermediate
Safety & Ethics

Model Card

A model card is standardized AI model documentation covering intended use, performance, limitations, training data, and ethical considerations — a transparency label for AI.

BVDNETBVDNET

Web development and AI automation. Done properly.

Company

  • About
  • Contact
  • FAQ

Resources

  • Services
  • Work
  • Library
  • Blog
  • Pricing

Connect

  • LinkedIn
  • Email

© 2026 BVDNET. All rights reserved.

Privacy Policy•Terms of Service•Cookie Policy