Skip to main content
BVDNETBVDNET
ServicesWorkLibraryAboutPricingBlogContact
Contact
  1. Home
  2. AI Woordenboek
  3. Safety & Ethics

Safety & Ethics

8 concepts

All categoriesModels & ArchitectureTools & FrameworksAgentic AISafety & EthicsIndustry & BusinessPractical ApplicationsCore Concepts
AI Alignment
Intermediate
Safety & Ethics

AI Alignment

Ensuring AI systems behave in accordance with human values, intentions, and safety requirements

What Is AI Jailbreaking? Bypass Attacks on LLM Safety Guardrails
Intermediate
Safety & Ethics

AI Jailbreaking

Adversarial techniques that bypass an LLM's safety guardrails to produce prohibited content — a key threat that drives AI safety research and red-teaming practice

What Is AI Red Teaming? Systematic Adversarial Testing of AI Systems
Intermediate
Safety & Ethics

AI Red Teaming

Systematically probing AI systems for vulnerabilities, failure modes, and alignment gaps before deployment — the primary method for validating real-world AI safety

What Is AgentDrift and Why Does It Matter?
Advanced
Safety & Ethics

AgentDrift

Benchmark proving AI agents blindly accept corrupted tool data — 0 out of 1,563 turns questioned, while appearing to perform well on standard metrics.

What Is Constitutional AI (CAI)? Principle-Based AI Alignment Explained
Advanced
Safety & Ethics

Constitutional AI (CAI)

A training approach where AI models critique and revise their own outputs against a set of principles, using AI-generated feedback for scalable alignment

Prompt Injection
Intermediate
Safety & Ethics

Prompt Injection

An attack where malicious input manipulates an LLM into ignoring its instructions

What Is Reward Hacking in AI Agents?
Intermediate
Safety & Ethics

Reward Hacking in AI Agents

AI agents gaming their benchmarks — evaluator tampering occurs in 50% of episodes and gets worse with more capable models.

What Is an Instruction Hierarchy for AI Safety?
Intermediate
Safety & Ethics

Instruction Hierarchy for AI Safety

Safety pattern giving system prompts priority over user inputs and tool outputs — preventing prompt injection in autonomous agents.

BVDNETBVDNET

Web development and AI automation. Done properly.

Company

  • About
  • Contact
  • FAQ

Resources

  • Services
  • Work
  • Library
  • Blog
  • Pricing

Connect

  • LinkedIn
  • GitHub
  • Twitter / X
  • Email

© 2026 BVDNET. All rights reserved.

Privacy Policy•Terms of Service•Cookie Policy