What happens when AI stops making mistakes… and starts misleading you?
This discussion dives into one of the most important — and least understood — frontiers in artificial intelligence: AI deception.
We explore how AI systems evolve from simple hallucinations (unintended errors) to deceptive behaviors — where models selectively distort truth to achieve goals or please human feedback loops. We unpack the coding incentives, enterprise risks, and governance challenges that make this issue critical for every executive leading AI transformation.
Key Moments:
- 00:00 What is AI Deception and Why It Matters
- 3:43 Emergent Behaviors: From Hallucinations to Alignment to Deception
- 4:40 Defining AI Deception
- 6:15 Does AI Have a Moral Compass?
- 7:20 Why AI Lies: Incentives to “Be Helpful” and Avoid Retraining
- 15:12 Is Deception Built into LLMs? (And Can It Ever Be Solved?)
- 18:00 Non-Human Intelligence Patterns: Hallucinations or Something Else?
- 19:37 Enterprise Impact: What Business Leaders Need to Know
- 27:00 Measuring Model Reliability: Can We Quantify AI Quality?
- 34:00 Final Thoughts: The Future of Trustworthy AI
Mentions:
- Scientists at OpenAI and Apollo Research showed in a paper that AI models lie and deceive: https://www.youtube.com/shorts/XuxVSPwW8I8
- TIME: New Tests Reveal AI’s Capacity for Deception
- OpenAI: Detecting and reducing scheming in AI models
- StartupHub: OpenAI and Apollo Research Reveal AI Models Are Learning to Deceive: New Detection Methods Show Promise
- Marcus Weller
- Hugging Face
Watch next: https://www.youtube.com/watch?v=plwN5XvlKMg&t=1s

