When your AI has to make decisions in the real world, the data you don’t have can hurt people.
DiffuseDrive CEO Balint Pasztor joins IT Visionaries to unpack the $124B data scarcity problem holding back autonomous systems — and how synthetic data (done right) can compress years of data collection into hours.
We dig into: why edge cases matter (and how to safely create them), closing the sim-to-real gap with diffusion models, preventing model drift, and building a data engine for physical AI across defense, robotics, and industrial automation. If you care about self-driving cars, drones, QA on the factory floor—or just shipping AI that survives the messiness of reality—this one’s for you.
Key Moments:
- 00:00 Introduction to Autonomous Driving Challenges
- 00:26 Meet Balint Pasztor and Diffuse Drive
- 01:14 The Importance of Synthetic Data
- 06:39 The Role of Synthetic Data in AI Training
- 18:07 Understanding Diffusion Models
- 23:28 Challenges in Real-World Data Collection
- 26:46 Three Steps to Improve AI Performance
- 32:13 Overcoming Non-Obvious Data Challenges
- 36:00 Balancing Data Quantity and Quality
- 40:28 Future of Autonomous Systems and Physical AI

.jpeg)