Overview
At DAIS 2026, Databricks announced three major capabilities to accelerate machine learning workflows: Genie Code as an AI agent for ML engineering, AI Runtime as a serverless GPU training platform, and substantial improvements to real-time ML infrastructure with Feature Store and high-capacity Model Serving.
The core message: generic coding agents cannot make the nuanced decisions required for production ML — data quality, feature lineage, business metrics impact. Genie Code is specifically designed to understand that context.
Genie Code for Machine Learning
Genie Code is an AI coding agent integrated throughout the complete ML lifecycle: from experimentation to production deployment. It integrates with Unity Catalog for data context and governance understanding, with native connections to Feature Store, Model Serving, Serverless Compute, and AI Runtime.
MLflow integration provides complete lifecycle management: feature engineering, experimentation, deployment, monitoring, and drift detection. The Danfoss case is notable: they built a complete ML pipeline in 90 minutes from raw data to a governed, production-ready deployment.
Bosch uses Genie Code for managing parallel thread workloads. FactSet reduced model training time from days to hours with simplified infrastructure setup.
AI Runtime (Public Preview)
AI Runtime is a serverless GPU training platform eliminating infrastructure complexity for deep learning workloads. Requires only 2-3 clicks to configure, with on-demand NVIDIA A10 and H100 GPUs on pay-per-use pricing without idle time commitments.
Supports multinode training for high-performance distributed work, RDMA, and optimized data loading for maximum GPU utilization. Integrates with Lakeflow Jobs and DABs for orchestration, with built-in MLflow experiment tracking and Unity Catalog governance.
The infrastructure is the same that Databricks used internally to train foundation models like DBRX and KARL — now available to all customers.
Real-Time ML
Enhanced Feature Store: Declarative feature definition for automatic training/serving materialization. Streaming features for real-time responses to customer activity. Online feature serving via Lakebase with low-latency access.
Model Serving improvements: High-QPS inference engine supporting 300K+ queries per second. Sub-10 millisecond p99 latency overhead. Automatic adaptation to model types and traffic patterns — no manual tuning required. Support for both CPU and GPU model serving.
Customer results are compelling: up to 90%+ infrastructure cost reductions versus self-managed systems, doubled latency improvements, and scaling beyond 100K QPS with minimal maintenance overhead. Grammarly and GoGuardian are reference implementations for high-QPS production serving.
Operational intelligence: Genie Code-assisted inference table querying, automated performance debugging on serving endpoints, and root-cause analysis for production alerts.
Key Points
- Genie Code understands data quality, feature lineage, and business metrics impact
- Danfoss: complete ML pipeline in 90 minutes with Genie Code
- AI Runtime: on-demand NVIDIA A10 and H100 GPUs serverless in 2-3 clicks
- Multinode training with RDMA and optimized data loading
- Feature Store with declarative features and streaming for real-time ML
- Model Serving: 300K+ QPS, sub-10ms p99 latency
- Up to 90%+ infrastructure cost reduction vs. self-managed systems
- Genie Code integrated for operational analysis of production endpoints
- Same infrastructure used to train DBRX and KARL, now available to customers
Why It Matters
Production ML has a persistent problem: the gap between prototype and production system. A data scientist can train a model in days, but getting it to production with governance, monitoring, scalable serving, and a retraining pipeline can take weeks or months.
AI Runtime closes the training gap: instead of waiting weeks for the infrastructure team to provision GPUs, ML teams can access them in minutes. Genie Code closes the engineering gap: instead of the data scientist having to learn their company’s specific MLOps best practices, the agent knows them and applies them automatically.
The serverless model for AI Runtime is particularly compelling: pay only for what you use, no instance commitments, making it viable to experiment with large models that previously required expensive dedicated infrastructure — democratizing deep learning capabilities beyond the largest tech companies.