Overview

Databricks Mosaic AI is a suite of capabilities within the Databricks Lakehouse Platform, designed to support the development, deployment, and management of generative AI applications and large language models (LLMs). It targets developers and technical buyers involved in the full machine learning lifecycle, from data ingestion and preparation to model serving and monitoring. The platform is engineered to facilitate the fine-tuning of open-source LLMs and the creation of Retrieval Augmented Generation (RAG) applications.

The core proposition of Mosaic AI is its integration with the broader Databricks Lakehouse architecture, which unifies data warehousing and data lakes. This integration aims to provide a single platform for data management, machine learning, and AI workloads. Components like Unity Catalog, for instance, offer unified governance across data, analytics, and AI assets, including structured data, unstructured data, and machine learning models [Unity Catalog documentation]. This approach seeks to address challenges associated with data silos and fragmented MLOps toolchains.

Mosaic AI supports various stages of AI development. For model training and fine-tuning, it provides environments compatible with popular machine learning frameworks and integrates with MLflow for experiment tracking and model lifecycle management [Mosaic AI product page]. For deployment, it offers real-time model serving capabilities, enabling models to be exposed as API endpoints. The platform also includes tools for building AI agents and for managing vector databases, which are critical for RAG applications that require semantic search over large datasets.

The platform is positioned for organizations that require a managed environment for their AI initiatives, particularly those dealing with large volumes of data and complex machine learning pipelines. It emphasizes enterprise readiness through features like compliance certifications (e.g., SOC 2 Type II, GDPR, HIPAA) and integration with existing enterprise data strategies. The developer experience is primarily Python-centric, leveraging familiar tools like Jupyter notebooks and the MLflow ecosystem, while also supporting other languages like Scala, Java, and R through its SDKs [Databricks ML documentation]. This allows for a consistent workflow from data engineering to model deployment within a unified environment.

Key features

  • Mosaic AI Model Serving: Provides capabilities for deploying and managing machine learning models, including LLMs, as scalable API endpoints for real-time inference.
  • Mosaic AI Vector Search: Offers a managed vector database service designed to store and query embeddings, facilitating semantic search and Retrieval Augmented Generation (RAG) applications.
  • Mosaic AI Playground: A web-based interface for experimenting with and evaluating LLMs, including custom fine-tuned models, to test prompts and responses.
  • Mosaic AI Agent Framework: Tools and libraries for building AI agents that can interact with external systems, perform tasks, and integrate with enterprise data sources.
  • Mosaic AI Gateway: A centralized interface for managing and routing requests to various LLMs, including third-party models and internal fine-tuned models, with features for access control and monitoring.
  • Unity Catalog: A unified governance solution for data, analytics, and AI assets across the Databricks Lakehouse Platform, providing centralized access control, auditing, and lineage [Unity Catalog overview].
  • MLflow Integration: Deep integration with MLflow for tracking experiments, managing models, and deploying machine learning workflows across the lifecycle.
  • Fine-tuning LLMs: Tools and infrastructure to adapt and fine-tune open-source large language models with proprietary data for specific use cases.
  • Data Preparation and Engineering: Leverages the Databricks Lakehouse Platform for scalable data ingestion, transformation, and feature engineering to support AI model development.

Pricing

Databricks Mosaic AI operates under a custom enterprise pricing model. Costs are typically based on consumption of compute resources (DBUs - Databricks Units) and cloud infrastructure usage. Specific pricing details are generally determined through direct consultation with Databricks sales, tailored to an organization's specific usage patterns and requirements. A free tier, Databricks Community Edition, is available for educational and personal use, offering limited compute and storage resources [Databricks Pricing Page].

Service Component Pricing Model (as of 2026-06-11) Notes
Databricks Units (DBUs) Consumption-based Primary unit of compute consumption for workloads like notebooks, jobs, and model serving. DBU rates vary by cloud provider, region, and workload type.
Cloud Infrastructure Cloud provider rates Charges for underlying cloud resources (e.g., EC2, S3, Azure VMs, Azure Blob Storage, GCP Compute Engine, Cloud Storage) managed by Databricks.
Unity Catalog Included with Databricks Platform usage Governance features are integrated; specific usage may incur DBU or storage costs.
Mosaic AI Model Serving DBU consumption for inference Costs based on compute used for serving models.
Mosaic AI Vector Search Consumption-based (storage and queries) Charges for storing vector embeddings and processing search queries.

Common integrations

  • Cloud Providers: Deep integration with AWS, Azure, and Google Cloud for underlying infrastructure and services [Databricks Getting Started].
  • MLflow: Native integration for experiment tracking, model registry, and MLOps workflows [MLflow LLM documentation].
  • Hugging Face: Support for integrating and fine-tuning models from the Hugging Face ecosystem [Hugging Face Transformers documentation].
  • LangChain: Compatibility with LangChain for building LLM-powered applications and agents [LangChain Databricks integration].
  • Apache Spark: Built on Apache Spark for large-scale data processing and analytics.
  • Delta Lake: Utilizes Delta Lake for reliable and scalable data lake storage.

Alternatives

  • AWS SageMaker: A cloud-based machine learning service from Amazon Web Services that provides tools for building, training, and deploying ML models.
  • Google Cloud Vertex AI: Google Cloud's unified platform for machine learning development, offering tools for data preparation, model training, and deployment.
  • Azure Machine Learning: Microsoft Azure's cloud service for accelerating the building and deployment of machine learning models.
  • H2O.ai: An open-source and commercial AI platform offering automated machine learning (AutoML) and deep learning capabilities.
  • DataRobot: An enterprise AI platform that provides automated machine learning, MLOps, and decision intelligence capabilities.

Getting started

To begin using Databricks Mosaic AI, you typically start by setting up a Databricks workspace and then leveraging Python notebooks for model development and deployment. The following example demonstrates how to load a pre-trained model and use it for inference within a Databricks notebook environment.

# This example assumes you have a Databricks workspace configured
# and necessary libraries (e.g., transformers, torch) installed in your cluster.

# Import necessary libraries
import torch
from transformers import pipeline

# Initialize a text generation pipeline using a pre-trained LLM
# For demonstration, we'll use a small model. For production, consider larger models
# or fine-tuned custom models served via Mosaic AI Model Serving.

# Ensure you have access to the model or it's available from Hugging Face Hub
# If using a custom fine-tuned model, you would load it from MLflow Model Registry

generator = pipeline(
    "text-generation", 
    model="distilgpt2", 
    torch_dtype=torch.bfloat16
)

# Define a prompt for text generation
prompt = "The quick brown fox jumps over the lazy "

# Generate text
print("Generating text...")
results = generator(prompt, max_new_tokens=50, num_return_sequences=1)

# Print the generated text
for res in results:
    print(res["generated_text"])

# Example of logging a model with MLflow (conceptual, requires MLflow setup)
# from mlflow.pyfunc import PythonModel
# import mlflow

# class MyLLMModel(PythonModel):
#     def load_context(self, context):
#         self.generator = pipeline(
#             "text-generation", 
#             model="distilgpt2", 
#             torch_dtype=torch.bfloat16
#         )

#     def predict(self, context, model_input):
#         return self.generator(model_input["prompt"].tolist(), max_new_tokens=50)

# with mlflow.start_run() as run:
#     mlflow.pyfunc.log_model(
#         "my_llm_model", 
#         python_model=MyLLMModel(),
#         artifacts={},
#         registered_model_name="DistilGPT2_Generator"
#     )
#     print(f"Model logged with run_id: {run.info.run_id}")