Is Determined AI open source?

Yes, Determined AI offers an open-source community edition. An enterprise version with additional features and support is also available.

Which deep learning frameworks does Determined AI support?

Determined AI integrates with popular deep learning frameworks including TensorFlow and PyTorch.

Who owns Determined AI?

Determined AI was acquired by Hewlett Packard Enterprise (HPE) in 2021.

How does Determined AI handle hyperparameter optimization?

Determined AI includes built-in algorithms and strategies for hyperparameter optimization, allowing users to systematically search for optimal model configurations.

What are the core products of Determined AI?

The core product is the Determined AI Platform, available as an open-source community edition and an enterprise offering.

Does Determined AI provide an API?

Yes, Determined AI provides a REST API and a Python SDK for programmatic interaction with the platform.

Determined AI — Distributed Deep Learning Training Platform

Q: What is Determined AI used for?

Determined AI is used for distributed deep learning training, hyperparameter optimization, experiment tracking, and resource management to accelerate and scale the development of machine learning models.

Determined AI is an open-source deep learning training platform designed to streamline the development and deployment of machine learning models. It provides tools for distributed training, hyperparameter optimization, experiment tracking, and resource management, aiming to improve the efficiency and scalability of deep learning workflows for MLOps.

Overview

Determined AI is an open-source deep learning training platform that provides tools for managing the entire lifecycle of deep learning experimentation and model development. Acquired by HPE in 2021, the platform focuses on addressing challenges associated with scaling deep learning workloads, including distributed training, hyperparameter optimization, and resource allocation. Its architecture is designed to support both on-premises and cloud-based GPU clusters, allowing machine learning teams to manage compute resources efficiently.

The platform is particularly suited for organizations engaged in computationally intensive deep learning research and development. It offers capabilities to automate the process of distributing training jobs across multiple GPUs or nodes, which can reduce training times for large models and datasets. This is achieved through integrated support for common deep learning frameworks such as TensorFlow and PyTorch, abstracting away some of the complexities of distributed computing environments Determined AI distributed training documentation.

Beyond distributed training, Determined AI includes features for experiment tracking and management. This allows developers to log metrics, model checkpoints, and configuration parameters for each training run, facilitating reproducibility and comparison of different model iterations. Hyperparameter optimization is another core offering, providing algorithms and strategies to systematically search for optimal model configurations, which can improve model performance without extensive manual tuning. The platform also includes a web UI for monitoring experiments, managing resources, and visualizing training progress.

For developers, Determined AI provides a Python SDK and a command-line interface (CLI) for defining and interacting with experiments. This developer experience is designed to integrate into existing MLOps workflows, providing programmatic control over training jobs and access to experiment results. The open-source nature of the community edition allows for flexibility and customization, while the enterprise offering provides additional features and support for production environments Determined AI homepage.

As organizations increasingly adopt large language models (LLMs) and other complex neural networks, the need for platforms that can manage substantial computational demands becomes more pronounced. Determined AI positions itself to meet these requirements by offering a scalable infrastructure for deep learning, aligning with industry trends towards MLOps platforms that support the full lifecycle of AI development Thoughtworks article on MLOps platforms.

Key features

Distributed Deep Learning Training: Automates the distribution of training jobs across multiple GPUs and nodes, supporting frameworks like TensorFlow and PyTorch for faster model training.
Hyperparameter Optimization: Provides built-in algorithms (e.g., ASHA, PBT) to systematically search for optimal hyperparameters, reducing manual effort and improving model performance.
Experiment Tracking and Management: Logs and organizes all aspects of deep learning experiments, including metrics, model checkpoints, and configuration, for reproducibility and comparison.
Resource Management: Manages GPU and CPU resources across a cluster, enabling efficient scheduling and utilization for multiple users and workloads.
Model Versioning and Checkpointing: Automatically saves model states and allows for easy rollback or continuation of training from specific checkpoints.
Web UI and CLI: Offers a web-based interface for monitoring experiments, managing resources, and visualizing results, alongside a command-line interface for programmatic control.
Python SDK: Provides a Python library for defining experiments, submitting jobs, and interacting with the Determined AI platform.

Pricing

As of May 2026, Determined AI offers an open-source community edition and custom enterprise pricing. The enterprise offering typically includes advanced features, dedicated support, and additional compliance options for organizations requiring enhanced capabilities for production environments.

Edition	Description	Key Features	Pricing Model
Community Edition	Open-source version of the Determined AI platform.	Distributed training, hyperparameter optimization, experiment tracking, resource management.	Free (self-supported)
Enterprise Edition	Commercial offering with additional features and support.	Community Edition features plus enhanced security, scalability, support, and enterprise integrations.	Custom enterprise pricing Determined AI contact sales

Common integrations

Deep Learning Frameworks: Integrates directly with TensorFlow and PyTorch for defining and executing training jobs Determined AI training overview.
Containerization: Leverages Docker for packaging environments and dependencies Determined AI Docker reference.
Cloud Providers: Supports deployment on major cloud platforms such as AWS, Google Cloud, and Azure for scalable compute resources.
Kubernetes: Can be deployed on Kubernetes clusters for orchestration and resource management.

Alternatives

MLflow: An open-source platform for managing the end-to-end machine learning lifecycle, including experiment tracking, reproducible runs, and model deployment MLflow homepage.
Weights & Biases: A proprietary MLOps platform that provides tools for experiment tracking, model visualization, and collaboration for deep learning projects Weights & Biases homepage.
Kubeflow: An open-source project dedicated to making deployments of machine learning workflows on Kubernetes simple, portable, and scalable Kubeflow homepage.

Getting started

To begin using Determined AI, you typically install the Determined CLI and then configure an experiment. Here's a basic example of defining an experiment for a simple PyTorch model and submitting it to a Determined AI cluster:

# experiment.yaml
name: my_first_experiment
project: default
model_definition:
  model_def_dir: .
  model_config:
    learning_rate: 0.001
    batch_size: 64
entrypoint: model_def:MyModel

hp_search:
  metric: validation_loss
  smaller_is_better: true
  num_trials: 1
  hyperparameters:
    learning_rate:
      type: log
      minval: 0.0001
      maxval: 0.01

searcher:
  name: single_trial
  max_length:
    batches: 100

resources:
  slots_per_trial: 1

# model_def.py
from determined.pytorch import PyTorchTrial, PyTorchTrialContext
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

class MyModel(PyTorchTrial):
    def __init__(self, context: PyTorchTrialContext):
        self.context = context
        self.model = nn.Linear(784, 10)
        self.optimizer = optim.Adam(self.model.parameters(), lr=self.context.get_hparam("learning_rate"))

    def build_training_data_loader(self):
        transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
        train_dataset = datasets.MNIST(
            "./data", train=True, download=True, transform=transform
        )
        return self.context.get_data_loader(train_dataset, batch_size=self.context.get_hparam("batch_size"))

    def build_validation_data_loader(self):
        transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
        val_dataset = datasets.MNIST(
            "./data", train=False, download=True, transform=transform
        )
        return self.context.get_data_loader(val_dataset, batch_size=self.context.get_hparam("batch_size"))

    def train_batch(self, batch, epoch_idx):
        data, target = batch
        data = data.view(-1, 784)
        output = self.model(data)
        loss = nn.functional.cross_entropy(output, target)
        self.context.backward(loss)
        self.context.step_optimizer(self.optimizer)
        return {"loss": loss.item()}

    def evaluate_batch(self, batch, epoch_idx):
        data, target = batch
        data = data.view(-1, 784)
        output = self.model(data)
        loss = nn.functional.cross_entropy(output, target, reduction='sum').item()
        pred = output.argmax(dim=1, keepdim=True)
        correct = pred.eq(target.view_as(pred)).sum().item()
        return {"validation_loss": loss, "validation_correct": correct, "validation_total": len(data)}

    def build_callbacks(self):
        return []

# To run this experiment, save the above to `experiment.yaml` and `model_def.py` in the same directory.
# Then, from your terminal, assuming a Determined AI cluster is running and `det` CLI is configured:
# det experiment create experiment.yaml .

This example defines a simple MNIST classification model using PyTorch. The experiment.yaml specifies the model definition directory, hyperparameters, and searcher configuration. The model_def.py contains the actual PyTorch model and defines how to build data loaders, train, and evaluate batches within the Determined AI framework. The det experiment create command submits this configuration to the Determined AI cluster for execution.

Determined AI

Overview

Key features

Pricing

Common integrations

Alternatives

Getting started

Frequently asked questions.

What is Determined AI used for?

Is Determined AI open source?

Which deep learning frameworks does Determined AI support?

Who owns Determined AI?

How does Determined AI handle hyperparameter optimization?

What are the core products of Determined AI?

Does Determined AI provide an API?

Reader reviews.

Letters.

Overview

Key features

Pricing

Common integrations

Alternatives

Getting started

Related —

Frequently asked questions.

What is Determined AI used for?

Is Determined AI open source?

Which deep learning frameworks does Determined AI support?

Who owns Determined AI?

How does Determined AI handle hyperparameter optimization?

What are the core products of Determined AI?

Does Determined AI provide an API?

Reader reviews.

Letters.