Overview
ClearML is an MLOps platform that provides tools for experiment tracking, pipeline orchestration, data management, and model serving. Developed by Allegro AI and founded in 2017, the platform aims to support the entire machine learning lifecycle from research and development through to production deployment. Its architecture is designed to be modular, allowing users to adopt specific components as needed or utilize the full suite for integrated MLOps workflows. ClearML offers both a self-hosted community edition and a cloud-based service, catering to various organizational requirements and scales.
The platform's core offering, ClearML Experiment Management, enables developers to log, compare, and reproduce machine learning experiments. This includes tracking hyperparameters, metrics, code versions, and environment configurations. This level of detail is critical for ensuring reproducibility and facilitating collaboration among data scientists and ML engineers, as highlighted by industry analysis on MLOps best practices from sources like a16z.com's commentary on ML infrastructure. ClearML Data further enhances this by providing a versioning system for datasets, linking data lineage directly to specific experiments and models. This ensures that models are trained on tracked and reproducible data, addressing common challenges in model governance and data drift.
For automating ML workflows, ClearML MLOps Platform includes capabilities for defining and executing ML pipelines. These pipelines can encompass data preprocessing, model training, evaluation, and deployment steps, abstracting away underlying infrastructure complexities. It supports various execution environments, from local machines to distributed cloud compute resources on providers like AWS or Google Cloud. Once models are trained, ClearML Serving facilitates their deployment into production environments, offering options for managing endpoints, monitoring performance, and performing A/B testing or canary rollouts. The Python SDK is the primary interface for interacting with the ClearML platform, allowing developers to integrate its functionalities directly into their existing ML codebases. This approach positions ClearML as a comprehensive solution for organizations seeking to operationalize machine learning at scale, from initial model development to continuous integration and delivery (CI/CD) of ML systems.
Key features
- Experiment Tracking and Management: Automatically logs code, hyperparameters, metrics, and models for every experiment, providing a centralized dashboard for comparison and reproduction.
- ML Pipeline Orchestration: Defines and automates complex machine learning workflows, including data preparation, model training, and evaluation, across various compute environments.
- Data Versioning for ML: Tracks and versions datasets used in experiments, ensuring data lineage and reproducibility of model training runs.
- Model Serving and Deployment: Facilitates deploying trained models as API endpoints, with features for monitoring, scaling, and managing different model versions in production.
- Resource Management: Manages and allocates compute resources for ML tasks, supporting both on-premises and cloud infrastructures.
- Collaboration Tools: Provides shared dashboards, experiment comparisons, and project management features to enhance team collaboration on ML projects.
- Flexible Deployment Options: Available as a self-hosted community edition or a cloud service, offering deployment flexibility.
Pricing
ClearML offers a free self-hosted community edition and a tiered cloud pricing model. The cloud service includes different plans based on usage and features.
| Tier | Features | Price/Month |
|---|---|---|
| ClearML Community (Self-hosted) | Full MLOps suite, self-managed infrastructure, open-source. | Free |
| ClearML Cloud - Starter | Basic experiment tracking, pipeline orchestration, limited team access. | $99 |
| ClearML Cloud - Pro | Advanced MLOps features, enhanced security, increased collaboration. | Custom |
| ClearML Cloud - Enterprise | Dedicated support, advanced compliance, custom integrations, white-glove service. | Custom |
For detailed information on specific feature inclusions and current pricing, refer to the official ClearML pricing page.
Common integrations
- TensorFlow/Keras: Automatic logging of experiments, models, and metrics from TensorFlow and Keras training runs. See ClearML TensorFlow Keras integration guide.
- PyTorch: Seamless integration for tracking PyTorch model training, including logging gradients, weights, and losses. Consult the ClearML PyTorch integration documentation.
- Scikit-learn: Record and compare machine learning experiments conducted with Scikit-learn, including model parameters and evaluation metrics. Detailed instructions are available in the ClearML Scikit-learn integration documentation.
- Hugging Face Transformers: Directly track training runs and model artifacts when using Hugging Face's Transformers library. Find more information on the ClearML Hugging Face integration page.
- Jupyter Notebooks/Labs: Automatic integration with interactive development environments for easy experiment logging. See ClearML Jupyter integration documentation.
- AWS SageMaker: Orchestrate and track ML experiments and pipelines executed on AWS SageMaker. Review the ClearML AWS SageMaker integration guide.
Alternatives
- MLflow: An open-source platform for managing the ML lifecycle, focusing on experiment tracking, reproducible runs, and model deployment.
- Weights & Biases: A proprietary platform offering experiment tracking, model optimization, and collaboration tools for machine learning.
- Comet ML: A meta machine learning platform for tracking, comparing, debugging, and optimizing model development.
- Databricks MLflow: The managed version of MLflow within the Databricks Lakehouse Platform, providing integrated ML development and operations.
- Google Cloud Vertex AI: A unified platform for building, deploying, and scaling ML models, integrating various Google Cloud ML services.
Getting started
To begin using ClearML for experiment tracking, install the Python SDK and initialize a project. The following Python example demonstrates how to log a simple scikit-learn model training run.
from clearml import Task
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import json
# Initialize a ClearML Task
task = Task.init(project_name='Iris Classification Demo',
task_name='RandomForest Classifier Training',
output_uri=True, # Automatically uploads artifacts
auto_connect_frameworks=True) # Automatically connects to frameworks like scikit-learn
# Get the task logger
logger = task.logger
# Simulate hyperparameters
hyperparameters = {
'n_estimators': 100,
'max_depth': 10,
'random_state': 42
}
# Log hyperparameters to ClearML
task.connect(hyperparameters)
# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=hyperparameters['random_state'])
# Initialize and train a RandomForestClassifier
model = RandomForestClassifier(n_estimators=hyperparameters['n_estimators'],
max_depth=hyperparameters['max_depth'],
random_state=hyperparameters['random_state'])
model.fit(X_train, y_train)
# Make predictions and evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
# Log metrics to ClearML
logger.report_scalar(title='Accuracy', series='Test Accuracy', value=accuracy, iteration=0)
# Log the trained model as an artifact
task.upload_artifact('trained_model', artifact_object=model)
# Log additional data, e.g., confusion matrix as JSON
# A more complete example would calculate and log a proper confusion matrix
fake_confusion_matrix = [[10, 0, 0], [0, 9, 1], [0, 1, 9]]
logger.report_text(name='Confusion Matrix', text=json.dumps(fake_confusion_matrix))
print(f"Model training complete. Test Accuracy: {accuracy:.2f}")
# The task will automatically be marked as completed when the script finishes.
To run this code, ensure you have the clearml and scikit-learn libraries installed (pip install clearml scikit-learn). Before running, set up your ClearML server connection details, either by configuring environment variables (CLEARML_API_HOST, CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY) or by creating a clearml.conf file. Upon execution, this script will create a new experiment in your ClearML project, logging the hyperparameters, the calculated accuracy, and the trained model artifact.