Welcome to OctoML’s documentation!

The OctoML Platform is a machine learning model optimization and deployment service powered by octoml.ai. This documentation provides an overview of how to use the OctoML Platform via the web interface as well as the Python SDK. To use our programmatic tools, you must first create an access token on the Settings tab of the Account page in the OctoML UI.

Example script to prepare your model for deployment

Here is a simple Python program to create a project, upload a statically shaped ONNX model to the OctoML Platform, optimize it, and download the optimized Python package:

from octomizer import client, project, workflow
from octomizer.models import onnx_model

# Pass your API token below:
client = client.OctomizerClient(access_token=MY_ACCESS_TOKEN)

# Create a Project.
my_project = project.Project(client, name="My Project")

# Specify model file and input layer parameters.
model_file = "mnist.onnx"

# Upload the model to OctoML.
model = onnx_model.ONNXModel(

# Optimize the model. By default, the resulting package will be a Python wheel.
wrkflow = model.get_uploaded_model_variant().accelerate(platform="broadwell")

# Save the workflow uuid somewhere so you can use it to access benchmark metrics or the resulting package later.

After you receive an email notification about the completion of the acceleration workflow, you can view performance benchmark metrics on the hardware you chose and download a packaged version of the accelerated model, either by visiting the UI or invoking the following code:

# Look up the workflow you previously launched using the workflow id
wrkflow = client.get_workflow("<INSERT WORKFLOW ID>")

# Save the resulting Python wheel to the current directory.

# To view benchmark metrics, either visit the UI or invoke something similar to:
if wrkflow.completed() and wrkflow.has_benchmark_stage():
   engine = wrkflow.proto.benchmark_stage_spec.engine
   metrics = wrkflow.metrics()
   print(engine, metrics)
   # Save the resulting Python wheel

To specify saving another package type, one may instead execute something similar to:

from octomizer.package_type import PackageType
wrkflow.save_package(".", package_type=PackageType.LINUX_SHARED_OBJECT)

See available hardware targets

To see a list of available hardware platforms on which you can package and accelerate your model, you may execute:


The command above outputs a list of objects like the following:

[display_name: "Intel Cascade Lake (AWS c5.12xlarge)"
available_by_default: true
platform: "aws_c5.12xlarge"
vendor_name: "Intel"
vcpu_count: 48
hardware_provider_name: "AWS"
hardware_provider_target_name: "c5.12xlarge"
architecture_name: "Cascade Lake"
supported_model_runtimes: TVM
supported_model_runtimes: ONNXRUNTIME
supported_model_runtimes: TENSORFLOW_RUNTIME
supported_model_runtimes: TFLITE_RUNTIME
, ...]

When preparing a model for deployment, you will need to refer to the “platform” string specified in each relevant hardware object listed above (e.g. “aws_c5.12xlarge”).

Please contact us if you have any questions or problems.