Tutorials

Accelerate a public Keras SavedModel

First make sure you have TensorFlow 2.6 installed and the latest version of OctoML’s SDK:

! pip install tensorflow==2.6
! python3 -m pip install octomizer-sdk --extra-index-url https://octo.jfrog.io/artifactory/api/pypi/pypi-local/simple --upgrade

Import libraries:

from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2

import octomizer.client
import octomizer.models.tensorflow_saved_model as tf_model

Now fetch a public Keras SavedModel from Keras’s official model zoo:

model = MobileNetV2(weights='imagenet')

# Calling `save('my_model')` creates a SavedModel folder `my_model`.
model.save("my_model")

# Create a tarball for the model
! tar -czvf my_model.tgz my_model

Upload the Keras SavedModel to OctoML’s Platform:

# Pass your API token below:
client = octomizer.client.OctomizerClient(access_token=MY_ACCESS_TOKEN)

# Upload the model to Octomizer.
model = tf_model.TensorFlowSavedModel(client, name="my_model", model="my_model.tgz")

This is a model with dynamic inputs, so you’ll need to specify the input shapes before acceleration. If not, you’ll get an error message saying “Dynamic inputs are not supported.”:

# Check the automatically inferred shapes.
inputs = model.get_uploaded_model_variant().inputs
print(inputs)
input_name = list(inputs[0].keys())[0]

# The command above returns a string starting with something like 'input_0:0': [-1, 224, 224, 3].
# The -1 in the first dimension means that you need to specify the batch size.

input_shapes = {input_name: [1, 224, 224, 3]} # Notice the -1 has been replaced by 1.
input_dtypes = inputs[1]

Now accelerate the model. By default, the resulting package will be a Python wheel:

wrkflow = model.get_uploaded_model_variant().octomize(platform="broadwell",
input_shapes=input_shapes, input_dtypes=input_dtypes)

# Save the workflow uuid somewhere so you can use it to access benchmark metrics or the resulting package later.
print(wrkflow.uuid)

After you receive an email notification about the completion of the acceleration workflow, you can view performance benchmark metrics on the hardware you chose and download a packaged version of the accelerated model, either by visiting the UI or invoking the following code:

# Look up the workflow you previously launched using the workflow id
wrkflow = client.get_workflow("<INSERT WORKFLOW ID>")

# To view benchmark metrics, either visit the UI or invoke something similar to:
if wrkflow.completed() and wrkflow.has_benchmark_stage():
    engine = wrkflow.proto.benchmark_stage_spec.engine
    metrics = wrkflow.metrics()
    print(engine, metrics)
    # Save the resulting Python wheel to the current directory.
    wrkflow.save_package(".")

Accelerate a custom Keras SavedModel

First make sure you have TensorFlow 2.6 installed and the latest version of OctoML’s SDK:

! pip install tensorflow==2.6
! python3 -m pip install octomizer-sdk --extra-index-url https://octo.jfrog.io/artifactory/api/pypi/pypi-local/simple --upgrade

Import libraries:

from tensorflow import keras
import numpy as np
import octomizer.client
import octomizer.models.tensorflow_saved_model as tf_model

Define and save your Keras SavedModel:

def get_model():
    # Create a simple model using Keras.
    inputs = keras.Input(shape=(32,))
    outputs = keras.layers.Dense(1)(inputs)
    model = keras.Model(inputs, outputs)
    model.compile(optimizer="adam", loss="mean_squared_error")
    return model

model = get_model()

# Train the model.
test_input = np.random.random((128, 32))
test_target = np.random.random((128, 1))
model.fit(test_input, test_target)

# Calling `save('my_model')` creates a SavedModel folder `my_model`.
# Calling keras.models.save_model(model, "my_model") also works.
model.save("my_model")

# Create a tarball for the model
! tar -czvf my_model.tgz my_model

The remaining steps to upload the custom Keras SavedModel, disambiguate inputs, accelerate the model, and view results are same as the steps specified above for public Keras SavedModels.

Accelerate a public TensorFlow GraphDef

First make sure you have TensorFlow 2.6 installed and the latest version of OctoML’s SDK:

! pip install tensorflow==2.6
! python3 -m pip install octomizer-sdk --extra-index-url https://octo.jfrog.io/artifactory/api/pypi/pypi-local/simple --upgrade

Import libraries:

import tensorflow as tf
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2

import octomizer.client
import octomizer.models.tensorflow_graph_def_model as tf_model

Now fetch a public GraphDef from Keras’s official model zoo:

model = MobileNetV2(weights='imagenet')

Convert the model to a GraphDef:

full_model = tf.function(lambda x: model(x))
full_model = full_model.get_concrete_function(x=tf.TensorSpec(model.inputs[0].shape, model.inputs[0].dtype))

# Get frozen ConcreteFunction
frozen_func = convert_variables_to_constants_v2(full_model)
frozen_func.graph.as_graph_def()

# Save frozen graph from frozen ConcreteFunction to hard drive
tf.io.write_graph(graph_or_graph_def=frozen_func.graph,
                  logdir=".",
                  name="myGraphDef.pb",
                  as_text=False)

Upload the GraphDef to OctoML’s Platform:

# Pass your API token below:
client = octomizer.client.OctomizerClient(access_token=MY_ACCESS_TOKEN)

# Upload the model to Octomizer.
model = tf_model.TensorFlowGraphDefModel(client, name="myGraphDef.pb", model="myGraphDef.pb")

This is a model with dynamic inputs, so you’ll need to specify the input shapes before acceleration. If not, you’ll get an error message saying “Dynamic inputs are not supported.”:

# Check the automatically inferred shapes.
inputs = model.get_uploaded_model_variant().inputs
print(inputs)
input_name = list(inputs[0].keys())[0]

# The command above returns a string starting with something like 'x:0': [-1, 224, 224, 3].
# The -1 in the first dimension means that you need to specify the batch size.

input_shapes = {input_name: [1, 224, 224, 3]} # Notice the -1 has been replaced by 1.
input_dtypes = inputs[1]

The remaining steps to accelerate the model and view results are same as the steps specified above for public Keras SavedModels.

Accelerate a custom TensorFlow GraphDef

First make sure you have TensorFlow 2.6 installed and the latest version of OctoML’s SDK:

! pip install tensorflow==2.6
! python3 -m pip install octomizer-sdk --extra-index-url https://octo.jfrog.io/artifactory/api/pypi/pypi-local/simple --upgrade

Import libraries:

import tensorflow as tf
from tensorflow import keras
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
import numpy as np

import octomizer.client
import octomizer.models.tensorflow_graph_def_model as tf_model

Define the model:

def get_model():
    # Create a simple model using Keras.
    inputs = keras.Input(shape=(32,))
    outputs = keras.layers.Dense(1)(inputs)
    model = keras.Model(inputs, outputs)
    model.compile(optimizer="adam", loss="mean_squared_error")
    return model

model = get_model()

# Train the model.
test_input = np.random.random((128, 32))
test_target = np.random.random((128, 1))
model.fit(test_input, test_target)

The remaining steps to convert the model to a GraphDef, upload the GraphDef to OctoML, disambiguate inputs, accelerate the model, and view results are same as the steps specified above for public GraphDef models.

End-to-end PyTorch object detection example using ImageNet

First, make sure you have all necessary libraries installed:

! pip install onnxruntime==1.8.0
! pip install onnx==1.9.0
! pip install torch==1.9.0
! python3 -m pip install octomizer-sdk --extra-index-url https://octo.jfrog.io/artifactory/api/pypi/pypi-local/simple --upgrade

Import libraries:

import numpy as np
import octomizer.client
import octomizer.models.onnx_model as onnx_model
import torch.onnx
import torchvision
import urllib

Download a public PyTorch model:

model = torchvision.models.squeezenet1_0(pretrained=True)

Export the PyTorch model to ONNX:

MODEL_NAME = "squeezenet_from_pt"
ONNX_FILE = MODEL_NAME + ".onnx"

# A standard ImageNet input has 3 color channels (RGB) and images of dimension 224x224.
# Image values can be randomized since we only need the network structure to accelerate the model.
rand_input = torch.randn(1, 3, 224, 224)

# OctoML supports ONNX opsets 13 and 11, so make sure to specify these opsets when converting your model.
try:
        torch.onnx.export(model, rand_input, ONNX_FILE, opset_version=13)
except:
        try:
                torch.onnx.export(model, rand_input, ONNX_FILE, opset_version=11)
        except:
                print("Could not export to ONNX using opset 13 or 11")

Upload the model to the OctoML platform:

# Pass your API token below:
client = octomizer.client.OctomizerClient(access_token=MY_ACCESS_TOKEN)

# Upload the model to Octomizer.
model = onnx_model.ONNXModel(client, name=MODEL_NAME, model=ONNX_FILE)

# Check the automatically inferred shapes.
print(model.get_uploaded_model_variant().inputs)

We’ve confirmed that this model has static shapes (1,3,224,224), so there’s no action needed to disambiguate the shapes. Models with dynamic shapes would have negative values in the dynamic dimensions; for instance, a shape of (-1,3,224,224) means that the first dimension (usually batch size) is dynamic and needs to be disambiguated by the user before acceleration. An example can be found here: https://app.octoml.ai/docs/tutorials.html#accelerate-a-public-keras-savedmodel

Now accelerate the model. By default, the resulting package will be a Python wheel:

wrkflow = model.get_uploaded_model_variant().octomize(platform="broadwell")

# Dynamically shaped models would require `input_shapes` and `input_dtypes` as additional parameters in the octomize() call.

# Save the workflow uuid somewhere so you can use it to access benchmark metrics or the resulting package later.
print(wrkflow.uuid)

# Also save the MODEL_NAME you used somewhere because you will need to call `import <MODEL_NAME>` after download the resulting package later,
# unless you set a custom package name per https://app.octoml.ai/docs/api/octomizer/octomizer.html?highlight=package#octomizer.model_variant.ModelVariant.octomize
print(MODEL_NAME)

After you receive an email notification about the completion of the acceleration workflow, you can view performance benchmark metrics on the hardware you chose and download a packaged version of the accelerated model, either by visiting the UI or invoking the following code:

# Look up the workflow you previously launched using the workflow id
wrkflow = client.get_workflow("<INSERT WORKFLOW ID>")

# To view benchmark metrics, either visit the UI or invoke something similar to:
if wrkflow.completed() and wrkflow.has_benchmark_stage():
    engine = wrkflow.proto.benchmark_stage_spec.engine
    metrics = wrkflow.metrics()
    print(engine, metrics)
    # Save the resulting Python wheel to the current directory.
    wrkflow.save_package(".")

Download the wheel generated by OctoML:

! pip install squeezenet_from_pt-0.1.0-py3-none-any.whl

To test the accelerated model, download a publicly available image from PyTorch:

# Download a picture of a Samoyed dog from PyTorch
url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")

try:
        urllib.URLopener().retrieve(url, filename)
except:
        try:
                urllib.request.urlretrieve(url, filename)
        except:
                print("Cannot import image")


# Use boilerplate image processing code from PyTorch-- see https://pytorch.org/hub/pytorch_vision_squeezenet/
from PIL import Image
from torchvision import transforms

input_image = Image.open(filename)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the PyTorch model

Run the accelerated model on the image:

# The module name defaults to the name of the model, if the model has alphanumeric characters only.
# You could also have customized this package name when creating an acceleration workflow.
import squeezenet_from_pt

best_model = squeezenet_from_pt.OctomizedModel()
outputs = best_model.run(input_batch.numpy()) # Run the accelerated model

# The accelerated model outputs a tvm.nd.NDArray object with shape (1,1000),
# with confidence scores over Imagenet's 1000 classes. You can convert the
# output to a numpy array by calling .numpy().

# Find the index of the ImageNet label predicted with highest probability.
pred = np.argmax(outputs[0].numpy())

# Download ImageNet labels
! wget https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt

# Read the labels in ImageNet and print out the label predicted with highest probability
with open("imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]
    print("Accelerated model detects object: " + categories[pred])

Finally, check that the original, unaccelerated PyTorch model shows the same output for the given image:

# Run the original PyTorch model
with torch.no_grad():
    orig_out = model(input_batch)

# The original output has unnormalized scores. To get probabilities, you can run a softmax on it.
orig_probs = torch.nn.functional.softmax(orig_out[0], dim=0)

# Read the labels in ImageNet and print out the label predicted with highest probability
with open("imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]
    orig_pred = np.argmax(orig_probs.numpy())
    print("Unaccelerated model detects object: " + categories[orig_pred])

End-to-end Transformers question answering example

First install all necessary libraries:

! pip install transformers
! pip install onnxruntime==1.8.0
! python3 -m pip install octomizer-sdk --extra-index-url https://octo.jfrog.io/artifactory/api/pypi/pypi-local/simple --upgrade

Also install a large BERT model finetuned on the SQuAD question answering dataset from the transformers repo:

! python -m transformers.onnx --model=bert-large-uncased-whole-word-masking-finetuned-squad --feature=question-answering onnx/

Now upload the model to the OctoML platform:

import octomizer.client
import octomizer.models.onnx_model as onnx_model

# Pass your API token below:
client = octomizer.client.OctomizerClient(access_token="<INSERT ACCESS TOKEN>")

# Upload the ONNX model
MODEL_NAME = "bert_squad_qa"
ONNX_FILE = "onnx/model.onnx"
accel_model = onnx_model.ONNXModel(client, name=MODEL_NAME, model=ONNX_FILE)

# Check the automatically inferred input shapes.
inputs = accel_model.get_uploaded_model_variant().inputs
print(inputs)

The command above prints ({‘input_ids’: [-1, -1], ‘attention_mask’: [-1, -1], ‘token_type_ids’: [-1, -1]}, {‘input_ids’: ‘int64’, ‘attention_mask’: ‘int64’, ‘token_type_ids’: ‘int64’}).

Notice that the input shapes printed above have negative values, which means they are dynamic and need to be disambiguated. For transformer models, the inputs input_ids, attention_mask, and token_type_ids need to have the same shape: [batch_size, maximum_sequence_length].

In this example, we will specify a batch size of 1 and maximum sequence length of 128. input_ids indicate the IDs of the tokens (words or subwords) in an input sequence. attention_mask is a binary tensor indicating which indices are padded so that the model does not attend to them. token_type_ids represents a binary mask identifying the two types of sequence in the model– question or context:

input_shapes = {'input_ids': [1, 128], 'attention_mask': [1, 128], 'token_type_ids': [1, 128]}
input_dtypes = inputs[1] # Use the input data types that OctoML automatically inferred

OctoML delivers best performance on Transformer-based models via optimized use of ONNX-RT and packaging for CPU targets. For GPU targets, we recommend using TVM for acceleration by calling octomize() instead of benchmark():

wrkflow = accel_model.get_uploaded_model_variant().benchmark(platform="broadwell", input_shapes=input_shapes, input_dtypes=input_dtypes, create_package = True)

# Save the workflow uuid somewhere so you can use it to access benchmark metrics or the resulting package later.
print(wrkflow.uuid)

# Also save the MODEL_NAME you used somewhere because you will need to call `import <MODEL_NAME>` after download the resulting package later,
# unless you set a custom package name per https://app.octoml.ai/docs/api/octomizer/octomizer.html?highlight=package#octomizer.model_variant.ModelVariant.octomize
print(MODEL_NAME)

After you receive an email notification about the completion of the acceleration workflow, you can view performance benchmark metrics on the hardware you chose and download a packaged version of the accelerated model, either by visiting the UI or invoking the following code:

# Look up the workflow you previously launched using the workflow id
wrkflow = client.get_workflow("<INSERT WORKFLOW ID>")

# To view benchmark metrics, either visit the UI or invoke something similar to:
if wrkflow.completed() and wrkflow.has_benchmark_stage():
    engine = wrkflow.proto.benchmark_stage_spec.engine
    metrics = wrkflow.metrics()
    print(engine, metrics)
    # Save the resulting Python wheel to the current directory.
    wrkflow.save_package(".")

Download the wheel generated by OctoML:

! pip install bert_squad_qa-0.1.0-py3-none-any.whl

Import the accelerated model:

import bert_squad_qa

best_model = bert_squad_qa.OctomizedModel()

Now set up a sample input for the accelerated model:

from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
question, context = "What are some example applications of BERT?", "…BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications."
encoded_input = tokenizer.encode_plus(question, context, return_tensors="np")

Run the accelerated model:

start_scores, end_scores = best_model.run(*encoded_input.values())

Now let’s interpret the results:

import numpy as np
input_ids = encoded_input['input_ids']
tokens = tokenizer.convert_ids_to_tokens(input_ids.squeeze())

# Find the tokens with the highest `start` and `end` scores.
answer_start = np.argmax(start_scores)
answer_end = np.argmax(end_scores)

# Combine the tokens in the answer and print it out.
answer = ' '.join(tokens[answer_start:answer_end+1])
print(answer)

The question we asked the accelerated model was “What are some example applications of BERT?” The model answered “… bert model can be fine ##tu ##ned with just one additional output layer to create state - of - the - art models for a wide range of tasks , such as question answering and language inference.”

Supported Hardware

To get the supported hardware targets available for acceleration, you can use the following code:

# Pass your API token below:
client = octomizer.client.OctomizerClient(access_token=MY_ACCESS_TOKEN)

# Get the list of available hardware targets
targets = client.get_hardware_targets()

Each hardware target in the list will have information like the display name, platform, vendor name, the number of vCPUs, the name of the architecture, and the supported model runtimes for the target (e.g. TVM, ONNX Runtime, etc.). If we want to accelerate a model using the target, Intel Cascade Lake (AWS c5.12xlarge), which has a platform name of aws_c5.12xlarge, we can set it as the platform parameter in the octomize function:

# Assuming the model has already been uploaded and we infer the inputs
wrkflow = model.get_uploaded_model_variant().octomize(platform="aws_c5.12xlarge")

After your model has been benchmarked, the lscpu information will be included in the benchmark results:

result = wrkflow.result()
print(result.benchmark_result.lscpu_output)

The output will produce something like this:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              48
On-line CPU(s) list: 0-47
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
Stepping:            7
CPU MHz:             1941.374
BogoMIPS:            5999.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-47
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni

End-to-End: Optimizing PyTorch Model, Packaging it in Triton, and Pushing to AWS ECR

First, let’s make sure we have all the right packages installed.:

! pip install onnxruntime==1.8.0
! pip install onnx==1.9.0
! pip install torch==1.9.0
! python3 -m pip install octomizer-sdk --extra-index-url https://octo.jfrog.io/artifactory/api/pypi/pypi-local/simple --upgrade

Import libraries:

import numpy as np
import octomizer.client
import octomizer.models.onnx_model as onnx_model
import torch.onnx
import torchvision.models
import urllib

Download a public PyTorch model:

model = torchvision.models.squeezenet1_0(pretrained=True)

Export the PyTorch model to ONNX:

MODEL_NAME = "squeezenet_from_pt"
# ONNX_FILEPATH = MODEL_NAME + ".onnx"

# A standard ImageNet input has 3 color channels (RGB) and images of dimension 224x224.
# Image values can be randomized since we only need the network structure to accelerate the model.
rand_input = torch.randn(1, 3, 224, 224)

# OctoML supports ONNX opsets 13 and 11, so make sure to specify these opsets when converting your model.

try:
        torch.onnx.export(model, rand_input, ONNX_FILEPATH, opset_version=13)
except:
        try:
                torch.onnx.export(model, rand_input, ONNX_FILEPATH, opset_version=11)
        except:
                print("Could not export to ONNX using opset 13 or 11")

Upload the model to the OctoML platform:

# Pass your API token below:
client = octomizer.client.OctomizerClient(access_token=MY_ACCESS_TOKEN)

# Upload the model to Octomizer.
model = onnx_model.ONNXModel(client, name=MODEL_NAME, model=ONNX_FILEPATH)

# Check the automatically inferred shapes.
print(model.get_uploaded_model_variant().inputs)

Now we’ll accelerate the model for an AWS “c5n.xlarge” instance. This step is likely to take several hours. To test a faster flow, set “kernel_trials=0” in the octomize() command.:

wrkflow = model.get_uploaded_model_variant().octomize(platform="aws_c5n.xlarge")

# Dynamically shaped models would require `input_shapes` and `input_dtypes` as additional parameters in the octomize() call.

# Save the workflow uuid somewhere so you can use it to access benchmark metrics or the resulting package later.
print(wrkflow.uuid)

After you receive an email notification about the completion of the acceleration workflow, you can view performance benchmark metrics on the hardware you chose.

You can also see the options available for packaging. In this tutorial, we’re focused upon the “docker_build_triton” package.:

# Look up the workflow you previously launched using the workflow id
wrkflow = client.get_workflow((<INSERT WORKFLOW ID>)

# To view benchmark metrics, either visit the UI or invoke something similar to:
if wrkflow.completed() and wrkflow.has_benchmark_stage():
        engine = wrkflow.proto.benchmark_stage_spec.engine
        metrics = wrkflow.metrics()
        print(engine, metrics)

        # Here we print the package types that are avialable. Notice that Triton is among them.
        packages = wrkflow.proto.status.result.package_result
        print(packages)

Once we’ve confirmed that we have a docker package available to us, let’s take a moment to prepare the destination for our package: AWS ECR. For more detail on setting up ECR, check out this guide: https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html:

Start by logging into your ECR registry using the AWS CLI.:

! aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com

Now that we’ve logged in, we need to specify the repository name and tag that we’re going to use for the model container image. If you haven’t already created a destination repository in ECR, you’ll need to so first. Check out the guide here: https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-create.html:

# Create the repository in ECR:
!aws ecr create-repository --repository-name $MODEL_NAME --profile [PROFILE]

# Create a local mirror for that repository, and also specify the tag
mirror_repository_name = '[AWS_ACCOUNT_ID].dkr.ecr.[REGION].amazonaws.com/' + MODEL_NAME
tag = 'v1'

print(repository_name+":"+tag)

Now we are ready to build an image! To do so, we call a special “docker_build_triton” method on the workflow we just completed. We’ll need to pass in the repository name and tag.:

wrkflow.docker_build_triton(mirror_repository_name+":"+tag)

Now that we’ve built the image, let’s confirm that we have it in our local registry:

! docker images

Finally, let’s push that image to AWS ECR, and confirm that we’ve succeeded. Note that this image is 12GB, it will take awhile the first time we do it.:

! docker push [AWS_ACCOUNT_ID].dkr.ecr.[REGION].amazonaws.com/squeezenet_from_pt:v1 #PROD VERSION

Now that we’ve pushed this image, let’s confirm that it’s in ECR.:

! aws ecr list-images --repository-name $MODEL_NAME --profile [PROFILE_NAME]

The optimized model container is now in AWS ECR and ready to be pushed wherever you need it!