Deploying Octomized Models

Once you have Octomized and packaged a model using either the web interface or Python SDK, deployment of the optimized model to your target hardware is simple.

Python wheel deployment

Required Dependencies

A Linux x86 machine is required. The following software dependencies are required to run the octomized model. For debian/ubuntu images you must have:

  1. ldd --version shows GLIBC >= 2.27. (For CPU packages, GLIBC >= 2.24 should be sufficient)

  2. sudo apt-get install libssl-dev zlib1g-dev build-essential

  3. sudo apt-get install libffi-dev Note if you’ve compiled your own Python dist (eg through pyenv install) you will need to re-compile the dist after this step

  4. python >= 3.7 (3.7 recommended)

Additionally, if you have packaged your model on a GPU platform, CUDA 10.2 and Vulkan 1.2.135 are required. On Ubuntu, the Vulkan dependency can be installed by running apt install libvulkan-dev.

Running Model Inference

First, download the Python wheel file produced by the Octomizer. In the SDK, this is done with:

wrkflow = modelvar.octomize(PLATFORM)
wrkflow.wait()
wrkflow.save_package(OUTPUT_DIRECTORY)

This will save a file in the form of <package_name>-0.1.0-py3-none-any.whl, where <package_name> will be the specified package name or, if unspecified, a name derived from the name field of your model.

You can install the wheel into your Python environment using the Python pip command:

$ pip install <package_name>-0.1.0-py3-none-any.whl

Once installed, the model can be imported and invoked from Python as follows:

import <package_name>
import numpy as np

model = <package_name>.OctomizedModel()

Please confirm that input info is correct for the packaged model with:

idict = model.get_input_dict()
print(idict)

Now you can provide inputs to run inference on.

Inference for single image input

If you have a model that takes in an image, you can run something like:

import cv2
image_path = <image path here>
image_size = <image size here>       # Please consult `idict` above for value
input_dtype = <input dtype here>     # Please consult `idict` above for value

# If the image is in grayscale, instead of the following line, invoke
# img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
img = cv2.imread(image_path)
img = cv2.resize(img, dsize=((image_size, image_size)))

# Note that if you provided an RGB image, `img.shape` will look like
# (image_size, image_size, 3). If the `idict` info you printed above indicates that
# your model expects an input of shape (1, 3, image_size, image_size), you
# should uncomment the following transposition to match the input data to the format
# expected by your model.
#
# img = img.transpose((2, 0, 1))
input_nparr = np.array(img.tolist()).astype(input_dtype)
# The next line assumes that the batch dimension for this image is 1
input_nparr = input_nparr.reshape([1, *input_nparr.shape])

# If you provided a grayscale image, `img.shape` will look like
# (image_size, image_size). If the `idict` info you printed above indicates that
# your model expects an input of shape (1, 1, image_size, image_size), ensure that
# you properly resize the data to the expected format as follows:
# input_nparr.reshape([1, 1, image_size, image_size])

Note that you will need to adjust the above code depending on how your model’s inputs were pre-processed for training.

Now that you’ve processed your image, you can run your model on the processed inputs with:

outputs = model.run(input_nparr)

At this point please confirm the output is as you would expect. If your model produces multiple outputs, note that the order of outputs is preserved across model formats.

Inference for multiple inputs

This example code runs a model with multiple random np array inputs – please adjust for your own purposes:

idict = model.get_input_dict()
inputs = []
for iname in idict.keys():
  ishape = idict[iname]["shape"]
  idtype = idict[iname]["dtype"]
  inp = np.random.random(ishape).astype(idtype)
  inputs.append(inp)

# Run the model. If your model has multiple inputs, you may pass multiple inputs
# in the style of *args to the `run` function, or if you prefer, you may provide
# a dict of input name to value in **kwargs style.
outputs = model.run(*inputs)

# Note that the order of outputs is preserved from both onnx and relay formats.

The return type of model.run is List[tvm.runtime.ndarray.NDArray], documentation for NDArray is found here.

To access the first output as a numpy object from the inference, you may run:

out = outputs[0].numpy()

C++ Linux shared object deployment

Required Dependencies

A Linux x86 machine is required. The following software dependencies are required to install an octomized model as a python package. For debian/ubuntu images you must have:

  1. ldd --version shows GLIBC >= 2.27. (For CPU packages, GLIBC >= 2.24 should be sufficient)

  2. sudo apt-get install libssl-dev zlib1g-dev build-essential libffi-dev

Additionally, if you have packaged your model on a GPU platform, CUDA 10.2 and Vulkan 1.2.135 are required. On Ubuntu, the Vulkan dependency can be installed by running apt install libvulkan-dev.

Running Model Inference

First, download the .tar.gz file produced by the Octomizer. In the SDK, this is done with:

wrkflow = modelvar.octomize(PLATFORM, package_type=PackageType.LINUX_SHARED_OBJECT)
wrkflow.wait()
wrkflow.save_package(OUTPUT_DIRECTORY)

This will save a file in the form of <package_name>.tar.gz, where <package_name> will be the specified package name or, if unspecified, a name derived from the name field of your model.

Data is input to and output from model via tvm.runtime.ndarray.NDArray, documentation of NDArray is found here.

Note that the order of outputs is preserved from both ONNX and Relay formats.

Also note that packaged models running on machines with TVM previously installed will fail.

Source code for the sample program is provided in the tar file for your convenience. See the provided README.md for more file directory information.

Inference on sample random inputs

The sample program by default runs model inference on random inputs. To check that the model was succcessfully packaged you may simply run:

$ tar xvf <package_name>.tar.gz
$ cd <package_name>
$ make sample_program
$ LD_LIBRARY_PATH=$(pwd) ./sample_program

Inference for single image input

First, install libopencv. If you’re on linux, you can do this with sudo apt install libopencv-dev. Otherwise, please follow the install instructions here: https://docs.opencv.org/master/df/d65/tutorial_table_of_content_introduction.html. Remember to run make install after running the cmake build commands if you choose to build from scratch.

Some modifications to the Makefile are necessary to build the sample_program with access to the opencv libraries:

  1. Add -lopencv_core to the sample_program target block’s compilation command.

2. If you built opencv from scratch, you may additionally need to add something like -I/path/to/opencv/include/dir to the compile command for the sample_program target in the Makefile file. For linux, this will look like -I/usr/local/include/opencv4 (the opencv4 directory contains an opencv2 directory that we will reference later).

Next, you’ll need to make some modifications to the file sample_program.cpp.

  1. Add the header #include <opencv2/opencv.hpp>

2. To the bottom of the main function, add the following code, which assumes your code takes float type inputs. If you have not modified sample_program.cpp, the addition of the below code will cause model inference to be run twice; the first time on random inputs, the second on an image of your choosing.:

std::string image_path = <image path here>
int64_t image_size = <image size here>      # Consult the generated code for input_map
                                            # to determine the correct image size

// If the image is in grayscale, instead of the following line, invoke
// cv::Mat image = cv::imread(image_path, cv::IMREAD_GRAYSCALE);
cv::Mat image = cv::imread(image_path);

// Check for failure
if (image.empty()) {
  std::cout << "Unable to find and read image" << std::endl;
  return -1;
}

cv::Mat dst;
cv::resize(image, dst, cv::Size(image_size, image_size));

std::vector<float> data(dst.begin<uint8_t>(), dst.end<uint8_t>());
std:vector<int64_t> shape = input_shape_for_<input name here>;     # Consult the generated code for input name

int64_t num_input_elements = 1;
for (int64_t s : shape) {
  num_input_elements *= s;
}
tvm::runtime::NDArray input_arr = tvm_ndarray_from_vec<float>(
    data,
    shape,
    ctx,
    "float32",
    num_input_elements
);
std::vector<tvm::runtime::NDArray> input_vec = { input_arr };

auto image_outputs = model.run(input_vec);

// You can access the data from the output NDArrays with the following:
for (tvm::runtime::NDArray output : image_outputs) {
  int64_t num_output_elements = 1;
  for (int64_t s : output.Shape()) {
    num_output_elements *= s;
  }

  auto dat = static_cast<float*>(output->data);
  for (int i = 0; i < num_output_elements; i++) {
    std::cout << dat[i] << std::endl;
  }
}