How to convert Pytorch model to CoreML

10 minute read

Published:

In this article, I will discuss how to convert model to a deployable format

Model Creation and Conversion for On-Device Deployment

  1. Model Creation and Conversion for On-Device Deployment

Welcome back to our series on deploying machine learning models to end-user devices. In this article, we delve into CoreML and ONNX, two pivotal technologies in this realm. Our goal is to create a PyTorch model designed for sequence prediction and then convert it for use with CoreML and ONNX.

Understanding CoreML and ONNX

What is CoreML?

CoreML is Apple’s framework for integrating machine learning models into iOS, macOS, watchOS, and tvOS apps. It’s designed for optimal performance and is capable of running various model types, including neural networks, decision trees, and support vector machines.

What is ONNX?

ONNX (Open Neural Network Exchange) is an open format used to represent machine learning models. It enables models to be used across multiple platforms and frameworks, aiding in flexibility and interoperability.

If your deployment is exclusively for Apple devices, CoreML is an excellent choice. However, for cross-platform deployment encompassing Apple, Windows, and Android, ONNX is a versatile option, providing broad compatibility beyond the scope of Apple’s CoreML. In order to make this series more comprehensive, we will play with both CoreML and ONNX.

Building a PyTorch Model for Sequence Prediction

To maintain clarity and focus in this series on end-to-end model deployment, we’ll focus on a simple machine learning task: developing a model to classify the result of an array’s sum divided by 20 into one of three categories: 0, 1, or 2. Specifically, the model will use training data generated by the formula y=int(sum(X)/20), where the size of X is 6 and max(X)<10. For example, if the input is [5, 1, 2, 3, 2, 1], the model should predict ‘0’. This task will be treated as a classification problem.

  1. Create conda environment and install dependencies
  • conda create -n on-device python==3.10

  • conda activate on-device

  • pip install torch==2.1.0 onnx==1.14.1 onnxruntime==1.16.3 coremltools==7.1

  1. Import libraries and check dependencies

    import torch import torch.nn as nn import torch.optim as optim import numpy as np import onnx import onnxruntime as ort from onnxruntime.training import artifacts import torch.nn.functional as F import coremltools as ct

    print(torch.version) # 2.1.0 print(onnx.version) # 1.14.1 print(onnxruntime.version) # 1.16.3 print(ct.version) # 7.1

  2. Create function to generate model training data

    import numpy as np

    def generate_training_data(data_size): # Generate random integers for input data X X = np.random.uniform(0.0, 10.0, (data_size, 6))

     # Compute output data y
     y = (np.sum(X, axis=1) / 20)
     y = y.astype(int)
     # Compute Y as the sum of each row in X divided by 3
    
     return torch.tensor(X, dtype=torch.float32), torch.tensor(y, dtype=torch.long)
    

    # Parameters data_size = 500 # Length of the subsequences

    # Generate data X, y = generate_training_data(data_size)

    X_train = X[:400] y_train = y[:400] X_test = X[400:] y_test = y[400:]

    # the first 4 samples (X[:4], y[:4]) looks like “"”tensor([[8.3398, 0.7825, 3.4119, 3.5665, 5.3777, 3.2396], [0.1323, 6.1142, 1.3378, 0.1998, 2.3530, 2.1843], [1.5627, 3.9146, 6.3159, 7.3567, 6.9889, 9.6894], [1.4717, 0.0659, 3.8867, 5.7530, 5.5593, 9.9149]]) tensor([1, 0, 1, 1]) “””

  3. Create a simple LSTM model for the classification task

    class LSTMNumberPredictor(nn.Module): def init(self, num_classes, hidden_dim, num_layers): super(LSTMNumberPredictor, self).init() self.hidden_dim = hidden_dim self.num_layers = num_layers

         # LSTM layer
         self.lstm = nn.LSTM(input_size=1, hidden_size=hidden_dim, num_layers=num_layers, batch_first=True)
            
         # Fully connected layer
         self.fc = nn.Linear(hidden_dim, num_classes)
        
     def forward(self, x):
         # Initializing hidden state for first input
         h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim)
         c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim)
            
         # Forward propagate LSTM
         out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_dim)
         # out, _ = self.lstm(x)
            
         # Decode the hidden state of the last time step
         out = self.fc(out[:, -1, :])
         return out
    

    # Example usage: model = LSTMNumberPredictor(num_classes=3, hidden_dim=50, num_layers=1)

  4. Train the pytorch model

    learning_rate = 0.001 num_epochs = 300 # Number of epochs for training

    # Loss and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    # Reshape input suitable for LSTM (batch_size, seq_length, input_size) X_train = X_train.unsqueeze(-1) X_test = X_test.unsqueeze(-1)

    # Training loop for epoch in range(num_epochs): model.train() optimizer.zero_grad()

     # Forward pass
     outputs = model(X_train)
     loss_train = criterion(outputs, y_train)
    
     outputs_test = model(X_test)
     loss_test = criterion(outputs_test, y_test)
    
     # Backward and optimize
     loss_train.backward()
     optimizer.step()
    
     if (epoch+1) % 100 == 0:
         print(f'Epoch [{epoch+1}/{num_epochs}], Training Loss: {loss_train.item():.4f}, Testing Loss: {loss_test.item():.4f}')
    

The output of training loop looks like

Once the PyTorch model is trained and produces satisfying loss, we can convert it to either CoreML or ONNX format.

Convert **PyTorch model to CoreML**

Use coremltools to convert the model. You need to specify the input shapes the model expects and trace the PyTorch model. “Tracing” a model in the context of machine learning, particularly in PyTorch, refers to a process of converting a dynamic neural network into a static computational graph. This concept is crucial in optimizing models for deployment, including conversion to different formats like CoreML or ONNX.

Tracing a PyTorch model involves running the model with a sample input tensor and recording the operations performed during this forward pass. This process creates a static representation (a traced graph) of the model. The traced graph is a sequence of operations as they were executed with the provided input, essentially capturing the model’s behavior in a fixed state.

Lets trace our trained model:

# Set model status to evaluation 
model.eval()

# Trace the model with random data.
example_input_for_trace = X_train[:1]

traced_model = torch.jit.trace(model, (example_input_for_trace, ))

When the traced model is ready, we can convert it to CoreML:

import coremltools as ct

# Convert to Core ML program using the Unified Conversion API.
model_ct = ct.convert(
    traced_model,
    # convert_to="mlprogram",
    # compute_precision=ct.precision.FLOAT32,
    inputs=[ct.TensorType(shape=example_input_for_trace.shape)],
 )

# check names of input and output
print(model_ct.input_description, model_ct.output_description) 
# you may see Features(x), Features(linear_0)
  • ct.convert: This function from the CoreMLTools library (ct is commonly used as an abbreviation for coremltools) is used to convert models from different frameworks (like PyTorch) to the CoreML format.

  • traced_model: This is the PyTorch model that has been converted to a traced model. Tracing is a process in PyTorch where a dynamic computation graph (like those used in PyTorch) is converted into a static graph. This is typically done using torch.jit.trace.

  • inputs=[ct.TensorType(shape=example_input_for_trace.shape)]: This specifies the input type and shape for the CoreML model. ct.TensorType is used to define the data type and shape of the input tensor. The shape is obtained from example_input_for_trace.shape, which should be a tensor representing a typical input to the model. This information is crucial for CoreML to understand how to handle inputs for the model.

Now, you can use the CoreML program to do inference now, use the output name from model_ct.input_description, model_ct.output_descriptionto test querying the model:

example_input_test_trace = X_train[2:3]

# test inference using converted coreml model
coreml_pred = model_ct.predict({"x": example_input_test_trace.numpy()})['linear_0']
np.argmax(coreml_pred, axis=1)

# save the converted coreml model
model_ct.save("lstm_model.mlpackage")

Convert PyTorch model to ONNX

To convert PyTorch model to ONNX, we first define the names for the input and output layers of the ONNX model. These names are used to identify these layers in the ONNX model. Here, the input layer is named “seq_input” and the output layer “my_output” :

# Define input / output names
input_names = ["seq_input"]
output_names = ["my_output"]

Now, we can do the conversion:

torch.onnx.export(model,
                  (example_input_for_trace,),
                  "lstm_model.onnx",
                  verbose=False,
                  input_names=input_names,
                  output_names=output_names,
                  dynamic_axes={'seq_input' : {0: 'batch'},    # variable length axes
                                'my_output' : {0: 'batch'}}
                 )
  • torch.onnx.export: This is the main function used for the conversion. It takes the model and other parameters to perform the conversion.

  • model: The trained PyTorch model that you are converting.

  • (example_input_for_trace,): A sample input for the model. This is used to trace the model’s operations. It’s wrapped in a tuple, which is indicated by the comma.

  • “lstm_model.onnx”: The filename where the ONNX model will be saved. In this case, the model will be saved as lstm_model.onnx.

  • verbose=False: This parameter controls the verbosity of the export process. Setting it to False means that detailed logging of the conversion process will be suppressed.

  • input_names and output_names: These are the lists you defined earlier, setting the names for the input and output layers in the ONNX model.

  • dynamic_axes: This is a dictionary that specifies which axes of the input and output tensors are dynamic. In this case, {0: ‘batch’} indicates that the first dimension (axis 0) of both seq_input and my_output is a batch dimension whose size can change. This is important for models that can take varying batch sizes.

The code effectively converts the PyTorch model into the ONNX format while also setting up the names and dynamic behavior (such as variable batch size) of the input and output tensors. The ONNX model created is saved as lstm_model.onnx, which can then be used in environments that support ONNX.

We can also do sanity check over the lstm_model.onnx and run inference using onnxruntime Python API:

# Load the ONNX model
model = onnx.load("lstm_model.onnx")
onnx.checker.check_model(model)

ort_session = ort.InferenceSession("lstm_model.onnx")

seq = example_input_for_trace[:1].numpy()

onnx_pred = ort_session.run(
    ["my_output"],
    {"seq_input": seq},
)
  • model = onnx.load(“lstm_model.onnx”): Loads the saved ONNX model into the variable model.

  • onnx.checker.check_model(model): This line verifies the integrity and correctness of the loaded ONNX model. It’s a way to ensure that the model is well-formed and doesn’t have inconsistencies or errors.

  • ort_session = ort.InferenceSession(“lstm_model.onnx”): Initializes an inference session for the model using ONNX Runtime. This session will be used to run the model.

  • seq = example_input_for_trace[:1].numpy(): Prepares the input data (taking a single example from a batch) for the model.

  • onnx_pred = ort_session.run([“my_output”], {“seq_input”: seq}): This line actually runs the model inference. [“my_output”]: Specifies the output that you are interested in. In this case, it’s named “my_output”, as defined in the model. {“seq_input”: seq}: Provides the input data to the model. The key “seq_input” matches the name of the input layer in the ONNX model, and seq is the input data prepared earlier.