In the rapidly evolving landscape of artificial intelligence, deploying machine learning models efficiently and reliably is a critical challenge. As AI moves from the cloud to the edge, frameworks that enable on-device inference become increasingly vital. This article introduces two prominent solutions in this space: Apple's Core ML and the Open Neural Network Exchange (ONNX). We will explore what each framework is, the purpose it serves, and how developers can practically utilize them, delving into their constituents, details, and ultimately, a comprehensive comparison to guide your development choices.
Section 1: Understanding Apple's Core ML
1.1 What is Core ML?
Core ML is Apple's foundational machine learning framework, designed to integrate trained machine learning models into Apple applications seamlessly. It provides a unified way to incorporate various types of machine learning models -- from deep neural networks to tree ensembles and support vector machines -- directly into apps running on iOS, macOS, watchOS, and tvOS devices. The primary purpose of Core ML is to enable on-device inference, meaning that predictions are made locally on the user's device rather than relying on cloud servers. This approach offers significant advantages, including enhanced user privacy, faster inference speeds due to reduced network latency, the ability to function offline, and potentially lower server infrastructure costs for developers.
1.2 The Core ML Ecosystem and Components
The Core ML ecosystem consists of several key components that work together to facilitate on-device machine learning:
The Core ML Framework is the runtime engine embedded within Apple's operating systems. It is responsible for executing machine learning models efficiently on the device's hardware, including leveraging specialized silicon like the Neural Engine in Apple's A-series and M-series chips for accelerated performance. This framework handles the low-level details of model execution, allowing developers to focus on integrating AI capabilities into their applications.
Core ML Tools is a powerful Python package that serves as the bridge between popular machine learning training frameworks and Core ML. Its main function is to convert models trained in frameworks such as TensorFlow, PyTorch, Keras, scikit-learn, and XGBoost into Apple's native Core ML model format. This conversion process often includes optimizations specific to Apple hardware, such as quantization and layer fusion, to ensure maximum performance and efficiency on device.
The .mlmodel format is the proprietary, universal model representation used by Core ML. When a model is converted using Core ML Tools, it is packaged into a `.mlmodel` file. This file contains the model's architecture, weights, input and output descriptions, and metadata. Xcode, Apple's integrated development environment, automatically generates Swift or Objective-C interfaces from these `.mlmodel` files, making it straightforward for app developers to interact with the model programmatically.
1.3 How to Use Core ML: A Step-by-Step Guide
Utilizing Core ML in an application typically involves a sequence of steps, starting from model creation and ending with on-device inference.
Step 1: Obtain or Train a Model
The first step involves acquiring a machine learning model. This could be a pre-trained model from a public repository, such as MobileNetV2 for image classification, or a custom model trained by a developer using frameworks like TensorFlow, PyTorch, or Keras. For our running example, we will use a pre-trained Keras MobileNetV2 model.
Step 2: Convert the Model to .mlmodel Format
Once a model is obtained, it needs to be converted into the `.mlmodel` format using Core ML Tools. This Python library provides functions to load models from various sources and convert them, often with options for optimization.
Let's demonstrate converting a pre-trained Keras MobileNetV2 model for image classification into a Core ML model. This model expects images of size 224x224 pixels with three color channels (RGB).
Code Snippet 1.3.2: Converting a Keras MobileNetV2 Model to Core ML
import coremltools as ct
import tensorflow as tf
# 1. Load a pre-trained Keras MobileNetV2 model.
# This model is pre-trained on the ImageNet dataset and is suitable for
# image classification tasks.
print("Loading Keras MobileNetV2 model...")
model = tf.keras.applications.MobileNetV2(weights='imagenet')
print("Keras MobileNetV2 model loaded successfully.")
# 2. Define input and output specifications for the Core ML model.
# For image classification, the input is an image, and the output is
# a set of class probabilities and the predicted class label.
# The input shape is (batch_size, height, width, channels).
# Core ML expects images to be normalized. MobileNetV2 expects pixel
# values in the range [-1, 1].
# The `bias` parameter subtracts -1 from each channel (effectively adding 1),
# and `scale` divides by 127.5, thus mapping [0, 255] to [-1, 1].
input_name = 'image'
output_name = model.output_names[0] # Typically 'predictions' or similar for Keras
class_labels = tf.keras.applications.mobilenet_v2.decode_predictions(
tf.zeros((1, 1000)).numpy(), top=1000
)[0]
class_labels_list = [label for _, label, _ in class_labels]
# Create a classifier configuration to indicate that the model performs
# classification and to specify the output layer that contains the class probabilities.
classifier_config = ct.ClassifierConfig(output_name, class_labels=class_labels_list)
# 3. Convert the Keras model to Core ML format.
# We specify the input type as an image, its shape, and preprocessing parameters.
print("Converting Keras model to Core ML format...")
coreml_model = ct.convert(
model,
inputs=[ct.ImageInput(shape=(1, 224, 224, 3),
bias=[-1,-1,-1],
scale=1/127.5,
channel_first=False, # Keras uses channel_last (HWC)
name=input_name)],
classifier_config=classifier_config,
convert_to="mlprogram" # Recommended for newer Core ML versions
)
print("Model conversion complete.")
# 4. Save the Core ML model to a .mlmodel file.
# This file will be dragged into an Xcode project.
output_filename = 'MobileNetV2.mlmodel'
coreml_model.save(output_filename)
print(f"Core ML model saved as '{output_filename}'")
This script will generate a `MobileNetV2.mlmodel` file, which is ready for integration into an Apple application.
Step 3: Integrate the .mlmodel into an Xcode Project
After conversion, the `.mlmodel` file is simply dragged and dropped into an Xcode project. Xcode automatically processes this file and generates a Swift or Objective-C class that provides a type-safe interface to interact with the model. This generated class simplifies model loading, input preparation, and prediction retrieval.
Step 4: Perform Inference in an iOS/macOS App
With the model integrated, developers can use the generated class to load the model and perform predictions. For image-based models, Apple's Vision framework often works in conjunction with Core ML, simplifying image preprocessing and result interpretation.
Let's look at a simplified Swift snippet for performing image classification using the converted MobileNetV2 model.
Code Snippet 1.3.4: Performing Image Classification with Core ML in Swift
import CoreML
import Vision
import UIKit
// This function demonstrates how to classify a UIImage using a Core ML model.
// It assumes that 'MobileNetV2.mlmodel' has been added to the Xcode project
// and Xcode has generated the 'MobileNetV2' class.
func classifyImage(image: UIImage) {
// 1. Load the Core ML model.
// The `MobileNetV2` class is automatically generated by Xcode from the .mlmodel file.
// MLModelConfiguration allows for specifying compute units (CPU, GPU, Neural Engine).
guard let coreMLModel = try? MobileNetV2(configuration: MLModelConfiguration()) else {
fatalError("Failed to load Core ML model. Ensure 'MobileNetV2.mlmodel' is in your project.")
}
// 2. Create a Vision model from the Core ML model.
// Vision framework provides convenience for image processing and model execution.
guard let visionModel = try? VNCoreMLModel(for: coreMLModel.model) else {
fatalError("Failed to create VNCoreMLModel from the Core ML model.")
}
// 3. Create a Vision request for classification.
// This request will execute the Core ML model and return classification results.
let request = VNCoreMLRequest(model: visionModel) { request, error in
// Handle potential errors during the request.
if let error = error {
print("Vision request failed with error: \(error.localizedDescription)")
return
}
// Process the classification results.
// Results are typically an array of VNClassificationObservation objects.
guard let results = request.results as? [VNClassificationObservation],
let topResult = results.first else {
print("No classification results found.")
return
}
// Print the top detected class and its confidence.
let confidence = String(format: "%.2f", topResult.confidence * 100)
print("Detected: \(topResult.identifier) with confidence \(confidence)%")
}
// 4. Convert the UIImage to CIImage for Vision processing.
// Vision framework primarily works with CIImage or CVPixelBuffer.
guard let ciImage = CIImage(image: image) else {
fatalError("Failed to convert UIImage to CIImage.")
}
// 5. Perform the Vision request using an image request handler.
// The handler executes the request on the provided image.
let handler = VNImageRequestHandler(ciImage: ciImage)
do {
try handler.perform([request])
} catch {
print("Failed to perform Vision request: \(error.localizedDescription)")
}
}
// Example usage (e.g., in a UIViewController after an image is selected):
// if let sampleImage = UIImage(named: "my_sample_image") {
// classifyImage(image: sampleImage)
// }
Figure 1.3.5: Core ML Workflow Diagram
This diagram illustrates the typical flow of using Core ML, from model training to on-device inference.
+---------------------+ +---------------------+ +---------------------+
| ML Training | | Core ML Tools | | Xcode Project |
| (TensorFlow, | | (Python Package) | | (iOS/macOS App) |
| PyTorch, Keras) | | | | |
+----------+----------+ +----------+----------+ +----------+----------+
| | |
| Train/Obtain Model | Convert Model | Add .mlmodel
| | to .mlmodel |
v v v
+---------------------+ +---------------------+ +---------------------+
| Trained Model |----->| .mlmodel File |----->| Generated Swift/ |
| (e.g., Keras HDF5) | | (Core ML Format) | | Obj-C Interface |
+---------------------+ +---------------------+ +----------+----------+
|
| Use in App
v
+---------------------+
| Core ML Framework |
| (On-Device Runtime)|
+----------+----------+
|
| Perform Inference
v
+---------------------+
| App Prediction |
| Results |
+---------------------+
Section 2: Exploring ONNX (Open Neural Network Exchange)
2.1 What is ONNX?
ONNX, which stands for Open Neural Network Exchange, is an open-source format designed to represent machine learning models. It acts as an intermediate representation that enables interoperability between different machine learning frameworks. The primary purpose of ONNX is to allow developers to train a model in one framework (e.g., PyTorch), convert it to the ONNX format, and then deploy it for inference using a different framework or runtime (e.g., ONNX Runtime, which supports various hardware and operating systems). This solves the problem of framework lock-in and facilitates a more flexible and efficient machine learning workflow. The benefits of using ONNX include greater flexibility in choosing training and deployment tools, improved hardware acceleration through optimized runtimes, and simplified cross-platform deployment across diverse environments, from cloud servers to edge devices.
2.2 The ONNX Ecosystem and Components
The ONNX ecosystem is built around its open standard and supporting tools:
The ONNX Format, identified by the `.onnx` file extension, is a graph-based representation of a computation graph. It defines a common set of operators (e.g., convolution, ReLU, matrix multiplication) and a standard data type system. This standardized representation ensures that a model exported to ONNX can be understood and executed by any compatible ONNX runtime, regardless of the original training framework.
ONNX Runtime is a high-performance inference engine for ONNX models. It is designed to maximize performance across a wide range of hardware, including CPUs, GPUs, and specialized AI accelerators, and supports various operating systems like Windows, Linux, macOS, Android, and iOS. ONNX Runtime achieves this by providing optimized execution providers that leverage the underlying hardware capabilities. It offers APIs for multiple languages, including Python, C++, C#, Java, and JavaScript.
ONNX Converters and Exporters are typically integrated directly into popular machine learning frameworks. For example, PyTorch has a built-in `torch.onnx.export` function, and TensorFlow provides tools to convert models to ONNX. These tools are responsible for translating the model's architecture and weights from the native framework's representation into the standardized ONNX graph format.
2.3 How to Use ONNX: A Step-by-Step Guide
Using ONNX involves exporting a model from its native framework and then performing inference with ONNX Runtime.
Step 1: Train or Obtain a Model
Similar to Core ML, the process begins with a trained machine learning model. This could be a model developed in PyTorch, TensorFlow, Keras, or other frameworks. For our running example, we will use a pre-trained PyTorch MobileNetV2 model.
Step 2: Export the Model to ONNX Format
The next crucial step is to convert the trained model into the ONNX format. This is typically done using the export functionalities provided by the training framework itself.
Let's demonstrate exporting a pre-trained PyTorch MobileNetV2 model to the ONNX format. A dummy input is required to trace the model's computational graph during export.
Code Snippet 2.3.2: Exporting a PyTorch MobileNetV2 Model to ONNX
import torch
import torchvision.models
# 1. Load a pre-trained PyTorch MobileNetV2 model.
# The `pretrained=True` argument downloads the weights trained on ImageNet.
print("Loading PyTorch MobileNetV2 model...")
model = torchvision.models.mobilenet_v2(pretrained=True)
model.eval() # Set the model to evaluation mode; crucial for inference.
print("PyTorch MobileNetV2 model loaded successfully.")
# 2. Create a dummy input tensor.
# This tensor is used to trace the model's computational graph during export.
# The shape (batch_size, channels, height, width) must match the model's
# expected input dimensions. MobileNetV2 expects 224x224 RGB images.
dummy_input = torch.randn(1, 3, 224, 224, requires_grad=True)
print(f"Dummy input tensor created with shape: {dummy_input.shape}")
# 3. Export the PyTorch model to ONNX format.
# - `model`: The PyTorch model instance to export.
# - `dummy_input`: The input tensor used for tracing.
# - `"mobilenet_v2.onnx"`: The name of the output ONNX file.
# - `export_params=True`: Stores the trained parameter weights inside the ONNX file.
# - `opset_version=11`: Specifies the ONNX operator set version to use.
# Version 11 is a commonly supported and stable version.
# - `do_constant_folding=True`: Performs constant folding optimization during export.
# - `input_names=['input']`: Assigns a name to the input tensor in the ONNX graph.
# - `output_names=['output']`: Assigns a name to the output tensor in the ONNX graph.
# - `dynamic_axes`: Allows for variable input dimensions, e.g., batch size.
print("Exporting PyTorch model to ONNX format...")
torch.onnx.export(
model,
dummy_input,
"mobilenet_v2.onnx",
export_params=True,
opset_version=11,
do_constant_folding=True,
input_names=['input'],
output_names=['output'],
dynamic_axes={'input' : {0 : 'batch_size'},
'output' : {0 : 'batch_size'}}
)
print("Model exported to mobilenet_v2.onnx successfully.")
This script will generate a `mobilenet_v2.onnx` file, which is a portable representation of the model.
Step 3: Perform Inference with ONNX Runtime
Once the model is in ONNX format, it can be loaded and executed using ONNX Runtime. This involves creating an inference session, preparing input data, and running the model to get predictions.
Let's look at a Python snippet for performing inference with the exported ONNX MobileNetV2 model. This example includes basic image preprocessing.
Code Snippet 2.3.3: Performing Inference with ONNX Runtime in Python
import onnxruntime as ort
import numpy as np
from PIL import Image
import torchvision.transforms as transforms
import os # For checking file existence
# 1. Define the path to the ONNX model.
onnx_model_path = "mobilenet_v2.onnx"
# Ensure the ONNX model file exists.
if not os.path.exists(onnx_model_path):
print(f"Error: ONNX model not found at '{onnx_model_path}'. Please run the export script first.")
exit()
# 2. Load the ONNX model and create an inference session.
# 'providers' specifies the execution backend. 'CPUExecutionProvider' is
# universally available. Other options include 'CUDAExecutionProvider' for GPUs.
print(f"Loading ONNX model from '{onnx_model_path}'...")
session = ort.InferenceSession(onnx_model_path, providers=['CPUExecutionProvider'])
print("ONNX model loaded successfully.")
# 3. Get input and output names from the session.
# These names are used to feed data into and retrieve data from the model.
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
print(f"Model input name: '{input_name}', output name: '{output_name}'")
# 4. Prepare a sample image for inference.
# For demonstration, we'll create a dummy image. In a real application,
# you would load and preprocess an actual image.
print("Preparing dummy image for inference...")
# Create a 224x224 red image for demonstration purposes
dummy_image = Image.new('RGB', (224, 224), color = 'red')
# Define preprocessing steps to match MobileNetV2's expected input.
# This includes resizing, cropping, converting to tensor, and normalization.
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(), # Converts PIL Image to PyTorch Tensor (C, H, W) and scales to [0.0, 1.0]
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(dummy_image).unsqueeze(0).numpy() # Add batch dimension (1, C, H, W)
print(f"Preprocessed input tensor shape: {input_tensor.shape}")
# 5. Run inference using the ONNX Runtime session.
# The `run` method takes a list of output names and a dictionary of input feeds.
print("Running inference...")
outputs = session.run([output_name], {input_name: input_tensor})
print("Inference complete.")
# 6. Process the output.
# The output is a list of NumPy arrays, one for each requested output.
# For MobileNetV2, the output is a 1000-element vector of class probabilities.
predicted_class_id = np.argmax(outputs[0]).item()
print(f"Predicted class ID: {predicted_class_id}")
# In a real application, you would map this ID to a human-readable label.
# For example, using ImageNet class labels.
Figure 2.3.4: ONNX Workflow Diagram
This diagram illustrates the typical flow of using ONNX, emphasizing its role as an interchange format.
+---------------------+ +---------------------+ +---------------------+
| ML Training | | ONNX Export | | ONNX Runtime |
| (PyTorch, | | (Framework-specific| | (Python, C++, etc.)|
| TensorFlow) | | exporter) | | |
+----------+----------+ +----------+----------+ +----------+----------+
| | |
| Train/Obtain Model | Export Model | Load .onnx
| | to .onnx |
v v v
+---------------------+ +---------------------+ +---------------------+
| Trained Model |----->| .onnx File |----->| Inference Engine |
| (e.g., PyTorch | | (ONNX Format) | | (Optimized for |
| state_dict) | | | | various hardware) |
+---------------------+ +---------------------+ +----------+----------+
|
| Perform Inference
v
+---------------------+
| Application |
| Prediction Results |
+---------------------+
Section 3: Core ML vs. ONNX: A Comparative Analysis
While both Core ML and ONNX facilitate the deployment of machine learning models, they serve distinct purposes and operate within different ecosystems. Understanding their differences is crucial for selecting the right tool for a given task.
3.1 Core Purpose and Philosophy
Core ML is fundamentally an Apple-centric framework. Its core purpose is to provide a highly optimized and integrated solution for running machine learning models directly on Apple devices. The philosophy behind Core ML is to offer a seamless developer experience within the Apple ecosystem, leveraging Apple's hardware and software optimizations to deliver the best possible on-device AI performance and user experience for iOS, macOS, watchOS, and tvOS applications.
ONNX, on the other hand, is built on the principle of interoperability and open standards. Its core purpose is to serve as a universal interchange format for machine learning models, allowing models to be moved effortlessly between different training frameworks and deployed across a wide array of hardware and operating systems. The philosophy of ONNX is to break down framework silos, providing flexibility and choice for developers and MLOps teams who need to deploy models in diverse, heterogeneous environments.
3.2 Ecosystem and Integration
Core ML is deeply integrated into Apple's software and hardware ecosystem. It works hand-in-hand with other Apple frameworks like Vision for computer vision tasks, Natural Language for text processing, and Sound Analysis for audio processing. This tight integration allows developers to build sophisticated AI-powered features with minimal effort, taking full advantage of Apple's Neural Engine for accelerated computation. The development workflow is streamlined within Xcode, providing a cohesive experience for Apple platform developers.
ONNX boasts broad support across the wider machine learning community. It integrates with major ML frameworks such as PyTorch, TensorFlow, Keras, scikit-learn, and MXNet through dedicated exporters. The ONNX Runtime, its primary inference engine, is designed for cross-platform deployment, supporting Windows, Linux, macOS, Android, and iOS. This broad compatibility makes ONNX an excellent choice for scenarios requiring deployment to a variety of environments, from cloud servers to diverse edge devices, without being tied to a specific vendor's ecosystem.
3.3 Model Conversion and Representation
Core ML uses its proprietary `.mlmodel` format, which is specifically designed and optimized for execution on Apple hardware. The `coremltools` package handles the conversion from various training frameworks to this format. During conversion, `coremltools` can apply Apple-specific optimizations, such as quantization (reducing model precision for smaller size and faster inference) and layer fusion (combining multiple operations into a single, more efficient one). The `.mlmodel` file is a compiled representation tailored for the Core ML runtime.
ONNX utilizes the `.onnx` format, which is an open, graph-based representation. It defines a standard set of operators and a common data type system, acting as an intermediate representation that describes the model's computation graph. This format is not tied to any specific hardware or runtime, making it highly portable. Models are exported to ONNX directly from their training frameworks (e.g., `torch.onnx.export` in PyTorch). While ONNX also supports optimizations, these are generally more generic and aim for broad compatibility rather than specific hardware targeting.
3.4 Performance and Optimization
Core ML is engineered to extract maximum performance from Apple's silicon. It automatically leverages the CPU, GPU, and the dedicated Neural Engine (if present) on Apple devices, intelligently distributing workloads for optimal speed and energy efficiency. This deep hardware integration often results in superior performance for Core ML models on Apple platforms compared to generic cross-platform solutions. Developers have some control over compute units via `MLModelConfiguration`, but much of the optimization is handled automatically by the framework.
ONNX, through ONNX Runtime, provides highly optimized execution across various platforms. ONNX Runtime uses "execution providers" (e.g., CPU, CUDA, TensorRT, OpenVINO) that are specifically tuned for different hardware accelerators. While it can achieve excellent performance, the level of optimization and hardware acceleration depends heavily on the chosen execution provider and its availability for the target hardware. For instance, using the CUDA execution provider on an NVIDIA GPU will yield high performance, but a generic CPU provider might not match Core ML's Neural Engine performance on an Apple device.
3.5 Use Cases and Target Audience
Core ML is primarily targeted at developers building applications for Apple's ecosystem (iOS, macOS, watchOS, tvOS) who need to integrate machine learning capabilities directly into their apps. Typical use cases include real-time image recognition, natural language processing, style transfer, and recommendation engines that benefit from on-device processing, privacy, and offline functionality. It is the go-to choice for developers whose primary deployment target is Apple hardware.
ONNX caters to a broader audience, including machine learning researchers, MLOps engineers, and developers who need to deploy models across a heterogeneous mix of platforms and hardware. Its primary use cases involve model portability, enabling a "train once, deploy anywhere" philosophy. This is particularly valuable for cloud deployments, cross-platform desktop applications, and edge devices that might not be part of the Apple ecosystem. It is ideal when flexibility in deployment targets and interchangeability between frameworks are paramount.
3.6 Interoperability with Each Other
A common question arises regarding the interoperability between Core ML and ONNX. Can models be converted from one format to the other?
Yes, it is possible to convert models from ONNX format to Core ML format. The `coremltools` package supports ONNX as an input format, meaning you can export a model to ONNX from any framework and then use `coremltools` to convert that `.onnx` file into a `.mlmodel` file for deployment on Apple devices. This provides a flexible pipeline where models can be trained in diverse environments, standardized with ONNX, and then specifically optimized for Apple hardware via Core ML.
However, the reverse conversion – from Core ML's `.mlmodel` format to ONNX – is generally not directly supported or practical. The `.mlmodel` format is a runtime-specific, often compiled representation tailored for Apple's ecosystem. It is not designed as an interchange format like ONNX. Therefore, if you have a model only in `.mlmodel` format and need to deploy it outside the Apple ecosystem, you would typically need to revert to the original training framework's model or re-implement it, rather than converting the `.mlmodel` directly to ONNX.
Conclusion
Both Apple's Core ML and ONNX are powerful tools in the machine learning deployment landscape, each with its unique strengths and target applications. Core ML excels in providing a highly optimized, integrated, and privacy-focused solution for deploying AI models exclusively within the Apple ecosystem. It offers unparalleled performance on Apple hardware, thanks to deep integration with the Neural Engine and other system frameworks, making it the ideal choice for iOS, macOS, watchOS, and tvOS app developers.
ONNX, conversely, stands out as an open and flexible standard for model interchange. It addresses the critical need for interoperability across diverse machine learning frameworks and deployment platforms. By providing a common model representation and a high-performance runtime, ONNX empowers developers to train models in their preferred framework and deploy them efficiently across a wide array of hardware and operating systems, from cloud servers to various edge devices.
Ultimately, these two technologies are not mutually exclusive but often complementary. Developers might train a model in PyTorch, export it to ONNX for broad deployment, and then, for their Apple-specific applications, convert the ONNX model to Core ML using `coremltools` to leverage Apple's unique hardware optimizations. The choice between them, or the decision to use both in a pipeline, depends on the specific deployment targets, performance requirements, and ecosystem considerations of your machine learning project.
Addendum: Full Running Example
This addendum provides complete, runnable code examples for both Core ML and ONNX, building upon the MobileNetV2 image classification model.
A.1 Core ML Example: Image Classification on iOS
This example includes a Python script to convert a Keras MobileNetV2 model to Core ML, and then Swift code for an iOS application that uses this model to classify an image.
A.1.1 Python Script: Convert Keras MobileNetV2 to MobileNetV2.mlmodel
This script converts the Keras MobileNetV2 model into the Core ML format. You need to run this script first on your development machine (e.g., a Mac) where Python and `coremltools` are installed.
Prerequisites:
* Python 3.7+
* `pip install tensorflow coremltools`
File: `convert_mobilenet_to_coreml.py`
import coremltools as ct
import tensorflow as tf
import os
def convert_keras_mobilenet_to_coreml():
"""
Loads a pre-trained Keras MobileNetV2 model, converts it to Core ML format,
and saves it as 'MobileNetV2.mlmodel'.
"""
print("Starting Core ML model conversion process...")
# 1. Load a pre-trained Keras MobileNetV2 model.
# This model is pre-trained on the ImageNet dataset.
try:
model = tf.keras.applications.MobileNetV2(weights='imagenet')
print("Keras MobileNetV2 model loaded successfully.")
except Exception as e:
print(f"Error loading Keras MobileNetV2 model: {e}")
print("Please ensure you have an internet connection to download weights.")
return
# 2. Define input and output specifications for the Core ML model.
# MobileNetV2 expects images of size 224x224 with 3 color channels.
# The pixel values should be normalized to the range [-1, 1].
# Core ML Tools handles this normalization with `bias` and `scale` parameters.
input_name = 'image'
output_name = model.output_names[0] # Typically 'predictions' for Keras models
# Get ImageNet class labels for the classifier configuration.
# This allows the Core ML model to output human-readable labels directly.
# We create a dummy input to get the output shape and then decode predictions.
dummy_output = model.predict(tf.zeros((1, 224, 224, 3)))
class_labels_raw = tf.keras.applications.mobilenet_v2.decode_predictions(
dummy_output, top=len(dummy_output[0])
)[0]
class_labels_list = [label for _, label, _ in class_labels_raw]
print(f"Loaded {len(class_labels_list)} ImageNet class labels.")
# Create a classifier configuration. This tells Core ML that the model
# performs classification and specifies the output layer for class probabilities.
classifier_config = ct.ClassifierConfig(output_name, class_labels=class_labels_list)
# 3. Convert the Keras model to Core ML format.
# - `model`: The Keras model instance.
# - `inputs`: A list of input specifications. For an image model,
# `ct.ImageInput` is used.
# - `shape`: The expected input image shape (batch_size, height, width, channels).
# - `bias`: Values to subtract from each channel (R, G, B).
# - `scale`: Value to divide each channel by after bias subtraction.
# For MobileNetV2, pixels [0, 255] are mapped to [-1, 1].
# (pixel - 127.5) / 127.5 => bias=-127.5, scale=1/127.5.
# However, `coremltools` applies bias *before* scaling.
# So, to achieve (pixel / 127.5) - 1, we use bias=[-1,-1,-1] and scale=1/127.5
# which results in (pixel * (1/127.5)) - 1. This is equivalent to
# (pixel - 127.5) / 127.5.
# - `channel_first=False`: Keras uses channel_last (HWC) format.
# - `name`: The name of the input feature in the Core ML model.
# - `classifier_config`: The classification configuration.
# - `convert_to="mlprogram"`: Specifies the target Core ML format.
# "mlprogram" is the newer, more flexible format.
print("Converting Keras model to Core ML format...")
try:
coreml_model = ct.convert(
model,
inputs=[ct.ImageInput(shape=(1, 224, 224, 3),
bias=[-1,-1,-1],
scale=1/127.5,
channel_first=False,
name=input_name)],
classifier_config=classifier_config,
convert_to="mlprogram"
)
print("Model conversion complete.")
except Exception as e:
print(f"Error during Core ML conversion: {e}")
return
# 4. Save the Core ML model.
output_filename = 'MobileNetV2.mlmodel'
try:
coreml_model.save(output_filename)
print(f"Core ML model saved successfully as '{output_filename}'")
except Exception as e:
print(f"Error saving Core ML model: {e}")
if __name__ == "__main__":
convert_keras_mobilenet_to_coreml()
A.1.2 Swift Code: iOS App for Image Classification
This Swift code is for an iOS application that uses the `MobileNetV2.mlmodel` to classify an image. Create a new iOS project in Xcode (e.g., "CoreMLImageClassifier"), drag the generated `MobileNetV2.mlmodel` file into the project navigator, and replace the `ViewController.swift` content with the following. Also, add a sample image (e.g., "sample_image.jpg") to your Assets.xcassets or directly to the project.
File: `ViewController.swift`
import UIKit
import CoreML
import Vision
class ViewController: UIViewController {
// MARK: - UI Elements
// UIImageView to display the selected or sample image
private let imageView: UIImageView = {
let imageView = UIImageView()
imageView.contentMode = .scaleAspectFit
imageView.translatesAutoresizingMaskIntoConstraints = false
imageView.backgroundColor = .lightGray
imageView.layer.cornerRadius = 8
imageView.clipsToBounds = true
return imageView
}()
// UILabel to display the classification result
private let resultLabel: UILabel = {
let label = UILabel()
label.textAlignment = .center
label.font = UIFont.systemFont(ofSize: 18, weight: .medium)
label.numberOfLines = 0
label.text = "Tap 'Classify Sample' or 'Select Image'"
label.translatesAutoresizingMaskIntoConstraints = false
return label
}()
// UIButton to trigger classification of a bundled sample image
private let classifySampleButton: UIButton = {
let button = UIButton(type: .system)
button.setTitle("Classify Sample Image", for: .normal)
button.titleLabel?.font = UIFont.boldSystemFont(ofSize: 18)
button.translatesAutoresizingMaskIntoConstraints = false
button.addTarget(self, action: #selector(classifySampleImage), for: .touchUpInside)
return button
}()
// UIButton to allow user to select an image from photo library
private let selectImageButton: UIButton = {
let button = UIButton(type: .system)
button.setTitle("Select Image from Library", for: .normal)
button.titleLabel?.font = UIFont.boldSystemFont(ofSize: 18)
button.translatesAutoresizingMaskIntoConstraints = false
button.addTarget(self, action: #selector(selectImage), for: .touchUpInside)
return button
}()
// MARK: - View Lifecycle
override func viewDidLoad() {
super.viewDidLoad()
setupUI()
}
// MARK: - UI Setup
private func setupUI() {
view.backgroundColor = .white
title = "Core ML Image Classifier"
// Add UI elements to the view
view.addSubview(imageView)
view.addSubview(resultLabel)
view.addSubview(classifySampleButton)
view.addSubview(selectImageButton)
// Setup Auto Layout constraints
NSLayoutConstraint.activate([
imageView.topAnchor.constraint(equalTo: view.safeAreaLayoutGuide.topAnchor, constant: 20),
imageView.leadingAnchor.constraint(equalTo: view.leadingAnchor, constant: 20),
imageView.trailingAnchor.constraint(equalTo: view.trailingAnchor, constant: -20),
imageView.heightAnchor.constraint(equalTo: imageView.widthAnchor), // Maintain aspect ratio
resultLabel.topAnchor.constraint(equalTo: imageView.bottomAnchor, constant: 20),
resultLabel.leadingAnchor.constraint(equalTo: view.leadingAnchor, constant: 20),
resultLabel.trailingAnchor.constraint(equalTo: view.trailingAnchor, constant: -20),
classifySampleButton.topAnchor.constraint(equalTo: resultLabel.bottomAnchor, constant: 30),
classifySampleButton.centerXAnchor.constraint(equalTo: view.centerXAnchor),
classifySampleButton.heightAnchor.constraint(equalToConstant: 44),
selectImageButton.topAnchor.constraint(equalTo: classifySampleButton.bottomAnchor, constant: 15),
selectImageButton.centerXAnchor.constraint(equalTo: view.centerXAnchor),
selectImageButton.heightAnchor.constraint(equalToConstant: 44),
selectImageButton.bottomAnchor.constraint(lessThanOrEqualTo: view.safeAreaLayoutGuide.bottomAnchor, constant: -20)
])
}
// MARK: - Image Selection
@objc private func selectImage() {
let picker = UIImagePickerController()
picker.delegate = self
picker.sourceType = .photoLibrary
present(picker, animated: true)
}
// MARK: - Classification Logic
@objc private func classifySampleImage() {
// Load a sample image bundled with the app.
// Make sure "sample_image.jpg" is added to your Xcode project.
guard let sampleImage = UIImage(named: "sample_image") else {
resultLabel.text = "Error: 'sample_image.jpg' not found in assets."
print("Error: sample_image.jpg not found.")
return
}
imageView.image = sampleImage
performImageClassification(image: sampleImage)
}
private func performImageClassification(image: UIImage) {
resultLabel.text = "Classifying image..."
// 1. Load the Core ML model.
// The `MobileNetV2` class is automatically generated by Xcode from the .mlmodel file.
// MLModelConfiguration allows for specifying compute units (CPU, GPU, Neural Engine).
guard let coreMLModel = try? MobileNetV2(configuration: MLModelConfiguration()) else {
resultLabel.text = "Failed to load Core ML model."
print("Error: Failed to load Core ML model.")
return
}
// 2. Create a Vision model from the Core ML model.
// Vision framework provides convenience for image processing and model execution.
guard let visionModel = try? VNCoreMLModel(for: coreMLModel.model) else {
resultLabel.text = "Failed to create VNCoreMLModel."
print("Error: Failed to create VNCoreMLModel.")
return
}
// 3. Create a Vision request for classification.
// This request will execute the Core ML model and return classification results.
let request = VNCoreMLRequest(model: visionModel) { [weak self] request, error in
DispatchQueue.main.async {
// Handle potential errors during the request.
if let error = error {
self?.resultLabel.text = "Classification failed: \(error.localizedDescription)"
print("Vision request failed with error: \(error.localizedDescription)")
return
}
// Process the classification results.
// Results are typically an array of VNClassificationObservation objects.
guard let results = request.results as? [VNClassificationObservation],
let topResult = results.first else {
self?.resultLabel.text = "No classification results found."
print("No classification results found.")
return
}
// Display the top detected class and its confidence.
let confidence = String(format: "%.2f", topResult.confidence * 100)
self?.resultLabel.text = "Detected: \(topResult.identifier)\nConfidence: \(confidence)%"
print("Detected: \(topResult.identifier) with confidence \(confidence)%")
}
}
// 4. Convert the UIImage to CIImage for Vision processing.
// Vision framework primarily works with CIImage or CVPixelBuffer.
guard let ciImage = CIImage(image: image) else {
resultLabel.text = "Failed to convert UIImage to CIImage."
print("Error: Failed to convert UIImage to CIImage.")
return
}
// 5. Perform the Vision request using an image request handler.
// The handler executes the request on the provided image.
// The `orientation` parameter is important for correct image interpretation.
let handler = VNImageRequestHandler(ciImage: ciImage, orientation: image.cgImagePropertyOrientation)
DispatchQueue.global(qos: .userInitiated).async {
do {
try handler.perform([request])
} catch {
DispatchQueue.main.async {
self.resultLabel.text = "Failed to perform Vision request: \(error.localizedDescription)"
print("Error: Failed to perform Vision request: \(error.localizedDescription)")
}
}
}
}
}
// MARK: - UIImagePickerControllerDelegate & UINavigationControllerDelegate
extension ViewController: UIImagePickerControllerDelegate, UINavigationControllerDelegate {
func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
picker.dismiss(animated: true)
// Get the selected image from the info dictionary
guard let selectedImage = info[.originalImage] as? UIImage else {
resultLabel.text = "Error: Could not retrieve image from library."
print("Error: Could not retrieve image from library.")
return
}
imageView.image = selectedImage
performImageClassification(image: selectedImage)
}
func imagePickerControllerDidCancel(_ picker: UIImagePickerController) {
picker.dismiss(animated: true)
}
}
// MARK: - UIImage Extension for Orientation Handling
extension UIImage {
// Helper to get the correct CGImagePropertyOrientation from UIImage.
// This is crucial for Vision requests to process images correctly.
var cgImagePropertyOrientation: CGImagePropertyOrientation {
switch imageOrientation {
case .up: return .up
case .upMirrored: return .upMirrored
case .down: return .down
case .downMirrored: return .downMirrored
case .left: return .left
case .leftMirrored: return .leftMirrored
case .right: return .right
case .rightMirrored: return .rightMirrored
@unknown default: return .up
}
}
}
```
To make the "Select Image from Library" button work, you need to add a privacy description to your app's `Info.plist` file. Right-click `Info.plist` in Xcode, select "Open As" -> "Source Code", and add the following keys:
```xml
<key>NSPhotoLibraryUsageDescription</key>
<string>This app needs access to your photo library to select images for classification.</string>
```
Then, build and run the app on an iOS device or simulator.
A.2 ONNX Example: Model Export and Inference in Python
This example includes a Python script to export a PyTorch MobileNetV2 model to ONNX and another Python script to perform inference using the exported ONNX model with ONNX Runtime.
A.2.1 Python Script: Export PyTorch MobileNetV2 to ONNX
This script converts a pre-trained PyTorch MobileNetV2 model into the ONNX format.
Prerequisites:
* Python 3.7+
* `pip install torch torchvision`
File: `export_mobilenet_to_onnx.py`
import torch
import torchvision.models
import os
def export_pytorch_mobilenet_to_onnx():
"""
Loads a pre-trained PyTorch MobileNetV2 model, exports it to ONNX format,
and saves it as 'mobilenet_v2.onnx'.
"""
print("Starting ONNX model export process...")
# 1. Load a pre-trained PyTorch MobileNetV2 model.
# `pretrained=True` downloads the weights trained on ImageNet.
try:
model = torchvision.models.mobilenet_v2(pretrained=True)
model.eval() # Set the model to evaluation mode; crucial for inference.
print("PyTorch MobileNetV2 model loaded successfully.")
except Exception as e:
print(f"Error loading PyTorch MobileNetV2 model: {e}")
print("Please ensure you have an internet connection to download weights.")
return
# 2. Create a dummy input tensor.
# This tensor is used to trace the model's computational graph during export.
# The shape (batch_size, channels, height, width) must match the model's
# expected input dimensions. MobileNetV2 expects 224x224 RGB images.
# `requires_grad=True` is often used during training but can be set to False
# for export if gradients are not needed for tracing.
dummy_input = torch.randn(1, 3, 224, 224, requires_grad=False)
print(f"Dummy input tensor created with shape: {dummy_input.shape}")
# 3. Export the PyTorch model to ONNX format.
# - `model`: The PyTorch model instance to export.
# - `dummy_input`: The input tensor used for tracing.
# - `"mobilenet_v2.onnx"`: The name of the output ONNX file.
# - `export_params=True`: Stores the trained parameter weights inside the ONNX file.
# - `opset_version=11`: Specifies the ONNX operator set version to use.
# Version 11 is a commonly supported and stable version.
# - `do_constant_folding=True`: Performs constant folding optimization during export.
# - `input_names=['input']`: Assigns a name to the input tensor in the ONNX graph.
# - `output_names=['output']`: Assigns a name to the output tensor in the ONNX graph.
# - `dynamic_axes`: Allows for variable input dimensions, e.g., batch size.
# This makes the ONNX model more flexible for different batch sizes.
output_filename = "mobilenet_v2.onnx"
print(f"Exporting PyTorch model to ONNX format as '{output_filename}'...")
try:
torch.onnx.export(
model,
dummy_input,
output_filename,
export_params=True,
opset_version=11,
do_constant_folding=True,
input_names=['input'],
output_names=['output'],
dynamic_axes={'input' : {0 : 'batch_size'},
'output' : {0 : 'batch_size'}}
)
print(f"Model exported to '{output_filename}' successfully.")
except Exception as e:
print(f"Error during ONNX export: {e}")
if __name__ == "__main__":
export_pytorch_mobilenet_to_onnx()
A.2.2 Python Script: Perform Inference with ONNX Runtime
This script loads the `mobilenet_v2.onnx` model and performs inference on a sample image using ONNX Runtime.
Prerequisites:
* Python 3.7+
* `pip install onnxruntime numpy Pillow torchvision`
* Ensure `mobilenet_v2.onnx` is in the same directory, generated by the export script.
* Optionally, place a `sample_image.jpg` in the same directory for realistic testing.
File: `run_onnx_inference.py`
import onnxruntime as ort
import numpy as np
from PIL import Image
import torchvision.transforms as transforms
import os
import json # To load ImageNet class labels
def get_imagenet_labels(label_file="imagenet_class_index.json"):
"""
Loads ImageNet class labels from a JSON file.
If the file doesn't exist, it attempts to download a common one.
"""
if not os.path.exists(label_file):
print(f"Label file '{label_file}' not found. Attempting to download...")
import urllib.request
url = "https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json"
try:
urllib.request.urlretrieve(url, label_file)
print("Label file downloaded successfully.")
except Exception as e:
print(f"Error downloading label file: {e}")
return None
try:
with open(label_file) as f:
labels = json.load(f)
# Labels are typically stored as { "index": ["ID", "class_name"] }
# We want a list ordered by index.
sorted_labels = [labels[str(i)][1] for i in range(len(labels))]
return sorted_labels
except Exception as e:
print(f"Error loading or parsing label file: {e}")
return None
def preprocess_image(image_path="sample_image.jpg"):
"""
Loads an image from the given path, preprocesses it to match MobileNetV2's
input requirements, and returns it as a NumPy array.
If image_path doesn't exist, a dummy red image is created.
"""
if os.path.exists(image_path):
print(f"Loading image from '{image_path}'...")
image = Image.open(image_path).convert('RGB')
else:
print(f"Image '{image_path}' not found. Creating a dummy red image for inference.")
image = Image.new('RGB', (224, 224), color = 'red') # Create a 224x224 red image
# Define preprocessing steps to match MobileNetV2's expected input:
# 1. Resize the image to 256x256.
# 2. Center crop it to 224x224.
# 3. Convert to a PyTorch Tensor (which also scales pixel values to [0.0, 1.0]).
# 4. Normalize with ImageNet's mean and standard deviation.
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(), # Converts PIL Image to PyTorch Tensor (C, H, W) and scales to [0.0, 1.0]
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
# Apply preprocessing and add a batch dimension (1, C, H, W).
input_tensor = preprocess(image).unsqueeze(0).numpy()
print(f"Preprocessed input tensor shape: {input_tensor.shape}")
return input_tensor
def run_onnx_inference():
"""
Performs inference on an ONNX MobileNetV2 model using ONNX Runtime.
"""
onnx_model_path = "mobilenet_v2.onnx"
# Ensure the ONNX model file exists.
if not os.path.exists(onnx_model_path):
print(f"Error: ONNX model not found at '{onnx_model_path}'.")
print("Please run 'export_mobilenet_to_onnx.py' first to generate the model.")
return
# 1. Load the ONNX model and create an inference session.
# 'providers' specifies the execution backend. 'CPUExecutionProvider' is
# universally available. 'CUDAExecutionProvider' can be used for NVIDIA GPUs.
print(f"Loading ONNX model from '{onnx_model_path}'...")
try:
session = ort.InferenceSession(onnx_model_path, providers=['CPUExecutionProvider'])
print("ONNX model loaded successfully.")
except Exception as e:
print(f"Error loading ONNX model or creating session: {e}")
return
# 2. Get input and output names from the session.
# These names are used to feed data into and retrieve data from the model.
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
print(f"Model input name: '{input_name}', output name: '{output_name}'")
# 3. Prepare an image for inference.
# You can replace "sample_image.jpg" with the path to your own image.
input_tensor = preprocess_image(image_path="sample_image.jpg")
# 4. Run inference using the ONNX Runtime session.
# The `run` method takes a list of output names to fetch and a dictionary
# of input feeds, mapping input names to NumPy arrays.
print("Running inference...")
try:
outputs = session.run([output_name], {input_name: input_tensor})
print("Inference complete.")
except Exception as e:
print(f"Error during ONNX inference: {e}")
return
# 5. Process the output.
# The output is a list of NumPy arrays, one for each requested output.
# For MobileNetV2, the output is a 1000-element vector of class probabilities.
probabilities = outputs[0][0] # Get the first (and only) batch's probabilities
predicted_class_id = np.argmax(probabilities).item()
confidence = np.max(probabilities).item()
print(f"Predicted class ID: {predicted_class_id}")
print(f"Confidence: {confidence:.4f}")
# 6. Map the predicted class ID to a human-readable label.
imagenet_labels = get_imagenet_labels()
if imagenet_labels and predicted_class_id < len(imagenet_labels):
predicted_label = imagenet_labels[predicted_class_id]
print(f"Predicted label: {predicted_label}")
else:
print("Could not retrieve ImageNet labels or ID out of bounds.")
if __name__ == "__main__":
run_onnx_inference()
No comments:
Post a Comment