In the first part of this series I used the Azure Custom Vision service to create an image classifier to allow me to easily identify my daughters cuddly toys. Once created I tested it by uploading an image and seeing what tags the classifier found for the image.
In the second part I accessed this model from a Xamarin app, so that I could use the camera to take a photo to run through the classifier using a NuGet package that talks to the Custom Vision service.
In the third part I showed how to download this model for iOS and run it locally, on device, using CoreML.
In this part we’re going to switch OS and run these models on Android. Like with iOS, Android offers some APIs to run AI models on device, taking advantage of the hardware. Unlike iOS this is not an API that is baked into the OS, instead it is a separate library you can add to your project and run. The upside of this is it has more OS support, working all the way back to API 21.
The library in question is called TensorFlow, and is actually an open source AI library that came from Google that can run on pretty much any platform and is accessible from most languages. When running on device you can think of it as similar to CoreML on iOS, a generic way of running all sorts of machine learning models including the ones generated by the custom vision service.
Running models with TensorFlow
For mobile apps we can use the Android bindings for the TensorFlow library. Like with CoreML these bindings contain a full API for running all sorts of models, as well as an easy API surface that we can use to do image classification - similar to the CoreML Vision APIs.
To download the TensorFlow model, head to the Performance tab in the Custom Vision portal, selected the latest iteration from the list on the left (we’ll cover iterations in a future post), and click the Export link at the top. Select Android (TensorFlow) then click Download. This will download a zip file containing 2 files:
model.pb
- This is the actual TensorFlow modellabels.txt
- The labels for the tags
Once these files have been downloaded you will need to add them to the Assets
folder in your Android app.
Importing the Android TensorFlow bindings
The TensorFlow Android bindings consist of a native library with a Jar binding. To use these from a Xamarin app we need to add Xamarin bindings, and this has been done for us already by Larry O’Brien from Xamarin and is available on GitHub here: https://github.com/lobrien/TensorFlow.Xamarin.Android. This binding is not available on NuGet yet, so you will need to clone this repo and compile it yourself. Once you’ve compiled it you will need to add the resulting TensorFlowXamain.dll
to your Android app.
Creating the model
The TensorFlow binding includes a class called TensorFlowInferenceInterface
which can be used to easily run image classification models.
using Org.Tensorflow.Contrib.Android;
...
var assets = Android.App.Application.Context.Assets;
var inferenceInterface = new TensorFlowInferenceInterface(assets, "model.pb");
This will then load the model.pb
model file from the asset catalog into a model.
Once we have the model we need to feed it some data, run it, then extract and interpret the output.
Feeding the model
Just like with CoreML the model doesn’t understand images as such, instead it needs binary data in the same format - a 227x227 sized array of 32-bit ARGB values. Again this is sort of pretty easy to create from an Android Bitmap
, we just need to convert the bitmap to one of the right size and color space. Once it’s in the right color space we need to average out the colors - models work better when the average of all the color values is 0 (neural networks work better when the average of all inputs is 0, thanks to the ever awesome Frank Krueger for teaching me this). Different domains are trained in different ways, so will need different adjustments.
You can see some Java sample code for this at https://github.com/Azure-Samples/cognitive-services-android-customvision-sample, with details on the adjustments you need to make to the image bytes detailed in the ReadMe.
Luckily I’ve done the hard work converting it to Xamarin for you and it’s on GitHub here.
Once you have the binary data, you pass it to the TensorFlow inference interface as named data, called “Placeholder”. This name is required by the models exported from the Custom Vision service:
inferenceInterface.Feed("Placeholder", floatValues, 1, InputSize, InputSize, 3);
The additional parameters provide details on the float buffer - I honestly don’t know what the “1” parameter is for, the next three are the size of the buffer (in our case 227 for both) and the number of floats per pixel in the buffer - 3 for R, G and B.
Running the model
Once the model has been fed, it needs to be run for a list of outputs - models can produce multiple outputs so we need to run it for all the outputs we need. In our case the only output we need is called “loss”:
inferenceInterface.Run(new[] { "loss" });
Getting the outputs
Once the model has run, we can extract the output that we are interested in - in our case the “loss” output. This comes back as an array of floats containing one entry per tag, with the values representing the probability of the image matching that tag as a value between 0 and 1, 1 being 100% probability. We have to pre-create this array before passing it in - so how do we know how big it is, and how do we know which tag value is which?
The answer comes from the labels.txt
file that was downloaded along with the model.pb
file. This file contains a list of tags, one per line. Add this file to your apps assets, then load it using:
var assets = Android.App.Application.Context.Assets;
using (var sr = new StreamReader(assets.Open("labels.txt")))
{
var content = sr.ReadToEnd();
var labels = content.Split('\n')
.Select(s => s.Trim())
.Where(s => !string.IsNullOrEmpty(s))
.ToList();
}
This will give you a list of labels - the file contains an empty line on the end by default, so remember to trim whitespace and remove any empty lines. You can then create a float array of the same size and put the output in there:
var outputs = new float[labels.Count];
inferenceInterface.Fetch("loss", outputs);
The float values map index for index with the label - so if labels[0]
was foo
and labels[1]
was bar
, outputs[0]
would be the probability of the image being foo
, and outputs[1]
would be the probability of the image being bar
.
TensorFlow is only supported on API 21 and above - so don’t forget to set your minimum supported version to 21 in your
application.manifest
You can read more on exporting and using models here. In the next post in this series we’ll look at the plugin NuGet package I’ve created to make it easy to use these models from a cross platform app.