Identifying my daughters toys using AI - Part 4, using the models offline on Android

In the first part of this series I used the Azure Custom Vision service to create an image classifier to allow me to easily identify my daughters cuddly toys. Once created I tested it by uploading an image and seeing what tags the classifier found for the image.

In the second part I accessed this model from a Xamarin app, so that I could use the camera to take a photo to run through the classifier using a NuGet package that talks to the Custom Vision service.

In the third part I showed how to download this model for iOS and run it locally, on device, using CoreML.

In this part we’re going to switch OS and run these models on Android. Like with iOS, Android offers some APIs to run AI models on device, taking advantage of the hardware. Unlike iOS this is not an API that is baked into the OS, instead it is a separate library you can add to your project and run. The upside of this is it has more OS support, working all the way back to API 21.

The library in question is called TensorFlow, and is actually an open source AI library that came from Google that can run on pretty much any platform and is accessible from most languages. When running on device you can think of it as similar to CoreML on iOS, a generic way of running all sorts of machine learning models including the ones generated by the custom vision service.

Running models with TensorFlow

For mobile apps we can use the Android bindings for the TensorFlow library. Like with CoreML these bindings contain a full API for running all sorts of models, as well as an easy API surface that we can use to do image classification - similar to the CoreML Vision APIs.

To download the TensorFlow model, head to the Performance tab in the Custom Vision portal, selected the latest iteration from the list on the left (we’ll cover iterations in a future post), and click the Export link at the top. Select Android (TensorFlow) then click Download. This will download a zip file containing 2 files:

model.pb - This is the actual TensorFlow model
labels.txt - The labels for the tags

Once these files have been downloaded you will need to add them to the Assets folder in your Android app.

Importing the Android TensorFlow bindings

The TensorFlow Android bindings consist of a native library with a Jar binding. To use these from a Xamarin app we need to add Xamarin bindings, and this has been done for us already by Larry O’Brien from Xamarin and is available on GitHub here: https://github.com/lobrien/TensorFlow.Xamarin.Android. This binding is not available on NuGet yet, so you will need to clone this repo and compile it yourself. Once you’ve compiled it you will need to add the resulting TensorFlowXamain.dll to your Android app.

Creating the model

The TensorFlow binding includes a class called TensorFlowInferenceInterface which can be used to easily run image classification models.

using Org.Tensorflow.Contrib.Android;
...
var assets = Android.App.Application.Context.Assets;
var inferenceInterface = new TensorFlowInferenceInterface(assets, "model.pb");

This will then load the model.pb model file from the asset catalog into a model.

Once we have the model we need to feed it some data, run it, then extract and interpret the output.

Feeding the model

Feed me Seymour

Just like with CoreML the model doesn’t understand images as such, instead it needs binary data in the same format - a 227x227 sized array of 32-bit ARGB values. Again this is sort of pretty easy to create from an Android Bitmap, we just need to convert the bitmap to one of the right size and color space. Once it’s in the right color space we need to average out the colors - models work better when the average of all the color values is 0 (neural networks work better when the average of all inputs is 0, thanks to the ever awesome Frank Krueger for teaching me this). Different domains are trained in different ways, so will need different adjustments.

You can see some Java sample code for this at https://github.com/Azure-Samples/cognitive-services-android-customvision-sample, with details on the adjustments you need to make to the image bytes detailed in the ReadMe.

Luckily I’ve done the hard work converting it to Xamarin for you and it’s on GitHub here.

Once you have the binary data, you pass it to the TensorFlow inference interface as named data, called “Placeholder”. This name is required by the models exported from the Custom Vision service:

inferenceInterface.Feed("Placeholder", floatValues, 1, InputSize, InputSize, 3);

The additional parameters provide details on the float buffer - I honestly don’t know what the “1” parameter is for, the next three are the size of the buffer (in our case 227 for both) and the number of floats per pixel in the buffer - 3 for R, G and B.

Running the model

Once the model has been fed, it needs to be run for a list of outputs - models can produce multiple outputs so we need to run it for all the outputs we need. In our case the only output we need is called “loss”:

inferenceInterface.Run(new[] { "loss" });

Getting the outputs

Once the model has run, we can extract the output that we are interested in - in our case the “loss” output. This comes back as an array of floats containing one entry per tag, with the values representing the probability of the image matching that tag as a value between 0 and 1, 1 being 100% probability. We have to pre-create this array before passing it in - so how do we know how big it is, and how do we know which tag value is which?

The answer comes from the labels.txt file that was downloaded along with the model.pb file. This file contains a list of tags, one per line. Add this file to your apps assets, then load it using:

var assets = Android.App.Application.Context.Assets;
using (var sr = new StreamReader(assets.Open("labels.txt")))
{
  var content = sr.ReadToEnd();
  var labels = content.Split('\n')
                      .Select(s => s.Trim())
                      .Where(s => !string.IsNullOrEmpty(s))
                      .ToList();
}

This will give you a list of labels - the file contains an empty line on the end by default, so remember to trim whitespace and remove any empty lines. You can then create a float array of the same size and put the output in there:

var outputs = new float[labels.Count];
inferenceInterface.Fetch("loss", outputs);

The float values map index for index with the label - so if labels[0] was foo and labels[1] was bar, outputs[0] would be the probability of the image being foo, and outputs[1] would be the probability of the image being bar.

TensorFlow is only supported on API 21 and above - so don’t forget to set your minimum supported version to 21 in your application.manifest

You can read more on exporting and using models here. In the next post in this series we’ll look at the plugin NuGet package I’ve created to make it easy to use these models from a cross platform app.