Describing a photo in a mobile app using Azure Computer Vision

I recently gave an introduction to Xamarin talk at Imperial College, London and wanted to build a cool app to show off what you can do on mobile using the awesome Cognitive Services available on Azure. I only had about 30-40 minutes to not only introduce Xamarin, but build an app so I decided to throw together a simple app to take a photo and describe it using the Azure Computer Vision service.

It really is simple to set up and use this service. Head to the Computer Vision cognitive services site, click the big Try the Computer Vision API button. Log in with an appropriate provider, and get an API key, noting the region the key is for.

From inside your Xamarin app, install the pre-release Microsoft.Azure.CognitiveServices.Vision.ComputerVision NuGet package into all the projects. Then install the Xam.Plugin.Media NuGet package and follow the instructions in the readme.txt that is auto-opened to enable permissions and other gumpf.

Add some code to take a photo using the media plugin:

var opts = new Plugin.Media.Abstractions.StoreCameraMediaOptions();
var photo = await Plugin.Media.CrossMedia.Current.TakePhotoAsync(opts);

Next, set up the Computer Vision API by creating an ApiKeyServiceClientCredentials with your API key, then constructing an instance of ComputerVisionAPI using these credentials, not forgetting to set the region.

    var creds = new ApiKeyServiceClientCredentials("<your key here>");
    var visionApi = new ComputerVisionAPI(creds)
    {
        AzureRegion = AzureRegions.Westeurope
    };

Finally, get a stream containing the image and pass it to the computer vision API.

var desc = await _visionApi.DescribeImageInStreamAsync(photo.GetStream());

You can then access a description for the image using the Captions property on the ImageDescription that is returned. You can also get a list of tags for the image using the Tags property. The image below shows my app using this to caption an image.

The app running showing an image and its description

You can find the code for this on my GitHub, and you can read more on the computer vision service in the docs.