Building Better Models with Open Data
Learn how we designed, built, and deployed the @WhereML Twitter bot that can identify where in the world a picture was taken using only the pixels in the image. We'll dive deep on artificial intelligence and deep learning with the MXNet framework and also talk about working with the Twitter Account Activity API. The bot is entirely autoscaling and powered by Amazon API Gateway and AWS Lambda which means, as a customer, you don't manage any infrastructure. The bot uses the Multimedia Commons dataset as the training set.
In 2017 I read a paper about geolocating images using just the pixels in the image, that means no EXIF GPS data is used in the inference and no external hints. I fell in love with the idea and started to pursue it eventually turning it into a Twitter bot. This work is based on prior work on PlaNet and LocationNet. In this talk we'll cover a few key topics:
- Open Datasets on AWS
- Effective Machine Learning Pipelines with SageMaker's Python SDK
- Apache MXNet
- Twitter API
- AWS Lambda and Webhooks
First I'll outline how we used the Multimedia Commons dataset to build and train a ResNet/LocationNet based ML model using Apache MXNet. We also use Google's S2 Spherical Geometry Library to create a sparse index of cells over the earth. I'll briefly cover how to support continuous training with new images. From there we'll launch into a discussion of how to run a model in production (from jupyter notebook to containerized app). Finally we'll close with a discussion on webhooks and the Twitter API.