Archive for April, 2016

11

Apr
2016

Operational Machine Learning for Developers

ML London for Dev

Machine learning (ML) is the unsung hero that powers many applications, systems, sensors, devices, and products. Machine learning is so pervasive that we can often assume its presence in most of the applications and systems without having to specifically call it out.

In simple terms, machine learning is a computer’s ability to learn from data, and it is one of the most useful tools we have to develop intelligent systems and applications. Machine learning is used widely today for all kinds of tasks, from churn prediction in large companies, to web search, to medical diagnostics, to robotics. It’s hard to find a field that cannot benefit from machine learning in one way or another.

Machine learning’s intuitive, versatile and robust approach to finding patterns in the available data makes it a priceless asset for anyone who wants to turn data into insights and predictions. What’s more, today it is more accessible than ever before, thanks to the variety of open source tools and programming languages.

What developers actually need to know about Machine Learning

Something is wrong in the way ML is being taught to developers.

Most ML teachers like to explain how different learning algorithms work and spend tons of time on that. For a beginner who wants to start using ML, being able to choose an algorithm and set parameters looks like the #1 barrier to entry, and knowing how the different techniques work seems to be a key requirement to remove that barrier. Many practitioners argue however that you only need one technique to get started: random forests. Other techniques may sometimes outperform them, but in general, random forests are the most likely to perform best on a variety of problems (see Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?), which makes them more than enough for a developer just getting started with ML.

We would further argue that you don’t need to know all the inner workings of (random forest) learning algorithms (and the simpler decision tree learning algorithms that they use). A high-level understanding of the algorithms, the intuitions behind them, their main parameters, their possibilities and limitations is enough. You’ll know enough to start practicing and experimenting with ML, as there are great open source ML libraries (such as scikit-learn in Python) and cloud platforms that make it super easy to create predictive models from data.

So, if we just give an overview of only one technique, what else can we teach?

Dataiku sticker (seen at PAPIs). Their Data Science Studio makes it easy to experiment with and deploy ML models.

Deploying ML models into production

It turns out that, when using ML in real-world applications, most of the work takes place before and after the learning. ML instructors rarely provide an end-to-end view of what it takes to use ML in a predictive application that’s deployed in production. They just explain one part of the problem, then they assume you’ll figure out the rest and you’ll connect the dots on your own. Like for instance, connecting the dots between the ML libraries you were taught to use in Python, R, or Matlab, and your application in production which is developed in Ruby, Swift, C++, etc.

Fortunately, today there are new and accessible solutions to this “last-mile problem”. They revolve around the use of REST (http) APIs. Models need to be exposed as APIs, and if scaling the number of predictions performed by a given model can be an issue, these APIs would be served on multiple endpoints with load balancers in front. Platforms-as-a service can help for that—here is some info about Microsoft Azure ML’s scaling capabilities, Amazon ML’s, and Yhat’s Analytics Load Balancer (which you can also run on your own private infrastructure/cloud). Some of these platforms allow you to use whatever ML library you want, others restrict you to their own proprietary ones. In our upcoming workshop, We’ve chosen to use Azure to deploy models created with scikit-learn into APIs, and also to demonstrate how Amazon and BigML provide an even higher level of abstraction (while still providing accurate models) that can make them easier to work with in many cases.

ML workflow

Deployment is not the only post-learning challenge there is in real-world ML. You should also find appropriate ways to evaluate and monitor your models’ performance/impact, before and after deployment.

ML workflow by Azure

The ML workflow diagram above also presents some of the steps to take before learning a model, which are about preparing the right dataset for the algorithms to run on. Before actually running any algorithm you need to…

– Define the right ML problem to tackle for your organization

– Engineer features, i.e. find ways to represent the objects on which you’ll be making predictions with ML

– Figure out when/how often you’ll need to make predictions, and how much time you’ll have for that (is there a way to do predictions in batches or do you absolutely need all your predictions to be real-time?)

– Collect data

– Prepare the actual dataset to run learning algorithms on, i.e. extract features from the “raw” collected data and clean it

– Figure out when/how often you’ll need to learn new/updated models, and how much time you’ll have for that.

Operational Machine Learning Workshop for Developers

MLLondon_Workshop Promo 2 EB

We are excited to officially announce today that, in collaboration with PAPIS, we are launching a new learning track called PAPIs Workshops. In partnership with leading education and industry organizations we will offer practical and industry-focused learning programs in various locations around the world (starting with Madrid, London and Boston).

Most Machine Learning courses are given from the perspective of a Data Scientist and focus on the techniques and algorithms that allow to learn from data. This workshop takes the perspective of an application developer and instead provides an end-to-end view of ML integration into your applications. We’ll go all the way from data preparation to the integration of predictive models in your domain and their deployment in production.

Our first workshop is aimed at developers and is an agnostic introduction to operational Machine Learning with open source and cloud platforms. It’s a 2-day hands-on workshop given in a classroom setting. Day 1 covers an intro to ML, the creation, operationalization, and evaluation of predictive models. Day 2 features model selection, ensembles, data preparation, a practical overview of advanced topics such as unsupervised learning and deep learning, and methodology for developing your own ML use case.

We’re using Python with libraries such as Pandas, scikit-learn, SKLL, and cloud platforms such as Microsoft Azure ML, Amazon ML, BigML and Indico. I think these platforms are great for many organizations and real-world use cases, but even if for some reason you’d realize they may not be the perfect fit for you, I’d still recommend using them for learning and practicing ML. ML-as-a-Service makes it much quicker to setup work environments (e.g. Azure ML has most popular libraries preinstalled and can run interactive Jupyter notebooks, which you can access from your browser) but also to experiment with ML with the higher levels of abstraction they provide (e.g. combining one-click clustering, anomaly detection, and classification models with BigML, or quickly featurizing text and images with Indico’s Deep Learning API).

For more information on how to attend, participate or become a sponsor, please visit http://www.papis.io/workshops/operational-machine-learning

Special Offer – 30% Discount!

Please take a moment to register now and avail the special 30% discount offered. Visit the event page and register before 22nd May to get 30% off. If you have any questions about the workshop or registration please feel free to contact us at hello@persontyle.com.

Happy Machine Learning!

Dr. Louis Dorard and Ali Syed

read more