Automated machine learning or AutoML explained

On Aug 21, 2019

The two biggest barriers to the use of machine learning (both classical machine learning and deep learning) are skills and computing resources. You can solve the second problem by throwing money at it, either for the purchase of accelerated hardware (such as computers with high-end GPUs) or for the rental of compute resources in the cloud (such as instances with attached GPUs, TPUs, and FPGAs).

On the other hand, solving the skills problem is harder. Data scientists often command hefty salaries and may still be hard to recruit. Google was able to train many of its employees on its own TensorFlow framework, but most companies barely have people skilled enough to build machine learning and deep learning models themselves, much less teach others how.

What is AutoML?

Automated machine learning, or AutoML, aims to reduce or eliminate the need for skilled data scientists to build machine learning and deep learning models. Instead, an AutoML system allows you to provide the labeled training data as input and receive an optimized model as output.

There are several ways of going about this. One approach is for the software to simply train every kind of model on the data and pick the one that works best. A refinement of this would be for it to build one or more ensemble models that combine the other models, which sometimes (but not always) gives better results.

A second technique is to optimize the hyperparameters (explained below) of the best model or models to train an even better model. Feature engineering (also explained below) is a valuable addition to any model training. One way of de-skilling deep learning is to use transfer learning, essentially customizing a well-trained general model for specific data.

What is hyperparameter optimization?

All machine learning models have parameters, meaning the weights for each variable or feature in the model. These are usually determined by back-propagation of the errors, plus iteration under the control of an optimizer such as stochastic gradient descent.

Most machine learning models also have hyperparameters that are set outside of the training loop. These often include the learning rate, the dropout rate, and model-specific parameters such as the number of trees in a Random Forest.

Hyperparameter tuning or hyperparameter optimization (HPO) is an automatic way of sweeping or searching through one or more of the hyperparameters of a model to find the set that results in the best trained model. This can be time-consuming, since you need to train the model again (the inner loop) for each set of hyperparameter values in the sweep (the outer loop). If you train many models in parallel, you can reduce the time required at the expense of using more hardware.

What is feature engineering?

A feature is an individual measurable property or characteristic of a phenomenon being observed. The concept of a “feature” is related to that of an explanatory variable, which is used in statistical techniques such as linear regression. A feature vector combines all of the features for a single row into a numerical vector. Feature engineering is the process of finding the best set of variables and the best data encoding and normalization for input to the model training process.

Part of the art of choosing features is to pick a minimum set of independent variables that explain the problem. If two variables are highly correlated, either they need to be combined into a single feature, or one should be dropped. Sometimes people perform principal component analysis (PCA) to convert correlated variables into a set of linearly uncorrelated variables.

To use categorical data for machine classification, you need to encode the text labels into another form. There are two common encodings.

One is label encoding, which means that each text label value is replaced with a number. The other is one-hot encoding, which means that each text label value is turned into a column with a binary value (1 or 0). Most machine learning frameworks have functions that do the conversion for you. In general, one-hot encoding is preferred, as label encoding can sometimes confuse the machine learning algorithm into thinking that the encoded column is ordered.

To use numeric data for machine regression, you usually need to normalize the data. Otherwise, the numbers with larger ranges might tend to dominate the Euclidian distance between feature vectors, their effects could be magnified at the expense of the other fields, and the steepest descent optimization might have difficulty converging. There are a number of ways to normalize and standardize data for machine learning, including min-max normalization, mean normalization, standardization, and scaling to unit length. This process is often called feature scaling.

Some of the transformations that people use to construct new features or reduce the dimensionality of feature vectors are simple. For example, subtract Year of Birth from Year of Death and you construct Age at Death, which is a prime independent variable for lifetime and mortality analysis. In other cases, feature construction may not be so obvious.

What is transfer learning?

Transfer learning is sometimes called custom machine learning, and sometimes called AutoML (mostly by Google). Rather than starting from scratch when training models from your data, Google Cloud AutoML implements automatic deep transfer learning (meaning that it starts from an existing deep neural network trained on other data) and neural architecture search (meaning that it finds the right combination of extra network layers) for language pair translation, natural language classification, and image classification.

That’s a different process than what’s usually meant by AutoML, and it doesn’t cover as many use cases. On the other hand, if you need a customized deep learning model in a supported area, transfer learning will often produce a superior model.

AutoML implementations

There are many implementations of AutoML that you can try. Some are paid services, and some are free source code. The lists below are by no means complete or final.

AutoML services

All of the big three cloud services have some kind of AutoML. Amazon SageMaker does hyperparameter tuning but doesn’t automatically try multiple models or perform feature engineering. Azure Machine Learning has both AutoML, which sweeps through features and algorithms, and hyperparameter tuning, which you typically run on the best algorithm chosen by AutoML. Google Cloud AutoML, as I discussed earlier, is deep transfer learning for language pair translation, natural language classification, and image classification.

A number of smaller companies offer AutoML services as well. For example, DataRobot, which claims to have invented AutoML, has a strong reputation in the market. And while dotData has a tiny market share and a mediocre UI, it has strong feature engineering capabilities and covers many enterprise use cases. H2O.ai Driverless AI, which I reviewed in 2017, can help a data scientist turn out models like a Kaggle master, doing feature engineering, algorithm sweeps, and hyperparameter optimization in a unified way.

AutoML frameworks

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention. Auto-Keras is an open source software library for automated machine learning, developed at Texas A&M, that provides functions to automatically search for architecture and hyperparameters of deep learning models. NNI (Neural Network Intelligence) is a toolkit from Microsoft to help users design and tune machine learning models (e.g., hyperparameters), neural network architectures, or a complex system’s parameters in an efficient and automatic way.

You can find additional AutoML projects and a fairly complete and current list of papers about AutoML on GitHub.