This is how to protect your machine-learning applications

On Apr 29, 2020

Modern machine learning (ML) has become an important tool in a very short time. We’re using ML models across our organisations, either rolling our own in R and Python, using tools like TensorFlow to learn and explore our data, or building on cloudcontainer-hosted services like Azure’s Cognitive Services. It’s a technology that helps predict maintenance schedules, spots fraud and damaged parts, and parses our speech, responding in a flexible way.

The models that drive our ML applications are incredibly complex, training neural networks on large data sets. But there’s a big problem: they’re hard to explain or understand. Why does a model parse a red blob with white text as a stop sign and not a soft drink advert? It’s that complexity which hides the underlying risks that are baked into our models, and the possible attacks that can severely disrupt the business processes and services we’re building using those very models.

Understanding threats

It’s easy to imagine an attack on a self-driving car that could make it ignore stop signs, simply by changing a few details on the sign, or a facial recognition system that would detect a pixelated bandanna as Brad Pitt. These adversarial attacks take advantage of the ML models, guiding them to respond in a way that’s not how they’re intended to operate, distorting the input data by changing the physical inputs.

Microsoft is thinking a lot about how to protect machine learning systems. They’re key to its future — from tools being built into Office, to its Azure cloud-scale services, and managing its own and your networks, even delivering security services through ML-powered tools like Azure Sentinel. With so much investment riding on its machine-learning services, it’s no wonder that many of Microsoft’s presentations at the RSA security conference focused on understanding the security issues with ML and on how to protect machine-learning systems.

Protecting machine learning

Attacks on machine-learning systems need access to the models used, so you need to keep your models private. That goes for small models that might be helping run your production lines as much as the massive models that drive the likes of Google, Bing and Facebook. If I get access to your model, I can work out how to affect it, either looking for the right data to feed it that will poison the results, or finding a way past the model to get the results I want.

Much of this work has been published in a paper in conjunction with the Berkman Klein Center, on failure modes in machine learning. As the paper points out, a lot of work has been done in finding ways to attack machine learning, but not much on how to defend it. We need to build a credible set of defences around machine learning’s neural networks, in much the same way as we protect our physical and virtual network infrastructures.

Attacks on ML systems are failures of the underlying models. They are responding in unexpected, and possibly detrimental ways. We need to understand what the failure modes of machine-learning systems are, and then understand how we can respond to those failures. The paper talks about two failure modes: intentional failures, where an attacker deliberately subverts a system, and unintentional failures, where there’s an unsafe element in the ML model being used that appears correct but delivers bad outcomes.

By understanding the failure modes we can build threat models and apply them to our ML-based applications and services, and then respond to those threats and defend our new applications.

Intentional failures: How to attack ML

The paper suggests 11 different attack classifications, many of which get around our standard defence models. It’s possible to compromise a machine-learning system without needing access to the underlying software and hardware, so standard authorisation techniques can’t protect ML-based systems and we need to consider alternative approaches.

What are these attacks? The first, perturbation attacks, modify queries to change the response to one the attackers desire. That’s matched by poisoning attacks, which achieve the same result by contaminating the training data. Machine-learning models often include important intellectual property, and some attacks like model inversion aim to extract that data. Similarly, a membership inference attack will try to determine whether specific data was in the initial training set. Closely related is the concept of model stealing, using queries to extract the model.

Other attacks include reprogramming the system around the ML model, so that either results or inputs are changed. Closely related are adversarial attacks that change physical objects, adding duct tape to signs to confuse navigation or using specially printed bandanas to disrupt facial-recognition systems. Some attacks depend on the provider: a malicious provider can extract training data from customer systems. They can add backdoors to systems, or compromise models as they’re downloaded.

While many of these attacks are new and targeted specifically at machine-learning systems, they are still computer systems and applications, and are vulnerable to existing exploits and techniques, allowing attackers to use familiar approaches to disrupt ML applications.

Building safe machine learning

It’s a long list of attack types, but understanding what’s possible allows us to think about the threats our applications face. More importantly they provide an opportunity to think about defences and how we protect machine-learning systems: building better, more secure training sets, locking down ML platforms, and controlling access to inputs and outputs, working with trusted applications and services.

Attacks are not the only risk: we must be aware of unintended failures — problems that come from the algorithms we use or from how we’ve designed and tested our ML systems. We need to understand how reinforcement learning systems behave, how systems respond in different environments, if there are natural adversarial effects, or how changing inputs can change results.

If we’re to defend machine-learning applications, we need to ensure that they have been tested as fully as possible, in as many conditions as possible. The apocryphal stories of early machine-learning systems that identified trees instead of tanks, because all the training images were of tanks under trees, are a sign that these aren’t new problems, and that we need to be careful about how we train, test, and deploy machine learning.

We can only defend against intentional attacks if we know that we’ve protected ourselves and our systems from mistakes we’ve made. The old adage “test, test, and test again” is key to building secure and safe machine learning — even when we’re using pre-built models and service APIs.