AI spots critical Microsoft security bugs 97% of the time

On Apr 18, 2020

Microsoft claims to have developed a system that correctly distinguishes between security and non-security software bugs 99% of the time, and that accurately identifies critical, high-priority security bugs on average 97% of the time. In the coming months, it plans to open-source the methodology on GitHub, along with example models and other resources.

The work suggests that such a system, which was trained on a data set of 13 million work items and bugs from 47,000 developers at Microsoft stored across AzureDevOps and GitHub repositories, could be used to support human experts. Coralogix estimates that developers create 70 bugs per 1,000 lines of code and that fixing a bug takes 30 times longer than writing a line of code; in the U.S., $113 billion is spent annually on identifying and fixing product defects.

In the course of architecting the model, Microsoft says that security experts approved the training data and that statistical sampling was used to provide those experts a manageable amount of data to review. The data was then encoded into representations called feature vectors, and Microsoft researchers set about designing the system using a two-step process. First, the model learned to classify security and non-security bugs, and then it learned to apply severity labels — critical, important, or low-impact — to the security bugs.

Microsoft’s model leverages two techniques to make its bug predictions. The first is a term frequency-inverse document frequency algorithm (TF-IDF), an information retrieval approach that assigns importance to a word based on the number of times it appears in a document and checks how relevant the word is throughout a collection of titles. (Microsoft says that its bug titles are generally very short, containing around 10 words.) The second technique a logistic regression model uses a logistic function to model the probability of a certain class or event existing.

Microsoft says that the model is deployed in production internally, and that it is continually retrained with data approved by security experts who monitor the number of bugs generated in software development.