How big data is reshaping the world of forensics

Tune in to the average TV cop show and you’re likely to think crime scene forensics is a solved problem. Fingerprints, bullets, and fragments of broken glass can be unambiguously traced back to the source, then used by clever investigators to prove who committed a crime beyond a reasonable doubt.

And indeed, according to a landmark 2009 National Academic of Sciences report on forensic science, such techniques have done a great deal to help capture criminals and ensure innocent people go free.

“Over the last two decades, advances in some forensic science disciplines, especially the use of DNA technology, have demonstrated that the forensic sciences have great additional potential to help law enforcement identify criminals,” according to the report. “Many crimes that may have gone unsolved are now being solved because the forensic sciences are helping to identify the perpetrators.”

Yet not all forensic disciplines are created equal, according to the committee of scientific and legal experts that produced the report. Some techniques suffered from “a notable dearth of peer-reviewed, published studies,” the report found, many hadn’t been thoroughly tested to understand the likelihood that two samples of evidence could be falsely found to match, and the justice system relied heavily on the opinions of individual experts, not all of whom have the same level of training or accuracy.

“That report 10 years ago was the critical turning point and the mechanism that generated interest at a national level to initiate the kind of foundational research that needed to be done in forensic science,” says Sarah Chu, senior advisor on forensic science policy at the Innocence Project.

In the roughly 10 years since that report was released, scholars, forensic practitioners and the legal system have worked to put the forensic methods that frequently help decide guilt or innocence on a sounder scientific footing. In some cases, they’ve worked to standardize techniques among people working in particular fields, and to gather data that can help evaluate how likely two pieces of evidence are to coincidentally match.

How to balance human skill with automated analysis

Like other fields from medicine to advertising that are increasingly incorporating statistical analysis and big digital data sets, the forensics world has increasingly begun to consider how to balance human skill with automated analysis. And with humans not likely to leave the field anytime soon, crime labs have instituted statistical training for investigators and, in some cases, begun slipping test cases in with real-world evidence for technicians to analyze, looking to spot problems.

“We’ve got known positive and known negative that we send through the system, and we see what happens,” says Peter Stout, president and CEO of the Houston Forensic Science Center, an independent public authority that took over forensic work away from the Houston Police Department in 2014 after widespread reports of problems within HPD labs. “They enter in before the analyst would have means of understanding that it’s a control.”

Other reforms since the new forensic center took over range from setting higher standards for clear labeling for police evidence submissions—with unclearly attributed samples turned away—to rolling out new lab management software based on Microsoft’s Azure Government Cloud tools.

“Most labs are dealing with stuff in a couple of servers in a closet,” says Stout. “I simply can’t do as much to secure those as Microsoft can do with all the stuff they have for security.”

Around the country, the way forensic experts speak about their work in court has also changed in recent years. They’ve moved away from vague language like saying samples matched “to a reasonable scientific certainty” to spelling out details about the degree of certainty involved, says Matt Redle, former prosecuting attorney of Sheridan County, Wyoming.

“Science doesn’t deal with certainty—it deals with measurement of uncertainty,” says Redle, a former chair of the American Bar Association Criminal Justice Section. “It’s a huge change in terms of the transparency that people are calling for and the field is trying to develop so that jurors can be informed of what the measurement of uncertainty of a different test is.”

“Good science is not cheap”

Figuring out what that level of uncertainty might be isn’t always easy, says Clifford Spiegelman, distinguished professor of statistics at Texas A&M University. Experts often need to figure out how prevalent particular observations, like a level of gunshot residue or a particular type of paint, are in the world at large, which can mean extensive—and costly—data collection. Tracking types of auto paint that might show up on cars or suspects after collisions or auto-related crimes would mean regular surveys of new and aftermarket paints, he says.

“Good science is not cheap—even for simple things like collecting representative paint samples,” he says, estimating it could cost “millions and millions of dollars to go collect random paint samples from junk yards and body shops and all over.”

One area where it’s possible for analysts to speak with confidence about the likelihood of false matches is DNA evidence, experts say.

“You do in DNA get quantitative evidence,” says Hal Stern, chancellor’s professor in the Department of Statistics at the University of California, Irvine. “We can determine if the two samples, the suspect and the crime scene, have the same DNA profile and we can estimate the probability that another person would have that profile.”

Scientists have made progress in other areas of forensics as well, especially when it comes to fingerprints. While head of the latent print branch at the Army Criminal Investigation Laboratory, Henry Swofford worked on developing a software tool called FRStat for testing the strength of fingerprint evidence. The system doesn’t replace human analysts—instead, they use their usual methods for comparing two prints and document the areas where the two appear similar. Then, the software says how likely that similarity is assuming that the prints are from the same source, and how likely it would be if the prints came from two different people, based on existing databases of fingerprints. When Swofford left the Army lab in September, the software was in use in about 35 other labs around the country, and the Defense Department had tentative plans to open-source the tool, he says.

“We don’t want to totally shake up the method that is known and found to be comfortable by the thousands of fingerprint practitioners around the country, around the world,” he says. “We’re gonna trust what you believe but we’re going to verify it in a way with the quantitative analysis as a last step.”

Similar efforts are underway to build digital tools for analyzing other types of evidence, including bullets and footwear, Swofford says.

Some forensic disciplines have fallen from favor altogether amid scrutiny: Back in 2005, the FBI announced it would stop using bullet lead analysis, a technique that attempted to determine if bullets came from the same source based on trace element concentrations within them. Another National Academy of Sciences report had called into question how useful such evidence was, since boxes of ammunition could contain non-matching bullets while very similar bullets could end up in different batches.

“You’re not going be able to say these bullets came from the same box,” says Karen Kafadar, who served on the NAS committee studying the issue and is now chair of the University of Virginia Department of Statistics.

Other techniques, including blood-spatter analysis, shoe print comparisons, and bite mark analysis have also come under serious scrutiny in recent years. Beginning in 2013, inspired in part by the NAS work, the Justice Department and the National Institute for Standards and Technology launched a National Commission on Forensic Science, designed to “strengthen and enhance the practice of forensic science.” In 2017, then-Attorney General Jeff Sessions declined to keep the commission in place as it reached the end of its second two-year term, but other scientific organizations have continued to support advances and greater rigor in forensics.

At the start of this year, the American Statistical Association issued its own recommendations on how forensic analysts should utilize statistics—and be clear about any limitations of their methods.

“A comprehensive report by the forensic scientist should report the limitations and uncertainty associated with measurements, and the inferences that could be drawn from them,” according to the ASA report.

Another NIST-linked group, called the Organization of Scientific Area Committees (OSAC) for Forensic Science, continues to operate and is beginning to roll out voluntary scientific standards for forensic operations, with a dozen published so far and more than 200 in the works. So far, the Houston Forensic Science Center and state labs in Kentucky and Georgia have said they’ll adopt its standards.