Data science secrets in finance and media | Top Stories

On Aug 8, 2018

Aug 31, 2018

Autoplay Videos Are Not Going Away | Tips & Tricks |…

Aug 31, 2018

Video: Data science and machine learning in marketing, media, and finance

Data is the foundation of AI and thus crucially important to many of the most important techniques and trends in computing today. From marketing to accounting to photography and many other domains, data is the lifeblood of advanced software. At the same time, data science remains hidden as a black box in many companies.

For this reason, I invited two world-class data experts to take part in episode 294 of the CXOTalk series of conversations with the world’s top innovators.

Also: Data.world: The importance of linking data and people

Matt Marolda is the chief analytics officer of Legendary Entertainment, a major Hollywood movie and game studio. Matt comes from the world of professional sports and Moneyball and now applies those data techniques to media and entertainment, with large-scale movies and television shows.

Anthony Scriffignano is the chief data scientist at the major financial industry data company, Dun & Bradstreet. He handles innovation around advanced data science topics at D&B and works with regulators around the world on these issues. He started his career doing physics for cranes, construction cranes, offshore oil rigs, and nuclear power plants.

This CXOTalk episode offers an unusual glimpse deep inside the data science from two highly articulate and expert practitioners.

Watch the entire conversation in the video embedded above, or read the complete transcript. You can also review the edited summary comments below.

Matt, what kinds of problems do you work on in media and entertainment?

Matthew Marolda: We live in this unusual place where we have these very large, binary outcomes, meaning we have a movie that we’re going to release, say Godzilla or Kong, movies of that kind of scale. There’s only really one world we can live in, which is the world where that movie is released, which means we can’t run tests. We can’t do a lot of things that a lot of people in data science would like to be able to do where you have controls.

We can do that within the campaign and within very small windows, but it’s very hard to, over long periods of time, iterate and adjust. We’re in this situation where we have to work to thread the needle and learn as much as we can as quickly as we can in these also ambiguous environments where the correlation to the data we have isn’t perfect to the outcome. We don’t have these direct correlations. We have to operate in these ambiguous environments that force us to look at all different kinds of data and pull it from lots of different places.

We’re very audience driven. Meaning, we need to understand audiences and people at a very specific level.

Also: IoT boom will change how data is analysed

That starts all the way at the beginning. Is there an audience for this movie or TV show? Does that audience have enough scale to support the budget we might have for it? Those [are] the kinds of questions.

We then want to understand what the audience likes and how they might respond to different elements or aspects of the movie.

Then, ultimately, when you get close to marketing, this is where it kind of escalates. We want to understand; how do we reach that audience? How do we persuade them? What creative materials, meaning the trailers or the ads or the TV spots we could show them, how are they going to impact and affect their ability to at least have a desire to watch the movie?

We’re just trying to dial it up. We’re just trying to shift the odds to make it more likely, although we can’t guarantee an outcome, we’re working on that. It’s all very much at the individual level.

Anthony, describe the kinds of business problems you look at with data?

Anthony Scriffignano: The types of problems that I’m working on are very similar, believe it or not, to the types of problems that Matt just described, but in a very different way. If you think about our customers, they’re trying to solve a problem that’s somewhere in the category of either total risk or total opportunity. What’s the white space? What could I possibly do if I penetrated this market? If I went into this country, can you help me find more companies that look like my best customers or don’t look like my best customers?

Then, on the risk side, are they going to pay me? Are they fraudulent? Are they going to go out of business? Those are the problem spaces.

But, I have the same edge of the possible that Matt just described. The unstructured data, the data we’ve never seen before. Everyone is really good at what’s called supervised learning right now, looking at structured, longitudinal data that’s been around for a long time and building, basically, regressive relationships and then saying, “Here’s what I think is going to happen,” assuming the future looks something like this past set of data that you’ve trained on.

The problem is, the future doesn’t look like that set of data. The future is ambiguous. The data in the future has never been seen before. Now, recently, some of it you can’t use because of those different regulations, so you have to unlearn things.

Also: How to build a data science team

The problems of understanding things we’ve never looked at before in ways that are changing while we’re looking at them are the same. This tale of two cities that we’re telling, it’s the same set of problems. It’s just a different use case at the end.

[Laughter] There is something that we work on that I call a Black Cat Problem where you’re looking for something that may not be there in a place that’s inherently hard to look. In our case, think about fraud, or think about maybe some other type of bad behavior, malfeasance. If you try to model your way out of finding things like that by looking at all the previous bad stuff, the best bad guys, when they know they’re being watched, they change their behavior, so you’ll model how the best ones are no longer behaving.