How Rossum is using deep learning to extract data from any document

The equivalent of more than 100 human lifetimes are spent globally each day on entry from invoices alone, according to Rossum. And so the Czech AI startup is using to help companies ditch manual entry, freeing up humans to focus on more complex or creative tasks instead.

The problem that Rossum is looking to fix is this: an estimated 550 billion invoices are exchanged annually each year, coming in all shapes, sizes, and formats. Extracting key information from these invoices has traditionally been a labor-intensive manual process, but automated tools have increasingly entered the fray. Unlike many traditional optical character recognition (OCR) data extraction tools, however, Rossum is using “cognitive data capture,” which involves pre-training machines to understand documents similar to how a human does.

OCR tools rely on different sets of rules and templates covering every type of invoice it may come across, which can be a slow and time-consuming training process given that a company may need to create hundreds of new templates and rule-sets. Rossum, on the other hand, said that its cloud-based software requires minimal effort to set up, after which it can peruse a document like a human does irrespective of style or formatting and it doesn't rely on fully-structured data to extract the content companies need.

The company also claims that it can extract data six times faster than manual entry, while saving companies up to 80 percent in costs.

Rossum was founded out of Prague in early 2017 by former AI PhD students Tomas Gogar, Tomas Tunys, and Petr Baudis. In its three years so far, the startup has secured some big-name clients on every continent, including Siemens, Nvidia, IBM, Box, and Bloomberg, and today Rossum announced that it has raised $4.5 million since its inception. This includes $1 million in pre-seed funding to develop a minimal viable product between 2017 and 2018, in addition to a $3 million seed round which closed last month.

The funding round was co-led by U.K.-based seed investors LocalGlobe and Seedcamp, with participation from some notable angel investors including Flexport CEO and founder Ryan Petersen, and Elad Gil, who sold his startup Mixer Labs to Twitter in 2009 before becoming an investor in Airbnb, Instacart, Pinterest, Square, Stripe, among others.

“Invoice data management is a huge unsolved problem,” Gil noted. “Rossum's traction shows that the company is in a great position to solve this problem as well as tackle many other data entry tasks using its highly versatile platform.”

Lay of the land

Cognitive data capture isn't a new concept, and it's something that IBM has touted for a number of years as a means of extracting data from “never seen before” documents. There are a number of other organizations operating in this space too, including long-established companies such as Kofax and Abbyy, while newer entrants include the likes of VC-backed HyperScience and Ephesoft.

Rossum is continuing on a similar trajectory, though it touts its “cloud”- and “machine learning”-only approach as a major differentiator between it and many of the more-established platforms. Moreover, the company touts its friction-free signup, which includes a free trial period to demonstrate how it works.

“Staying in the cloud means that we don't have to take care of on-premises installations, and we have only one platform to take care of,” CEO Tomas Gogar told VentureBeat. “Therefore all the engineering and research resources we spend goes to all the clients equally.”

Gogar likens Rossum's data extraction approach to what Salesforce did two decades ago, vis-à-vis how it applied a software-as-a-service (SaaS) business model to customer relationship management (CRM).

“In 1999 Salesforce said, ‘we will do cloud-only CRM because it's how to build the best product',” Gogar said. “No one believed them, and now they are the best ones by far.”

Rossum's pre-trained AI engine can be tried and tested within a couple of minutes of integrating its REST API. And as with any self-respecting machine learning system, Rossum adapts as it learns from customers' data. Rossum claims an average accuracy rate of around 95%, and in situations where it can't identify the correct data fields, it asks a human operator for feedback which it improves from.

Rossum claims 30 full-time employees, plus a further 20 who work as “AI teachers” for Rossum's AI engine. And with a fresh $3.5 million in the bank, it said that it plans to expand globally this will include opening a new office in the U.S. and target its technology at more sectors.

Indeed, while Rossum's clients are mostly using the platform for processing invoices and similar documents such as delivery notes, it can be applied to many different kinds of documents across industries, including accounting, logistics, insurance, and real estate management.

“Technology should make data entry easier and cheaper but businesses have become too reliant on using old systems that no longer meet their needs,” Gogar noted. “Rossum solves these problems without complicated, clunky integrations, without teams of developers, and without high costs. Our solution is smart enough to be tailored to suit any type of business and it's scalable to work with even the largest of firms.”

In terms of costs, prices can vary greatly depending on the volume of documents that need processed and specific requirements, but Gogar said that subscriptions start at around $800 per month, and the platform is a “good fit” for companies that process more than 5,000 documents during that period.


It's worth noting the inherent synergies here between Rossum et al and the myriad robotic process automation (RPA) platforms that are out there. RPA, for the uninitiated, is software that companies install on machines to help businesses automate laborious, repetitive tasks it learns from human activity using computer vision and rule-based processes, and then copies them.

A lot of money is flying around the RPA realm at the moment, with Automation Anywhere recently raising $290 million at a $6.8 billion valuation and UiPath closing a whopping $568 million funding round at a $7 billion valuation. A slew of big-name backers have invested in both these companies, including Salesforce, Alphabet, SoftBank, Goldman Sachs, Sequoia, and Accel.

UiPath Studio for designing processes

So what is the difference between what Rossum is doing and RPA? According to Gogar, they are complementary services more than anything, as RPA is generally better suited to structured data. As such, Rossum is actually a technology partner of RPA companies including Blue Prism and UiPath (side-point: Rossum investor Seedcamp is also an investor in UiPath).

“Rossum is very often an important piece of any automation project and it works very well with major RPA platforms,” Gogar said. “RPA platforms are great at automating processes with structured data. Rossum is a gateway that converts unstructured data into a structured form. Therefore it allows automation in processes where it was not possible before.”


With countless headlines proclaiming that AI is here to steal human jobs, companies are naturally sensitive to public apprehension when they develop automated technologies that will impact employment. And that is why those same companies are increasingly looking to offset any criticism that may come their way, often preemptively explaining that they're not trying to replace humans, but more augment their jobs so they can do other more interesting tasks instead.

This was echoed in a recent report commissioned by IBM, which found that while AI and automation would likely change how every job is performed, it would ultimately lead to an increased demand for creative skills. And this is a sentiment that Rossum is very much in tune with.

“Rather than replacing employees, Rossum's aim is to speed up human operators, giving businesses more flexibility and reliability for their customers, and helping employees focus their attention on more complex tasks or tasks that require creativity,” the company said.

You might also like

Comments are closed.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. AcceptRead More