10 Hot Data Science Companies with Berkeley Roots

Berkeley innovators are developing new and creative ways to capture, clean and analyze ever-increasing amounts of data. From identifying patterns in unstructured human writing to helping government agencies track inflation in real time, Cal founders are making it easier than ever to store, understand, predict and act on the information all around us.

CaptricityKuang Chen PhD ’11 landed on the idea for Captricity while conducting research for his PhD dissertation. Interested in developing data management systems to help low-resource organizations better serve their disadvantaged clients, Chen began developing algorithms to speed up the process of converting paper data into electronic records. From there, Captricity was born – a cloud service that extracts and integrates structured data from paper forms with 99.99%+ accuracy. Initially founded to serve public service organizations and rural health clinics, a $10 million funding round in mid-July has enabled the company to extend service to large enterprise and government clients – from state and federal agencies including the FDA to large companies including Dell, Symantec, True Value, and grubHub.

Cloudera - Founded by Mike Olson ’91, MS ’92, and led By CEO Tom Reilly ’85, Cloudera is one of the brightest stars in the big data landscape. Founded in 2008, Cloudera was the first to offer an enterprise-class data management infrastructure built on open-source Apache Hadoop. The company recently took in almost a billion dollars in investments from Intel, T. Rowe Price, and others, and regularly partners with other key players in the space like MongoDB, Trifacta, and Databricks.

Databricks - Born from a research project in Berkeley’s AMPLab, Databricks offers an interactive platform for developing and deploying Apache Spark applications on a Databricks-owned cloud. While the Databricks Platform makes it easier for users to launch, manage, and integrate clusters of data applications, the Databricks Workspace allows users to interactively query and visualize data. Led by CEO Ion Stoica, one of the Spark’s initial developers and a Professor of Computer Science at UC Berkeley, Databricks completed a $33 million Series B funding round and launched their product in late-June.

Infer - Vik Singh `07 (CEO), Chung Wu `05 and Yang Zhang `05 put their Berkeley degrees together to create Infer – a platform that helps companies improve their sales leads through predictive analytics. Customers include Cloudera, Box, Zendesk, AdRoll, and Optimizely. They have received over $35M in two series of funding from Redpoint Ventures, Andreessen Horowitz, The Social+Capital Partnership, Sutter Hill Ventures and Nexus Venture Partners.

MongoDB – CEO Max Schireson entered Berkeley as a 15-year-old. After five years as a math major and a two-year stint in a local record store, Max left Berkeley to launch a long and successful career in the tech industry. Today, he leads MongoDB, a document-model database built to accommodate the variety of data used by major web companies including FourSquare, Shutterfly, and Craigslist. Investors include Altimeter Capital, Fidelity Investments, Intel Capital, Salesforce.com, Sequoia Capital, and T. Rowe Price.

RecommindIn the late 1990s, UC Berkeley postdoc Jan Puzicha realized that existing information retrieval technologies would not be able to handle the ever-increasing amounts of data organization would need to process in the years to come. He developed Recommind’s CORE technology as a research fellow at Cal. In 2000, Jan teamed up with Derek Schueren and Bob Tennant MBA ’03 to found Recommind, which uses the CORE technology to identify and analyze patterns in massive amounts of human-generated, unstructured data (email, word processing, social media, etc.). An industry leader in unstructured data management, Recommind took in another $15 million in Series C funding last September.

PipelineDBCo-founded in December 2013 by Derek Nelson ’11 and UCSB alum Jeff Ferguson, Y-Combinator-backed PipelineDB completed a seed funding round in May 2014, drawing investments from SV Angel, Susa Ventures, Data Collective, Paul Buchheit, and more. The product offers realtime data analytics, running SQL queries continuously on incoming data to update results incrementally as new data arrives. Not yet one-year-old, PipelineDB is among the youngest companies on our list, and we’re excited to watch them grow.

Premise Data CorporationAnother newbie in the big data arena, Premise tracks macroeconomic and human development in real time, combining online e-commerce data with information collected by a global street team to keep track of what’s on sale, how it looks, and how much it costs. The company then sells that information to financial institutions, government agencies, and packaged goods companies to help them keep track of product availability and economic inflation, bridging the discrepancy between official reporting on consumer prices and real-time fluctuations. Co-founded by Berkeley PhD candidate David Soloff MA ’02 and Joseph Reisinger, the company is backed by Google Ventures, Andreesen Horowitz, Harrison Metal, and others.

Trifacta - Founded by alumnus and Computer Science Professor Joseph Hellerstein, Trifacta specializes in data transformation, discovering, structuring, and cleaning data so it’s ready for analysis. Investors include Accel, Dave Goldberg, and others. They recently closed a $25 million investment round, and Hellerstein has moved from the CEO role back to teaching.

Wise.io - Founded in 2012 by a team of Berkeley astrophysics professors and researchers, Wise.io grew out of a decade-long academic effort to develop predictive machine learning models that would help astronomers discover and categorize new stars. When enterprise companies began inquiring about using models on their own datasets, Professor Joshua Bloom realized the technology could have commercial value. Today, with $2.6 million in funding, Wise offers applications that use machine learning technologies to generate accurate predictions for use in a variety of business processes.

And Leada – a startup currently in residence at Berkeley’s SkyDeck incubator - is helping data science companies fill their pipeline. Founded by a team of Cal students and recent graduates, Leada partners with companies looking to employ data scientists, creating challenges for job seekers to complete, then sending top submissions directly to hiring managers. As job seekers complete challenges, they can choose to display their portfolio of work on a platform visible to recruiters at all client companies. Current clients include Cal-founded MightyHive and Statricks, as well as eBay, the Republican National Committee, and others.