Databricks, a startup company that provides support for the open-source Apache Spark project, received $33 million in a new funding round.
Databricks also announced that it is launching a new service in Databricks Cloud, which is a platform for streaming data analysis that puts the company in competition with the similar Google DataFlow service that was just revealed last week.
The announcements were made by Databricks in the second annual Spark Summit.
The $33 million funding comes in the form of series B venture capital, with NEA being the leader of the funding round. Existing Databricks investor Andreessen Horowitz is included, who follows up on his $14 million series A investment in September of last year.
The new cloud, however, has drawn more attention than the significant additional capital that the company received.
The Databricks Cloud will allow companies to operate Spark-based applications on it, for purposes such as business intelligence. The objective of the cloud is to bypass the need to use a variety of tools in the cleaning up, processing and analysis of data, said Ion Stoica, chief executive of Databricks.
"Our goal is to say we want to make it as easy as possible for people to use it," Stoica said in an interview, referring to Apache Spark.
Databricks Cloud is initially running on Amazon Web Services, but Stoica said that it will soon be made available for other clouds such as Microsoft's and Google's.
Stoica differentiates Databricks Cloud from Google DataFlow by saying that the two services have different target markets.
"Google DataFlow is really targeted to developers. We also have higher-level interfaces for data scientists and data engineers," Stoica said.
The Apache Spark is seen as the successor of the MapReduce by data experts. MapReduce is the first programming model developed for the Hadoop ecosystem, which utilizes different open-source program for data analysis.
Users of Apache Spark tout its vast improvements in terms of performance over MapReduce, which is made possible by its efficiency in the usage of computer resources. Compared to MapReduce, Apache Spark programs can go up to 100 times faster in memory and up to 10 times faster on disk.
Databricks is looking to expand the community of Spark users with the certification of more Spark-based applications and holding Spark tutorials on massively open online course websites.
Databricks is currently in collaboration with Hortonworks to make sure that new Spark-based tools and applications will be applicable with all its implementations. Hortonworks also announced that the Apache Spark is fully enabled for the YARN resource management technology.