Imply Advances Apache Druid Real-Time Analytics Database
The Apache Druid open-source real-time analytics database now includes a multi-stage query engine that commercial database vendor Imply built as part of the Project Shapeshift effort. New Druid features are generally available today.
Based in Burlingame, CA, Imply raised $100 million in a Series D funding round in May with the aim of continuing to develop its own advertising Imply Polaris Cloud Database service based on Apache Druid.
Apache Druid provides an online analytics processing database that competes with a number of open and closed source systems. On the closed source side, competitors like Rockset and Aerospike. In open source, Druid competes with Apache Pinot, which offers a commercial offer led by the database provider StarTree.
Apache Druid now features a multi-step query execution engine that loads data faster than previous versions of the database.
The runtime engine also allows users to perform data transformations using SQL. New Druid features are coming to the cloud via Imply’s Polaris Database as a Service, which is also being updated to make it easier for developers to leverage the service with new data visualizations. Imply has been working for most of this year to make Druid more powerful in a project codenamed Project Shapeshift.
What multi-step queries bring to the Druid real-time analytics database
The multi-step query engine for Druid with SQL-based ingestion and in-database transformation is a step forward, said IDC analyst Amy Machado.
Using SQL for ingestion not only makes it easier for the developer, but also speeds up the query, Machado noted. The same goes for in-database transformations, which remove an extra step that might otherwise have taken longer.
“Complex queries for real-time applications and data analytics require a database that can handle high concurrency with very low latency, and the ability to join with historical data to ensure that data is always in context,” Machado said. “Imply contributes to the Apache Druid project to help ensure these requirements.”
Workflow optimization for real-time analysis
Multi-step queries will help optimize the overall workflow for Druid users, said Fangjin Yang, co-founder and CEO of Imply.
The ability to load data and then transform it minimizes the need for organizations to create an external data pipeline. Without the multi-stage engine, an organization might have needed multiple tools to load and transform data, before importing it into Druid.
The data transformation technology in Druid is also capable of loading so-called nestimated data structureswhich can make it difficult for users to directly manipulate the data.
Prior to the Druid update, users had to first flatten nested data structures or use some sort of external data pipeline to manipulate that data so it could be used in the database, Yang said. This is no longer the case as the multi-stage engine is able to load and transform nested data.
Yang noted that Druid does not compete with data transformation technologies, including open source. DBT basic project. DBT enables workflow automation for data transformation, with the ability to schedule and repeat operations, which Druid or Imply Polaris do not intend to offer, he said.
Looking ahead, Yang said Imply is working to improve the usability of Druid and its Polaris DBaaS.
“Some of the things we’re working on essentially remove the need for tuning and configuration, even if you’re just deploying from open source,” Yang said. “Basically, we just want it to work without you having to think too much.”