How Reverse ETL Can Lighten Your Data Load
Where is your business on the AI adoption curve? Take our AI survey discover.
Moving data between applications and warehousing data for analysis are common issues for application builders, data engineers, and IT teams. But we all know our businesses can benefit significantly if we’re smart with our data.
There are many options for moving data now. Some have been around for years and have evolved, like ETL (extract, transform, load) and custom integrations. Others were created out of necessity, such as ELT (extract, load, transform) and event streaming. The requirements and use cases for these data pipeline tools have become more advanced and demanding. From these ever-increasing demands, a new (but sane) use case has emerged in recent years: moving data from your data warehouse to the cloud applications your business uses. And a new category of data pipeline has emerged to satisfy it: reverse ETL.
What is reverse ETL?
Reverse ETL is simple in its function: move data from your data warehouse to your cloud applications. Reverse ETL tools synchronize data on a recurring schedule (configurable from a few minutes to 24 hours or more) or when triggered by calling an API (application programming interface) endpoint that the reverse ETL tool exposes or through integrations with tools such as Air flow and dbt.
What can I do with reverse ETL?
Reverse ETL tools allow you to realize a lot of the promises of data science. The complex and valuable modeling and analysis that your data teams produce lives in your data warehouse. Being able to use this enriched post-analysis data and automate the update of your business applications makes the work of your data scientists more valuable. It also ensures that their work delivers more real-time value compared to the manual processes that exist in most businesses today.
Reverse ETL tools focus on customer data and are best used to solve problems that require combining data across your websites, digital products, and whatever cloud applications you use. The specific use cases that reverse ETL tools address best are:
- Build better and more complete customer profiles (sometimes called “customer 360”)
- Create more specific and granular audience segments
- Scoring prospects based on your unique and business-specific criteria
- Identify customers “at risk” or likely to unsubscribe
- Provide data to cloud applications for better reporting
Who are the major reverse ETL providers?
There are several reverse ETL tools available, and they all work the same. You set up a source connection to your data warehouse, set up a destination connection to a cloud application, and then write an SQL (structured query language) statement (or choose a table) to select the data you want to sync, choose your mappings and set a synchronization schedule.
Despite the similar functionality of reverse ETL tools, three vendors stand out:
Hightouch believes your data warehouse is your source of truth for customer data. The company makes it easy to sync this data with the cloud tools your business uses. Hightouch stands out because its tool is mature and has more source and destination integrations than any other Pure play reverse ETL tool. The company has also grown its integration library faster than Census (see below) over the past six to 12 months. This is important because integrations dictate the level of flexibility your business can have with its selection of tools. More integrations are better for reverse ETL.
Hightouch customers include Grafana, Plaid, Zeplin, and Mattermost.
If there was an industry standard for And the opposite, it would probably be the census. Census hasn’t been around much longer than Hightouch, but it first gained ground and has an impressive customer base. It’s a mature tool and has a lot of integrations, but less than Hightouch.
Census clients include Fivetran, dbt, Netlify, and Notion.
If you choose between reverse ETL tools, you are probably choosing between Hightouch and Census. Your decision criteria will depend on the integrations and pricing available, as Hightouch and Census have different pricing models. High-touch prices based on the monthly volume of synchronized records, while Census prices are based on the number of data synchronization workflows you run.
Rudder is not a pure-play reverse ETL tool – it is an event streaming platform. The company made a name for itself and expanded its customer base by being the open source alternative to Segment. Earlier this year, RudderStack released reverse ETL and ETL features that made it a competitor in the reverse ETL space.
The reason this combination of features makes sense is that reverse ETL relies on event streaming or event collection tools (frequently Segment, Snow plow, or RudderStack) and ETL tools to import data into the warehouse. RudderStack is the only reverse ETL tool that can also bring the necessary customer data into your warehouse. And the company offers a lot more destination integrations than Hightouch or Census. This is because it is an event streaming tool and these tools require extensive integration libraries to be competitive.
RudderStack customers include Crate & Barrel, Priceline, Acorns, and Hinge.
The segment also has reverse ETL functionality, but the company does not identify itself as such. Personas SQL Traits allows you to sync data from your warehouse to your cloud apps, but it must go through Segment’s Personas audience builder.
Segment introduced a new feature late last year with Segment data lakes, which creates a customer data lake for you. This reduces the importance of the business reverse ETL functionality.
Alternatives to reverse ETL
Reverse ETL is ideal for tasks such as building customer profiles, segmenting audiences, and other customer-centric processes. The real-time requirement for these processes is not strict, which makes a lot of sense because loading and analyzing data in your data warehouse in real time is not a good architectural model. Data warehouses and OLAP (Online Analytical Processing) databases can quickly run complex analytical models and queries, but they are not designed for real-time application response.
The emerging solution to these real-time requirements is to use tools such as Rockset to provide real-time analytics to your applications. In terms of function, Rockset is no different from Elasticsearch, but Rockset is built natively in the cloud and emphasizes SQL compatibility. This means that you will be able to scale beyond what Elasticsearch supports and perform basic SQL functions, such as joins, that Elasticsearch does not support.
An example use case of Rockset is providing data to a constantly updated leaderboard in a large multiplayer online game. If you have millions of concurrent players, it’s incredibly difficult to ingest events, calculate millions of independent scores, and sort that list in real time, but it’s a practical use case for tools like Rockset. .
VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the topics that interest you
- our newsletters
- Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
- networking features, and more
Become a member