Blog
Test data management

Amazon RDS vs Tonic Ephemeral for test databases in the cloud

Author
Andrew Colombi, PhD
Author
May 1, 2024
Amazon RDS vs Tonic Ephemeral for test databases in the cloud
In this article
    Share

    In this article I’d like to compare Tonic Ephemeral to Amazon RDS.

    Ephemeral and RDS are both services that make it easy to run a database in the cloud; however, they have a different set of priorities. While building Ephemeral, we optimized it for software development use cases. In contrast, Amazon optimized RDS for production use cases. For example, where RDS prioritizes deploying a database across multiple availability zones for redundancy, Ephemeral provides lightning fast deploy times suitable for CI/CD pipelines.

    The TLDR version is this table. 

    A table comparing the characteristics and capabilities of Tonic Ephemeral and Amazon RDS

    1 Instead RDS has automatic DB resurrection! More on this below.

    We optimized Ephemeral to be fast, cheap, and easy for testing, and that’s simply not what RDS is optimized for. To understand why we took this approach, let me start by explaining the workflows that our users need.

    Use cases for ephemeral databases in the cloud

    The developers our solutions support have two primary use cases for instantiating databases in the cloud: automated testing and isolated development. 

    Automated testing

    Automated tests have two principal needs of their test databases: speedy startup times, and isolation from other tests. These two requests are at odds with each other; you can provide quick startup times by sharing the database, but if you share the database then one test run may interfere with another test run. To provide isolation, databases are spun up for a specific test run, but this adds the database start time to every run of your automation.

    Tonic Ephemeral solves these problems by ensuring fast spin up times for databases of any size. Once you solve the startup time problem, isolation becomes easy because you can start independent databases with every run of your automation.

    Developer productivity

    Developers occasionally need a temporary data infrastructure during their development. It’s often convenient to run temporary databases on your local developer hardware, i.e. your laptop, but that’s not always possible:

    • the database might be too big to run locally,
    • the database may not run on your hardware (here’s looking at you, Mac M-series 😬),
    • the project may require sharing the infrastructure with other teammates, or
    • the project may require connecting to the database from other infrastructure in your VPC.

    In this case, using a cloud database can be really convenient, but it comes at a cost, literally. Cloud databases are notoriously expensive and easy to accidentally leave on after you’re done with them. In fact, with RDS you cannot shut down a DB to save on costs. You can only temporarily shut down a DB, that RDS will automatically restart after one week. Here’s the actual dialog you must confirm when “shutting down” an RDS database.

    A screenshot of the dialog modal in Amazon RDS that must be confirmed when "shutting down" an RDS database.

    In my experience, many organizations won’t allow their developers to start cloud databases for fear of the bill at the end of the month.

    The benefits of Ephemeral’s approach

    From the outset we knew we needed Ephemeral to be fast and cheap to enable the workflows we wanted to support. Ephemeral distinguishes itself from RDS in a few ways. First let’s look at cost. 

    Optimizing cost

    The first thing we did to optimize cost was make it easy to avoid paying for infrastructure you’re not even using! Ephemeral implements customizable database expiration.

    A screenshot of Tonic Ephemeral's UI for setting a database expiration timer.

    And because Ephemeral is monitoring usage of the database, it knows to not shut down a database that’s actively being used. Instead, it’ll wait for at least an hour to pass since the last activity before shutting down a database.

    The second thing we did is recognize that ephemeral databases are not meant to be production ready. Ephemeral databases are simple. No multi-cluster deployment. No read replicas. No production monitoring. No automatic maintenance with a blue-green deployment. RDS’s smallest instance allocates a minimum of 2 vCPUs. On Ephemeral we let you go all the way down to 1/8th of a vCPU. The needs of a database you’re going to toss in a few hours are just very, very different. By omitting all these extraneous features, we can offer a far more competitive price.

    Optimizing speed

    From the start, we knew Ephemeral needed to be fast. To accomplish this, Ephemeral stores and restores data in the format native to the database. There’s no way to import 100s of GB into a database in under a minute; even Tonic Structural, which is an incredibly efficient data loader, takes ~30 minutes to load 100GB of data. Therefore, Ephemeral begins by working with all its data in the data’s native form. This gives near instant data load times from the time the data volume is ready. The next problem to solve is getting the data volumes ready quickly.

    To address data volume readiness, we customized Ephemeral’s management of data volumes to each cloud environment we deploy in: e.g. AWS, Azure and GCP. For example, in AWS, Ephemeral takes advantage of EBS Volume Snapshots. EBS has a neat trick it can do with Volume Snapshots that makes loading them much quicker. While an EBS volume is being loaded from its snapshot, requests for data not yet loaded can be rerouted to the as of yet loaded blocks directly from the snapshot in S3. This means the EBS Volume is essentially instantly ready.

    At this point you might be wondering what makes this faster than RDS. Afterall, surely RDS can do many of the same tricks? There are two main reasons RDS ends up being slower. First instantiating the RDS instance—i.e. starting the EC2 instance(s) that will host the database—is slow. Even with the fanciest of tricks to speed up the data load, the overall time will be limited by the time it takes to instantiate the infrastructure.

    The second, more subtle reason RDS ends up being slow is that RDS is expensive even for a small node. Because it’s expensive, organizations tend to share RDS instances among multiple test runs. When loading data with a snapshot, the way Ephemeral does, you’re affecting the whole instance’s data volume. This means you cannot load a single test run’s data via a snapshot without disrupting all the other concurrent test runs. Thus, sharing an RDS instance among multiple test runs, which is the only economical way to use RDS for ephemeral test infrastructure, means you cannot take advantage of native data formats or fancy snapshot loading. You’re left with loading data using SQL scripts and bulk loading CSVs, which is much, much slower.

    A doodle comparing the number of volumes used by Tonic Ephemeral vs Amazon RDS when running tests. Ephemeral uses one volume per test run. RDS runs all tests in one volume.

    In summary

    Tonic Ephemeral and Amazon RDS both enable organizations with ways to instantiate databases in the cloud, but only Tonic Ephemeral is well-suited to the specific needs of ephemeral test infrastructure. With Ephemeral, you can quickly create isolated test databases, filled with your test data, for CI/CD or developer needs. Sign up for a free trial to spin up your own ephemeral databases today, or book a live demo with our engineering team to learn more.

    Andrew Colombi, PhD
    Co-Founder & CTO
    Andrew is the Co-founder and CTO of Tonic. An early employee at Palantir, he led the team of engineers that launched the company into the commercial sector and later began Palantir’s Foundry product. Today, he is building Tonic’s platform to engineer data that parallels the complex realities of modern data ecosystems and supply development teams with the resource-saving tools they most need.

    Fake your world a better place

    Enable your developers, unblock your data scientists, and respect data privacy as a human right.
    Accelerate development with high-quality, privacy-respecting synthetic test data from Tonic.ai.Boost development speed and maintain data privacy with Tonic.ai's synthetic data solutions, ensuring secure and efficient test environments.