Recently, I had the pleasure of being in Chicago for QA Financial, meeting the QA leaders of Bank of America, Discover, BNY Mellon, Goldman Sachs, and more. It also overlapped with the much heralded 2024 solar eclipse at a whopping 96% totality. The anticipation was immense. I thought that with 96% of the sun covered, I would sink into a notable darkness. Surely that 4% couldn’t make a huge difference. Well, I was in for quite a shock.
Instead of the dramatic twilight I expected, there was only a subtle dimming, like someone had turned down the sun’s dimmer switch just a tad. Like the saturation on your TV was slightly adjusted so the colors were just slightly less vibrant. This got me thinking about the parallels to test data management.
In the world of software development, having test data that mimics real-world scenarios is crucial. But here's the kicker: like my Chicago eclipse experience, having 96% of your test data environment set up correctly might sound great, but it can still leave you missing out on the full effects of what you need—total coverage. Additionally, you can have 96% of the process optimized but that last 4% can significantly impede the quality and efficiency of your overall testing process.
When your data is only “close enough”
Let's paint a picture here. Imagine you're at 96% test data coverage. Things look pretty good. You've captured most of the scenarios, your data looks solid, and your tests are running smoothly. But it’s that elusive 4% that can harbor the most critical, show-stopping bugs. Just as the 4% of the sun that peeked out from the moon dramatically lessened the eclipse experience, even a tiny slice of insufficient test data can lead to a less-than-stellar release. How do you get that last 4%?
There seems to be two ways that people can do this. The first would be to manually craft all of the different data scenarios that would capture your edge cases. This requires heavy investment into the QA planning process and can significantly slow down development with marginal returns. It’s easy to think of the first few scenarios but gets a lot harder to think of the less common ones. Not only is that method slow, but humans are notoriously difficult to predict and model. At Tonic.ai, we think the second way is better, faster, and more reliable, which is to not build them from scratch, but to rely on your production data, masking it for compliant use while capturing all the relationships it contains, so that you can leverage all of your real-world structural variants.
In Indianapolis, a short drive away, they experienced the totality of the eclipse. The difference was night and day—literally. The streetlights came on, stars appeared, and crickets started chirping. My friend who was in Indianapolis said it was almost a spiritual experience. That's what 100% test data coverage can do for you. It brings out the bugs hiding in the metaphorical shadows, allowing you to address them before they become real problems and drastically improving your change failure rate.
When your process is only “close enough”
Alternatively, let’s say that your CI/CD process is humming along. However, in order to get refreshed test data that’s referentially intact, you need to file a JIRA approval ticket to find someone who has access to production. That one step can add a tremendous amount of lag into the testing process. A process that is 96% automated yet has a human approval step, from a different team, is going to rob that process of its magic.
We all want to improve our development frequency and it can be shocking how each part adds friction, slowing down the process and therefore the time at which we can hit the market with our product.
Striving for totality in test data management
So, how do we move from a pretty-good 96% to a stellar 100%?
- Embrace the Shadows: Just as astronomers relish the brief totality of an eclipse to study the sun’s corona, delve into the less traveled parts of your application. Utilize your edge cases from production by safely and realistically replicating them in your test data.
- Use the right tools: When you look at the eclipse, you need to wear special sun-glasses. However, those glasses do not work with cell phone cameras as I learned when I tried to take a picture. The right tool for your test data pipeline should make generating and refreshing test data quick and simple, and with rapid data provisioning to boot.
- Get to the right spots: Just as eclipse chasers travel the world following the path of totality, location and timing matter. Regularly revisit and revise your test data strategies and test data pipelines. What worked last year may not cover you this year. Does your test data infrastructure connect across data sources to give you comprehensive coverage, even across different database systems? Does it output your test data to containers or ephemeral data environments, for ease of access and management? Generating data is only half the battle. Where you can generate from and to are critical considerations to ensure efficiency.
Conclusion
While 96% might sound like you're almost there, the truth is, in both solar eclipses and test data management, that last 4% can make a world of difference. Striving for total coverage ensures that when your product goes live, it's not just functioning—it's flourishing.
In the realm of test data, don't settle for a light dimming when you can have the full coverage. Aim for the stars—quite literally—and ensure your software shines even in the darkest of times. Let’s bring the totality of our efforts to every test, and trust me, the results will be as striking as the difference between a near-total eclipse and the real, breathtaking deal.