In the digital age, where data is the new currency, protecting sensitive information while ensuring its utility is paramount. The strategic partnership between Tonic.ai and Databricks marks a significant milestone in achieving this balance. Tonic's innovative approach to data synthesis is now seamlessly integrated with Databricks, offering a joint solution that is both powerful and privacy-enhancing.
The Tonic + Databricks connector
The Tonic-Databricks connector is a testament to the commitment of both companies to democratize data access without compromising on privacy. Tonic.ai's platform generates safe, high-fidelity versions of production data, free of sensitive information, enabling organizations to maintain the value of their data—from engineering to analytics.
For Databricks customers, this partnership means that they can now harness the full potential of their data without leaving the Databricks ecosystem. The connector leverages Databricks APIs—supporting Unity Catalog, Delta Tables, and Delta Sharing—to provide a comprehensive data protection and utility solution. Together, Tonic is helping make Databricks the Intelligent Data Platform by providing better and more efficient ways to share and test data as companies innovate to incorporate AI into their organizations.
Why Tonic
The Tonic platform enables organizations to provide their teams with high-quality, secure test data, accelerating development cycles and reducing time-to-market. Tonic.ai's additional offerings, such as Tonic Textual and Tonic Validate, further extend the capabilities for handling, respectively, the redaction and synthesis of free-text data, and monitoring of RAG application performance.
The effectiveness of Tonic’s native database connectors is underscored by success stories from top companies such as eBay, Walgreens, CVS Health, Autodesk, Philips, the NHL, and Plume Design. These organizations have experienced firsthand the benefits of integrating Tonic.ai's synthetic data capabilities into their existing testing and development workflows.
Why Tonic and Databricks
The collaboration between Tonic.ai and Databricks is not just a technical integration—it's a strategic enhancement to the data ecosystem. Tonic and Databricks create a powerful combination, offering a one-stop solution for both data privacy and an open and unified platform for data and AI. This ensures a seamless and flexible experience for users by performing data transformations directly in Databricks clusters. This synergy empowers Databricks customers to:
- Securely retain data for extended periods.
- Protect data at the earliest stage in the data lifecycle.
- Utilize de-identified data for AI/ML development.
- Integrate Tonic deeply into data applications with the Databricks SDK.
- Share de-identified data with partners through Delta Sharing.
- List de-identified data on the Databricks Marketplace.
Step-by-step guide: Use the Tonic-Databricks connector
So how do you take advantage of the connector? Here's an overview of the basic steps to connect Tonic to your Databricks catalog, configure and run data generation, and verify and use the resulting synthetic data.
Before you begin, make sure that you have a Databricks workspace ready and that you possess the necessary permissions to manage data connections.
Step 1: Establish the connection from Tonic to Databricks
Log into your Tonic instance and create a Tonic workspace.
Under Connection Type, choose Databricks.
Provide your Databricks API token and other connection details. You first provide connection information for your source data.
Then indicate where Tonic writes the synthesized data.
Step 2: Configure data mapping and generation
Use Privacy Hub to view the fields where Tonic detected sensitive information.
Indicate which tables in your Databricks database to include in the data generation. By default, the data generation includes all of the tables. Truncating a table will exclude the table from the data generation.
Use the intuitive Tonic UI to assign data generators to table fields. Tonic data generators indicate how to transform the field values. A generator might scramble characters or produce a safe but realistic value that mimics the original data.
Step 3: Generate synthetic data
Click the green button to initiate a data generation job within Tonic and monitor its progress.
After the job is complete, you may validate and utilize the synthetic data in Databricks.
Use the generated synthetic data for development, testing, QA, and AI/ML model training. Share your data with Delta Sharing and consider listing it in the Marketplace. Integrate the data into your workflows to experience the power of Tonic + Databricks.
Conclusion
The Tonic.ai and Databricks partnership is a powerful combination that promises to revolutionize how organizations approach data privacy and utility. By providing a secure, efficient, and user-friendly platform for data masking, subsetting, and synthesis, this alliance empowers organizations to innovate with confidence.
As we look to the future, the partnership between Tonic.ai and Databricks is poised to set new standards in data privacy and utility, enabling organizations to unlock strategic data assets safely and efficiently. It's an exciting time for data-driven companies and the opportunities for growth and innovation are boundless. Try out the Tonic + Databricks connector for free today by starting a Free Trial