Tonic Validate is a free, open-source library for evaluating RAG and LLM based applications. It makes it easy to understand how the answer quality of your application changes over time as parameters are tuned, components are changed, and data sources are refreshed. Today, we’re happy to announce something that will make the continuous improvement of your RAG systems even easier!
Tonic Validate can be run entirely as a GitHub Action. We recently wrote a blog post about using GitHub Actions for integration testing of your RAG system. This post received a lot of love from the community, so we decided to make it available for quick deployment on GitHub Marketplace, here.
The integration allows you to set up CI/CD processes for LLM evaluation with just a few clicks. To get started, you can follow the instructions found in the marketplace listing. I’ll also give a brief demonstration of how to set up below.
Setting up Tonic Validate as a GitHub Action
The action can be used in your own GitHub repositories by simply adding an additional workflow file in your .github directory. The workflow file, will look something like:
You have to provide just two pieces of information.
API Key
Validate uses an LLM to assist in the evaluation of your application. By default, we use GPT-4 Turbo, available from OpenAI. To use, you just need to provide your OpenAI API Key. If you prefer to use Azure OpenAI you can instead provide the equivalent Azure API Key and the Azure OpenAI Endpoint URL as well.
Today, Validate supports either OpenAI or Azure OpenAI models, but if there are other LLMs you think we should be using for evaluation, you can file an issue on our main repository.
Questions
Validate takes a set of user-provided questions on which to perform the evaluation. Each question is represented by a JSON object which looks like the below JSON.
The optional fields shown above allow for Validate to compute additional metrics (if not provided, Validate will compute just the metrics which do not require the missing fields).
The file should ultimately look something like:
Running your action
Once this setup is complete, your action will now trigger each time a PR is created and on each subsequent commit to the PR. Once the evaluation completes, a comment will be added to the PR with a breakdown of scores provided by Tonic Validate to help you understand how the code change affects the answer quality of your RAG or LLM application. Here is a sample PR comment so you can see what it looks like:
Each time the action is run on a given PR the comment will update, as opposed to creating a new comment on each commit.
Next Steps
If you have any feedback, or see a feature that is missing which you need, reach out to us. We can most easily be reached via GitHub by filing an issue on either the GitHub Action repo or the main Tonic Validate repo. Alternatively, you can check out our offering for business which provides a nice UI experience on top of our open source offering. You can learn more here.
We'll be back in a few days with another post about an additional listing we'll be adding that enables you to run Tonic Validate on GitHub merges. Stay tuned for that!