De-identify sensitive free-text data for your RAG system to harness the power of RAG while protecting privacy.
Automatically detect and de-identify dozens of sensitive entity types in free-text data to keep private information out of your RAG system.
Extract complex, messy data from PDFs, images, CSVs, and more into a standardized, easy-to-develop-with markdown format.
With reversible tokens, your RAG system can display the original text to users while ensuring the LLM processes only the redacted data.
Substitute sensitive information with reversible or non-reversible tokens to maintain data consistency across your dataset.
Extract data from messy, complex formats, such as PDFs of clinical notes, into a standard format convenient for RAG ingestion. Support for TXT, DOCX, PDF, CSV, XLSX, TIFF, XML, PNG, JPEG, JSON, and more.
Automatically update your RAG system with new and modified files each time the pipeline runs to keep your application current.
Automatically identify dozens of sensitive entity types in free-text data with Textual’s proprietary, best-in-class multilingual machine learning models for NER.
For structured and semi-structured data de-identification
For unstructured, free-text data de-identification
For ephemeral data environments
For structured and semi-structured data de-identification