Added the Tonic NER model version to the model information. The API endpoint /api/environment/models
reports version strings for NER models.
Entity manager for entity types - The new entity manager allows you to view all of the occurrences of each entity type in a dataset. it displays the original value, the context in the original file, and the context in the transformed file. To view the entities manager, from the entity value preview list, click Open Entities Manager. Note that by default, for the NUMERIC_VALUE
entity type, Textual only provides context information for the first 20 occurrences. To change this, set the SOLAR_NER_OCCURRENCE_IGNORE_NUMERIC_VALUE
environment variable to false
.
Bug fixes and other internal updates.
Improved detection of names, particularly in ASR transcripts.
Added an optional jsonpath_allow_lists
to redact_json
. You use jsonpath_allow_lists
to override NER results at specific JSON Path expressions.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Textual can now redact images in .docx files.
Fixed a rare issue where Azure OCR returned a400 response when the file upload stream contained corrupted data.
Improved synthesis on days of the week and ordinal numbers that are flagged as DATE_TIME.
Textual now only disables a numeric span when it overlaps one of the following disabled types: DATE_TIME, DOB, LOCATION, LOCATION_ADDRESS, LOCATION_ZIP, MONEY, CREDIT_CARD, PHONE_NUMBER.
Textual now allows you to parse EML and MSG files.
You can now use the Python SDK to configure Azure pipelines.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
You can now use the Python SDK to configure Amazon S3 pipelines.
Amazon Textract can now be used to process dataset files.
On the Python SDK, added parameters for pipeline creation, including the file location, the connection credentials, and whether to also generate redacted files.
Improved the Textual NER model throughput on long strings that contain a large number of numeric characters.
Added the redact_html
function to the SDK, which allows you redact sensitive values from HTML strings.
Improved detection of names and organizations.
Disabled auxiliary model detection of WORK_OF_ART.
Improved the Textual NER model throughput on long strings that contain a large number of detected entities.
Added support to store dataset files in a specified S3 bucket, instead of in the Textual application database.
When Textual replaces first name values, it now attempts to use a name with the same gender.
For the DOB (date of birth) entity type, you can now configure synthesis options. You can set how to shift the date.
Bug fixes and other internal updates.
Improved the synthesized values for the PERSON_AGE
entity type.
You can now configure the entity type handling for a dataset before you upload the dataset files.
You can now provide added and excluded entity values when you use the SDK to redact individual strings and files.
Added a new method to the SDK. redact_xml
works similarly to redact_json
. To produce a redacted output, you pass in a redact_xml
string.
Improve Pipeline UI to include better Python SDK code snippets
Improved the user experience when you load a large number of files to a dataset.
Updated the UI for adding and excluding values for entity types. Changed the tab labels to Add to detection and Exclude from detection, and removed the requirement to click the edit icon for the entries.
Added support in the SDK to create dataset include lists to define additional values for an entity type.
Removed support for en_core_web_trf
and en_core_web_lg
auxiliary models. Disabled model inference for ORGANIZATION
, PERSON
, LOCATION
, and MONEY
entity types. Updated the auxiliary model configuration environmental variables to have new default values:TEXTUAL_AUX_MODEL_GPU
: false
TEXTUAL_AUX_MODEL
: en_core_web_sm
Fixed a redaction issue that was caused by a regression from v140.
Improved the Textual NER model, specifically for datetime values and and electronic health records.
Fix for correctly re-synthesizing files as part of pipelines
When you call the dataset.add_file
method in the Textual SDK, you can now pass in IO bytes.
You can now specify a list of additional values to include for each entity in a datasets. This allows Textual to identify values that it might not identify because they are specific to your organization or industry. The list can contain both specific values and regular expressions.
Improved the file list display for datasets to better accommodate longer file names.
For an uploaded file pipeline, added a Back to Files breadcrumb to return the user to the main file list.
On the dataset details page, the bulk edit function for entity type handling is now a dropdown instead of separate buttons.
You can now use the Python SDK to delete files from a dataset.
To improve performance, enabled date synthesis inference on GPU. Added the environment variable TEXTUAL_DATE_ SYNTH_ GPU to manage whether to use it.
Renamed the following environment variables:
SOLAR_PREFER_GPU
to TEXTUAL_AUX_MODEL_GPU
SOLAR_AUX_MODEL
to TEXTUAL_AUX_MODEL
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
You can now create pipelines that use files from Azure Blob Storage.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Improved performance for previewing PDF files.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
The responses for textual.redact
and textual.llm_synthesis
now include:
br<
language
for each entity value.new_start
) of each entity value in the redacted content.new_end
) of each entity value in the redacted string.For uploaded file pipelines that also generate redacted files, you can now configure the handling option for each entity type.
Bug fixes and other internal updates.
Added a HEALTHCARE_ID
entity type for identifiers associated with health care.
Removed the right-hand panels from the Home page. Added the API Keys panel to the dataset details page to accompany the code snippets.
Bug fixes and other internal updates.
On the Playground, LLM Synthesis is now turned off by default.
Improved synthesis for DATE_TIME entities by recognizing non-standard date formats.
Added a forgot password option to allow Textual Cloud users to reset their password.
Improved our detection of date values.
Removed the US_DRIVER_LICENSE entity type.
Introduced a new model that improves performance for non-English languages.
The responses for textual.redact
and textual.llm_synthesis
now include the redacted or synthesized value.
Bug fixes and other internal updates.
When Textual synthesizes values, it now matches the capitalization of the original value.
For PDF and image files, the pipeline file details now include any tables and key-value pairs that are in the file.
The Pipelines and Datasets pages now display lists instead of cards.
You can now set up a pipeline to connect to a Databricks workspace. You can also configure a Databricks pipeline to generate redacted files in addition to the JSON output.
Textual can now detect values in multiple languages. Textual Cloud supports a set of non-English languages. For a self-hosted instance, you must enable multi-language support and provide the language models for Textual to use.
Textual no longer includes the option to create custom models to use for datasets.
Textual now supports a pay-as-you-go option for Textual Cloud. When you set up a pay-as-you-go subscription, you provide a credit card that is automatically billed each month for your Textual usage.
Bug fixes and other internal updates.
Redesigned the structure of the JSON output.
Bug fixes and other internal updates.
NOTE: This version was removed because of a regression.
Fixed an issue where pipelines failed to run.
Updated the Python SDK to add options to create pipelines, delete pipelines, upload pipeline files, and download pipeline results.
Disabled the Run Pipeline option when a pipeline does not have a configured output path.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
The user menu now displays the current Textual version.
Fixed an issue where you could not upload a file that had the same name as a file that was deleted.
Added entity types, for money, a person's age, data of birth, organizations, and countries. Improved the detection of occupation.
Updated the display of JSON output to add collapse/expand functionality.
The Snowflake Native App now supports JSON files.
Added a new Next Steps panel for pipelines. Includes copyable code snippets
Bug fixes and other internal updates.
Updated the getting started flow to focus on pipelines.
Bug fixes and other internal updates.
You can now configure an Amazon S3 pipeline to also generate redacted versions of the original files.
For self-hosted instances, Textual no longer requires you to input AWS credentials if they are available from the environment.
The Snowflake Native App now supports parsing files to produce JSON output.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Added support to provide AWS session tokens for pipelines.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
The Entities tab of the pipeline file details no longer displays the tabular list of entities. It only displays the detected entities in context.
For uploaded file pipelines, removed the run option and list of runs. Uploaded file pipelines now run automatically to process each new file as it is added.
Bug fixes and other internal updates.
For Amazon S3 pipelines, in the file selection, highlight buckets that contain selected files or folders.
On the file details page for a pipeline file, changed the Redactions tab to Entities.
Added an option to copy the identifier of a pipeline run.
Updated the onboarding flow and redesigned the Home page.
Added a confirmation modal for canceling a pipeline run.
Fixed a display issue with pipeline titles.
You can now delete a newly uploaded file without having to refresh the page.
Fixed an issue with the status information for pipeline files.
For plain text pipeline files, removed the option to toggle between Markdown and the original content.
Fixed an issue with previewing PDF files.
The file list for a pipeline run no longer displays while the run is queued.
Bug fixes and other internal updates.
Added support for Amazon S3 files in pipelines.
Added support for image files.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Added support for .xlsx files.Added support for headers, footers, footnotes, and endnotes in .docx files.
Textual can now detect gender labels and occupations.
Fixed an issue with .docx files that started with an empty paragraph.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Improved handling of emojis.
Bug fixes and other internal updates.
Bug fixes and other internal updates.
Fixed an issue with the preview of .docx files.
Added support for image files.
Added an LLM synthesis option to the Playground.
Updated the SDK to support redacting individual text and .csv files.