You can now configure the entity type handling for a dataset before you upload the dataset files.
You can now provide added and excluded entity values when you use the SDK to redact individual strings and files.
Added a new method to the SDK. redact_xml
works similarly to redact_json
. To produce a redacted output, you pass in a redact_xml
string.
Removed support for en_core_web_trf
and en_core_web_lg
auxiliary models. Disabled model inference for ORGANIZATION
, PERSON
, LOCATION
, and MONEY
entity types. Updated the auxiliary model configuration environmental variables to have new default values:TEXTUAL_AUX_MODEL_GPU
: false
TEXTUAL_AUX_MODEL
: en_core_web_sm
Fixed a redaction issue that was caused by a regression from v140.
Improved the Textual NER model, specifically for datetime values and and electronic health records.
Fix for correctly re-synthesizing files as part of pipelines
When you call the dataset.add_file
method in the Textual SDK, you can now pass in IO bytes.
You can now specify a list of additional values to include for each entity in a datasets. This allows Textual to identify values that it might not identify because they are specific to your organization or industry. The list can contain both specific values and regular expressions.
Improved the file list display for datasets to better accommodate longer file names.
For an uploaded file pipeline, added a Back to Files breadcrumb to return the user to the main file list.
On the dataset details page, the bulk edit function for entity type handling is now a dropdown instead of separate buttons.
You can now use the Python SDK to delete files from a dataset.
To improve performance, enabled date synthesis inference on GPU. Added the environment variable TEXTUAL_DATE_ SYNTH_ GPU to manage whether to use it.
Renamed the following environment variables:
SOLAR_PREFER_GPU
to TEXTUAL_AUX_MODEL_GPU
SOLAR_AUX_MODEL
to TEXTUAL_AUX_MODEL