Custom entity types - You can now define custom entity types, to identify sets of values that are unique to your industry or organization. For each custom entity type, you provide one or more regular expressions to define the matching values. You can then enable or disable the entity type for each dataset and pipeline.
Fixed an issue that caused slow load times for customers with large datasets. Calls to GET /api/dataset
and GET /api/dataset/{datasetid}
no longer return entity information for the dataset files. Instead, the new GET /api/dataset/{datasetid}/pii_info
endpoint returns the entity information for a dataset’s files.
A new dataset settings option controls the output in .docx tables. By default, table content goes through the regular scan and redaction process, and detected entity values are handled based on the dataset’s entity type handling configuration. You can also choose to completely block out all table cells, in which case each table cell is covered by a black box.
File statistics for pipelines - The pipeline details page now displays a summary of information about the pipelines files, including the number of files, the number of words in the files, the number of detected entity types, and the number of detected topics. For entity types, the display includes the number of detected values for each type. For topics, the display includes the number of files that involve each topic.
On the dataset details page, the preview count for each entity type now reflects the count of values that are assigned that type in the output files. Previously, values that matched multiple entity types were included in the preview count for all of the matching types.