Tools/Trainable Classifier Lab

Trainable Classifier Lab

What trainable classifiers are, when to use one instead of a SIT, EDM, or fingerprint, and what building your own really takes.

Pattern matching vs machine learning

Purview classifies content two fundamentally different ways. Knowing which problem you have is most of the decision.

SIT

Sensitive Information Types

Pattern matching: a regex or keyword finds the data, nearby evidence confirms the context. If you can describe the data as a pattern, this is your tool.

Detects: credit cards, IDs, account numbers, codenames
You control: every part of the pattern
Feedback loop: instant, test and adjust in minutes

Build one in the Custom SIT Builder →

Trainable

Trainable Classifiers

Machine learning: the classifier learns a category from hundreds of examples and recognises a contract or a CV the way a person does, by the content as a whole.

Detects: contracts, source code, financials, resumes
You control: the examples it learns from
Feedback loop: slow, training cycles take days

Browse the pretrained catalog →

The custom classifier lifecycle

Pretrained classifiers work immediately. A custom one is a small project, and most of it is collecting good examples.

Check the built-in catalog

Minutes

Purview comes with pretrained classifiers (source code, resumes, financial statements and more) that work straight away. If one covers your category, stop here: custom classifiers are real work.

Collect seed content

Days to weeks

50 to 500 samples that clearly belong in the category, 150 to 1,500 that clearly do not. A human picks these, and their quality decides the project. Only the 2,000 newest samples are processed.

Stage in SharePoint

Hours

Positive and negative samples go in separate, dedicated SharePoint folders containing nothing else. Use a Communication site, not a Teams folder, and allow an hour for indexing if the folders are new.

Create and train

Up to 24 hours

Point the portal at the positive folder, then the negative one. The model builds within 24 hours, and automated testing (in preview) has cut the whole workflow from around 12 days to about two.

Review predictions

Days

Work through the test results confirming each prediction. Poor accuracy is cheap to fix here: add seed data and retrain. After publishing it is not.

Publish and monitor

Ongoing

Published classifiers become conditions in auto-labelling, auto-apply retention, and DLP. The catch: a published classifier cannot be retrained. To improve one, delete it and rebuild with bigger sample sets.

Where classifiers can be used

Trainable classifiers are conditions, the same as SITs: policies reference them. One gap catches people out.

Solution	Built-in classifiers	Custom classifiers
Auto-labelling with sensitivity labels	✓	✓
Auto-apply retention label policies	✓	✓
Data Loss Prevention policies	✓	✓
Communication Compliance Microsoft-provided classifiers only. Custom trainable classifiers are not supported.	✓	✗

The constraints that shape projects

Five facts worth knowing before anyone commits to a custom classifier. Each one has derailed a real project.

Custom classifiers are English only

Built-in classifiers evaluate multiple languages, but custom trainable classifiers only support English content.

Encrypted items are invisible

Classifiers only work with items that are not encrypted. Content protected with encrypting sensitivity labels will not be evaluated.

No retraining after publish

Retraining a published custom classifier is not supported. If accuracy is poor in production, you remove the classifier and rebuild it with larger, better sample sets. Get it right before you publish.

Creator-only by default

By default, only the account that creates a custom classifier can train it and review its predictions. Pick the owner deliberately, not whoever happened to be logged in.

E5-level licensing

Trainable classifiers sit in the Microsoft 365 E5 / E5 Compliance feature set. Check the Microsoft 365 licensing guidance for security and compliance for specifics.