Skip to content
Tools/SIT Recipe Library

SIT Recipe Library

Proven custom SIT patterns for detection needs the built-in SITs don't cover. Every recipe is tested against the same engine that powers the Custom SIT Builder, and opens there with sample text loaded so you can see it match before adapting it to your tenant.

Got a recipe that works in production? Share it in the Discord and it can join the library.

Credential lists in spreadsheets

Open in Builder

The built-in General Password and User Login Credentials SITs match key-value shapes like password=X in code, config, and logs. A spreadsheet with a username column and a password column never matches that shape, because the extracted text puts the headers in one row and the values in others. This recipe flags the file by its headers instead.

highkeywords: password, passwords, pwd, passphrase + username, usernames, user name, login, user id, account name within 50 chars
mediumkeywords: password, passwords, pwd, passphrase + username, usernames, user name, login, user id, account name within 300 chars
  • -Keyword lists are word matches, so "password" does not match inside "passwords". The list carries both forms.
  • -Headers appear once per file, so keep any DLP rule on this SIT at a low instance count. Instance counts also dedupe by value: the same word twice counts once.
  • -Keep using the built-in credential SITs for code and config. This recipe only covers the file shape they miss.
Sample that matches: username passwords

Connection strings with embedded passwords

Open in Builder

Database connection strings with a plain-text Password= parameter, sitting in runbooks, wiki exports, and handover docs. The built-in credential SITs cover the big cloud formats but cannot be tuned; this catches generic on-prem strings and is fully yours to adjust.

highregex (?:Server|Data Source|Host)\s*=[^;\n]{1,100};[^\n]{0,200}?Password\s*=\s*[^;\s]{4,} + database, user id, initial catalog, integrated security, connection string, port within 100 chars 1 check
  • -The pattern needs the server and password parts in the same string, which keeps it quiet on prose that merely mentions servers and passwords.
  • -Add "Exclude specific matches" entries for known placeholder values like Password=changeme in your template docs.
  • -Works on text the way Purview sees it after extraction, so it catches strings pasted into Word and OneNote too.
Sample that matches: Connect with: Server=sql01.internal;Database=crm;User Id=svc_crm;Password=Pr0d!2024;

Internal API tokens and service keys

Open in Builder

Microsoft's credential pack covers the big vendors. Your internal gateway tokens, service keys, and signing secrets follow whatever format your platform team picked, and only a custom SIT will find them. Adapt the prefix and length to your format.

mediumregex \b(?:tok|key|svc)_[A-Za-z0-9]{24,40}\b
highregex \b(?:tok|key|svc)_[A-Za-z0-9]{24,40}\b + api key, token, secret, bearer, credential, rotate within 100 chars
  • -Change the tok_, key_, svc_ prefixes to whatever your tokens actually start with. The fixed prefix is what keeps false positives near zero.
  • -The bare format sits at medium; the high pattern wants a nearby keyword like "api key" or "secret". Route blocking rules at high and monitoring at medium.
  • -If your tokens have no prefix at all, expect noise: a bare 32-character alphanumeric regex matches GUIDs, hashes, and file names. Add context keywords before trusting it.
Sample that matches: Please rotate the gateway api key tok_9f2Kb7Qp4Lm8Rx3Tz6Vw1Ny5Jc8H before Friday.

Customer account numbers

Open in Builder

Eight digits is a noisy shape: dates, order IDs, and phone fragments all collide with it. This recipe shows how additional checks rescue a weak format: constrain the leading digit, reject repeated-digit strings, and require account context for high confidence.

lowregex \b\d{8}\b 2 checks
highregex \b\d{8}\b + account number, account no, customer account, acct within 100 chars 2 checks
  • -Set "Starts with" to the ranges your account numbers actually use. Every excluded leading digit removes a slice of false positives.
  • -Exclude duplicate characters kills 11111111-style test data, which is the most common false positive in real tenants.
  • -If your account numbers carry a check digit, validate the format in the portal with the checksum validator. The Builder flags where checks like these belong.
Sample that matches: Refund approved for customer account number 48217634.

Board and executive material

Open in Builder

Board packs leak through ordinary sharing, not hacking. A keyword-primary SIT that recognises the standing vocabulary of board material gives DLP and auto-labeling something to anchor on without any regex at all.

mediumkeywords: board pack, board minutes, remuneration committee, audit committee, executive session
highkeywords: board pack, board minutes, remuneration committee, audit committee, executive session + confidential, restricted, not for distribution, draft within 100 chars
  • -Replace the keyword list with the headings your own board template actually uses. The narrower the vocabulary, the cleaner the matches.
  • -Keywords are the primary element here, so one phrase anywhere in a document is a match at medium. That is deliberately broad: route it to monitoring, not blocking.
  • -For enforcement, prefer labeling the board pack template itself, or document fingerprinting if the format is stable. This SIT is the detection net underneath.
Sample that matches: Confidential: the Q3 board pack is attached for review ahead of Thursday.

Recipes simulate Purview's evaluation in your browser. Build the final version in the portal and confirm with its SIT test function before enforcing anything.