Large language models (LLMs) can be trained or fine-tuned on data obtained without the owner’s consent. Verifying whether a specific LLM was trained on particular data instances—or on an entire dataset—is extremely challenging. Dataset watermarking addresses this by embedding identifiable modifications in training data to detect unauthorized use. However, existing methods often lack stealth, making them relatively easy to detect and remove.
We propose LexiMark, a novel watermarking technique for text and documents that embeds synonym substitutions for carefully selected high-entropy words. The substitutions boost an LLM’s memorization of the watermarked text without altering its semantic integrity. Consequently, the watermark blends seamlessly into the text—no visible markers—while remaining resistant to automated or manual removal.
We evaluate LexiMark on baseline datasets from recent studies and seven open-source models: LLaMA-1 7B, LLaMA-3 8B, Mistral 7B, Pythia 6.9B, and three smaller Pythia variants (160 M, 410 M, 1 B). Our experiments cover continued-pretraining and fine-tuning scenarios. LexiMark consistently boosts AUROC scores over prior methods, demonstrating its effectiveness in reliably detecting whether unauthorized watermarked data was used during LLM training.
LexiMark operates in two phases:
The approach is stealthy (no unnatural glyphs), robust (survives post-training and mild edits) and portable (no auxiliary models required).
Dataset | Metric | Pythia 6.9 B | LLaMA-1 7 B | LLaMA-3 8 B | Mistral 7 B | ||||
---|---|---|---|---|---|---|---|---|---|
None | LexiMark | None | LexiMark | None | LexiMark | None | LexiMark | ||
BookMIA | AUROC | 69.1 | 94.8 | 73.2 | 95.9 | 79.0 | 96.9 | 84.7 | 96.7 |
TPR @ 5 % FPR | 13.5 | 79.1 | 18.3 | 84.3 | 24.3 | 84.4 | 30.2 | 90.9 | |
Enron Emails | AUROC | 65.6 | 72.3 | 65.6 | 69.8 | 71.3 | 75.3 | 78.1 | 81.6 |
TPR @ 5 % FPR | 11.0 | 23.8 | 11.0 | 19.4 | 12.4 | 21.3 | 27.7 | 31.2 | |
PubMed Abstracts | AUROC | 68.7 | 76.0 | 72.2 | 80.7 | 78.4 | 83.3 | 83.8 | 88.7 |
TPR @ 5 % FPR | 17.9 | 25.0 | 23.6 | 35.0 | 35.4 | 41.5 | 48.4 | 58.4 | |
Wikipedia (en) | AUROC | 65.5 | 74.5 | 63.1 | 73.0 | 70.8 | 78.9 | 77.2 | 84.6 |
TPR @ 5 % FPR | 10.2 | 16.6 | 12.4 | 19.7 | 14.2 | 22.8 | 18.1 | 31.7 | |
PILE – FreeLaw | AUROC | 67.7 | 83.3 | 57.2 | 61.6 | 70.9 | 87.0 | 80.1 | 92.0 |
TPR @ 5 % FPR | 10.0 | 37.0 | 9.8 | 23.7 | 11.8 | 42.6 | 18.5 | 67.1 | |
USPTO Backgrounds | AUROC | 63.4 | 76.1 | 65.0 | 78.5 | 72.4 | 82.5 | 82.0 | 89.8 |
TPR @ 5 % FPR | 9.2 | 22.7 | 14.5 | 28.3 | 21.1 | 35.2 | 39.6 | 60.6 |
Dataset | Metric | Pythia 160 M | Pythia 410 M | Pythia 1 B | |||
---|---|---|---|---|---|---|---|
None | LexiMark | None | LexiMark | None | LexiMark | ||
BookMIA | AUROC | 77.5 | 95.0 | 87.3 | 97.0 | 88.1 | 96.2 |
TPR @ 5 % FPR | 18.0 | 89.5 | 25.0 | 95.9 | 24.5 | 95.0 | |
Enron Emails | AUROC | 79.1 | 85.2 | 84.6 | 87.6 | 85.8 | 89.0 |
TPR @ 5 % FPR | 26.8 | 51.3 | 31.0 | 59.2 | 48.0 | 68.4 | |
PubMed Abstracts | AUROC | 69.9 | 77.9 | 86.5 | 89.0 | 93.8 | 96.5 |
TPR @ 5 % FPR | 17.9 | 26.8 | 52.9 | 60.2 | 82.7 | 89.8 | |
Wikipedia (en) | AUROC | 68.4 | 74.5 | 76.8 | 84.5 | 80.2 | 87.9 |
TPR @ 5 % FPR | 10.0 | 17.0 | 18.1 | 37.9 | 33.4 | 57.1 | |
PILE - FreeLaw | AUROC | 67.2 | 79.8 | 73.9 | 87.1 | 78.1 | 91.4 |
TPR @ 5 % FPR | 13.5 | 34.5 | 18.8 | 46.9 | 23.4 | 64.3 | |
USPTO Backgrounds | AUROC | 69.5 | 79.4 | 80.5 | 89.8 | 83.5 | 92.0 |
TPR @ 5 % FPR | 17.4 | 29.5 | 41.5 | 61.2 | 54.1 | 73.2 |
@misc{german2025leximarkrobustwatermarkinglexical,
title = {LexiMark: Robust Watermarking via Lexical Substitutions to Enhance Membership Verification of an LLM's Textual Training Data},
author = {Eyal German and Sagiv Antebi and Edan Habler and Asaf Shabtai and Yuval Elovici},
year = {2025},
eprint = {2506.14474},
archivePrefix= {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2506.14474}
}