LexiMark: Robust Watermarking via
Lexical Substitutions to Enhance Membership
Verification of an LLM’s Textual Training Data

1Ben-Gurion University of the Negev
LexiMark watermark-and-MIA workflow with key highlights

Abstract

Large language models (LLMs) can be trained or fine-tuned on data obtained without the owner’s consent. Verifying whether a specific LLM was trained on particular data instances—or on an entire dataset—is extremely challenging. Dataset watermarking addresses this by embedding identifiable modifications in training data to detect unauthorized use. However, existing methods often lack stealth, making them relatively easy to detect and remove.

We propose LexiMark, a novel watermarking technique for text and documents that embeds synonym substitutions for carefully selected high-entropy words. The substitutions boost an LLM’s memorization of the watermarked text without altering its semantic integrity. Consequently, the watermark blends seamlessly into the text—no visible markers—while remaining resistant to automated or manual removal.

We evaluate LexiMark on baseline datasets from recent studies and seven open-source models: LLaMA-1 7B, LLaMA-3 8B, Mistral 7B, Pythia 6.9B, and three smaller Pythia variants (160 M, 410 M, 1 B). Our experiments cover continued-pretraining and fine-tuning scenarios. LexiMark consistently boosts AUROC scores over prior methods, demonstrating its effectiveness in reliably detecting whether unauthorized watermarked data was used during LLM training.

Method

LexiMark operates in two phases:

  • Watermark Embedding. For every sentence we (i) rank words by corpus entropy, (ii) select the top-K, and (iii) substitute each with a synonym that yields even higher entropy while retaining syntactic correctness. Substitutions are blocked on stop-words and named entities.
  • Watermark Detection. After the suspect model is trained, we run any standard black-box MIA (e.g. Min-K++ 20%) on the watermarked tokens only. The resulting likelihood gap yields state-of-the-art AUROC with as few as six records.

The approach is stealthy (no unnatural glyphs), robust (survives post-training and mild edits) and portable (no auxiliary models required).

Fine-tune LLMs

Comparison of watermarking and non-watermarking methods on various datasets and models (AUROC / TPR @ FPR = 5 %).
k = 5, concatenation synonym selection, Min-K++ 20 % MIA. Bold values indicate the best score for each dataset–model pair.
Dataset Metric Pythia 6.9 B LLaMA-1 7 B LLaMA-3 8 B Mistral 7 B
NoneLexiMark NoneLexiMark NoneLexiMark NoneLexiMark
BookMIA AUROC 69.194.8 73.295.9 79.096.9 84.796.7
TPR @ 5 % FPR 13.579.1 18.384.3 24.384.4 30.290.9
Enron Emails AUROC 65.672.3 65.669.8 71.375.3 78.181.6
TPR @ 5 % FPR 11.023.8 11.019.4 12.421.3 27.731.2
PubMed Abstracts AUROC 68.776.0 72.280.7 78.483.3 83.888.7
TPR @ 5 % FPR 17.925.0 23.635.0 35.441.5 48.458.4
Wikipedia (en) AUROC 65.574.5 63.173.0 70.878.9 77.284.6
TPR @ 5 % FPR 10.216.6 12.419.7 14.222.8 18.131.7
PILE – FreeLaw AUROC 67.783.3 57.261.6 70.987.0 80.192.0
TPR @ 5 % FPR 10.037.0 9.8 23.7 11.842.6 18.567.1
USPTO Backgrounds AUROC 63.476.1 65.078.5 72.482.5 82.089.8
TPR @ 5 % FPR 9.2 22.7 14.528.3 21.135.2 39.660.6

Continued Pretraining

Comparison of watermarking and non-watermarking methods on various datasets and models (AUROC / TPR @ FPR = 5 %).
k = 5, concatenation-based synonym selection, Min-K++ 20 % MIA. Bold values mark the best score per dataset–model pair.
Dataset Metric Pythia 160 M Pythia 410 M Pythia 1 B
NoneLexiMark NoneLexiMark NoneLexiMark
BookMIA AUROC 77.595.0 87.397.0 88.196.2
TPR @ 5 % FPR 18.089.5 25.095.9 24.595.0
Enron Emails AUROC 79.185.2 84.687.6 85.889.0
TPR @ 5 % FPR 26.851.3 31.059.2 48.068.4
PubMed Abstracts AUROC 69.977.9 86.589.0 93.896.5
TPR @ 5 % FPR 17.926.8 52.960.2 82.789.8
Wikipedia (en) AUROC 68.474.5 76.884.5 80.287.9
TPR @ 5 % FPR 10.017.0 18.137.9 33.457.1
PILE - FreeLaw AUROC 67.279.8 73.987.1 78.191.4
TPR @ 5 % FPR 13.534.5 18.846.9 23.464.3
USPTO Backgrounds AUROC 69.579.4 80.589.8 83.592.0
TPR @ 5 % FPR 17.429.5 41.561.2 54.173.2

BibTeX

@misc{german2025leximarkrobustwatermarkinglexical,
  title        = {LexiMark: Robust Watermarking via Lexical Substitutions to Enhance Membership Verification of an LLM's Textual Training Data},
  author       = {Eyal German and Sagiv Antebi and Edan Habler and Asaf Shabtai and Yuval Elovici},
  year         = {2025},
  eprint       = {2506.14474},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL},
  url          = {https://arxiv.org/abs/2506.14474}
}