LexiMark — Robust Watermarking for LLM Pre-training Verification

LexiMark: Robust Watermarking via
Lexical Substitutions to Enhance Membership
Verification of an LLM’s Textual Training Data

¹Ben-Gurion University of the Negev

Abstract

Large language models (LLMs) can be trained or fine-tuned on data obtained without the owner’s consent. Verifying whether a specific LLM was trained on particular data instances—or on an entire dataset—is extremely challenging. Dataset watermarking addresses this by embedding identifiable modifications in training data to detect unauthorized use. However, existing methods often lack stealth, making them relatively easy to detect and remove.

We propose LexiMark, a novel watermarking technique for text and documents that embeds synonym substitutions for carefully selected high-entropy words. The substitutions boost an LLM’s memorization of the watermarked text without altering its semantic integrity. Consequently, the watermark blends seamlessly into the text—no visible markers—while remaining resistant to automated or manual removal.

We evaluate LexiMark on baseline datasets from recent studies and seven open-source models: LLaMA-1 7B, LLaMA-3 8B, Mistral 7B, Pythia 6.9B, and three smaller Pythia variants (160 M, 410 M, 1 B). Our experiments cover continued-pretraining and fine-tuning scenarios. LexiMark consistently boosts AUROC scores over prior methods, demonstrating its effectiveness in reliably detecting whether unauthorized watermarked data was used during LLM training.

Method

LexiMark operates in two phases:

Watermark Embedding. For every sentence we (i) rank words by corpus entropy, (ii) select the top-K, and (iii) substitute each with a synonym that yields even higher entropy while retaining syntactic correctness. Substitutions are blocked on stop-words and named entities.
Watermark Detection. After the suspect model is trained, we run any standard black-box MIA (e.g. Min-K++ 20%) on the watermarked tokens only. The resulting likelihood gap yields state-of-the-art AUROC with as few as six records.

The approach is stealthy (no unnatural glyphs), robust (survives post-training and mild edits) and portable (no auxiliary models required).

Fine-tune LLMs

Comparison of watermarking and non-watermarking methods on various datasets and models (AUROC / TPR @ FPR = 5 %).
*k = 5*, concatenation synonym selection, Min-K++ 20 % MIA. Bold values indicate the best score for each dataset–model pair.
Dataset	Metric	Pythia 6.9 B	LLaMA-1 7 B	LLaMA-3 8 B	Mistral 7 B
BookMIA	AUROC	69.1	94.8	73.2	95.9	79.0	96.9	84.7	96.7
TPR @ 5 % FPR	13.5	79.1	18.3	84.3	24.3	84.4	30.2	90.9
Enron Emails	AUROC	65.6	72.3	65.6	69.8	71.3	75.3	78.1	81.6
TPR @ 5 % FPR	11.0	23.8	11.0	19.4	12.4	21.3	27.7	31.2
PubMed Abstracts	AUROC	68.7	76.0	72.2	80.7	78.4	83.3	83.8	88.7
TPR @ 5 % FPR	17.9	25.0	23.6	35.0	35.4	41.5	48.4	58.4
Wikipedia (en)	AUROC	65.5	74.5	63.1	73.0	70.8	78.9	77.2	84.6
TPR @ 5 % FPR	10.2	16.6	12.4	19.7	14.2	22.8	18.1	31.7
PILE – FreeLaw	AUROC	67.7	83.3	57.2	61.6	70.9	87.0	80.1	92.0
TPR @ 5 % FPR	10.0	37.0	9.8	23.7	11.8	42.6	18.5	67.1
USPTO Backgrounds	AUROC	63.4	76.1	65.0	78.5	72.4	82.5	82.0	89.8
TPR @ 5 % FPR	9.2	22.7	14.5	28.3	21.1	35.2	39.6	60.6

Comparison of watermarking and non-watermarking methods on various datasets and models (AUROC / TPR @ FPR = 5 %).
k = 5, concatenation synonym selection, Min-K++ 20 % MIA. Bold values indicate the best score for each dataset–model pair.

Dataset

Metric

Pythia 6.9 B

LLaMA-1 7 B

LLaMA-3 8 B

Mistral 7 B

None

LexiMark

None

LexiMark

None

LexiMark

None

LexiMark

BookMIA

AUROC

69.1

94.8

73.2

95.9

79.0

96.9

84.7

96.7

TPR @ 5 % FPR

13.5

79.1

18.3

84.3

24.3

84.4

30.2

90.9

Enron Emails

AUROC

65.6

72.3

65.6

69.8

71.3

75.3

78.1

81.6

TPR @ 5 % FPR

11.0

23.8

11.0

19.4

12.4

21.3

27.7

31.2

PubMed Abstracts

AUROC

68.7

76.0

72.2

80.7

78.4

83.3

83.8

88.7

TPR @ 5 % FPR

17.9

25.0

23.6

35.0

35.4

41.5

48.4

58.4

Wikipedia (en)

AUROC

65.5

74.5

63.1

73.0

70.8

78.9

77.2

84.6

TPR @ 5 % FPR

10.2

16.6

12.4

19.7

14.2

22.8

18.1

31.7

PILE – FreeLaw

AUROC

67.7

83.3

57.2

61.6

70.9

87.0

80.1

92.0

TPR @ 5 % FPR

10.0

37.0

9.8

23.7

11.8

42.6

18.5

67.1

USPTO Backgrounds

AUROC

63.4

76.1

65.0

78.5

72.4

82.5

82.0

89.8

TPR @ 5 % FPR

9.2

22.7

14.5

28.3

21.1

35.2

39.6

60.6

Continued Pretraining

Comparison of watermarking and non-watermarking methods on various datasets and models (AUROC / TPR @ FPR = 5 %).
*k = 5*, concatenation-based synonym selection, Min-K++ 20 % MIA. Bold values mark the best score per dataset–model pair.
Dataset	Metric	Pythia 160 M	Pythia 410 M	Pythia 1 B
BookMIA	AUROC	77.5	95.0	87.3	97.0	88.1	96.2
TPR @ 5 % FPR	18.0	89.5	25.0	95.9	24.5	95.0
Enron Emails	AUROC	79.1	85.2	84.6	87.6	85.8	89.0
TPR @ 5 % FPR	26.8	51.3	31.0	59.2	48.0	68.4
PubMed Abstracts	AUROC	69.9	77.9	86.5	89.0	93.8	96.5
TPR @ 5 % FPR	17.9	26.8	52.9	60.2	82.7	89.8
Wikipedia (en)	AUROC	68.4	74.5	76.8	84.5	80.2	87.9
TPR @ 5 % FPR	10.0	17.0	18.1	37.9	33.4	57.1
PILE - FreeLaw	AUROC	67.2	79.8	73.9	87.1	78.1	91.4
TPR @ 5 % FPR	13.5	34.5	18.8	46.9	23.4	64.3
USPTO Backgrounds	AUROC	69.5	79.4	80.5	89.8	83.5	92.0
TPR @ 5 % FPR	17.4	29.5	41.5	61.2	54.1	73.2

Comparison of watermarking and non-watermarking methods on various datasets and models (AUROC / TPR @ FPR = 5 %).
k = 5, concatenation-based synonym selection, Min-K++ 20 % MIA. Bold values mark the best score per dataset–model pair.

Dataset

Metric

Pythia 160 M

Pythia 410 M

Pythia 1 B

None

LexiMark

None

LexiMark

None

LexiMark

BookMIA

AUROC

77.5

95.0

87.3

97.0

88.1

96.2

TPR @ 5 % FPR

18.0

89.5

25.0

95.9

24.5

95.0

Enron Emails

AUROC

79.1

85.2

84.6

87.6

85.8

89.0

TPR @ 5 % FPR

26.8

51.3

31.0

59.2

48.0

68.4

PubMed Abstracts

AUROC

69.9

77.9

86.5

89.0

93.8

96.5

TPR @ 5 % FPR

17.9

26.8

52.9

60.2

82.7

89.8

Wikipedia (en)

AUROC

68.4

74.5

76.8

84.5

80.2

87.9

TPR @ 5 % FPR

10.0

17.0

18.1

37.9

33.4

57.1

PILE - FreeLaw

AUROC

67.2

79.8

73.9

87.1

78.1

91.4

TPR @ 5 % FPR

13.5

34.5

18.8

46.9

23.4

64.3

USPTO Backgrounds

AUROC

69.5

79.4

80.5

89.8

83.5

92.0

TPR @ 5 % FPR

17.4

29.5

41.5

61.2

54.1

73.2

BibTeX

@misc{german2025leximarkrobustwatermarkinglexical, title = {LexiMark: Robust Watermarking via Lexical Substitutions to Enhance Membership Verification of an LLM's Textual Training Data}, author = {Eyal German and Sagiv Antebi and Edan Habler and Asaf Shabtai and Yuval Elovici}, year = {2025}, eprint = {2506.14474}, archivePrefix= {arXiv}, primaryClass = {cs.CL}, url = {https://arxiv.org/abs/2506.14474} }

LexiMark: Robust Watermarking via Lexical Substitutions to Enhance Membership Verification of an LLM’s Textual Training Data

Abstract

Method

Fine-tune LLMs

Continued Pretraining

BibTeX

LexiMark: Robust Watermarking via
Lexical Substitutions to Enhance Membership
Verification of an LLM’s Textual Training Data