Tab‑MIA — Benchmark for MIAs on Tabular Data in LLMs

Tab‑MIA: A Benchmark Dataset
for Membership Inference Attacks
on Tabular Data in LLMs

¹Ben‑Gurion University of the Negev

Abstract

Tab‑MIA introduces the first dedicated benchmark for measuring how susceptible large‑language‑models (LLMs) are to membership‑inference attacks (MIAs) when they are trained on tabular data. The benchmark bundles five real‑world table collections — ranging from census records to Wikipedia QA tables — and serialises each one into six popular text‑based formats (JSON, HTML, Markdown, Key‑Value, Key‑is‑Value and Line‑Separated).

Using Tab‑MIA we evaluate three state‑of‑the‑art black‑box MIAs (LOSS, Min‑K %, Min‑K %++) against four open‑weight LLMs (LLaMA‑3 8 B, LLaMA‑3 3 B, Mistral 7 B and Gemma‑3 4 B). Even with as few as three epochs of fine‑tuning, attacks achieve AUROC scores above 90 % on most short‑context datasets, revealing severe privacy risks. We further show that encoding choices matter: flat row‑oriented serialisations (Line‑Separated, Key‑Value) amplify memorisation, whereas tag‑heavy encodings such as HTML dilute it.

Tab‑MIA is released on HuggingFace together with evaluation scripts to catalyse future research on privacy‑preserving training and defences for tabular data in LLMs.

Benchmark Overview

Each dataset in Tab‑MIA undergoes a three‑step pipeline: (i) deduplication and filtering, (ii) optional row‑chunking for long tables, and (iii) six‑fold serialization. The result is a set of JSONL files whose lines correspond to individual tables (or table chunks), making it simple to stream data during fine‑tuning.

The table below summarises the included collections.

Datasets in Tab‑MIA
Collection	Domain	Context	# Records	# Features
WikiTableQuestions	Wikipedia	Short	1 290	≥ 5
WikiSQL	Wikipedia	Short	17 900	≥ 5
TabFact	Wikipedia	Short	13 100	≥ 5
Adult (Census)	Income Prediction	Long	2 440	15
California Housing	Housing Prices	Long	1 030	10

Datasets in Tab‑MIA

Collection

Domain

Context

# Records

# Features

WikiTableQuestions

Wikipedia

Short

1 290

≥ 5

WikiSQL

Wikipedia

Short

17 900

≥ 5

TabFact

Wikipedia

Short

13 100

≥ 5

Adult (Census)

Income Prediction

Long

2 440

California Housing

Housing Prices

Long

1 030

Effect of Encoding Format

AUROC scores on the WTQ dataset for Min‑K++ 20 % MIA across encoding formats and models. Bold = best per row.
Encoding	Llama-3.2 3B	Mistral 7B	Gemma-3 4B
Markdown	85.3	94.2	86.7
JSON	79.8	82.7	79.2
HTML	79.7	92.9	83.3
Key-Value Pair	83.5	94.9	85.5
Key-is-Value	83.7	89.7	85.0
Line-Separated	89.7	97.7	89.6

AUROC scores on the WTQ dataset for Min‑K++ 20 % MIA across encoding formats and models. Bold = best per row.

Encoding

Llama-3.2 3B

Mistral 7B

Gemma-3 4B

Markdown

85.3

94.2

86.7

JSON

79.8

82.7

79.2

HTML

79.7

92.9

83.3

Key-Value Pair

83.5

94.9

85.5

Key-is-Value

83.7

89.7

85.0

Line-Separated

89.7

97.7

89.6

Effect of Fine-Tuning Epochs

AUROC vs. epochs across datasets (Line‑Separated encoding, Min‑K++ 20 % MIA). Bold = best per row.
Model	# Epochs	Adult	California	WTQ	WikiSQL	TabFact
LLaMA-3.1 8B	1	55.1	59.0	61.6	64.5	64.9
2	60.0	72.8	80.8	78.6	79.6
3	71.1	87.8	93.6	88.9	89.9
LLaMA-3.2 3B	1	54.1	57.7	57.6	61.5	61.5
2	58.0	66.8	74.8	73.6	73.4
3	64.4	77.2	89.7	83.2	80.4
Mistral 7B	1	54.6	57.8	69.7	67.5	68.5
2	58.9	70.3	88.4	80.0	81.2
3	71.5	86.8	97.7	87.8	89.9
Gemma-3 4B	1	53.9	54.3	59.3	62.6	63.3
2	58.9	62.5	77.0	76.6	77.9
3	67.7	73.8	89.6	86.1	87.4

AUROC vs. epochs across datasets (Line‑Separated encoding, Min‑K++ 20 % MIA). Bold = best per row.

Model

# Epochs

Adult

California

WTQ

WikiSQL

TabFact

LLaMA-3.1 8B

55.1

59.0

61.6

64.5

64.9

60.0

72.8

80.8

78.6

79.6

71.1

87.8

93.6

88.9

89.9

LLaMA-3.2 3B

54.1

57.7

57.6

61.5

61.5

58.0

66.8

74.8

73.6

73.4

64.4

77.2

89.7

83.2

80.4

Mistral 7B

54.6

57.8

69.7

67.5

68.5

58.9

70.3

88.4

80.0

81.2

71.5

86.8

97.7

87.8

89.9

Gemma-3 4B

53.9

54.3

59.3

62.6

63.3

58.9

62.5

77.0

76.6

77.9

67.7

73.8

89.6

86.1

87.4

BibTeX

@misc{german2025tabmiabenchmarkdatasetmembership, title = {Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs}, author = {Eyal German and Sagiv Antebi and Daniel Samira and Asaf Shabtai and Yuval Elovici}, year = {2025}, eprint = {2507.17259}, archivePrefix= {arXiv}, primaryClass = {cs.CR}, url = {https://arxiv.org/abs/2507.17259} }

Tab‑MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs

Abstract

Benchmark Overview

Effect of Encoding Format

Effect of Fine-Tuning Epochs

BibTeX

Tab‑MIA: A Benchmark Dataset
for Membership Inference Attacks
on Tabular Data in LLMs