Tab‑MIA introduces the first dedicated benchmark for measuring how susceptible large‑language‑models (LLMs) are to membership‑inference attacks (MIAs) when they are trained on tabular data. The benchmark bundles five real‑world table collections — ranging from census records to Wikipedia QA tables — and serialises each one into six popular text‑based formats (JSON, HTML, Markdown, Key‑Value, Key‑is‑Value and Line‑Separated).
Using Tab‑MIA we evaluate three state‑of‑the‑art black‑box MIAs (LOSS, Min‑K %, Min‑K %++) against four open‑weight LLMs (LLaMA‑3 8 B, LLaMA‑3 3 B, Mistral 7 B and Gemma‑3 4 B). Even with as few as three epochs of fine‑tuning, attacks achieve AUROC scores above 90 % on most short‑context datasets, revealing severe privacy risks. We further show that encoding choices matter: flat row‑oriented serialisations (Line‑Separated, Key‑Value) amplify memorisation, whereas tag‑heavy encodings such as HTML dilute it.
Tab‑MIA is released on HuggingFace together with evaluation scripts to catalyse future research on privacy‑preserving training and defences for tabular data in LLMs.
Each dataset in Tab‑MIA undergoes a three‑step pipeline: (i) deduplication and filtering, (ii) optional row‑chunking for long tables, and (iii) six‑fold serialization. The result is a set of JSONL files whose lines correspond to individual tables (or table chunks), making it simple to stream data during fine‑tuning.
The table below summarises the included collections.
Collection | Domain | Context | # Records | # Features |
---|---|---|---|---|
WikiTableQuestions | Wikipedia | Short | 1 290 | ≥ 5 |
WikiSQL | Wikipedia | Short | 17 900 | ≥ 5 |
TabFact | Wikipedia | Short | 13 100 | ≥ 5 |
Adult (Census) | Income Prediction | Long | 2 440 | 15 |
California Housing | Housing Prices | Long | 1 030 | 10 |
Encoding | Llama-3.2 3B | Mistral 7B | Gemma-3 4B |
---|---|---|---|
Markdown | 85.3 | 94.2 | 86.7 |
JSON | 79.8 | 82.7 | 79.2 |
HTML | 79.7 | 92.9 | 83.3 |
Key-Value Pair | 83.5 | 94.9 | 85.5 |
Key-is-Value | 83.7 | 89.7 | 85.0 |
Line-Separated | 89.7 | 97.7 | 89.6 |
Model | # Epochs | Adult | California | WTQ | WikiSQL | TabFact |
---|---|---|---|---|---|---|
LLaMA-3.1 8B | 1 | 55.1 | 59.0 | 61.6 | 64.5 | 64.9 |
2 | 60.0 | 72.8 | 80.8 | 78.6 | 79.6 | |
3 | 71.1 | 87.8 | 93.6 | 88.9 | 89.9 | |
LLaMA-3.2 3B | 1 | 54.1 | 57.7 | 57.6 | 61.5 | 61.5 |
2 | 58.0 | 66.8 | 74.8 | 73.6 | 73.4 | |
3 | 64.4 | 77.2 | 89.7 | 83.2 | 80.4 | |
Mistral 7B | 1 | 54.6 | 57.8 | 69.7 | 67.5 | 68.5 |
2 | 58.9 | 70.3 | 88.4 | 80.0 | 81.2 | |
3 | 71.5 | 86.8 | 97.7 | 87.8 | 89.9 | |
Gemma-3 4B | 1 | 53.9 | 54.3 | 59.3 | 62.6 | 63.3 |
2 | 58.9 | 62.5 | 77.0 | 76.6 | 77.9 | |
3 | 67.7 | 73.8 | 89.6 | 86.1 | 87.4 |
@misc{german2025tabmiabenchmarkdatasetmembership,
title = {Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs},
author = {Eyal German and Sagiv Antebi and Daniel Samira and Asaf Shabtai and Yuval Elovici},
year = {2025},
eprint = {2507.17259},
archivePrefix= {arXiv},
primaryClass = {cs.CR},
url = {https://arxiv.org/abs/2507.17259}
}