Stylometry recognizes human and LLM-generated texts in short samples

Przystalski, Karol; Argasiński, Jan; Grabska-Gradzińska, Iwona; Ochab, Jeremi

doi:10.1016/j.eswa.2025.129001

Simple view

Full metadata view

Authors

Statistics

Stylometry recognizes human and LLM-generated texts in short samples

2025

journal article

article

10.1016/j.eswa.2025.129001

Journal

Expert Systems with Applications

200

Author

Przystalski Karol

Argasiński Jan

Grabska-Gradzińska Iwona

Ochab Jeremi

Volume

296, Part B

Article ID

129001

ISSN

0957-4174

Keywords in English

stylometry

large language models

machine-generated text detection

AI detection

benchmark dataset

URL

https://www.sciencedirect.com/science/article/pii/S0957417425026181

Access date

2025-07-21

Language

English

Journal language

English

Abstract in English

The paper explores stylometry as a method to distinguish between texts created by Large Language Models (LLMs) and humans, addressing issues of model attribution, intellectual property, and ethical AI use. Stylometry has been used extensively to characterise the style and attribute authorship of texts. By applying it to LLM-generated texts, we identify their emergent writing patterns. The paper involves creating a benchmark dataset based on Wikipedia, with (a) human-written term summaries, (b) texts generated purely by LLMs (GPT-3.5/4, LLaMa 2/3, Orca, and Falcon), (c) processed through multiple text summarisation methods (T5, BART, Gensim, and Sumy), and (d) rephrasing methods (Dipper, T5). The 10-sentence long texts were classified by tree-based models (decision trees and LightGBM) using human-designed (StyloMetrix) and n-gram-based (our own pipeline) stylometric features that encode lexical, grammatical, syntactic, and punctuation patterns. The cross-validated results reached a performance of up to.87 Matthews correlation coefficient in the multiclass scenario with 7 classes, and accuracy between.79 and 1. in binary classification, with the particular example of Wikipedia and GPT-4 reaching up to.98 accuracy on a balanced dataset. Shapley Additive Explanations pinpointed features characteristic of the encyclopaedic text type, individual overused words, as well as a greater grammatical standardisation of LLMs with respect to human-written texts. These results show – crucially, in the context of the increasingly sophisticated LLMs – that it is possible to distinguish machine- from human-generated texts at least for a well-defined text type

dc.abstract.en	The paper explores stylometry as a method to distinguish between texts created by Large Language Models (LLMs) and humans, addressing issues of model attribution, intellectual property, and ethical AI use. Stylometry has been used extensively to characterise the style and attribute authorship of texts. By applying it to LLM-generated texts, we identify their emergent writing patterns. The paper involves creating a benchmark dataset based on Wikipedia, with (a) human-written term summaries, (b) texts generated purely by LLMs (GPT-3.5/4, LLaMa 2/3, Orca, and Falcon), (c) processed through multiple text summarisation methods (T5, BART, Gensim, and Sumy), and (d) rephrasing methods (Dipper, T5). The 10-sentence long texts were classified by tree-based models (decision trees and LightGBM) using human-designed (StyloMetrix) and n-gram-based (our own pipeline) stylometric features that encode lexical, grammatical, syntactic, and punctuation patterns. The cross-validated results reached a performance of up to.87 Matthews correlation coefficient in the multiclass scenario with 7 classes, and accuracy between.79 and 1. in binary classification, with the particular example of Wikipedia and GPT-4 reaching up to.98 accuracy on a balanced dataset. Shapley Additive Explanations pinpointed features characteristic of the encyclopaedic text type, individual overused words, as well as a greater grammatical standardisation of LLMs with respect to human-written texts. These results show – crucially, in the context of the increasingly sophisticated LLMs – that it is possible to distinguish machine- from human-generated texts at least for a well-defined text type
dc.affiliation	Wydział Fizyki, Astronomii i Informatyki Stosowanej : Instytut Informatyki Stosowanej
dc.affiliation	Wydział Fizyki, Astronomii i Informatyki Stosowanej : Instytut Fizyki Teoretycznej
dc.contributor.author	Przystalski, Karol - 126070
dc.contributor.author	Argasiński, Jan - 105948
dc.contributor.author	Grabska-Gradzińska, Iwona - 121296
dc.contributor.author	Ochab, Jeremi - 122224
dc.date.accession	2025-07-21
dc.date.accessioned	2025-07-24T12:54:11Z
dc.date.available	2025-07-24T12:54:11Z
dc.date.createdat	2025-07-21T07:37:40Z	en
dc.date.issued	2025
dc.date.openaccess	0
dc.description.accesstime	w momencie opublikowania
dc.description.version	ostateczna wersja wydawcy
dc.description.volume	296, Part B
dc.identifier.articleid	129001
dc.identifier.doi	10.1016/j.eswa.2025.129001
dc.identifier.issn	0957-4174
dc.identifier.project	DRC AI
dc.identifier.uri	https://ruj.uj.edu.pl/handle/item/558168
dc.identifier.weblink	https://www.sciencedirect.com/science/article/pii/S0957417425026181
dc.language	eng
dc.language.container	eng
dc.rights	Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa
dc.rights.licence	CC-BY
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/legalcode.pl
dc.share.type	inne
dc.subject.en	stylometry
dc.subject.en	large language models
dc.subject.en	machine-generated text detection
dc.subject.en	AI detection
dc.subject.en	benchmark dataset
dc.subtype	Article
dc.title	Stylometry recognizes human and LLM-generated texts in short samples
dc.title.journal	Expert Systems with Applications
dc.type	JournalArticle
dspace.entity.type	Publication	en

dc.abstract.en

The paper explores stylometry as a method to distinguish between texts created by Large Language Models (LLMs) and humans, addressing issues of model attribution, intellectual property, and ethical AI use. Stylometry has been used extensively to characterise the style and attribute authorship of texts. By applying it to LLM-generated texts, we identify their emergent writing patterns. The paper involves creating a benchmark dataset based on Wikipedia, with (a) human-written term summaries, (b) texts generated purely by LLMs (GPT-3.5/4, LLaMa 2/3, Orca, and Falcon), (c) processed through multiple text summarisation methods (T5, BART, Gensim, and Sumy), and (d) rephrasing methods (Dipper, T5). The 10-sentence long texts were classified by tree-based models (decision trees and LightGBM) using human-designed (StyloMetrix) and n-gram-based (our own pipeline) stylometric features that encode lexical, grammatical, syntactic, and punctuation patterns. The cross-validated results reached a performance of up to.87 Matthews correlation coefficient in the multiclass scenario with 7 classes, and accuracy between.79 and 1. in binary classification, with the particular example of Wikipedia and GPT-4 reaching up to.98 accuracy on a balanced dataset. Shapley Additive Explanations pinpointed features characteristic of the encyclopaedic text type, individual overused words, as well as a greater grammatical standardisation of LLMs with respect to human-written texts. These results show – crucially, in the context of the increasingly sophisticated LLMs – that it is possible to distinguish machine- from human-generated texts at least for a well-defined text type

dc.affiliation

Wydział Fizyki, Astronomii i Informatyki Stosowanej : Instytut Informatyki Stosowanej

dc.affiliation

Wydział Fizyki, Astronomii i Informatyki Stosowanej : Instytut Fizyki Teoretycznej

dc.contributor.author

Przystalski, Karol - 126070

dc.contributor.author

Argasiński, Jan - 105948

dc.contributor.author

Grabska-Gradzińska, Iwona - 121296

dc.contributor.author

Ochab, Jeremi - 122224

dc.date.accession

2025-07-21

dc.date.accessioned

2025-07-24T12:54:11Z

dc.date.available

2025-07-24T12:54:11Z

dc.date.createdaten

2025-07-21T07:37:40Z

dc.date.issued

2025

dc.date.openaccess

0

dc.description.accesstime

w momencie opublikowania

dc.description.version

ostateczna wersja wydawcy

dc.description.volume

296, Part B

dc.identifier.articleid

129001

dc.identifier.doi

10.1016/j.eswa.2025.129001

dc.identifier.issn

0957-4174

dc.identifier.project

DRC AI

dc.identifier.uri

https://ruj.uj.edu.pl/handle/item/558168

dc.identifier.weblink

https://www.sciencedirect.com/science/article/pii/S0957417425026181

dc.language

eng

dc.language.container

eng

dc.rights

Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa

dc.rights.licence

CC-BY

dc.rights.uri

http://creativecommons.org/licenses/by/4.0/legalcode.pl

dc.share.type

inne

dc.subject.en

stylometry

dc.subject.en

large language models

dc.subject.en

machine-generated text detection

dc.subject.en

AI detection

dc.subject.en

benchmark dataset

dc.subtype

Article

dc.title

Stylometry recognizes human and LLM-generated texts in short samples

dc.title.journal

Expert Systems with Applications

dc.type

JournalArticle

dspace.entity.typeen

Publication

Affiliations

Wydział Fizyki, Astronomii i Informatyki Stosowanej

Przystalski, Karol

Argasiński, Jan

Grabska-Gradzińska, Iwona

Ochab, Jeremi

* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.

Views

19 Views per month

Views per city

Liszki

4

Krakow

2

Amsterdam

1

Dublin

1

Downloads

przystalski_et-al_stylometry_recognizes_human_and_llm-generated_2025.pdf

4

Open Access

Files

przystalski_et-al_stylometry_recognizes_human_and_llm-generated_2025.pdfpdf 6.19 MB

License

Except as otherwise noted, this item is licensed under : Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa

Collections

Research publications

DRC AI

Exact sciences