Jagiellonian University Repository

Attributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimm

Attributing authorship in the noisy digitized ...

Show full item record

dc.contributor.author Franzini, Greta pl
dc.contributor.author Kestemont, Mike pl
dc.contributor.author Rotari, Gabriela pl
dc.contributor.author Jander, Melina pl
dc.contributor.author Ochab, Jeremi [SAP14013682] pl
dc.contributor.author Franzini, Emily pl
dc.contributor.author Byszuk, Joanna pl
dc.contributor.author Rybicki, Jan [SAP13006621] pl
dc.date.accessioned 2018-10-24T12:02:25Z
dc.date.available 2018-10-24T12:02:25Z
dc.date.issued 2018 pl
dc.identifier.uri https://ruj.uj.edu.pl/xmlui/handle/item/58613
dc.language eng pl
dc.rights Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa *
dc.rights.uri http://creativecommons.org/licenses/by/4.0/pl/legalcode *
dc.title Attributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimm pl
dc.type JournalArticle pl
dc.abstract.en This article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition), reporting on the effect this noise has on the analyses necessary to computationally identify the different writing style of the two brothers. In summary, our findings show that OCR digitization serves as a reliable proxy for the more painstaking process of manual digitization, at least when it comes to authorship attribution. Our results suggest that attribution is viable even when using training and test sets from different digitization pipelines. With regard to HTR, this research demonstrates that even though automated transcription significantly increases risk of text misclassification when compared to OCR, a cleanliness above \approx 20% is already sufficient to achieve a higher-than-chance probability of correct binary attribution. pl
dc.description.volume 5 pl
dc.identifier.doi 10.3389/fdigh.2018.00004 pl
dc.identifier.eissn 2297-2668 pl
dc.title.journal Frontiers in Digital Humanities pl
dc.language.container eng pl
dc.affiliation Wydział Filologiczny : Instytut Filologii Angielskiej pl
dc.affiliation Wydział Fizyki, Astronomii i Informatyki Stosowanej : Instytut Fizyki im. Mariana Smoluchowskiego pl
dc.subtype Article pl
dc.identifier.articleid 4 pl
dc.rights.original CC-BY; otwarte czasopismo; ostateczna wersja wydawcy; w momencie opublikowania; 0 pl
dc.identifier.project ROD UJ / OP pl


Files in this item

This item appears in the following Collection(s)

Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa Except where otherwise noted, this item's license is described as Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa