Attributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimm

2018
journal article
article
dc.abstract.enThis article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition), reporting on the effect this noise has on the analyses necessary to computationally identify the different writing style of the two brothers. In summary, our findings show that OCR digitization serves as a reliable proxy for the more painstaking process of manual digitization, at least when it comes to authorship attribution. Our results suggest that attribution is viable even when using training and test sets from different digitization pipelines. With regard to HTR, this research demonstrates that even though automated transcription significantly increases risk of text misclassification when compared to OCR, a cleanliness above $\approx$ 20% is already sufficient to achieve a higher-than-chance probability of correct binary attribution.pl
dc.affiliationWydział Filologiczny : Instytut Filologii Angielskiejpl
dc.affiliationWydział Fizyki, Astronomii i Informatyki Stosowanej : Instytut Fizyki im. Mariana Smoluchowskiegopl
dc.contributor.authorFranzini, Gretapl
dc.contributor.authorKestemont, Mikepl
dc.contributor.authorRotari, Gabrielapl
dc.contributor.authorJander, Melinapl
dc.contributor.authorOchab, Jeremi - 122224 pl
dc.contributor.authorFranzini, Emilypl
dc.contributor.authorByszuk, Joannapl
dc.contributor.authorRybicki, Jan - 214316 pl
dc.date.accessioned2018-10-24T12:02:25Z
dc.date.available2018-10-24T12:02:25Z
dc.date.issued2018pl
dc.date.openaccess0
dc.description.accesstimew momencie opublikowania
dc.description.additionalPublikacja w j. węgierskim: Szerzőazonosítás Jacob és Wilhelm Grimm zajos, digitalizált levelezésében, Digitális Bölcsészet vol. 5 (2021). DOI: 10.31400/dh-hun.2021.5.3144, https://ruj.uj.edu.pl/xmlui/handle/item/288895pl
dc.description.versionostateczna wersja wydawcy
dc.description.volume5pl
dc.identifier.articleid4pl
dc.identifier.doi10.3389/fdigh.2018.00004pl
dc.identifier.eissn2297-2668pl
dc.identifier.projectROD UJ / OPpl
dc.identifier.urihttps://ruj.uj.edu.pl/xmlui/handle/item/58613
dc.languageengpl
dc.language.containerengpl
dc.rightsUdzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa*
dc.rights.licenceCC-BY
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/legalcode.pl*
dc.share.typeotwarte czasopismo
dc.subtypeArticlepl
dc.titleAttributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimmpl
dc.title.journalFrontiers in Digital Humanitiespl
dc.typeJournalArticlepl
dspace.entity.typePublication
dc.abstract.enpl
This article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition), reporting on the effect this noise has on the analyses necessary to computationally identify the different writing style of the two brothers. In summary, our findings show that OCR digitization serves as a reliable proxy for the more painstaking process of manual digitization, at least when it comes to authorship attribution. Our results suggest that attribution is viable even when using training and test sets from different digitization pipelines. With regard to HTR, this research demonstrates that even though automated transcription significantly increases risk of text misclassification when compared to OCR, a cleanliness above $\approx$ 20% is already sufficient to achieve a higher-than-chance probability of correct binary attribution.
dc.affiliationpl
Wydział Filologiczny : Instytut Filologii Angielskiej
dc.affiliationpl
Wydział Fizyki, Astronomii i Informatyki Stosowanej : Instytut Fizyki im. Mariana Smoluchowskiego
dc.contributor.authorpl
Franzini, Greta
dc.contributor.authorpl
Kestemont, Mike
dc.contributor.authorpl
Rotari, Gabriela
dc.contributor.authorpl
Jander, Melina
dc.contributor.authorpl
Ochab, Jeremi - 122224
dc.contributor.authorpl
Franzini, Emily
dc.contributor.authorpl
Byszuk, Joanna
dc.contributor.authorpl
Rybicki, Jan - 214316
dc.date.accessioned
2018-10-24T12:02:25Z
dc.date.available
2018-10-24T12:02:25Z
dc.date.issuedpl
2018
dc.date.openaccess
0
dc.description.accesstime
w momencie opublikowania
dc.description.additionalpl
Publikacja w j. węgierskim: Szerzőazonosítás Jacob és Wilhelm Grimm zajos, digitalizált levelezésében, Digitális Bölcsészet vol. 5 (2021). DOI: 10.31400/dh-hun.2021.5.3144, https://ruj.uj.edu.pl/xmlui/handle/item/288895
dc.description.version
ostateczna wersja wydawcy
dc.description.volumepl
5
dc.identifier.articleidpl
4
dc.identifier.doipl
10.3389/fdigh.2018.00004
dc.identifier.eissnpl
2297-2668
dc.identifier.projectpl
ROD UJ / OP
dc.identifier.uri
https://ruj.uj.edu.pl/xmlui/handle/item/58613
dc.languagepl
eng
dc.language.containerpl
eng
dc.rights*
Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa
dc.rights.licence
CC-BY
dc.rights.uri*
http://creativecommons.org/licenses/by/4.0/legalcode.pl
dc.share.type
otwarte czasopismo
dc.subtypepl
Article
dc.titlepl
Attributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimm
dc.title.journalpl
Frontiers in Digital Humanities
dc.typepl
JournalArticle
dspace.entity.type
Publication
Affiliations

* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.

Views
4
Views per month
Views per city
Ashburn
2
Downloads
ochab_rybicki_attributing_authorship_in_the_noisy_digitized_correspondence_2018.pdf
4