Szerzőazonosítás Jacob és Wilhelm Grimm zajos, digitalizált levelezésében

2021
journal article
translation
dc.abstract.enThis article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition), reporting on the effect this noise has on the analyses necessary to computationally identify the different writing style of the two brothers. In summary, our findings show that OCR digitization serves as a reliable proxy for the more painstaking process of manual digitization, at least when it comes to authorship attribution. Our results suggest that attribution is viable even when using training and test sets from different digitization pipelines. With regard to HTR, this research demonstrates that even though automated transcription significantly increases risk of text misclassification when compared to OCR, a cleanliness above 20% is already sufficient to achieve a higher-than-chance probability of correct binary attribution.pl
dc.affiliationWydział Filologiczny : Instytut Filologii Angielskiejpl
dc.affiliationWydział Fizyki, Astronomii i Informatyki Stosowanej : Zespół Zakładów Fizyki Teoretycznejpl
dc.contributor.authorFranzini, Gretapl
dc.contributor.authorKestemont, Mikepl
dc.contributor.authorRotari, Gabrielapl
dc.contributor.authorJander, Melinapl
dc.contributor.authorOchab, Jeremi - 122224 pl
dc.contributor.authorFranzini, Emilypl
dc.contributor.authorByszuk, Joannapl
dc.contributor.authorRybicki, Jan - 214316 pl
dc.contributor.translatorKustos, Júliapl
dc.date.accessioned2022-03-08T15:57:49Z
dc.date.available2022-03-08T15:57:49Z
dc.date.issued2021pl
dc.date.openaccess0
dc.description.accesstimew momencie opublikowania
dc.description.additionalPublikacja oryginalna: Attributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimm, Frontiers in Digital Humanities vol. 5 (2018). DOI: 10.3389/fdigh.2018.00004, https://ruj.uj.edu.pl/xmlui/handle/item/58613pl
dc.description.number5pl
dc.description.physical39-68pl
dc.description.versionostateczna wersja wydawcy
dc.identifier.doi10.31400/dh-hun.2021.5.3144pl
dc.identifier.eissn2630-9696pl
dc.identifier.urihttps://ruj.uj.edu.pl/xmlui/handle/item/288895
dc.languagehunpl
dc.language.containerhunpl
dc.language.originalengpl
dc.rightsUdzielam licencji. Uznanie autorstwa - Użycie niekomercyjne - Na tych samych warunkach 4.0 Międzynarodowa*
dc.rights.licenceCC-BY-NC-SA
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.pl*
dc.share.typeotwarte czasopismo
dc.subtypeTranslationpl
dc.titleSzerzőazonosítás Jacob és Wilhelm Grimm zajos, digitalizált levelezésébenpl
dc.title.alternativeAttributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimmpl
dc.title.journalDigitális Bölcsészetpl
dc.title.volumeA krakkói Computational Stylistics Group (Különszám)pl
dc.typeJournalArticlepl
dspace.entity.typePublication
dc.abstract.enpl
This article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition), reporting on the effect this noise has on the analyses necessary to computationally identify the different writing style of the two brothers. In summary, our findings show that OCR digitization serves as a reliable proxy for the more painstaking process of manual digitization, at least when it comes to authorship attribution. Our results suggest that attribution is viable even when using training and test sets from different digitization pipelines. With regard to HTR, this research demonstrates that even though automated transcription significantly increases risk of text misclassification when compared to OCR, a cleanliness above 20% is already sufficient to achieve a higher-than-chance probability of correct binary attribution.
dc.affiliationpl
Wydział Filologiczny : Instytut Filologii Angielskiej
dc.affiliationpl
Wydział Fizyki, Astronomii i Informatyki Stosowanej : Zespół Zakładów Fizyki Teoretycznej
dc.contributor.authorpl
Franzini, Greta
dc.contributor.authorpl
Kestemont, Mike
dc.contributor.authorpl
Rotari, Gabriela
dc.contributor.authorpl
Jander, Melina
dc.contributor.authorpl
Ochab, Jeremi - 122224
dc.contributor.authorpl
Franzini, Emily
dc.contributor.authorpl
Byszuk, Joanna
dc.contributor.authorpl
Rybicki, Jan - 214316
dc.contributor.translatorpl
Kustos, Júlia
dc.date.accessioned
2022-03-08T15:57:49Z
dc.date.available
2022-03-08T15:57:49Z
dc.date.issuedpl
2021
dc.date.openaccess
0
dc.description.accesstime
w momencie opublikowania
dc.description.additionalpl
Publikacja oryginalna: Attributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimm, Frontiers in Digital Humanities vol. 5 (2018). DOI: 10.3389/fdigh.2018.00004, https://ruj.uj.edu.pl/xmlui/handle/item/58613
dc.description.numberpl
5
dc.description.physicalpl
39-68
dc.description.version
ostateczna wersja wydawcy
dc.identifier.doipl
10.31400/dh-hun.2021.5.3144
dc.identifier.eissnpl
2630-9696
dc.identifier.uri
https://ruj.uj.edu.pl/xmlui/handle/item/288895
dc.languagepl
hun
dc.language.containerpl
hun
dc.language.originalpl
eng
dc.rights*
Udzielam licencji. Uznanie autorstwa - Użycie niekomercyjne - Na tych samych warunkach 4.0 Międzynarodowa
dc.rights.licence
CC-BY-NC-SA
dc.rights.uri*
http://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.pl
dc.share.type
otwarte czasopismo
dc.subtypepl
Translation
dc.titlepl
Szerzőazonosítás Jacob és Wilhelm Grimm zajos, digitalizált levelezésében
dc.title.alternativepl
Attributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimm
dc.title.journalpl
Digitális Bölcsészet
dc.title.volumepl
A krakkói Computational Stylistics Group (Különszám)
dc.typepl
JournalArticle
dspace.entity.type
Publication
Affiliations

* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.

Views
13
Views per month
Views per city
Ashburn
2
Wroclaw
2
Dublin
1
Krakow
1
Downloads
ochab_rybicki_et-al_szerzoazonositas_jacob_es_wilhelm_grimm_zajos_2021.pdf
4