Simple view
Full metadata view
Authors
Statistics
Szerzőazonosítás Jacob és Wilhelm Grimm zajos, digitalizált levelezésében
Attributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimm
Publikacja oryginalna: Attributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimm, Frontiers in Digital Humanities vol. 5 (2018). DOI: 10.3389/fdigh.2018.00004, https://ruj.uj.edu.pl/xmlui/handle/item/58613
This article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition), reporting on the effect this noise has on the analyses necessary to computationally identify the different writing style of the two brothers. In summary, our findings show that OCR digitization serves as a reliable proxy for the more painstaking process of manual digitization, at least when it comes to authorship attribution. Our results suggest that attribution is viable even when using training and test sets from different digitization pipelines. With regard to HTR, this research demonstrates that even though automated transcription significantly increases risk of text misclassification when compared to OCR, a cleanliness above 20% is already sufficient to achieve a higher-than-chance probability of correct binary attribution.
dc.abstract.en | This article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition), reporting on the effect this noise has on the analyses necessary to computationally identify the different writing style of the two brothers. In summary, our findings show that OCR digitization serves as a reliable proxy for the more painstaking process of manual digitization, at least when it comes to authorship attribution. Our results suggest that attribution is viable even when using training and test sets from different digitization pipelines. With regard to HTR, this research demonstrates that even though automated transcription significantly increases risk of text misclassification when compared to OCR, a cleanliness above 20% is already sufficient to achieve a higher-than-chance probability of correct binary attribution. | pl |
dc.affiliation | Wydział Filologiczny : Instytut Filologii Angielskiej | pl |
dc.affiliation | Wydział Fizyki, Astronomii i Informatyki Stosowanej : Zespół Zakładów Fizyki Teoretycznej | pl |
dc.contributor.author | Franzini, Greta | pl |
dc.contributor.author | Kestemont, Mike | pl |
dc.contributor.author | Rotari, Gabriela | pl |
dc.contributor.author | Jander, Melina | pl |
dc.contributor.author | Ochab, Jeremi - 122224 | pl |
dc.contributor.author | Franzini, Emily | pl |
dc.contributor.author | Byszuk, Joanna | pl |
dc.contributor.author | Rybicki, Jan - 214316 | pl |
dc.contributor.translator | Kustos, Júlia | pl |
dc.date.accessioned | 2022-03-08T15:57:49Z | |
dc.date.available | 2022-03-08T15:57:49Z | |
dc.date.issued | 2021 | pl |
dc.date.openaccess | 0 | |
dc.description.accesstime | w momencie opublikowania | |
dc.description.additional | Publikacja oryginalna: Attributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimm, Frontiers in Digital Humanities vol. 5 (2018). DOI: 10.3389/fdigh.2018.00004, https://ruj.uj.edu.pl/xmlui/handle/item/58613 | pl |
dc.description.number | 5 | pl |
dc.description.physical | 39-68 | pl |
dc.description.version | ostateczna wersja wydawcy | |
dc.identifier.doi | 10.31400/dh-hun.2021.5.3144 | pl |
dc.identifier.eissn | 2630-9696 | pl |
dc.identifier.uri | https://ruj.uj.edu.pl/xmlui/handle/item/288895 | |
dc.language | hun | pl |
dc.language.container | hun | pl |
dc.language.original | eng | pl |
dc.rights | Udzielam licencji. Uznanie autorstwa - Użycie niekomercyjne - Na tych samych warunkach 4.0 Międzynarodowa | * |
dc.rights.licence | CC-BY-NC-SA | |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.pl | * |
dc.share.type | otwarte czasopismo | |
dc.subtype | Translation | pl |
dc.title | Szerzőazonosítás Jacob és Wilhelm Grimm zajos, digitalizált levelezésében | pl |
dc.title.alternative | Attributing authorship in the noisy digitized correspondence of Jacob and Wilhelm Grimm | pl |
dc.title.journal | Digitális Bölcsészet | pl |
dc.title.volume | A krakkói Computational Stylistics Group (Különszám) | pl |
dc.type | JournalArticle | pl |
dspace.entity.type | Publication |
* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.
Views
13
Views per month
Views per city
Downloads
Open Access