Active annotation in evaluating the credibility of web-based medical information : guidelines for creating training data sets for machine learning

2021
journal article
article
5
cris.lastimport.wos2024-04-09T23:36:08Z
dc.abstract.enBackground: The spread of false medical information on the web is rapidly accelerating. Establishing the credibility of web-based medical information has become a pressing necessity. Machine learning offers a solution that, when properly deployed, can be an effective tool in fighting medical misinformation on the web. Objective: The aim of this study is to present a comprehensive framework for designing and curating machine learning training data sets for web-based medical information credibility assessment. We show how to construct the annotation process. Our main objective is to support researchers from the medical and computer science communities. We offer guidelines on the preparation of data sets for machine learning models that can fight medical misinformation. Methods: We begin by providing the annotation protocol for medical experts involved in medical sentence credibility evaluation. The protocol is based on a qualitative study of our experimental data. To address the problem of insufficient initial labels, we propose a preprocessing pipeline for the batch of sentences to be assessed. It consists of representation learning, clustering, and reranking. We call this process active annotation. Results: We collected more than 10,000 annotations of statements related to selected medical subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, and food allergy testing) for less than US $7000 by employing 9 highly qualified annotators (certified medical professionals), and we release this data set to the general public. We developed an active annotation framework for more efficient annotation of noncredible medical statements. The application of qualitative analysis resulted in a better annotation protocol for our future efforts in data set creation. Conclusions: The results of the qualitative analysis support our claims of the efficacy of the presented method.pl
dc.contributor.authorNabożny, Aleksandrapl
dc.contributor.authorBalcerzak, Bartłomiejpl
dc.contributor.authorWierzbicki, Adampl
dc.contributor.authorMorzy, Mikołajpl
dc.contributor.authorChlabicz, Małgorzatapl
dc.date.accession2022-01-31pl
dc.date.accessioned2022-01-31T09:01:45Z
dc.date.available2022-01-31T09:01:45Z
dc.date.issued2021pl
dc.date.openaccess0
dc.description.accesstimew momencie opublikowania
dc.description.number11pl
dc.description.versionostateczna wersja wydawcy
dc.description.volume9pl
dc.identifier.articleide26065pl
dc.identifier.doi10.2196/26065pl
dc.identifier.issn2291-9694pl
dc.identifier.project2019/35/J/HS6/03498pl
dc.identifier.urihttps://ruj.uj.edu.pl/xmlui/handle/item/287483
dc.identifier.weblinkhttps://medinform.jmir.org/2021/11/e26065/pl
dc.languageengpl
dc.language.containerengpl
dc.rightsUdzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa*
dc.rights.licenceCC-BY
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/legalcode.pl*
dc.share.typeotwarte czasopismo
dc.subject.enactive annotationpl
dc.subject.encredibilitypl
dc.subject.enweb-based medical informationpl
dc.subject.enfake newspl
dc.subtypeArticlepl
dc.titleActive annotation in evaluating the credibility of web-based medical information : guidelines for creating training data sets for machine learningpl
dc.title.journalJMIR Medical Informaticspl
dc.typeJournalArticlepl
dspace.entity.typePublication
cris.lastimport.wos
2024-04-09T23:36:08Z
dc.abstract.enpl
Background: The spread of false medical information on the web is rapidly accelerating. Establishing the credibility of web-based medical information has become a pressing necessity. Machine learning offers a solution that, when properly deployed, can be an effective tool in fighting medical misinformation on the web. Objective: The aim of this study is to present a comprehensive framework for designing and curating machine learning training data sets for web-based medical information credibility assessment. We show how to construct the annotation process. Our main objective is to support researchers from the medical and computer science communities. We offer guidelines on the preparation of data sets for machine learning models that can fight medical misinformation. Methods: We begin by providing the annotation protocol for medical experts involved in medical sentence credibility evaluation. The protocol is based on a qualitative study of our experimental data. To address the problem of insufficient initial labels, we propose a preprocessing pipeline for the batch of sentences to be assessed. It consists of representation learning, clustering, and reranking. We call this process active annotation. Results: We collected more than 10,000 annotations of statements related to selected medical subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, and food allergy testing) for less than US $7000 by employing 9 highly qualified annotators (certified medical professionals), and we release this data set to the general public. We developed an active annotation framework for more efficient annotation of noncredible medical statements. The application of qualitative analysis resulted in a better annotation protocol for our future efforts in data set creation. Conclusions: The results of the qualitative analysis support our claims of the efficacy of the presented method.
dc.contributor.authorpl
Nabożny, Aleksandra
dc.contributor.authorpl
Balcerzak, Bartłomiej
dc.contributor.authorpl
Wierzbicki, Adam
dc.contributor.authorpl
Morzy, Mikołaj
dc.contributor.authorpl
Chlabicz, Małgorzata
dc.date.accessionpl
2022-01-31
dc.date.accessioned
2022-01-31T09:01:45Z
dc.date.available
2022-01-31T09:01:45Z
dc.date.issuedpl
2021
dc.date.openaccess
0
dc.description.accesstime
w momencie opublikowania
dc.description.numberpl
11
dc.description.version
ostateczna wersja wydawcy
dc.description.volumepl
9
dc.identifier.articleidpl
e26065
dc.identifier.doipl
10.2196/26065
dc.identifier.issnpl
2291-9694
dc.identifier.projectpl
2019/35/J/HS6/03498
dc.identifier.uri
https://ruj.uj.edu.pl/xmlui/handle/item/287483
dc.identifier.weblinkpl
https://medinform.jmir.org/2021/11/e26065/
dc.languagepl
eng
dc.language.containerpl
eng
dc.rights*
Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa
dc.rights.licence
CC-BY
dc.rights.uri*
http://creativecommons.org/licenses/by/4.0/legalcode.pl
dc.share.type
otwarte czasopismo
dc.subject.enpl
active annotation
dc.subject.enpl
credibility
dc.subject.enpl
web-based medical information
dc.subject.enpl
fake news
dc.subtypepl
Article
dc.titlepl
Active annotation in evaluating the credibility of web-based medical information : guidelines for creating training data sets for machine learning
dc.title.journalpl
JMIR Medical Informatics
dc.typepl
JournalArticle
dspace.entity.type
Publication
Affiliations

* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.

Views
58
Views per month
Views per city
Ashburn
9
Dublin
9
Krakow
5
Hong Kong
4
Szczecin
3
Lomé
2
New York
2
Paris
2
Shanghai
2
Wroclaw
2
Downloads
nabozny_balcerzak_wierzbicki_morzy_chlabicz_active_annotation_in_evaluating_the_credibility_2021.pdf
23
nabozny_balcerzak_wierzbicki_morzy_chlabicz_active_annotation_in_evaluating_the_credibility_2021.odt
10