Results of the PolEval 2019 Shared Task 6 : first dataset and Open Shared Task for automatic cyberbullying detection in Polish Twitter

Ptaszynski, Michal; Pieciukiewicz, Agata; Dybała, Paweł

Simple view

Full metadata view

Authors

Statistics

Results of the PolEval 2019 Shared Task 6 : first dataset and Open Shared Task for automatic cyberbullying detection in Polish Twitter

2019

book section

conference proceedings

Publisher

Polska Akademia Nauk

Author

Ptaszynski Michal

Pieciukiewicz Agata

Dybała Paweł

Editor

Ogrodniczuk Maciej

Kobyliński Łukasz

Book title / Journal title

Proceedings of the PolEval 2019 Workshop

Place of publication: Publisher

Warszawa : Institute of Computer Sciences. Polish Academy of Sciences

Pages

89-110

ISBN

978-83-63159-28-3

Keywords in English

cyberbullying

automatic cyberbullying detection

hate-speech

natural language processing

machine learning

URL

http://2019.poleval.pl/files/poleval2019.pdf

Date accessed

2020-03-23

Notes

Przypisy. Bibliogr. s. 108-110

Language

English

Book language / Journal language

English

Abstract in English

In this paper we describe the first dataset for the Polish language containing annotations of harmful and toxic language. The dataset was created to study harmful Internet phenomena such as cyberbullying and hate speech, which recently dramatically gain on numbers in Polish Internet as well as worldwide. The dataset was automatically collected from Polish Twitter accounts and annotated by both layperson volunteers under the supervision of a cyberbullying and hate-speech expert. Together with the dataset we propose the first open shared task for Polish to utilize the dataset in classification of such harmful phenomena. In particular, we propose two subtasks: 1) binary classification of harmful and non-harmful tweets, and 2) multiclass classification between two types of harmful information (cyberbullying and hate-speech), and other. The first installment of the shared task became a success by reaching fourteen overall submissions, hence proving a high demand for research applying such data.

dc.abstract.en	In this paper we describe the first dataset for the Polish language containing annotations of harmful and toxic language. The dataset was created to study harmful Internet phenomena such as cyberbullying and hate speech, which recently dramatically gain on numbers in Polish Internet as well as worldwide. The dataset was automatically collected from Polish Twitter accounts and annotated by both layperson volunteers under the supervision of a cyberbullying and hate-speech expert. Together with the dataset we propose the first open shared task for Polish to utilize the dataset in classification of such harmful phenomena. In particular, we propose two subtasks: 1) binary classification of harmful and non-harmful tweets, and 2) multiclass classification between two types of harmful information (cyberbullying and hate-speech), and other. The first installment of the shared task became a success by reaching fourteen overall submissions, hence proving a high demand for research applying such data.	pl
dc.affiliation	Wydział Studiów Międzynarodowych i Politycznych : Instytut Bliskiego i Dalekiego Wschodu	pl
dc.conference	PolEval 2019 Workshop
dc.conference.city	Warszawa
dc.conference.country	Polska
dc.conference.datefinish	2019-05-31
dc.conference.datestart	2019-05-31
dc.conference.weblink	http://2019.poleval.pl/index.php/publication/	pl
dc.contributor.author	Ptaszynski, Michal	pl
dc.contributor.author	Pieciukiewicz, Agata	pl
dc.contributor.author	Dybała, Paweł - 242662	pl
dc.contributor.editor	Ogrodniczuk, Maciej	pl
dc.contributor.editor	Kobyliński, Łukasz	pl
dc.date.accession	2020-03-23	pl
dc.date.accessioned	2020-03-23T18:15:51Z
dc.date.available	2020-03-23T18:15:51Z
dc.date.issued	2019	pl
dc.date.openaccess	0
dc.description.accesstime	w momencie opublikowania
dc.description.additional	Przypisy. Bibliogr. s. 108-110	pl
dc.description.conftype	international	pl
dc.description.physical	89-110	pl
dc.description.publication	1,48	pl
dc.description.version	ostateczna wersja wydawcy
dc.identifier.isbn	978-83-63159-28-3	pl
dc.identifier.project	ROD UJ / OP	pl
dc.identifier.uri	https://ruj.uj.edu.pl/xmlui/handle/item/152265
dc.identifier.weblink	http://2019.poleval.pl/files/poleval2019.pdf	pl
dc.language	eng	pl
dc.language.container	eng	pl
dc.pubinfo	Warszawa : Institute of Computer Sciences. Polish Academy of Sciences	pl
dc.publisher.ministerial	Polska Akademia Nauk	pl
dc.rights	Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa	*
dc.rights.licence	CC-BY
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/legalcode.pl	*
dc.share.type	inne
dc.sourceinfo	liczba autorów 32; liczba stron 163; liczba arkuszy wydawniczych 10;	pl
dc.subject.en	cyberbullying	pl
dc.subject.en	automatic cyberbullying detection	pl
dc.subject.en	hate-speech	pl
dc.subject.en	natural language processing	pl
dc.subject.en	machine learning	pl
dc.subtype	ConferenceProceedings	pl
dc.title	Results of the PolEval 2019 Shared Task 6 : first dataset and Open Shared Task for automatic cyberbullying detection in Polish Twitter	pl
dc.title.container	Proceedings of the PolEval 2019 Workshop	pl
dc.type	BookSection	pl
dspace.entity.type	Publication

dc.abstract.enpl

In this paper we describe the first dataset for the Polish language containing annotations of harmful and toxic language. The dataset was created to study harmful Internet phenomena such as cyberbullying and hate speech, which recently dramatically gain on numbers in Polish Internet as well as worldwide. The dataset was automatically collected from Polish Twitter accounts and annotated by both layperson volunteers under the supervision of a cyberbullying and hate-speech expert. Together with the dataset we propose the first open shared task for Polish to utilize the dataset in classification of such harmful phenomena. In particular, we propose two subtasks: 1) binary classification of harmful and non-harmful tweets, and 2) multiclass classification between two types of harmful information (cyberbullying and hate-speech), and other. The first installment of the shared task became a success by reaching fourteen overall submissions, hence proving a high demand for research applying such data.

dc.affiliationpl

Wydział Studiów Międzynarodowych i Politycznych : Instytut Bliskiego i Dalekiego Wschodu

dc.conference

PolEval 2019 Workshop

dc.conference.city

Warszawa

dc.conference.country

Polska

dc.conference.datefinish

2019-05-31

dc.conference.datestart

2019-05-31

dc.conference.weblinkpl

http://2019.poleval.pl/index.php/publication/

dc.contributor.authorpl

Ptaszynski, Michal

dc.contributor.authorpl

Pieciukiewicz, Agata

dc.contributor.authorpl

Dybała, Paweł - 242662

dc.contributor.editorpl

Ogrodniczuk, Maciej

dc.contributor.editorpl

Kobyliński, Łukasz

dc.date.accessionpl

2020-03-23

dc.date.accessioned

2020-03-23T18:15:51Z

dc.date.available

2020-03-23T18:15:51Z

dc.date.issuedpl

2019

dc.date.openaccess

0

dc.description.accesstime

w momencie opublikowania

dc.description.additionalpl

Przypisy. Bibliogr. s. 108-110

dc.description.conftypepl

international

dc.description.physicalpl

89-110

dc.description.publicationpl

1,48

dc.description.version

ostateczna wersja wydawcy

dc.identifier.isbnpl

978-83-63159-28-3

dc.identifier.projectpl

ROD UJ / OP

dc.identifier.uri

https://ruj.uj.edu.pl/xmlui/handle/item/152265

dc.identifier.weblinkpl

http://2019.poleval.pl/files/poleval2019.pdf

dc.languagepl

eng

dc.language.containerpl

eng

dc.pubinfopl

Warszawa : Institute of Computer Sciences. Polish Academy of Sciences

dc.publisher.ministerialpl

Polska Akademia Nauk

dc.rights*

Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa

dc.rights.licence

CC-BY

dc.rights.uri*

http://creativecommons.org/licenses/by/4.0/legalcode.pl

dc.share.type

inne

dc.sourceinfopl

liczba autorów 32; liczba stron 163; liczba arkuszy wydawniczych 10;

dc.subject.enpl

cyberbullying

dc.subject.enpl

automatic cyberbullying detection

dc.subject.enpl

hate-speech

dc.subject.enpl

natural language processing

dc.subject.enpl

machine learning

dc.subtypepl

ConferenceProceedings

dc.titlepl

Results of the PolEval 2019 Shared Task 6 : first dataset and Open Shared Task for automatic cyberbullying detection in Polish Twitter

dc.title.containerpl

Proceedings of the PolEval 2019 Workshop

dc.typepl

BookSection

dspace.entity.type

Publication

Affiliations

Wydział Studiów Międzynarodowych i Politycznych

Dybała, Paweł

No affiliation

Ptaszynski, Michal

Pieciukiewicz, Agata

Ogrodniczuk, Maciej

Kobyliński, Łukasz

* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.

Views

143 Views per month

Views per city

Wroclaw

12

Warsaw

10

Poznan

9

Gdansk

8

Krakow

5

Chandler

4

Los Angeles

4

Bhubaneswar

3

Ilford

3

Lahore

3

Downloads

ptaszynski_pieciukiewicz_dybala_results_of_the_poleval_2019.pdf

235

ptaszynski_pieciukiewicz_dybala_results_of_the_poleval_2019.odt

30

Open Access

Files

ptaszynski_pieciukiewicz_dybala_results_of_the_poleval_2019.pdfpdf 251.56 KB

ptaszynski_pieciukiewicz_dybala_results_of_the_poleval_2019.odtodt 88.77 KB

License

Except as otherwise noted, this item is licensed under : Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa

Collections

Research publications

ROD UJ

Social sciences