Jagiellonian University Repository

Efficient mixture model for clustering of sparse high dimensional binary data

pcg.skipToMenu

Efficient mixture model for clustering of sparse high dimensional binary data

Show full item record

dc.contributor.author Śmieja, Marek [SAP14005333] pl
dc.contributor.author Hajto, Krzysztof [USOS114286] pl
dc.contributor.author Tabor, Jacek [SAP11017416] pl
dc.date.accessioned 2020-01-28T09:16:55Z
dc.date.available 2020-01-28T09:16:55Z
dc.date.issued 2019 pl
dc.identifier.issn 1384-5810 pl
dc.identifier.uri https://ruj.uj.edu.pl/xmlui/handle/item/147662
dc.language eng pl
dc.rights Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa *
dc.rights.uri http://creativecommons.org/licenses/by/4.0/pl/legalcode *
dc.title Efficient mixture model for clustering of sparse high dimensional binary data pl
dc.type JournalArticle pl
dc.description.physical 1583-1624 pl
dc.abstract.en Clustering is one of the fundamental tools for preliminary analysis of data. While most of the clustering methods are designed for continuous data, sparse high-dimensional binary representations became very popular in various domains such as text mining or cheminformatics. The application of classical clustering tools to this type of data usually proves to be very inefficient, both in terms of computational complexity as well as in terms of the utility of the results. In this paper we propose a mixture model, SparseMix, for clustering of sparse high dimensional binary data, which connects model-based with centroid-based clustering. Every group is described by a representative and a probability distribution modeling dispersion from this representative. In contrast to classical mixture models based on the EM algorithm, SparseMix: is specially designed for the processing of sparse data; can be efficiently realized by an on-line Hartigan optimization algorithm; describes every cluster by the most representative vector. We have performed extensive experimental studies on various types of data, which confirmed that SparseMix builds partitions with a higher compatibility with reference grouping than related methods. Moreover, constructed representatives often better reveal the internal structure of data. pl
dc.description.volume 33 pl
dc.identifier.doi 10.1007/s10618-019-00635-1 pl
dc.identifier.eissn 1573-756X pl
dc.title.journal Data Mining and Knowledge Discovery pl
dc.language.container eng pl
dc.affiliation Wydział Matematyki i Informatyki : Instytut Informatyki i Matematyki Komputerowej pl
dc.subtype Article pl
dc.rights.original CC-BY; inne; ostateczna wersja wydawcy; w momencie opublikowania; 0 pl
dc.identifier.project 2016/21/D/ST6/00980 pl
dc.identifier.project 2017/25/B/ST6/01271 pl
dc.identifier.project ROD UJ / OP pl
.pointsMNiSW [2019 A]: 140


Files in this item

This item appears in the following Collection(s)

Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa Except where otherwise noted, this item's license is described as Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa