Jagiellonian University Repository

A multidimensional analysis of machine learning methods performance in the classification of bioactive compounds

pcg.skipToMenu

A multidimensional analysis of machine learning methods performance in the classification of bioactive compounds

Show full item record

dc.contributor.author Podlewska, Sabina [USOS78046] pl
dc.contributor.author Kurczab, Rafał pl
dc.contributor.author Bojarski, Andrzej J. pl
dc.date.accessioned 2015-01-27T09:39:16Z
dc.date.available 2015-01-27T09:39:16Z
dc.date.issued 2013 pl
dc.identifier.issn 0169-7439 pl
dc.identifier.uri http://ruj.uj.edu.pl/xmlui/handle/item/2764
dc.language eng pl
dc.title A multidimensional analysis of machine learning methods performance in the classification of bioactive compounds pl
dc.type JournalArticle pl
dc.description.physical 89-100 pl
dc.description.additional Na publikacji autorka Podlewska Sabina podpisana jako Smusz Sabina. pl
dc.abstract.en A multidimensional analysis of machine learning methods performance in the classification of bioactive compounds was carried out. Eleven learning algorithms (including 4 meta-classifiers): J48, RandomForest, NaïveBayes, PART, Hyperpipes, SMO, Ibk, MultiBoostAB, Decorate, FilteredClassifier and Bagging, implemented in the WEKA package, were evaluated in the classification of 5 protein target ligands (cyclooxygenase-2, HIV-1 protease and metalloproteinase inhibitors, M1 and 5-HT1A agonists), using 8 different fingerprints for molecular representation (EStateFP, FP, ExtFP, GraphFP, KlekFP, MACCSFP, PubChemFP, and SubFP). The influence of the number of actives in the training data as well as the computational expenses expressed by the time required for building a predictive model was also taken into account. Tests were performed for sets containing a similar number of actives and inactives and also for datasets recreating virtual screening conditions. In order to facilitate the interpretation of results, the evaluating parameters (recall, precision, and MCC) values were presented in the form of heat maps. The classification of cyclooxygenase-2 inhibitors was almost perfect regardless of the conditions, yet the results for the rest of the targets varied between different experiments. The performance of machine learning methods was improved by increasing the number of actives in the training data; however, the moving to virtual screening conditions was generally connected with a significant fall in precision. Some methods, e.g. SMO, Bagging, Decorate and MultiBoostAB, were more stable regarding changes in classification conditions, whereas in the case of the others, such as NaïveBayes, J48 or Hyperpipes, the performance strongly varied between different datasets, fingerprints and targets. The application of meta-learning led to an increase in the values of evaluating parameters. KlekFP was a fingerprint which yielded the best results, although its use was connected with great computational expenses. On the other hand, EStateFP and SubFP gave worse results, especially in virtual screening-like conditions. pl
dc.subject.en machine learning pl
dc.subject.en virtual screening pl
dc.subject.en classification pl
dc.subject.en drug design pl
dc.description.volume 128 pl
dc.description.points 35 pl
dc.identifier.doi 10.1016/j.chemolab.2013.08.003 pl
dc.identifier.eissn 1873-3239 pl
dc.title.journal Chemometrics and Intelligent Laboratory Systems pl
dc.language.container eng pl
dc.affiliation Wydział Chemii : Zakład Krystalochemii i Krystalofizyki pl
dc.subtype Article pl
dc.rights.original bez licencji pl
.pointsMNiSW [2013 A]: 35


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)