A multidimensional analysis of machine learning methods performance in the classification of bioactive compounds

2013
journal article
article
28
cris.lastimport.wos2024-04-09T18:44:23Z
dc.abstract.enA multidimensional analysis of machine learning methods performance in the classification of bioactive compounds was carried out. Eleven learning algorithms (including 4 meta-classifiers): J48, RandomForest, NaïveBayes, PART, Hyperpipes, SMO, Ibk, MultiBoostAB, Decorate, FilteredClassifier and Bagging, implemented in the WEKA package, were evaluated in the classification of 5 protein target ligands (cyclooxygenase-2, HIV-1 protease and metalloproteinase inhibitors, M1 and 5-HT1A agonists), using 8 different fingerprints for molecular representation (EStateFP, FP, ExtFP, GraphFP, KlekFP, MACCSFP, PubChemFP, and SubFP). The influence of the number of actives in the training data as well as the computational expenses expressed by the time required for building a predictive model was also taken into account. Tests were performed for sets containing a similar number of actives and inactives and also for datasets recreating virtual screening conditions. In order to facilitate the interpretation of results, the evaluating parameters (recall, precision, and MCC) values were presented in the form of heat maps. The classification of cyclooxygenase-2 inhibitors was almost perfect regardless of the conditions, yet the results for the rest of the targets varied between different experiments. The performance of machine learning methods was improved by increasing the number of actives in the training data; however, the moving to virtual screening conditions was generally connected with a significant fall in precision. Some methods, e.g. SMO, Bagging, Decorate and MultiBoostAB, were more stable regarding changes in classification conditions, whereas in the case of the others, such as NaïveBayes, J48 or Hyperpipes, the performance strongly varied between different datasets, fingerprints and targets. The application of meta-learning led to an increase in the values of evaluating parameters. KlekFP was a fingerprint which yielded the best results, although its use was connected with great computational expenses. On the other hand, EStateFP and SubFP gave worse results, especially in virtual screening-like conditions.pl
dc.affiliationWydział Chemii : Zakład Krystalochemii i Krystalofizykipl
dc.contributor.authorPodlewska, Sabina - 149058 pl
dc.contributor.authorKurczab, Rafałpl
dc.contributor.authorBojarski, Andrzej J.pl
dc.date.accessioned2015-01-27T09:39:16Z
dc.date.available2015-01-27T09:39:16Z
dc.date.issued2013pl
dc.description.additionalNa publikacji autorka Podlewska Sabina podpisana jako Smusz Sabina.pl
dc.description.physical89-100pl
dc.description.points35pl
dc.description.volume128pl
dc.identifier.doi10.1016/j.chemolab.2013.08.003pl
dc.identifier.eissn1873-3239pl
dc.identifier.issn0169-7439pl
dc.identifier.urihttp://ruj.uj.edu.pl/xmlui/handle/item/2764
dc.languageengpl
dc.language.containerengpl
dc.rights.licencebez licencji
dc.subject.enmachine learningpl
dc.subject.envirtual screeningpl
dc.subject.enclassificationpl
dc.subject.endrug designpl
dc.subtypeArticlepl
dc.titleA multidimensional analysis of machine learning methods performance in the classification of bioactive compoundspl
dc.title.journalChemometrics and Intelligent Laboratory Systemspl
dc.typeJournalArticlepl
dspace.entity.typePublication
Affiliations

* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.

Views
0
Views per month

No access

No Thumbnail Available