Wybrane metody klasyfikacyjne

Kozakiewicz, Bartłomiej

Simple view

Full metadata view

Authors

Statistics

Wybrane metody klasyfikacyjne

master

Alternative title

Some methods of classification

Author

Kozakiewicz Bartłomiej

Reviewer

Kościelniak Piotr

Mazur Marcin

Advisor

Kościelniak Piotr

Date of defence

2017-12-12

Keywords in Polish

metody klasyfikacyjne, drzew klasyfikacyjne, statystyka, Adaboost, lasy losowe, klasyfikacja, klasyfikator Bayesa

Keywords in English

classifiaction, Naive Bayes, random forest, classification trees, Adaboost algorythm

Language

Polish

Abstract in Polish

Informatyzacja oraz cyfryzacja zycia codziennego, które dokonały sie pod koniec XX i na poczatku XXI wieku spowodowały powstanie olbrzymich zbiorów danych (populacja klientów, tresci wyszukiwane, przesyłane w sieci). Bardzo czesto obecne systemy informatyczne (programisci) staja przed problemem klasyfikacji ów danych (przyporzadkowanie ich do własciwej kategorii). Owe problemy wymusiły rozwój metod klasyfikacji przy uzyciu statystyki. Klasyfikacja jest dzis szeroko wykorzystywana w swiecie nauki, biznesu, przemysłu czy medycyny. Klasyfikacja polega na mozliwie najlepszym rozdzieleniu obserwacji z róznych populacji. W pracy zostana zaprezentowane teoretyczne podstawy pieciu metod klasyfikacji. Rozdział pierwszy został oparty na notatkach własnych z wykładów oraz ksiazce "Rachunek prawdopodobienstwa dla (prawie) kazdego" autorów J. Jakubowski, R. Sztencel. Przedstawione zostały podstawowe zagadnienia rachunku prawdopodobienstwa i statystyki, niezbedne do zrozumienia problemu klasyfikacji. Kolejna czesc to wprowadzenie do metod klasyfikacji, w której przedstawione zostało zagadnienie klasyfikacji, problem budowy klasyfikatora oraz kryteria porównawcze róznych metod klasyfikacyjnych. Pierwszym z opisanych klasyfikatorów jest Naiwny klasyfikator Bayes’a, najprostsza z przedstawionych w pracy metod. Został on opisany w rozdziale trzecim, który bazuje na wykładzie dra hab. Tadeusza Pankowskiego dotyczacym klasyfikacji metoda Bayesa. Rozdział czwarty przedstawia metode oparta na regresji logistycznej. Do jej przedstawienia najpierw zdefiniowano uogólniony model liniowy (ksiazka J. J. Faraway "Extending the linear model with R. Generalized linear, mixed effects and nonparametric regression models"). Nastepnie ustalajac pewne załozenia został przedstawiony szczególny przypadek powyzszego modelu, tj. model regresji logistycznej. Czesc opisujaca metode drzew klasyfikacyjnych powstała w znacznej mierze w oparciu o pozycje "Classification and regression trees" autorów L. Breiman, H. J. Friedman, A. R. Olshen, J. C. Stone. Warto wspomniec iz jeden z autorów tj., Leo Breiman, wniósł olbrzymi wkład w dziedzine uczenia maszynowego oraz rozwój metod klasyfikacji. Ponadto wykorzystano materiały zawarte w ksiazkach P.Cichosza "Systemy uczace sie" oraz "Statystyczne systemy uczace sie" autorów J. Koronacki, J. Cwik. Powyzsze dwie pozycja sa chyba najlepszymi zródłami wiedzy na temat metody klasyfikacji, a dostepnymi w jezyku polskim. Podobnie dwa ostatnie rozdziały opieraja sie na trzech wczesniej przedstawionych pozycjach [3], [12], [2]. Przedstawiony został w nich problem utworzenia najlepszej rodziny klasyfikatorów z wczesniej juz zbudowanych (np. metoda lasów losowych korzysta z drzew klasyfikacyjnych) Ponadto w czesci poswieconej algorytmowi Adaboost posiłkowano sie ksiazka R. E. Shapire "A brief introduction to boosting".

Abstract in English

Computerization and digitization of everyday's life that have taken place at the end XX and early XXI century caused the emergence of huge collections data (customer population, searchable content, networked). Very often present information systems (programmers) are facing a problem of classification data (assigning them to the correct category). These problems have forced the development of grading methods using statistics. Classification is today widely used in the world of learning, business, industry or medicine. Classification is as good as possible separation of observations from different populations. Theoretical foundations of the five methods will be presented classification. The first chapter was based on notes from the lectures and book "Probability for (almost) any" authors" J. Jakubowski, R. Sztencel. The basic issues of the account were presented the probability and statistics needed to understand the problem classification. Another part is an introduction to the classification methods, in which presented the problem of classification, the problem of building a classifier, and comparative criteria for different classification methods. The first classifier described is Bayesian Naive Classifier, the simplest of the methods presented in the work. It has been described in the chapter the third, which is based on the lecture dr hab. Tadeusz Pankowski "Bayesian method". Fourthchapter presents a logistic regression-based approach. Down its presentation first defined a generalized linear model (book J. J. Faraway "Extending the linear model with R. Generalized linear, mixed effects and nonparametric regression models"). Then fix some the assumption is presented in a special case of the above model, ie logistic regression model. The section describing the classification tree method was developed in a considerable way based on "Classification and regression trees" of authors of L.Breiman, H. J. Friedman, A. R. Olshen, J. C. Stone. It is worth mentioning One of the authors, Leo Breiman, made a tremendous contribution to the field of learning Machine learning and the development of classification methods. Also used materials included in the books of P.Cichosz "Learning systems" and "Statistical Learning Systems" of authors J. Koronacki, J. Cwik. The above two The position is probably the best source of knowledge about the classification method, Available in Polish. Similarly, the last two chapters are based on three previously presented Positions [3], [12], [2]. The problem was presented Create the best classifier family from the earlier built (eg random forest method uses classification trees) In addition, the part devoted to the Adaboost algorithm was used R. R. Shapire's book "A brief introduction to boosting".

dc.abstract.en	Computerization and digitization of everyday's life that have taken place at the end XX and early XXI century caused the emergence of huge collections data (customer population, searchable content, networked). Very often present information systems (programmers) are facing a problem of classification data (assigning them to the correct category). These problems have forced the development of grading methods using statistics. Classification is today widely used in the world of learning, business, industry or medicine. Classification is as good as possible separation of observations from different populations. Theoretical foundations of the five methods will be presented classification. The first chapter was based on notes from the lectures and book "Probability for (almost) any" authors" J. Jakubowski, R. Sztencel. The basic issues of the account were presented the probability and statistics needed to understand the problem classification. Another part is an introduction to the classification methods, in which presented the problem of classification, the problem of building a classifier, and comparative criteria for different classification methods. The first classifier described is Bayesian Naive Classifier, the simplest of the methods presented in the work. It has been described in the chapter the third, which is based on the lecture dr hab. Tadeusz Pankowski "Bayesian method". Fourthchapter presents a logistic regression-based approach. Down its presentation first defined a generalized linear model (book J. J. Faraway "Extending the linear model with R. Generalized linear, mixed effects and nonparametric regression models"). Then fix some the assumption is presented in a special case of the above model, ie logistic regression model. The section describing the classification tree method was developed in a considerable way based on "Classification and regression trees" of authors of L.Breiman, H. J. Friedman, A. R. Olshen, J. C. Stone. It is worth mentioning One of the authors, Leo Breiman, made a tremendous contribution to the field of learning Machine learning and the development of classification methods. Also used materials included in the books of P.Cichosz "Learning systems" and "Statistical Learning Systems" of authors J. Koronacki, J. Cwik. The above two The position is probably the best source of knowledge about the classification method, Available in Polish. Similarly, the last two chapters are based on three previously presented Positions [3], [12], [2]. The problem was presented Create the best classifier family from the earlier built (eg random forest method uses classification trees) In addition, the part devoted to the Adaboost algorithm was used R. R. Shapire's book "A brief introduction to boosting".	pl
dc.abstract.pl	Informatyzacja oraz cyfryzacja zycia codziennego, które dokonały sie pod koniec XX i na poczatku XXI wieku spowodowały powstanie olbrzymich zbiorów danych (populacja klientów, tresci wyszukiwane, przesyłane w sieci). Bardzo czesto obecne systemy informatyczne (programisci) staja przed problemem klasyfikacji ów danych (przyporzadkowanie ich do własciwej kategorii). Owe problemy wymusiły rozwój metod klasyfikacji przy uzyciu statystyki. Klasyfikacja jest dzis szeroko wykorzystywana w swiecie nauki, biznesu, przemysłu czy medycyny. Klasyfikacja polega na mozliwie najlepszym rozdzieleniu obserwacji z róznych populacji. W pracy zostana zaprezentowane teoretyczne podstawy pieciu metod klasyfikacji. Rozdział pierwszy został oparty na notatkach własnych z wykładów oraz ksiazce "Rachunek prawdopodobienstwa dla (prawie) kazdego" autorów J. Jakubowski, R. Sztencel. Przedstawione zostały podstawowe zagadnienia rachunku prawdopodobienstwa i statystyki, niezbedne do zrozumienia problemu klasyfikacji. Kolejna czesc to wprowadzenie do metod klasyfikacji, w której przedstawione zostało zagadnienie klasyfikacji, problem budowy klasyfikatora oraz kryteria porównawcze róznych metod klasyfikacyjnych. Pierwszym z opisanych klasyfikatorów jest Naiwny klasyfikator Bayes’a, najprostsza z przedstawionych w pracy metod. Został on opisany w rozdziale trzecim, który bazuje na wykładzie dra hab. Tadeusza Pankowskiego dotyczacym klasyfikacji metoda Bayesa. Rozdział czwarty przedstawia metode oparta na regresji logistycznej. Do jej przedstawienia najpierw zdefiniowano uogólniony model liniowy (ksiazka J. J. Faraway "Extending the linear model with R. Generalized linear, mixed effects and nonparametric regression models"). Nastepnie ustalajac pewne załozenia został przedstawiony szczególny przypadek powyzszego modelu, tj. model regresji logistycznej. Czesc opisujaca metode drzew klasyfikacyjnych powstała w znacznej mierze w oparciu o pozycje "Classification and regression trees" autorów L. Breiman, H. J. Friedman, A. R. Olshen, J. C. Stone. Warto wspomniec iz jeden z autorów tj., Leo Breiman, wniósł olbrzymi wkład w dziedzine uczenia maszynowego oraz rozwój metod klasyfikacji. Ponadto wykorzystano materiały zawarte w ksiazkach P.Cichosza "Systemy uczace sie" oraz "Statystyczne systemy uczace sie" autorów J. Koronacki, J. Cwik. Powyzsze dwie pozycja sa chyba najlepszymi zródłami wiedzy na temat metody klasyfikacji, a dostepnymi w jezyku polskim. Podobnie dwa ostatnie rozdziały opieraja sie na trzech wczesniej przedstawionych pozycjach [3], [12], [2]. Przedstawiony został w nich problem utworzenia najlepszej rodziny klasyfikatorów z wczesniej juz zbudowanych (np. metoda lasów losowych korzysta z drzew klasyfikacyjnych) Ponadto w czesci poswieconej algorytmowi Adaboost posiłkowano sie ksiazka R. E. Shapire "A brief introduction to boosting".	pl
dc.affiliation	Wydział Matematyki i Informatyki	pl
dc.area	obszar nauk ścisłych	pl
dc.contributor.advisor	Kościelniak, Piotr - 129220	pl
dc.contributor.author	Kozakiewicz, Bartłomiej	pl
dc.contributor.departmentbycode	UJK/WMI2	pl
dc.contributor.reviewer	Kościelniak, Piotr - 129220	pl
dc.contributor.reviewer	Mazur, Marcin - 130444	pl
dc.date.accessioned	2020-07-27T12:01:27Z
dc.date.available	2020-07-27T12:01:27Z
dc.date.submitted	2017-12-12	pl
dc.fieldofstudy	matematyka finansowa	pl
dc.identifier.apd	diploma-119479-183103	pl
dc.identifier.project	APD / O	pl
dc.identifier.uri	https://ruj.uj.edu.pl/xmlui/handle/item/224470
dc.language	pol	pl
dc.subject.en	classifiaction, Naive Bayes, random forest, classification trees, Adaboost algorythm	pl
dc.subject.pl	metody klasyfikacyjne, drzew klasyfikacyjne, statystyka, Adaboost, lasy losowe, klasyfikacja, klasyfikator Bayesa	pl
dc.title	Wybrane metody klasyfikacyjne	pl
dc.title.alternative	Some methods of classification	pl
dc.type	master	pl
dspace.entity.type	Publication

dc.abstract.enpl

Computerization and digitization of everyday's life that have taken place at the end XX and early XXI century caused the emergence of huge collections data (customer population, searchable content, networked). Very often present information systems (programmers) are facing a problem of classification data (assigning them to the correct category). These problems have forced the development of grading methods using statistics. Classification is today widely used in the world of learning, business, industry or medicine. Classification is as good as possible separation of observations from different populations. Theoretical foundations of the five methods will be presented classification. The first chapter was based on notes from the lectures and book "Probability for (almost) any" authors" J. Jakubowski, R. Sztencel. The basic issues of the account were presented the probability and statistics needed to understand the problem classification. Another part is an introduction to the classification methods, in which presented the problem of classification, the problem of building a classifier, and comparative criteria for different classification methods. The first classifier described is Bayesian Naive Classifier, the simplest of the methods presented in the work. It has been described in the chapter the third, which is based on the lecture dr hab. Tadeusz Pankowski "Bayesian method". Fourthchapter presents a logistic regression-based approach. Down its presentation first defined a generalized linear model (book J. J. Faraway "Extending the linear model with R. Generalized linear, mixed effects and nonparametric regression models"). Then fix some the assumption is presented in a special case of the above model, ie logistic regression model. The section describing the classification tree method was developed in a considerable way based on "Classification and regression trees" of authors of L.Breiman, H. J. Friedman, A. R. Olshen, J. C. Stone. It is worth mentioning One of the authors, Leo Breiman, made a tremendous contribution to the field of learning Machine learning and the development of classification methods. Also used materials included in the books of P.Cichosz "Learning systems" and "Statistical Learning Systems" of authors J. Koronacki, J. Cwik. The above two The position is probably the best source of knowledge about the classification method, Available in Polish. Similarly, the last two chapters are based on three previously presented Positions [3], [12], [2]. The problem was presented Create the best classifier family from the earlier built (eg random forest method uses classification trees) In addition, the part devoted to the Adaboost algorithm was used R. R. Shapire's book "A brief introduction to boosting".

dc.abstract.plpl

Informatyzacja oraz cyfryzacja zycia codziennego, które dokonały sie pod koniec XX i na poczatku XXI wieku spowodowały powstanie olbrzymich zbiorów danych (populacja klientów, tresci wyszukiwane, przesyłane w sieci). Bardzo czesto obecne systemy informatyczne (programisci) staja przed problemem klasyfikacji ów danych (przyporzadkowanie ich do własciwej kategorii). Owe problemy wymusiły rozwój metod klasyfikacji przy uzyciu statystyki. Klasyfikacja jest dzis szeroko wykorzystywana w swiecie nauki, biznesu, przemysłu czy medycyny. Klasyfikacja polega na mozliwie najlepszym rozdzieleniu obserwacji z róznych populacji. W pracy zostana zaprezentowane teoretyczne podstawy pieciu metod klasyfikacji. Rozdział pierwszy został oparty na notatkach własnych z wykładów oraz ksiazce "Rachunek prawdopodobienstwa dla (prawie) kazdego" autorów J. Jakubowski, R. Sztencel. Przedstawione zostały podstawowe zagadnienia rachunku prawdopodobienstwa i statystyki, niezbedne do zrozumienia problemu klasyfikacji. Kolejna czesc to wprowadzenie do metod klasyfikacji, w której przedstawione zostało zagadnienie klasyfikacji, problem budowy klasyfikatora oraz kryteria porównawcze róznych metod klasyfikacyjnych. Pierwszym z opisanych klasyfikatorów jest Naiwny klasyfikator Bayes’a, najprostsza z przedstawionych w pracy metod. Został on opisany w rozdziale trzecim, który bazuje na wykładzie dra hab. Tadeusza Pankowskiego dotyczacym klasyfikacji metoda Bayesa. Rozdział czwarty przedstawia metode oparta na regresji logistycznej. Do jej przedstawienia najpierw zdefiniowano uogólniony model liniowy (ksiazka J. J. Faraway "Extending the linear model with R. Generalized linear, mixed effects and nonparametric regression models"). Nastepnie ustalajac pewne załozenia został przedstawiony szczególny przypadek powyzszego modelu, tj. model regresji logistycznej. Czesc opisujaca metode drzew klasyfikacyjnych powstała w znacznej mierze w oparciu o pozycje "Classification and regression trees" autorów L. Breiman, H. J. Friedman, A. R. Olshen, J. C. Stone. Warto wspomniec iz jeden z autorów tj., Leo Breiman, wniósł olbrzymi wkład w dziedzine uczenia maszynowego oraz rozwój metod klasyfikacji. Ponadto wykorzystano materiały zawarte w ksiazkach P.Cichosza "Systemy uczace sie" oraz "Statystyczne systemy uczace sie" autorów J. Koronacki, J. Cwik. Powyzsze dwie pozycja sa chyba najlepszymi zródłami wiedzy na temat metody klasyfikacji, a dostepnymi w jezyku polskim. Podobnie dwa ostatnie rozdziały opieraja sie na trzech wczesniej przedstawionych pozycjach [3], [12], [2]. Przedstawiony został w nich problem utworzenia najlepszej rodziny klasyfikatorów z wczesniej juz zbudowanych (np. metoda lasów losowych korzysta z drzew klasyfikacyjnych) Ponadto w czesci poswieconej algorytmowi Adaboost posiłkowano sie ksiazka R. E. Shapire "A brief introduction to boosting".

dc.affiliationpl

Wydział Matematyki i Informatyki

dc.areapl

obszar nauk ścisłych

dc.contributor.advisorpl

Kościelniak, Piotr - 129220

dc.contributor.authorpl

Kozakiewicz, Bartłomiej

dc.contributor.departmentbycodepl

UJK/WMI2

dc.contributor.reviewerpl

Kościelniak, Piotr - 129220

dc.contributor.reviewerpl

Mazur, Marcin - 130444

dc.date.accessioned

2020-07-27T12:01:27Z

dc.date.available

2020-07-27T12:01:27Z

dc.date.submittedpl

2017-12-12

dc.fieldofstudypl

matematyka finansowa

dc.identifier.apdpl

diploma-119479-183103

dc.identifier.projectpl

APD / O

dc.identifier.uri

https://ruj.uj.edu.pl/xmlui/handle/item/224470

dc.languagepl

pol

dc.subject.enpl

classifiaction, Naive Bayes, random forest, classification trees, Adaboost algorythm

dc.subject.plpl

metody klasyfikacyjne, drzew klasyfikacyjne, statystyka, Adaboost, lasy losowe, klasyfikacja, klasyfikator Bayesa

dc.titlepl

Wybrane metody klasyfikacyjne

dc.title.alternativepl

Some methods of classification

dc.typepl

master

dspace.entity.type

Publication

Affiliations

No affiliation

Kozakiewicz, Bartłomiej

Kościelniak, Piotr

Mazur, Marcin

* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.

No access

Collections

Masters theses

ROD UJ