Correlation analysis of multi-layer perceptron learning process

Lewandowska, Maria

Simple view

Full metadata view

Authors

Statistics

Correlation analysis of multi-layer perceptron learning process

master

Alternative title

Analiza korelacji procesu uczenia wielowarstwowego perceptronu

Author

Lewandowska Maria

Reviewer

Ochab Jeremi

Janik Romuald

Advisor

Ochab Jeremi

Date of defence

2021-10-12

Keywords in Polish

sztuczne sieci neuronowe, wielowarstwowy perceptron, dynamika sieci neuronowych, głębokie uczenie, korelacje czasowe, korelacje przestrzenne

Keywords in English

artificial neural networks, multi-layer perceptron, neural network dynamics, deep learning, temporal correlations, spatial correlations

Language

English

Abstract in Polish

Niniejsza praca badawcza została przeprowadzona w celu zbadania dynamiki uczenia się perceptronu wielowarstwowego (MLP). Na podstawie (Saxe \textit{et al}., 2013) wyznaczono dokładny wzór analityczny na błąd średniokwadratowy dla liniowych MLP, wyprowadzono wynikające z niego równania ruchu oraz porównano otrzymane wyniki z symulacjami numerycznymi. Przedstawiony model analityczny poprawnie przewiduje stan równowagi dla wytrenowanych MLP, ale nie opisuje korelacji przestrzennych i czasowych wewnątrz wag i biasów. W dalszej części pracy przeprowadziliśmy jakościową analizę korelacji przestrzennych i czasowych w szerokiej klasie liniowych i nieliniowych MLP. Zaobserwowano, że cechą determinującą końcową skuteczność modelu jest w głównej mierze inicjalizacja wag, natomiast dynamika parametrów modelu jest kształtowana zarówno przez inicjalizację wag, jak i wybór kryterium optymalizacji. Architektura i rozmiar modelu mają znikomy wpływ na oba powyższe kryteria. W dobrze wytrenowanych modelach widma macierzy autokorelacji i korelacji krzyżowej obliczone dla wag i biasów w kolejnych warstwach pokrywają się ze sobą i są w dużej mierze równe zeru numerycznemu; W modelach overfitujących widma macierzy autokorelacji i korelacji krzyżowej obliczone dla wag i biasów w kolejnych warstwach są przesunięte względem siebie, zaś znaczna część wartości własnych jest różna od zera. Widmo macierzy autokorelacji i korelacji krzyżowej oraz rozkład funkcji straty odzwierciedlają jak dobry jest dany model, więc mogą zostać wykorzystane do oceny modelu.

Abstract in English

The following research study was conducted to investigate the learning dynamics of a multi-layer perceptron (MLP). Based on (Saxe \textit{et al}., 2013) we determined a precise analytical formula for mean square error loss function for linear MLPs, derived the equations of motion resulting from it and compared obtained results with numerical simulations. The presented analytical model predicts correctly the equilibrium state of trained MLPs but fails to describe spatial and temporal correlations within weights and biases. Subsequently, we carried out a qualitative analysis of spatial and temporal correlations in a broad class of linear and nonlinear MLPs. The following effects have been observed: the main feature determining the final model performance is weights' initialization, whereas the dynamics of the model parameters is shaped by both the weights' initialization and the choice of the optimization criterion. The architecture and the size of the model have almost negligible influence on both of the above-mentioned criteria. In well-trained models the eigenspectrum of autocorrelation and cross-correlation matrices calculated for weights and biases in consecutive layers overlap and is mostly equal to the numerical zero; On the contrary, in models that overfit, the eigenvalues are not only larger but the eigenspectrum of autocorrelation and cross-correlation matrices calculated for weights and biases in consecutive layers is shifted relative to each other. Finally, the spectrum of autocorrelation and cross-correlation matrices and the loss function landscape can be used for model evaluation, as they reflect how good MLP models are.

Views

1

dc.abstract.en	The following research study was conducted to investigate the learning dynamics of a multi-layer perceptron (MLP). Based on (Saxe \textit{et al}., 2013) we determined a precise analytical formula for mean square error loss function for linear MLPs, derived the equations of motion resulting from it and compared obtained results with numerical simulations. The presented analytical model predicts correctly the equilibrium state of trained MLPs but fails to describe spatial and temporal correlations within weights and biases. Subsequently, we carried out a qualitative analysis of spatial and temporal correlations in a broad class of linear and nonlinear MLPs. The following effects have been observed: the main feature determining the final model performance is weights' initialization, whereas the dynamics of the model parameters is shaped by both the weights' initialization and the choice of the optimization criterion. The architecture and the size of the model have almost negligible influence on both of the above-mentioned criteria. In well-trained models the eigenspectrum of autocorrelation and cross-correlation matrices calculated for weights and biases in consecutive layers overlap and is mostly equal to the numerical zero; On the contrary, in models that overfit, the eigenvalues are not only larger but the eigenspectrum of autocorrelation and cross-correlation matrices calculated for weights and biases in consecutive layers is shifted relative to each other. Finally, the spectrum of autocorrelation and cross-correlation matrices and the loss function landscape can be used for model evaluation, as they reflect how good MLP models are.	pl
dc.abstract.pl	Niniejsza praca badawcza została przeprowadzona w celu zbadania dynamiki uczenia się perceptronu wielowarstwowego (MLP). Na podstawie (Saxe \textit{et al}., 2013) wyznaczono dokładny wzór analityczny na błąd średniokwadratowy dla liniowych MLP, wyprowadzono wynikające z niego równania ruchu oraz porównano otrzymane wyniki z symulacjami numerycznymi. Przedstawiony model analityczny poprawnie przewiduje stan równowagi dla wytrenowanych MLP, ale nie opisuje korelacji przestrzennych i czasowych wewnątrz wag i biasów. W dalszej części pracy przeprowadziliśmy jakościową analizę korelacji przestrzennych i czasowych w szerokiej klasie liniowych i nieliniowych MLP. Zaobserwowano, że cechą determinującą końcową skuteczność modelu jest w głównej mierze inicjalizacja wag, natomiast dynamika parametrów modelu jest kształtowana zarówno przez inicjalizację wag, jak i wybór kryterium optymalizacji. Architektura i rozmiar modelu mają znikomy wpływ na oba powyższe kryteria. W dobrze wytrenowanych modelach widma macierzy autokorelacji i korelacji krzyżowej obliczone dla wag i biasów w kolejnych warstwach pokrywają się ze sobą i są w dużej mierze równe zeru numerycznemu; W modelach overfitujących widma macierzy autokorelacji i korelacji krzyżowej obliczone dla wag i biasów w kolejnych warstwach są przesunięte względem siebie, zaś znaczna część wartości własnych jest różna od zera. Widmo macierzy autokorelacji i korelacji krzyżowej oraz rozkład funkcji straty odzwierciedlają jak dobry jest dany model, więc mogą zostać wykorzystane do oceny modelu.	pl
dc.affiliation	Wydział Fizyki, Astronomii i Informatyki Stosowanej	pl
dc.area	obszar nauk ścisłych	pl
dc.contributor.advisor	Ochab, Jeremi	pl
dc.contributor.author	Lewandowska, Maria	pl
dc.contributor.departmentbycode	UJK/WFAIS	pl
dc.contributor.reviewer	Ochab, Jeremi	pl
dc.contributor.reviewer	Janik, Romuald - 100502	pl
dc.date.accessioned	2021-10-14T21:37:46Z
dc.date.available	2021-10-14T21:37:46Z
dc.date.submitted	2021-10-12	pl
dc.fieldofstudy	fizyka	pl
dc.identifier.apd	diploma-150456-228950	pl
dc.identifier.project	APD / O	pl
dc.identifier.uri	https://ruj.uj.edu.pl/xmlui/handle/item/280468
dc.language	eng	pl
dc.subject.en	artificial neural networks, multi-layer perceptron, neural network dynamics, deep learning, temporal correlations, spatial correlations	pl
dc.subject.pl	sztuczne sieci neuronowe, wielowarstwowy perceptron, dynamika sieci neuronowych, głębokie uczenie, korelacje czasowe, korelacje przestrzenne	pl
dc.title	Correlation analysis of multi-layer perceptron learning process	pl
dc.title.alternative	Analiza korelacji procesu uczenia wielowarstwowego perceptronu	pl
dc.type	master	pl
dspace.entity.type	Publication

Affiliations

No affiliation

Lewandowska, Maria

* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.

Views

1 Views per month

Views per city

Wroclaw

1

No access

Collections

Masters theses

ROD UJ