Accidental exploration through value predictors

2018
journal article
article
dc.abstract.enInfinite length of trajectories is an almost universal assumption in the theoretical foundations of reinforcement learning. In practice learning occurs on finite trajectories. In this paper we examine a specific result of this disparity, namely a strong bias of the time-bounded Every-visit Monte Carlo value estimator. This manifests as a vastly different learning dynamic for algorithms that use value predictors, including encouraging or discouraging exploration. We investigate these claims theoretically for a one dimensional random walk, and empirically on a number of simple environments. We use GAE as an algorithm involving a value predictor and evolution strategies as a reference point.pl
dc.affiliationWydział Matematyki i Informatykipl
dc.contributor.authorLeśniak, Damian - 165389 pl
dc.contributor.authorKisielewski, Tomasz - 175553 pl
dc.date.accession2024-02-19pl
dc.date.accessioned2021-10-04T16:34:34Z
dc.date.available2021-10-04T16:34:34Z
dc.date.issued2018pl
dc.date.openaccess0
dc.description.accesstimew momencie opublikowania
dc.description.additionalDOI artykułu: 10.4467/20838476SI.18.009.10414(nie jest aktywne)pl
dc.description.physical107-127pl
dc.description.versionostateczna wersja wydawcy
dc.description.volume27pl
dc.identifier.doi10.4467/20838476SI.18.009.10414pl
dc.identifier.eissn2083-8476pl
dc.identifier.issn1732-3916pl
dc.identifier.projectROD UJ / Opl
dc.identifier.urihttps://ruj.uj.edu.pl/xmlui/handle/item/279475
dc.identifier.weblinkhttps://www.ejournals.eu/Schedae-Informaticae/2018/Volume-27/art/13932/pl
dc.languageengpl
dc.language.containerengpl
dc.rightsDodaję tylko opis bibliograficzny*
dc.rights.licenceCC-BY-NC-ND
dc.rights.uri*
dc.share.typeotwarte czasopismo
dc.subject.enreinforcement learningpl
dc.subject.envalue predictorspl
dc.subject.enexplorationpl
dc.subtypeArticlepl
dc.titleAccidental exploration through value predictorspl
dc.title.journalSchedae Informaticaepl
dc.typeJournalArticlepl
dspace.entity.typePublication
dc.abstract.enpl
Infinite length of trajectories is an almost universal assumption in the theoretical foundations of reinforcement learning. In practice learning occurs on finite trajectories. In this paper we examine a specific result of this disparity, namely a strong bias of the time-bounded Every-visit Monte Carlo value estimator. This manifests as a vastly different learning dynamic for algorithms that use value predictors, including encouraging or discouraging exploration. We investigate these claims theoretically for a one dimensional random walk, and empirically on a number of simple environments. We use GAE as an algorithm involving a value predictor and evolution strategies as a reference point.
dc.affiliationpl
Wydział Matematyki i Informatyki
dc.contributor.authorpl
Leśniak, Damian - 165389
dc.contributor.authorpl
Kisielewski, Tomasz - 175553
dc.date.accessionpl
2024-02-19
dc.date.accessioned
2021-10-04T16:34:34Z
dc.date.available
2021-10-04T16:34:34Z
dc.date.issuedpl
2018
dc.date.openaccess
0
dc.description.accesstime
w momencie opublikowania
dc.description.additionalpl
DOI artykułu: 10.4467/20838476SI.18.009.10414(nie jest aktywne)
dc.description.physicalpl
107-127
dc.description.version
ostateczna wersja wydawcy
dc.description.volumepl
27
dc.identifier.doipl
10.4467/20838476SI.18.009.10414
dc.identifier.eissnpl
2083-8476
dc.identifier.issnpl
1732-3916
dc.identifier.projectpl
ROD UJ / O
dc.identifier.uri
https://ruj.uj.edu.pl/xmlui/handle/item/279475
dc.identifier.weblinkpl
https://www.ejournals.eu/Schedae-Informaticae/2018/Volume-27/art/13932/
dc.languagepl
eng
dc.language.containerpl
eng
dc.rights*
Dodaję tylko opis bibliograficzny
dc.rights.licence
CC-BY-NC-ND
dc.rights.uri*
dc.share.type
otwarte czasopismo
dc.subject.enpl
reinforcement learning
dc.subject.enpl
value predictors
dc.subject.enpl
exploration
dc.subtypepl
Article
dc.titlepl
Accidental exploration through value predictors
dc.title.journalpl
Schedae Informaticae
dc.typepl
JournalArticle
dspace.entity.type
Publication
Affiliations

* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.

Views
7
Views per month
Views per city
Ashburn
3
Wroclaw
2
Dublin
1

No access

No Thumbnail Available