Deep-Hedge MCCFR: An algorithm for solving imperfect information games.

Ziemiński, Marcin

Simple view

Full metadata view

Authors

Statistics

Deep-Hedge MCCFR: An algorithm for solving imperfect information games.

master

Alternative title

Deep-Hedge MCCFR: Algorytm rozwiązywania gier z niepełną informacją

Author

Ziemiński Marcin

Reviewer

Wrona Michał

Walczak Bartosz

Advisor

Walczak Bartosz

Date of defence

2020-10-29

Keywords in Polish

teoria gier, reinforcement learning, counterfactual regret minimization, poker, deep learning

Keywords in English

game theory, reinforcement learning, counterfactual regret minimization, poker, deep learning

Language

English

Abstract in Polish

W ostatnim czasie poczyniono znaczne postępy w dziedzinie algorytmów uczących się rozwiązywać trudne problemy decyzyjne. Dominujące rozwiązania oparte o deep reinforcement learning (Deep RL) okazały się niezwykle skuteczne, pokonując najlepszych ludzkich graczy w gry takie jak Go, StarCraft II lub te pochodzące z Atari. Algorytmy te dostosowują swoje zachowania na podstawie zgromadzonego doświadczenia i otrzymywanych sygnałów.Jednak z drugiej strony, metody te nie są dobrze dostosowane do środowisk wieloagentowych, które z natury są niestacjonarne. Agenci funkcjonujący w takim kontekście równolegle kształtują swoje strategie, wpływając na zmienny charakter całego środowiska. Ponadto, działają oni w realiach niepełnej informacji, gdyż nie mają wglądu w wiedzę, do której dostęp mają wyłącznie ich przeciwnicy. To wszystko sprawia, że problem optymalizacji jest trudniejszy, ale przy tym podatny na analizę z punktu widzenia teorii gier.Teoria gier znajduje się u podstaw dominujących algorytmów dla gier o niepełnej informacji. Program o nazwie Pluribus, bazujący na algorytmie Counterfactual Regret Minimization (CFR), pokonał czołowych profesjonalnych graczy w sześcioosobowym No-Limit Texas Hold'em. CFR to adaptacyjny algorytm, który agreguje informacje dla każdego stanu gry w sposób tabelaryczny, co służy do usprawnienia strategii w następujących po sobie iteracjach. Jednak zastosowanie tego algorytmu w przypadku gier o dużym rozmiarze wymaga szerokiej wiedzy eksperckiej i przeszukiwania drzewa gry w czasie wykonania.W tej pracy staramy się połączyć dwa powyższe paradygmaty. Proponujemy rozwiązanie oparte o CFR, które nawiązuje do rozwoju dokonanego w dziedzinie, którą jest deep learning. Nasz algorytm, który nazywamy Deep-Hedge MCCFR, modeluje strategie poszczególnych agentów przy użyciu sieci neuronowych. Strategie te są ulepszane dzięki doświadczeniu zdobytemu podczas symulowanej gry. Pokazujemy, że proponowany algorytm osiąga w praktyce dobre wyniki dla różnych gier, nie wymagając przy tym wiedzy eksperckiej.

Abstract in English

There have been much progress in the area of training agents to tackle difficult sequential decision-making problems. The dominating solutions from the realm of deep reinforcement learning (Deep RL) proved to be extremely successful surpassing the top human players at Atari games or Go. These algorithms adapt their behaviours directly from their experience and received rewards. But on the downside, the Deep RL methods are not well suited to multi-agent environments, which are non-stationary by nature, as the probability distribution over possible outcomes may change over time. The agents within the same environment learn their strategies in parallel and modify their behaviour. What is more, they may have access to private information, concealed from the others. This makes the optimization problem more difficult, but at the same time amenable to game-theoretic analysis. Game theory is at the centre of the prevailing approaches for solving imperfect information competitive games. The Counterfactual Regret Minimization (CFR) algorithm was the base for the poker AI called Pluribus, which defeated the best human professionals in six-player no-limit Texas hold'em. CFR is an adaptive tabular algorithm, which means that it aggregates information per each state of the game individually and uses it to iteratively adapt the strategy. However, in order to be feasible for large games, it requires a significant portion of domain-specific knowledge and a look-ahead search during the execution. In this work we try to combine the two aforementioned paradigms. We propose an algorithm founded on Counterfactual Regret Minimization, which utilizes the advancements of deep learning. Our method, called Deep-Hedge MCCFR, uses neural networks to model the agents' strategies. The strategies are improved through the experience gathered during self-play. We show that the algorithm does not require domain expertise and is applicable to various scenarios in unmodified form.

dc.abstract.en	There have been much progress in the area of training agents to tackle difficult sequential decision-making problems. The dominating solutions from the realm of deep reinforcement learning (Deep RL) proved to be extremely successful surpassing the top human players at Atari games or Go. These algorithms adapt their behaviours directly from their experience and received rewards. But on the downside, the Deep RL methods are not well suited to multi-agent environments, which are non-stationary by nature, as the probability distribution over possible outcomes may change over time. The agents within the same environment learn their strategies in parallel and modify their behaviour. What is more, they may have access to private information, concealed from the others. This makes the optimization problem more difficult, but at the same time amenable to game-theoretic analysis. Game theory is at the centre of the prevailing approaches for solving imperfect information competitive games. The Counterfactual Regret Minimization (CFR) algorithm was the base for the poker AI called Pluribus, which defeated the best human professionals in six-player no-limit Texas hold'em. CFR is an adaptive tabular algorithm, which means that it aggregates information per each state of the game individually and uses it to iteratively adapt the strategy. However, in order to be feasible for large games, it requires a significant portion of domain-specific knowledge and a look-ahead search during the execution. In this work we try to combine the two aforementioned paradigms. We propose an algorithm founded on Counterfactual Regret Minimization, which utilizes the advancements of deep learning. Our method, called Deep-Hedge MCCFR, uses neural networks to model the agents' strategies. The strategies are improved through the experience gathered during self-play. We show that the algorithm does not require domain expertise and is applicable to various scenarios in unmodified form.	pl
dc.abstract.pl	W ostatnim czasie poczyniono znaczne postępy w dziedzinie algorytmów uczących się rozwiązywać trudne problemy decyzyjne. Dominujące rozwiązania oparte o deep reinforcement learning (Deep RL) okazały się niezwykle skuteczne, pokonując najlepszych ludzkich graczy w gry takie jak Go, StarCraft II lub te pochodzące z Atari. Algorytmy te dostosowują swoje zachowania na podstawie zgromadzonego doświadczenia i otrzymywanych sygnałów.Jednak z drugiej strony, metody te nie są dobrze dostosowane do środowisk wieloagentowych, które z natury są niestacjonarne. Agenci funkcjonujący w takim kontekście równolegle kształtują swoje strategie, wpływając na zmienny charakter całego środowiska. Ponadto, działają oni w realiach niepełnej informacji, gdyż nie mają wglądu w wiedzę, do której dostęp mają wyłącznie ich przeciwnicy. To wszystko sprawia, że problem optymalizacji jest trudniejszy, ale przy tym podatny na analizę z punktu widzenia teorii gier.Teoria gier znajduje się u podstaw dominujących algorytmów dla gier o niepełnej informacji. Program o nazwie Pluribus, bazujący na algorytmie Counterfactual Regret Minimization (CFR), pokonał czołowych profesjonalnych graczy w sześcioosobowym No-Limit Texas Hold'em. CFR to adaptacyjny algorytm, który agreguje informacje dla każdego stanu gry w sposób tabelaryczny, co służy do usprawnienia strategii w następujących po sobie iteracjach. Jednak zastosowanie tego algorytmu w przypadku gier o dużym rozmiarze wymaga szerokiej wiedzy eksperckiej i przeszukiwania drzewa gry w czasie wykonania.W tej pracy staramy się połączyć dwa powyższe paradygmaty. Proponujemy rozwiązanie oparte o CFR, które nawiązuje do rozwoju dokonanego w dziedzinie, którą jest deep learning. Nasz algorytm, który nazywamy Deep-Hedge MCCFR, modeluje strategie poszczególnych agentów przy użyciu sieci neuronowych. Strategie te są ulepszane dzięki doświadczeniu zdobytemu podczas symulowanej gry. Pokazujemy, że proponowany algorytm osiąga w praktyce dobre wyniki dla różnych gier, nie wymagając przy tym wiedzy eksperckiej.	pl
dc.affiliation	Wydział Matematyki i Informatyki	pl
dc.area	obszar nauk ścisłych	pl
dc.contributor.advisor	Walczak, Bartosz	pl
dc.contributor.author	Ziemiński, Marcin	pl
dc.contributor.departmentbycode	UJK/WMI2	pl
dc.contributor.reviewer	Wrona, Michał	pl
dc.contributor.reviewer	Walczak, Bartosz	pl
dc.date.accessioned	2020-11-08T23:33:28Z
dc.date.available	2020-11-08T23:33:28Z
dc.date.submitted	2020-10-29	pl
dc.fieldofstudy	informatyka analityczna	pl
dc.identifier.apd	diploma-145774-143879	pl
dc.identifier.project	APD / O	pl
dc.identifier.uri	https://ruj.uj.edu.pl/xmlui/handle/item/253187
dc.language	eng	pl
dc.subject.en	game theory, reinforcement learning, counterfactual regret minimization, poker, deep learning	pl
dc.subject.pl	teoria gier, reinforcement learning, counterfactual regret minimization, poker, deep learning	pl
dc.title	Deep-Hedge MCCFR: An algorithm for solving imperfect information games.	pl
dc.title.alternative	Deep-Hedge MCCFR: Algorytm rozwiązywania gier z niepełną informacją	pl
dc.type	master	pl
dspace.entity.type	Publication

dc.abstract.enpl

There have been much progress in the area of training agents to tackle difficult sequential decision-making problems. The dominating solutions from the realm of deep reinforcement learning (Deep RL) proved to be extremely successful surpassing the top human players at Atari games or Go. These algorithms adapt their behaviours directly from their experience and received rewards. But on the downside, the Deep RL methods are not well suited to multi-agent environments, which are non-stationary by nature, as the probability distribution over possible outcomes may change over time. The agents within the same environment learn their strategies in parallel and modify their behaviour. What is more, they may have access to private information, concealed from the others. This makes the optimization problem more difficult, but at the same time amenable to game-theoretic analysis. Game theory is at the centre of the prevailing approaches for solving imperfect information competitive games. The Counterfactual Regret Minimization (CFR) algorithm was the base for the poker AI called Pluribus, which defeated the best human professionals in six-player no-limit Texas hold'em. CFR is an adaptive tabular algorithm, which means that it aggregates information per each state of the game individually and uses it to iteratively adapt the strategy. However, in order to be feasible for large games, it requires a significant portion of domain-specific knowledge and a look-ahead search during the execution. In this work we try to combine the two aforementioned paradigms. We propose an algorithm founded on Counterfactual Regret Minimization, which utilizes the advancements of deep learning. Our method, called Deep-Hedge MCCFR, uses neural networks to model the agents' strategies. The strategies are improved through the experience gathered during self-play. We show that the algorithm does not require domain expertise and is applicable to various scenarios in unmodified form.

dc.abstract.plpl

W ostatnim czasie poczyniono znaczne postępy w dziedzinie algorytmów uczących się rozwiązywać trudne problemy decyzyjne. Dominujące rozwiązania oparte o deep reinforcement learning (Deep RL) okazały się niezwykle skuteczne, pokonując najlepszych ludzkich graczy w gry takie jak Go, StarCraft II lub te pochodzące z Atari. Algorytmy te dostosowują swoje zachowania na podstawie zgromadzonego doświadczenia i otrzymywanych sygnałów.Jednak z drugiej strony, metody te nie są dobrze dostosowane do środowisk wieloagentowych, które z natury są niestacjonarne. Agenci funkcjonujący w takim kontekście równolegle kształtują swoje strategie, wpływając na zmienny charakter całego środowiska. Ponadto, działają oni w realiach niepełnej informacji, gdyż nie mają wglądu w wiedzę, do której dostęp mają wyłącznie ich przeciwnicy. To wszystko sprawia, że problem optymalizacji jest trudniejszy, ale przy tym podatny na analizę z punktu widzenia teorii gier.Teoria gier znajduje się u podstaw dominujących algorytmów dla gier o niepełnej informacji. Program o nazwie Pluribus, bazujący na algorytmie Counterfactual Regret Minimization (CFR), pokonał czołowych profesjonalnych graczy w sześcioosobowym No-Limit Texas Hold'em. CFR to adaptacyjny algorytm, który agreguje informacje dla każdego stanu gry w sposób tabelaryczny, co służy do usprawnienia strategii w następujących po sobie iteracjach. Jednak zastosowanie tego algorytmu w przypadku gier o dużym rozmiarze wymaga szerokiej wiedzy eksperckiej i przeszukiwania drzewa gry w czasie wykonania.W tej pracy staramy się połączyć dwa powyższe paradygmaty. Proponujemy rozwiązanie oparte o CFR, które nawiązuje do rozwoju dokonanego w dziedzinie, którą jest deep learning. Nasz algorytm, który nazywamy Deep-Hedge MCCFR, modeluje strategie poszczególnych agentów przy użyciu sieci neuronowych. Strategie te są ulepszane dzięki doświadczeniu zdobytemu podczas symulowanej gry. Pokazujemy, że proponowany algorytm osiąga w praktyce dobre wyniki dla różnych gier, nie wymagając przy tym wiedzy eksperckiej.

dc.affiliationpl

Wydział Matematyki i Informatyki

dc.areapl

obszar nauk ścisłych

dc.contributor.advisorpl

Walczak, Bartosz

dc.contributor.authorpl

Ziemiński, Marcin

dc.contributor.departmentbycodepl

UJK/WMI2

dc.contributor.reviewerpl

Wrona, Michał

dc.contributor.reviewerpl

Walczak, Bartosz

dc.date.accessioned

2020-11-08T23:33:28Z

dc.date.available

2020-11-08T23:33:28Z

dc.date.submittedpl

2020-10-29

dc.fieldofstudypl

informatyka analityczna

dc.identifier.apdpl

diploma-145774-143879

dc.identifier.projectpl

APD / O

dc.identifier.uri

https://ruj.uj.edu.pl/xmlui/handle/item/253187

dc.languagepl

eng

dc.subject.enpl

game theory, reinforcement learning, counterfactual regret minimization, poker, deep learning

dc.subject.plpl

teoria gier, reinforcement learning, counterfactual regret minimization, poker, deep learning

dc.titlepl

Deep-Hedge MCCFR: An algorithm for solving imperfect information games.

dc.title.alternativepl

Deep-Hedge MCCFR: Algorytm rozwiązywania gier z niepełną informacją

dc.typepl

master

dspace.entity.type

Publication

Affiliations

No affiliation

Ziemiński, Marcin

Wrona, Michał

Walczak, Bartosz

* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.

Views

46 Views per month

Views per city

Wroclaw

9

Nomimachi

6

Brasov

3

Dublin

1

Edinburgh

1

Edmonton

1

Falkenstein

1

Izumi-honcho

1

Krakow

1

La Boissière-des-Landes

1

No access

Collections

Masters theses

ROD UJ