Mathematical properties of the Stochastic Approximation and the Multi-Armed Bandit problem

Postgraduate Thesis uoadl:2944070 260 Read counter

Unit:
Κατεύθυνση Στατιστική και Επιχειρησιακή Έρευνα
Library of the School of Science
Deposit date:
2021-04-26
Year:
2021
Author:
Stamatis Nikolaos
Supervisors info:
Απόστολος Μπουρνέτας (επιβλέπων), Καθηγητής, Τμήμα Μαθηματικών, ΕΚΠΑ
Κωνσταντίνος Μηλολιδάκης, Αν. Καθηγητής, Τμήμα Μαθηματικών, ΕΚΠΑ
Αντώνιος Οικονόμου, Καθηγητής, Τμήμα Μαθηματικών, ΕΚΠΑ
Original Title:
Mathematical properties of the Stochastic Approximation and the Multi-Armed Bandit problem
Languages:
English
Translated title:
Mathematical properties of the Stochastic Approximation and the Multi-Armed Bandit problem
Summary:
Despite the fact that neural networks had been used extensively for decades, a theoretical background that would explain their success was, until recently, elusive. In Chapter 2, we present the main results which settled this question, developed mostly in the early `90s. We prove Cybenko's theorem, which states that continuous and sigmoidal functions are always universal approximators, and we also study some extensions of this result.
Chapter 3 is devoted to the study of stochastic approximation algorithms. The goal of these algorithms is to determine the fixed point of an operator when its values are not known to us, but they are revealed perturbed by some noise. We also present the proof of the convergence of the Q-Learning algorithm which is based on this theory. The Q-Learning algorithm is a generalization of the successive approximation method, a method used extensively in classical dynamic programming, when we have no prior information on the underlying process (transition probabilities and cost functions), but we can only draw and observe values from it.
In the final chapter, we study the multi-armed bandit problem, a subfield of reinforcement learning, where the goal is to determine the most profitable action among a given set, while simultaneously, maximizing one's profit. We prove the Lai-Robbins lower bound, which shows that for a certain class of reward distributions there are limits to how fast one can reach a maximum profit, and we also present an algorithm that attains it. We conclude the chapter by studying the upper confidence bound algorithm, introduced by Auer et al., which resolves several issues of the Lai-Robbins approach.
Main subject category:
Science
Keywords:
universal approximation, neural networks, Cybenko's Theorem, stochastic approximation, Robbins-Monro approximation, dynamic programming, Q-Learning, multi-armed bandits, Lai-Robbins bounds
Index:
Yes
Number of index pages:
2
Contains images:
Yes
Number of references:
52
Number of pages:
161
Nikos Stamatis - Thesis (library).pdf (1023 KB) Open in new window