Pergamos - Library and Information Center of National and Kapodistrian University of Athens

Unit:

Κατεύθυνση Στατιστική και Επιχειρησιακή Έρευνα
Library of the School of Science

Deposit date:

2021-04-26

Year:

2021

Author:

Stamatis Nikolaos

Supervisors info:

Απόστολος Μπουρνέτας (επιβλέπων), Καθηγητής, Τμήμα Μαθηματικών, ΕΚΠΑ
Κωνσταντίνος Μηλολιδάκης, Αν. Καθηγητής, Τμήμα Μαθηματικών, ΕΚΠΑ
Αντώνιος Οικονόμου, Καθηγητής, Τμήμα Μαθηματικών, ΕΚΠΑ

Original Title:

Mathematical properties of the Stochastic Approximation and the Multi-Armed Bandit problem

Languages:

English

Translated title:

Mathematical properties of the Stochastic Approximation and the Multi-Armed Bandit problem

Summary:

Despite the fact that neural networks had been used extensively for decades, a theoretical background that would explain their success was, until recently, elusive. In Chapter 2, we present the main results which settled this question, developed mostly in the early `90s. We prove Cybenko's theorem, which states that continuous and sigmoidal functions are always universal approximators, and we also study some extensions of this result.
Chapter 3 is devoted to the study of stochastic approximation algorithms. The goal of these algorithms is to determine the fixed point of an operator when its values are not known to us, but they are revealed perturbed by some noise. We also present the proof of the convergence of the Q-Learning algorithm which is based on this theory. The Q-Learning algorithm is a generalization of the successive approximation method, a method used extensively in classical dynamic programming, when we have no prior information on the underlying process (transition probabilities and cost functions), but we can only draw and observe values from it.
In the final chapter, we study the multi-armed bandit problem, a subfield of reinforcement learning, where the goal is to determine the most profitable action among a given set, while simultaneously, maximizing one's profit. We prove the Lai-Robbins lower bound, which shows that for a certain class of reward distributions there are limits to how fast one can reach a maximum profit, and we also present an algorithm that attains it. We conclude the chapter by studying the upper confidence bound algorithm, introduced by Auer et al., which resolves several issues of the Lai-Robbins approach.

Main subject category:

Science

Keywords:

universal approximation, neural networks, Cybenko's Theorem, stochastic approximation, Robbins-Monro approximation, dynamic programming, Q-Learning, multi-armed bandits, Lai-Robbins bounds

Index:

Yes

Number of index pages:

Contains images:

Yes

Number of references:

Number of pages:

161

File:

https://pergamos.lib.uoa.gr/uoa/dl/object/2944070/file.pdf

Persistent URL:

https://pergamos.lib.uoa.gr/uoa/dl/object/2944070