Mathematical Models and Algorithms for Contextual Multi-armed Bandit Problems

Postgraduate Thesis uoadl:3356103 52 Read counter

Unit:
Κατεύθυνση Στατιστική και Επιχειρησιακή Έρευνα
Library of the School of Science
Deposit date:
2023-09-18
Year:
2023
Author:
Zacharis Dimitrios
Supervisors info:
Απόστολος Μπουρνέτας, Καθηγητής, Τμήμα Μαθηματικών ΕΚΠΑ,
Παναγιώτης Μερτικόπουλος, Καθηγητής Τμήμα Μαθηματικών ΕΚΠΑ,
Αντώνης Οικονόμου, Καθηγητής, Τμήμα Μαθηματικών ΕΚΠΑ
Original Title:
Mathematical Models and Algorithms for Contextual Multi-armed Bandit Problems
Languages:
English
Translated title:
Mathematical Models and Algorithms for Contextual Multi-armed Bandit Problems
Summary:
This thesis deals with a special class of bandit problems, contextual bandits, and algorithms for learning problems. Contextual bandits belong to the field of reinforcement learning and in such a problem, the algorithm has to make decisions about
actions based on contexts, which include information about the current state of the system
and possibly previous results collected. The goal of the algorithm is to learn a policy
that, over time, will select actions with the greatest potential payoff.
The paper will discuss basic concepts, definitions, examples and applications related to
bandit problems. After analyzing the main results on stochastic and adversarial bandits,
we will deepen the thesis on theorems and algorithms related to contextual bandits, with a focus on
Thompson Sampling algorithm. We will also design simulation studies to evaluate the algorithms
Thompson and LinUCB on different classes of reinforcement learning problems.
Main subject category:
Science
Keywords:
Bandit problems,regret,contextual bandits,contexts,learner,reward,environment,stochastic bandits,LinUCB,Thompson Sampling,adversarial bandits
Index:
No
Number of index pages:
0
Contains images:
Yes
Number of references:
15
Number of pages:
52
Diploma thesis .pdf (1 MB) Open in new window