Evaluating the capabilities of LLMs in geospatial question answering and geospatial reasoning

Postgraduate Thesis uoadl:3463272 16 Read counter

Unit:
Κατεύθυνση / ειδίκευση Τεχνολογίες Πληροφορικής και Επικοινωνιών (ΤΠΕ)
Πληροφορική
Deposit date:
2025-02-09
Year:
2025
Author:
Karagiannis Evangelos-Emmanouil
Supervisors info:
Μανόλης Κουμπαράκης, Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, Εθνικού και Καποδιστριακού Πανεπιστημίου Αθηνών

Χατζηευθυμιάδης Ευστάθιος, Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, Εθνικού και Καποδιστριακού Πανεπιστημίου Αθηνών

Ιωάννης Παναγάκης, Αναπληρωτής Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, Εθνικού και Καποδιστριακού Πανεπιστημίου Αθηνών
Original Title:
Evaluating the capabilities of LLMs in geospatial question answering and geospatial reasoning
Languages:
Greek
Translated title:
Evaluating the capabilities of LLMs in geospatial question answering and geospatial reasoning
Summary:
In this thesis, we aim to evaluate LLMs in terms of their ability to correctly answer geospatial
questions. For the purposes of the aforementioned evaluation, we used the GeoQuestions1089
benchmark, which includes geospatial questions along with their respective answers
and SPARQL or GeoSPARQL queries. In the case where the answers are close to
identical, we classify the LLM’s answer as (mostly) correct. If the answers are not identical
but share some common elements, we consider them partially correct. Conversely, if they
share no common elements, the LLM’s answer is classified as incorrect.
The second aspect of this assignment relates to the ability of LLMs to understand and
process spatial information, as well as to deduce relationships between objects in two dimensional
space. More specifically, given certain graphs and providing the models with
the spatial relations between some of their nodes as input, we investigate whether the
models can infer other relationships.
After extracting the useful information from GeoQuestions1089 and removing the questions
containing polygons, we categorize the remaining questions into Binary, Descriptive,
and Quantitative types. This categorization is necessary because each question type requires
a different evaluation methodology.
The results showed that the models perform well in correctly evaluating the first and third
categories of questions, which are considered easier, but struggle with the second category.
Specifically, large models are capable of addressing questions from the second
category but not always successfully. Smaller models have greater difficulty with the
second category of questions and are more inconsistent, meaning they may provide different
answers to the same question.
Main subject category:
Technology - Computer science
Keywords:
Deep Learning, SPARQL, GeoSPARQL, Knowledge Graphs, Large Language Models
Index:
Yes
Number of index pages:
2
Contains images:
Yes
Number of references:
29
Number of pages:
80
Karagiannis_Evangelos_Emmanouil_MSc.pdf (1 MB) Open in new window