Unit:
Specialty Language TechnologyΠληροφορική
Supervisors info:
ΕΠΙΒΛΕΠΟΥΣΑ: Αικατερίνη Γκίρτζου, Επιστημονική Συνεργάτιδα, Ινστιτούτο Επεξεργασίας Λόγου
ΕΞΕΤΑΣΤΙΚΗ ΕΠΙΤΡΟΠΗ: Αθανάσιος Κατσαμάνης, Διευθυντής Ερευνών, Ινστιτούτο Επεξεργασίας Λόγου
Δημήτριος Γαλάνης, Κύριος Ερευνητής, Ινστιτούτο Επεξεργασίας Λόγου
Original Title:
The Best of Both Worlds? Exploring a Hybrid Approach in RAG
Translated title:
The Best of Both Worlds? Exploring a Hybrid Approach in RAG
Summary:
This study investigates the impact of implementing structured and unstructured data retrieval in Retrieval-Augmented Generation (RAG) systems for Large Language Models (LLMs). The primary objective is to assess whether a HybridRAG approach, which combines DocumentRAG (vector-based retrieval) and GraphRAG (structured knowledge retrieval) can outperform the individual approaches. To achieve this, a dataset of news articles was collected and processed into both textual and graph-based representations. The study systematically compares the performance of three retrieval setups (DocumentRAG, GraphRAG, and HybridRAG) while using the LLM itself, Mistral-7B-v0.3, as a baseline for comparison. The approaches are evaluated through standard metrics (ROUGE-2, ROUGE-L, and BERTScore). Findings indicate that DocumentRAG consistently outperforms GraphRAG, while HybridRAG does not show significant improvements over DocumentRAG, despite its theoretical advantages. To further investigate these findings, a second experiment was conducted on a selected subset of data, incorporating graph augmentation and a more refined entity extraction, aiming to improve GraphRAG and consequently, HybridRAG. While these modifications led to some improvements in GraphRAG’s performance, HybridRAG did not show a clear advantage over DocumentRAG. Ultimately, it is concluded that while HybridRAG holds promise, the entire pipeline must be further optimized and refined to enhance the LLM’s question-answering capabilities. Consequently, further research is required to achieve the desired improvements. However, the experimentation process was both insightful and enjoyable, providing a deeper understanding of RAG in general.
Main subject category:
Technology - Computer science
Keywords:
document rag, graph rag, hybrid rag, question answering, knowledge graphs, graph augmentation, entity extraction
File:
File access is restricted only to the intranet of UoA.
Masters_thesis_eleni_mpatsi.pdf
2 MB
File access is restricted only to the intranet of UoA.