Summary:
While data guides and influences many human activities, the barriers posed by the tools that are needed to retrieve it, such as Structured Query Language (SQL), make data inaccessible for many users. To lift these barriers, researchers have been working on creating natural language interfaces that would allow users to access databases solely through natural language.
Natural language interfaces employ Text-to-SQL systems that can translate a natural language question from the user to an SQL query that can retrieve the data they need. Recently, novel Text-to-SQL systems are adopting deep learning methods with very promising results. At the same time, several challenges remain open, making this area an active and flourishing field of research and development. To make real progress in building Text-to-SQL systems, we need to de-mystify what has been done, understand how and when each approach can be used, and, finally, identify the research challenges ahead of us. We present a detailed taxonomy of neural Text-to-SQL systems that will enable a deeper study of all the parts of such a system. This taxonomy will allow us to make a better comparison between different approaches, as well as highlight specific challenges in each step of the process, thus enabling researchers to better strategize their quest towards the ``holy grail" of database accessibility.
However, how can the user verify that the generated SQL query matches their intent if they are not familiar with SQL? To tackle this problem, a system that can translate the SQL query back to natural language is needed (also known as an SQL-to-Text system). We explore the SQL-to-Text problem, we examine its challenges and peculiarities, and present a Transformer-based model that can generate fluent query explanations. Additionally, we look into the difficulties of automatically evaluating the performance of such a system and we examine how different metrics behave in the SQL-to-Text setting.
Keywords:
Semantic Parsing, Natural Language Generation, Databases, Deep Learning, Metric Learning, Machine Translation