Columns Occurrences Graph to Improve Column Prediction in Deep Learning Nlidb

Abbas, Shanza; Khan, Muhammad Umair; Lee, Scott Uk-Jin; Abbas, Asad

doi:10.3390/app112412116

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Columns Occurrences Graph to Improve Column Prediction in Deep Learning Nlidb

Full metadata record

DC Field	Value	Language
dc.contributor.author	Abbas, Shanza	-
dc.contributor.author	Khan, Muhammad Umair	-
dc.contributor.author	Lee, Scott Uk-Jin	-
dc.contributor.author	Abbas, Asad	-
dc.date.accessioned	2022-07-18T01:27:31Z	-
dc.date.available	2022-07-18T01:27:31Z	-
dc.date.issued	2021-12	-
dc.identifier.issn	2076-3417	-
dc.identifier.issn	2076-3417	-
dc.identifier.uri	https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/108067	-
dc.description.abstract	Natural language interfaces to databases (NLIDB) has been a research topic for a decade. Significant data collections are available in the form of databases. To utilize them for research purposes, a system that can translate a natural language query into a structured one can make a huge difference. Efforts toward such systems have been made with pipelining methods for more than a decade. Natural language processing techniques integrated with data science methods are researched as pipelining NLIDB systems. With significant advancements in machine learning and natural language processing, NLIDB with deep learning has emerged as a new research trend in this area. Deep learning has shown potential for rapid growth and improvement in text-to-SQL tasks. In deep learning NLIDB, closing the semantic gap in predicting users' intended columns has arisen as one of the critical and fundamental problems in this research field. Contributions toward this issue have consisted of preprocessed feature inputs and encoding schema elements afore of and more impactful to the targeted model. Various significant work contributed towards this problem notwithstanding, this has been shown to be one of the critical issues for the task of developing NLIDB. Working towards closing the semantic gap between user intention and predicted columns, we present an approach for deep learning text-to-SQL tasks that includes previous columns' occurrences scores as an additional input feature. Overall exact match accuracy can also be improved by emphasizing the improvement of columns' prediction accuracy, which depends significantly on column prediction itself. For this purpose, we extract the query fragments from previous queries' data and obtain the columns' occurrences and co-occurrences scores. Column occurrences and co-occurrences scores are processed as input features for the encoder-decoder-based text to the SQL model. These scores contribute, as a factor, the probability of having already used columns and tables together in the query history. We experimented with our approach on the currently popular text-to-SQL dataset Spider. Spider is a complex data set containing multiple databases. This dataset includes query-question pairs along with schema information. We compared our exact match accuracy performance with a base model using their test and training data splits. It outperformed the base model's accuracy, and accuracy was further boosted in experiments with the pretrained language model BERT.	-
dc.format.extent	14	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	MDPI	-
dc.title	Columns Occurrences Graph to Improve Column Prediction in Deep Learning Nlidb	-
dc.type	Article	-
dc.publisher.location	스위스	-
dc.identifier.doi	10.3390/app112412116	-
dc.identifier.scopusid	2-s2.0-85121745494	-
dc.identifier.wosid	000735495600001	-
dc.identifier.bibliographicCitation	Applied Sciences-basel, v.11, no.24, pp 1 - 14	-
dc.citation.title	Applied Sciences-basel	-
dc.citation.volume	11	-
dc.citation.number	24	-
dc.citation.startPage	1	-
dc.citation.endPage	14	-
dc.type.docType	Article	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Chemistry	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Materials Science	-
dc.relation.journalResearchArea	Physics	-
dc.relation.journalWebOfScienceCategory	Chemistry, Multidisciplinary	-
dc.relation.journalWebOfScienceCategory	Engineering, Multidisciplinary	-
dc.relation.journalWebOfScienceCategory	Materials Science, Multidisciplinary	-
dc.relation.journalWebOfScienceCategory	Physics, Applied	-
dc.subject.keywordAuthor	deep learning	-
dc.subject.keywordAuthor	text-to-SQL	-
dc.subject.keywordAuthor	natural language processing	-
dc.subject.keywordAuthor	database	-
dc.subject.keywordAuthor	machine learning	-
dc.subject.keywordAuthor	machine translation	-
dc.identifier.url	https://www.mdpi.com/2076-3417/11/24/12116	-

Files in This Item: Go to Link

Appears in Collections: COLLEGE OF COMPUTING > ERICA 컴퓨터학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Lee, Scott Uk Jin photo

Lee, Scott Uk Jin: ERICA 소프트웨어융합대학 (ERICA 컴퓨터학부)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE