Detailed Information

Cited 4 time in webofscience Cited 7 time in scopus
Metadata Downloads

A Deep-Learned Embedding Technique for Categorical Features Encoding

Full metadata record
DC Field Value Language
dc.contributor.authorDahouda, Mwamba Kasongo-
dc.contributor.authorJoe, Inwhee-
dc.date.accessioned2022-07-06T16:01:47Z-
dc.date.available2022-07-06T16:01:47Z-
dc.date.created2021-11-22-
dc.date.issued2021-08-
dc.identifier.issn2169-3536-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/141386-
dc.description.abstractMany machine learning algorithms and almost all deep learning architectures are incapable of processing plain texts in their raw form. This means that their input to the algorithms must be numerical in order to solve classification or regression problems. Hence, it is necessary to encode these categorical variables into numerical values using encoding techniques. Categorical features are common and often of high cardinality. One-hot encoding in such circumstances leads to very high dimensional vector representations, raising memory and computability concerns for machine learning models. This paper proposes a deep-learned embedding technique for categorical features encoding on categorical datasets. Our technique is a distributed representation for categorical features where each category is mapped to a distinct vector, and the properties of the vector are learned while training a neural network. First, we create a data vocabulary that includes only categorical data, and then we use word tokenization to make each categorical data a single word. After that, feature learning is introduced to map all of the categorical data from the vocabulary to word vectors. Three different datasets provided by the University of California Irvine (UCI) are used for training. The experimental results show that the proposed deep-learned embedding technique for categorical data provides a higher F1 score of 89% than 71% of one-hot encoding, in the case of the Long short-term memory (LSTM) model. Moreover, the deep-learned embedding technique uses less memory and generates fewer features than one-hot encoding.-
dc.language영어-
dc.language.isoen-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleA Deep-Learned Embedding Technique for Categorical Features Encoding-
dc.typeArticle-
dc.contributor.affiliatedAuthorJoe, Inwhee-
dc.identifier.doi10.1109/ACCESS.2021.3104357-
dc.identifier.scopusid2-s2.0-85113332296-
dc.identifier.wosid000686754800001-
dc.identifier.bibliographicCitationIEEE ACCESS, v.9, pp.114381 - 114391-
dc.relation.isPartOfIEEE ACCESS-
dc.citation.titleIEEE ACCESS-
dc.citation.volume9-
dc.citation.startPage114381-
dc.citation.endPage114391-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.description.journalClass1-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalResearchAreaTelecommunications-
dc.relation.journalWebOfScienceCategoryComputer Science, Information Systems-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.relation.journalWebOfScienceCategoryTelecommunications-
dc.subject.keywordPlusClustering algorithms-
dc.subject.keywordPlusDeep learning-
dc.subject.keywordPlusEmbeddings-
dc.subject.keywordPlusEncoding (symbols)-
dc.subject.keywordPlusLearning systems-
dc.subject.keywordPlusLong short-term memory-
dc.subject.keywordPlusSignal encoding-
dc.subject.keywordPlusState assignment-
dc.subject.keywordPlusVectors-
dc.subject.keywordPlusCategorical datasets-
dc.subject.keywordPlusCategorical features-
dc.subject.keywordPlusCategorical variables-
dc.subject.keywordPlusDistributed representation-
dc.subject.keywordPlusEmbedding technique-
dc.subject.keywordPlusLearning architectures-
dc.subject.keywordPlusMachine learning models-
dc.subject.keywordPlusUniversity of California-
dc.subject.keywordPlusLearning algorithms-
dc.subject.keywordAuthorEncoding-
dc.subject.keywordAuthorNumerical models-
dc.subject.keywordAuthorMachine learning-
dc.subject.keywordAuthorData models-
dc.subject.keywordAuthorTraining-
dc.subject.keywordAuthorBiological neural networks-
dc.subject.keywordAuthorComputational modeling-
dc.subject.keywordAuthorData preprocessing-
dc.subject.keywordAuthorcategorical variables-
dc.subject.keywordAuthornatural language processing-
dc.subject.keywordAuthormachine learning-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/9512057-
Files in This Item
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Joe, Inwhee photo

Joe, Inwhee
COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)
Read more

Altmetrics

Total Views & Downloads

BROWSE