Simple Data Transformations for Mitigating the Syntactic Similarity to Improve Sentence Embeddings at Supervised Contrastive Learning

Kim, Minji; Cho, Whanhee; Kim, Soohyeong; Choi, Yong Suk

doi:10.1002/aisy.202300717

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Simple Data Transformations for Mitigating the Syntactic Similarity to Improve Sentence Embeddings at Supervised Contrastive Learning

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Minji	-
dc.contributor.author	Cho, Whanhee	-
dc.contributor.author	Kim, Soohyeong	-
dc.contributor.author	Choi, Yong Suk	-
dc.date.accessioned	2026-04-07T05:30:20Z	-
dc.date.available	2026-04-07T05:30:20Z	-
dc.date.issued	2024-08	-
dc.identifier.issn	2640-4567	-
dc.identifier.issn	2640-4567	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/212082	-
dc.description.abstract	Contrastive learning of sentence representations has achieved great improvements in several natural language processing tasks. However, the supervised contrastive learning model trained on the natural language inference (NLI) dataset is insufficient to elucidate the semantics of sentences since it is prone to make a prediction based on heuristics. Herein, by using the ParsEVAL and the word overlap metric, it is shown that sentence pairs in the NLI dataset have strong syntactic similarity and propose a framework to compensate for this problem in two aspects. 1) Apply simple syntactic transformations to the hypothesis and 2) expand the objective to SupCon Loss to leverage variants of sentences. The method is evaluated on semantic textual similarity (STS) tasks and transfer tasks. The proposed methods improve the performance of the BERT-based baseline in STS Benchmark and SICK Relatedness by 1.48% and 2.2%. Furthermore, the model achieves 82.65% on the HANS benchmark dataset, to the best of our knowledge, which is a state-of-the-art performance demonstrating that our approach is effective in grasping semantics without heuristics in the NLI dataset at supervised contrastive learning. The code is available at https://github.com/whnhch/Break-the-Similarity.	-
dc.format.extent	10	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Wiley	-
dc.title	Simple Data Transformations for Mitigating the Syntactic Similarity to Improve Sentence Embeddings at Supervised Contrastive Learning	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1002/aisy.202300717	-
dc.identifier.scopusid	2-s2.0-85198393624	-
dc.identifier.wosid	001271034400001	-
dc.identifier.bibliographicCitation	Advanced Intelligent Systems, v.6, no.8, pp 1 - 10	-
dc.citation.title	Advanced Intelligent Systems	-
dc.citation.volume	6	-
dc.citation.number	8	-
dc.citation.startPage	1	-
dc.citation.endPage	10	-
dc.type.docType	Article; Early Access	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Automation & Control Systems	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Robotics	-
dc.relation.journalWebOfScienceCategory	Automation & Control Systems	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalWebOfScienceCategory	Robotics	-
dc.subject.keywordPlus	Benchmarking	-
dc.subject.keywordPlus	Embeddings	-
dc.subject.keywordPlus	Intelligent systems	-
dc.subject.keywordPlus	Metadata	-
dc.subject.keywordPlus	Natural language processing systems	-
dc.subject.keywordPlus	Syntactics	-
dc.subject.keywordAuthor	contrastive learning	-
dc.subject.keywordAuthor	sentence embedding	-
dc.subject.keywordAuthor	syntactic transformation	-
dc.identifier.url	https://onlinelibrary.wiley.com/doi/10.1002/aisy.202300717	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Choi, Yong Suk photo

Choi, Yong Suk: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE