D-former: a U-shaped Dilated Transformer for 3D medical image segmentation

Wu, Yixuan; Liao, Kuanlun; Chen, Jintai; Wang, Jinhong; Chen, Danny Z.; Gao, Honghao; Wu, Jian

Detailed Information

Cited 8 time in webofscience

Cited 12 time in scopus

Metadata Downloads

D-former: a U-shaped Dilated Transformer for 3D medical image segmentation

Authors: Wu, Yixuan; Liao, Kuanlun; Chen, Jintai; Wang, Jinhong; Chen, Danny Z.; Gao, Honghao; Wu, Jian

Issue Date: Jan-2023

Publisher: SPRINGER LONDON LTD

Keywords: Medical image analysis; Segmentation; Transformer; Long-range dependency; Position encoding

Citation: NEURAL COMPUTING & APPLICATIONS, v.35, no.2, pp.1931 - 1944

Journal Title: NEURAL COMPUTING & APPLICATIONS

Volume: 35

Number: 2

Start Page: 1931

End Page: 1944

URI: https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/86799

DOI: 10.1007/s00521-022-07859-1

ISSN: 0941-0643

Abstract: Computer-aided medical image segmentation has been applied widely in diagnosis and treatment to obtain clinically useful information of shapes and volumes of target organs and tissues. In the past several years, convolutional neural network (CNN)-based methods (e.g., U-Net) have dominated this area, but still suffered from inadequate long-range information capturing. Hence, recent work presented computer vision Transformer variants for medical image segmentation tasks and obtained promising performances. Such Transformers modeled long-range dependency by computing pair-wise patch relations. However, they incurred prohibitive computational costs, especially on 3D medical images (e.g., CT and MRI). In this paper, we propose a new method called Dilated Transformer, which conducts self-attention alternately in local and global scopes for pair-wise patch relations capturing. Inspired by dilated convolution kernels, we conduct the global self-attention in a dilated manner, enlarging receptive fields without increasing the patches involved and thus reducing computational costs. Based on this design of Dilated Transformer, we construct a U-shaped encoder-decoder hierarchical architecture called D-Former for 3D medical image segmentation. Experiments on the Synapse and ACDC datasets show that our D-Former model, trained from scratch, outperforms various competitive CNN-based or Transformer-based segmentation models at a low computational cost without time-consuming per-training process.

Files in This Item: There are no files associated with this item.

Appears in Collections: ETC > 1. Journal Articles

Show full item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :4,165,376; Today View :26,667

RSS_1.0 RSS_2.0 ATOM_1.0

1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE