Cited 0 time in
DFT-based transformation invariant pooling layer for visual classification
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Ryu, Jongbin | - |
| dc.contributor.author | Yang, Ming-Hsuan | - |
| dc.contributor.author | Lim, Jongwoo | - |
| dc.date.accessioned | 2022-07-11T09:28:33Z | - |
| dc.date.available | 2022-07-11T09:28:33Z | - |
| dc.date.issued | 2018-10 | - |
| dc.identifier.issn | 0302-9743 | - |
| dc.identifier.issn | 1611-3349 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/149307 | - |
| dc.description.abstract | We propose a novel discrete Fourier transform-based pooling layer for convolutional neural networks. The DFT magnitude pooling replaces the traditional max/average pooling layer between the convolution and fully-connected layers to retain translation invariance and shape preserving (aware of shape difference) properties based on the shift theorem of the Fourier transform. Thanks to the ability to handle image misalignment while keeping important structural information in the pooling stage, the DFT magnitude pooling improves the classification accuracy significantly. In addition, we propose the DFT+ method for ensemble networks using the middle convolution layer outputs. The proposed methods are extensively evaluated on various classification tasks using the ImageNet, CUB 2010-2011, MIT Indoors, Caltech 101, FMD and DTD datasets. The AlexNet, VGG-VD 16, Inception-v3, and ResNet are used as the base networks, upon which DFT and DFT+ methods are implemented. Experimental results show that the proposed methods improve the classification performance in all networks and datasets. | - |
| dc.format.extent | 16 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | Springer Verlag | - |
| dc.title | DFT-based transformation invariant pooling layer for visual classification | - |
| dc.type | Article | - |
| dc.publisher.location | 미국 | - |
| dc.identifier.doi | 10.1007/978-3-030-01264-9_6 | - |
| dc.identifier.scopusid | 2-s2.0-85055696119 | - |
| dc.identifier.bibliographicCitation | Lecture Notes in Computer Science, v.11218 LNCS, pp 89 - 104 | - |
| dc.citation.title | Lecture Notes in Computer Science | - |
| dc.citation.volume | 11218 LNCS | - |
| dc.citation.startPage | 89 | - |
| dc.citation.endPage | 104 | - |
| dc.type.docType | Conference Paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.subject.keywordPlus | Computer vision | - |
| dc.subject.keywordPlus | Convolution | - |
| dc.subject.keywordPlus | Discrete Fourier transforms | - |
| dc.subject.keywordPlus | Image enhancement | - |
| dc.subject.keywordPlus | Neural networks | - |
| dc.subject.keywordPlus | Classification accuracy | - |
| dc.subject.keywordPlus | Classification performance | - |
| dc.subject.keywordPlus | Convolutional neural network | - |
| dc.subject.keywordPlus | Fully-connected layers | - |
| dc.subject.keywordPlus | Structural information | - |
| dc.subject.keywordPlus | Transformation invariants | - |
| dc.subject.keywordPlus | Translation invariance | - |
| dc.subject.keywordPlus | Visual classification | - |
| dc.subject.keywordPlus | Classification (of information) | - |
| dc.identifier.url | https://link.springer.com/chapter/10.1007/978-3-030-01264-9_6 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
