Advanced search
1 file | 7.73 MB Add to list

Fractional fourier image transformer for multimodal remote sensing data classification

Author
Organization
Abstract
With the recent development of the joint classification of hyperspectral image (HSI) and light detection and ranging (LiDAR) data, deep learning methods have achieved promising performance owing to their locally sematic feature extracting ability. Nonetheless, the limited receptive field restricted the convolutional neural networks (CNNs) to represent global contextual and sequential attributes, while visual image transformers (VITs) lose local semantic information. Focusing on these issues, we propose a fractional Fourier image transformer (FrIT) as a backbone network to extract both global and local contexts effectively. In the proposed FrIT framework, HSI and LiDAR data are first fused at the pixel level, and both multisource feature and HSI feature extractors are utilized to capture local contexts. Then, a plug-and-play image transformer FrIT is explored for global contextual and sequential feature extraction. Unlike the attention-based representations in classic VIT, FrIT is capable of speeding up the transformer architectures massively and learning valuable contextual information effectively and efficiently. More significantly, to reduce redundancy and loss of information from shallow to deep layers, FrIT is devised to connect contextual features in multiple fractional domains. Five HSI and LiDAR scenes including one newly labeled benchmark are utilized for extensive experiments, showing improvement over both CNNs and VITs.
Keywords
Artificial Intelligence, Computer Networks and Communications, Computer Science Applications, Software, Feature extraction, Transformers, Laser radar, Data mining, Discrete Fourier transforms, Visualization, Semantics, Fractional Fourier image transformer (FrIT), hyperspectral image (HSI), light detection and ranging (LiDAR), multimodal data, FEATURE-EXTRACTION, LIDAR, FUSION, MULTISOURCE

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 7.73 MB

Citation

Please use this url to cite or link to this publication:

MLA
Zhao, Xudong, et al. “Fractional Fourier Image Transformer for Multimodal Remote Sensing Data Classification.” IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 35, no. 2, 2024, pp. 2314–26, doi:10.1109/tnnls.2022.3189994.
APA
Zhao, X., Zhang, M., Tao, R., Li, W., Liao, W., Tian, L., & Philips, W. (2024). Fractional fourier image transformer for multimodal remote sensing data classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 35(2), 2314–2326. https://doi.org/10.1109/tnnls.2022.3189994
Chicago author-date
Zhao, Xudong, Mengmeng Zhang, Ran Tao, Wei Li, Wenzhi Liao, Lianfang Tian, and Wilfried Philips. 2024. “Fractional Fourier Image Transformer for Multimodal Remote Sensing Data Classification.” IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 35 (2): 2314–26. https://doi.org/10.1109/tnnls.2022.3189994.
Chicago author-date (all authors)
Zhao, Xudong, Mengmeng Zhang, Ran Tao, Wei Li, Wenzhi Liao, Lianfang Tian, and Wilfried Philips. 2024. “Fractional Fourier Image Transformer for Multimodal Remote Sensing Data Classification.” IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 35 (2): 2314–2326. doi:10.1109/tnnls.2022.3189994.
Vancouver
1.
Zhao X, Zhang M, Tao R, Li W, Liao W, Tian L, et al. Fractional fourier image transformer for multimodal remote sensing data classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS. 2024;35(2):2314–26.
IEEE
[1]
X. Zhao et al., “Fractional fourier image transformer for multimodal remote sensing data classification,” IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 35, no. 2, pp. 2314–2326, 2024.
@article{8767155,
  abstract     = {{With the recent development of the joint classification of hyperspectral image (HSI) and light detection and ranging (LiDAR) data, deep learning methods have achieved promising performance owing to their locally sematic feature extracting ability. Nonetheless, the limited receptive field restricted the convolutional neural networks (CNNs) to represent global contextual and sequential attributes, while visual image transformers (VITs) lose local semantic information. Focusing on these issues, we propose a fractional Fourier image transformer (FrIT) as a backbone network to extract both global and local contexts effectively. In the proposed FrIT framework, HSI and LiDAR data are first fused at the pixel level, and both multisource feature and HSI feature extractors are utilized to capture local contexts. Then, a plug-and-play image transformer FrIT is explored for global contextual and sequential feature extraction. Unlike the attention-based representations in classic VIT, FrIT is capable of speeding up the transformer architectures massively and learning valuable contextual information effectively and efficiently. More significantly, to reduce redundancy and loss of information from shallow to deep layers, FrIT is devised to connect contextual features in multiple fractional domains. Five HSI and LiDAR scenes including one newly labeled benchmark are utilized for extensive experiments, showing improvement over both CNNs and VITs.}},
  author       = {{Zhao, Xudong and Zhang, Mengmeng and Tao, Ran and Li, Wei and Liao, Wenzhi and Tian, Lianfang and Philips, Wilfried}},
  issn         = {{2162-237X}},
  journal      = {{IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS}},
  keywords     = {{Artificial Intelligence,Computer Networks and Communications,Computer Science Applications,Software,Feature extraction,Transformers,Laser radar,Data mining,Discrete Fourier transforms,Visualization,Semantics,Fractional Fourier image transformer (FrIT),hyperspectral image (HSI),light detection and ranging (LiDAR),multimodal data,FEATURE-EXTRACTION,LIDAR,FUSION,MULTISOURCE}},
  language     = {{eng}},
  number       = {{2}},
  pages        = {{2314--2326}},
  title        = {{Fractional fourier image transformer for multimodal remote sensing data classification}},
  url          = {{http://doi.org/10.1109/tnnls.2022.3189994}},
  volume       = {{35}},
  year         = {{2024}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: