Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

An Innovative n-Shifted Sigmoid Channel and Spatial Attention Module for Efficient 3D Scene Object Detection

Version 1 : Received: 28 May 2024 / Approved: 29 May 2024 / Online: 29 May 2024 (13:50:59 CEST)

How to cite: Du, S.; Burume, D. M.; Liu, Q. An Innovative n-Shifted Sigmoid Channel and Spatial Attention Module for Efficient 3D Scene Object Detection. Preprints 2024, 2024051941. https://doi.org/10.20944/preprints202405.1941.v1 Du, S.; Burume, D. M.; Liu, Q. An Innovative n-Shifted Sigmoid Channel and Spatial Attention Module for Efficient 3D Scene Object Detection. Preprints 2024, 2024051941. https://doi.org/10.20944/preprints202405.1941.v1

Abstract

Recently, attention mechanisms have developed into an important tool for performance improvement of deep neural networks. In computer vision, attention mechanisms are generally divided into two main branches: spatial and channel attention. Both attention categories have their own advantages. The fusion of both attentions achieves higher performance, on the cost of the computational load. This paper introduces an innovative and lighter n-shifted sigmoid channel and spatial attention (CSA) module to reduce the computational cost and to improve the 3D scene relevant features selection. To validate the proposed attention module, 3D scene object detection in the deep Hough voting point sets is considered as the testing application. The proposed attention module with its piecewise n-shifted sigmoid activation function improves the network’s learning and generalization capacity which effectively predict bounding box parameters directly from 3D scenes and detect objects more accurately. This advantage is achieved by selectively attending to more relevant features of the input data. When used in the deep Hough voting point sets, the proposed attention module outperforms state-of-the-art 3D detection methods on the sizable SUNRGBD dataset. Experiments conducted showed an increase of 12.02 mean accuracy precision (mAP) when compared to the celebrated VoteNet (without attention). It also got 9.92 mAP higher compared to the MLVCNet, and 10.32 mAP higher than the Point Transformer. The proposed model not only decreases the sigmoid vanishing gradient problem but also brings out valuable features by fusing channel-wise and spatial information while improving accuracy results in 3D object detection.

Keywords

attention mechanism; Hough voting; point clouds; activation function

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.