TY - JOUR
T1 - Video saliency detection via combining temporal difference and pixel gradient
AU - Lu, Xiangwei
AU - Jian, Muwei
AU - Wang, Rui
AU - Liu, Xiangyu
AU - Lin, Peiguang
AU - Yu, Hui
N1 - Funding Information:
This work was supported by National Natural Science Foundation of China (NSFC) (61976123, 61601427, 61876098); the Taishan Young Scholars Program of Shandong Province; and Key Development Program for Basic Research of Shandong Province (ZR2020ZD44).
Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/10/2
Y1 - 2023/10/2
N2 - Even though temporal information matters for the quality of video saliency detection, many problems still arise/emerge in present network frameworks, such as bad performance in time-space coherence and edge continuity. In order to solve these problems, this paper proposes a full convolutional neural network, which integrates temporal differential and pixel gradient to fine tune the edges of salient targets. Considering the features of neighboring frames are highly relevant because of their proximity in location, a co-attention mechanism is used to put pixel-wise weight on the saliency probability map after features extraction with multi-scale pooling so that attention can be paid on both the edge and central of images. And the changes of pixel gradients of original images are used to recursively improve the continuity of target edges and details of central areas. In addition, residual networks are utilized to integrate information between modules, ensuring stable connections between the backbone network and modules and propagation of pixel gradient changes. In addition, a self-adjustment strategy for loss functions is presented to solve the problem of overfitting in experiments. The method presented in the paper has been tested with three available public datasets and its effectiveness has been proved after comparing with 6 other typically stat-of-the-art methods.
AB - Even though temporal information matters for the quality of video saliency detection, many problems still arise/emerge in present network frameworks, such as bad performance in time-space coherence and edge continuity. In order to solve these problems, this paper proposes a full convolutional neural network, which integrates temporal differential and pixel gradient to fine tune the edges of salient targets. Considering the features of neighboring frames are highly relevant because of their proximity in location, a co-attention mechanism is used to put pixel-wise weight on the saliency probability map after features extraction with multi-scale pooling so that attention can be paid on both the edge and central of images. And the changes of pixel gradients of original images are used to recursively improve the continuity of target edges and details of central areas. In addition, residual networks are utilized to integrate information between modules, ensuring stable connections between the backbone network and modules and propagation of pixel gradient changes. In addition, a self-adjustment strategy for loss functions is presented to solve the problem of overfitting in experiments. The method presented in the paper has been tested with three available public datasets and its effectiveness has been proved after comparing with 6 other typically stat-of-the-art methods.
KW - Co-Attention
KW - Edge refinement
KW - Pixels gradient
KW - Temporal difference
KW - Video saliency detection
UR - http://www.scopus.com/inward/record.url?scp=85173125951&partnerID=8YFLogxK
U2 - 10.1007/s11042-023-17128-5
DO - 10.1007/s11042-023-17128-5
M3 - Article
AN - SCOPUS:85173125951
SN - 1380-7501
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
ER -