Multi-modal person re-identification based on transformer relational 
regularization

Zheng, Xiangtian; Huang, Xiaohua; Ji, Chen; Yang, Xiaolin; Sha, Pengcheng; Cheng, Liang

doi:10.1016/j.inffus.2023.102128

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Multi-modal person re-identification based on transformer relational regularization

Authors

Zheng, Xiangtian
External Organizations;

Huang, Xiaohua
External Organizations;

Ji, Chen
External Organizations;

Yang, Xiaolin
External Organizations;

/persons/resource/pcsha

Sha, Pengcheng
1.4 Remote Sensing, 1.0 Geodesy, Departments, GFZ Publication Database, Deutsches GeoForschungsZentrum;

Cheng, Liang
External Organizations;

External Ressource

No external resources are shared

Fulltext (public)

There are no public fulltexts stored in GFZpublic

Supplementary Material (public)

There is no public supplementary material available

Citation

Zheng, X., Huang, X., Ji, C., Yang, X., Sha, P., Cheng, L. (2024): Multi-modal person re-identification based on transformer relational regularization. - Information Fusion, 103, 102128.
https://doi.org/10.1016/j.inffus.2023.102128

Cite as: https://gfzpublic.gfz-potsdam.de/pubman/item/item_5025137

Abstract

For robust multi-modal person re-identification (re-ID) models, it is crucial to effectively utilize the complementary information and constraint relationships among different modalities. However, current multi-modal methods often overlook the correlation between modalities at the feature fusion stage. To address this issue, we propose a novel multimodal person re-ID method called Transformer Relation Regularization (TRR). Firstly, we introduce an adaptive collaborative matching module that facilitates the exchange of useful information by mining feature correspondences between modalities. This module allows for the integration of complementary information, enhancing the re-ID performance. Secondly, we propose an enhanced embedded module that corrects general information using discriminative information within each modality. By leveraging this approach, we improve the model’s stability in challenging multi-modal environments. Lastly, we propose an adaptive triple loss to enhance sample utilization efficiency and mitigate the problem of inconsistent representation among multimodal samples. This loss function optimizes the model’s ability to distinguish between different individuals, leading to improved re-ID accuracy. Experimental results on several challenging visible-infrared person re-ID benchmark datasets demonstrate that our proposed TRR method achieves optimal performance. Additionally, extensive ablation studies validate the effective contribution of each component to the overall model. In summary, our proposed TRR method effectively leverages complementary information, addresses the correlation between modalities, and improves the re-ID performance in multi-modal scenarios. The results obtained from various benchmark datasets and the comprehensive analysis support the efficacy of our approach.