IEEE TMM Article

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

IEEE TMM Article

The IEEE Transactions on Multimedia is now accepting nominations for the 2024 Multimedia Prize Paper Award. Nominations are due no later than 15 April 2024.

The IEEE Transactions on Multimedia is now accepting nominations for the 2024 Multimedia Prize Paper Award.  Nominations are due no later than 31 March 2024.

The IEEE Transactions on Multimedia is now accepting nominations for the 2024 Multimedia Prize Paper Award.  Nominations are due no later than 31 March 2024.

Instance-level human parsing is aimed at separately partitioning the human body into different semantic parts for each individual, which remains a challenging task due to human appearance/pose variation, occlusion and complex backgrounds. Most state-of-the-art methods follow the “parsing-by-detection” paradigm, which relies on a trained detector to localize persons and then sequentially performs single-person parsing for each person. However, this paradigm is closely related to the detector, and the runtime is proportional to the number of persons in an image.

Pedestrian attribute recognition (PAR) aims to generate a structured description of pedestrians and plays an important role in surveillance. Current work focusing on 2D images can achieve decent performance when there is no variation in the captured pedestrian orientation. However, the performance of these works cannot be maintained in scenarios when the orientation of pedestrians is ignored. 

Zero-shot learning (ZSL) has received extensive attention recently especially in areas of fine-grained object recognition, retrieval, and image captioning. Due to the complete lack of training samples and high requirement of defense transferability, the ZSL model learned is particularly vulnerable against adversarial attacks. Recent work also showed adversarially robust generalization requires more data.

Domain generalization aims to reduce the vulnerability of deep neural networks in the out-of-domain distribution scenario. With the recent and increasing data privacy concerns, federated domain generalization, where multiple domains are distributed on different local clients, has become an important research problem and brings new challenges for learning domain-invariant information from separated domains. 

Image-text matching, as a fundamental cross-modal task, bridges the gap between vision and language. The core is to accurately learn semantic alignment to find relevant shared semantics in image and text. Existing methods typically attend to all fragments with word-region similarity greater than empirical threshold zero as relevant shared semantics, e.g. , via a ReLU operation that forces the negative to zero and maintains the positive.

Recent advances in unsupervised domain adaptation (UDA) techniques have witnessed great success in cross-domain computer vision tasks, enhancing the generalization ability of data-driven deep learning architectures by bridging the domain distribution gaps.

Despite the development of computer vision techniques, the micro-expression (ME) recognition task still remains a great challenge because MEs have very low intensity and short duration. However, the ME recognition is of great significance since it provides important clues for real affective states detection. This paper proposes a novel Block Division Convolutional Network (BDCNN) with the implicit deep features augmentation. 

Pages

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel