TMM Featured Articles

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

TMM Featured Articles

The video captioning task aims to describe video content using several natural-language sentences. Although one-step encoder-decoder models have achieved promising progress, the generations always involve many errors, which are mainly caused by the large semantic gap between the visual domain and the language domain and by the difficulty in long-sequence generation.

The prevailing use of both images and text to express opinions on the web leads to the need for multimodal sentiment recognition. Some commonly used social media data containing short text and few images, such as tweets and product reviews, have been well studied. However, it is still challenging to predict the readers’ sentiment after reading online news articles, since news articles often have more complicated structures, e.g., longer text and more images.

Recently, dense video captioning has made attractive progress in detecting and captioning all events in a long untrimmed video. Despite promising results were achieved, most existing methods do not sufficiently explore the scene evolution within an event temporal proposal for captioning, and therefore perform less satisfactorily when the scenes and objects change over a relatively long proposal. To address this problem, we propose a graph-based partition-and-summarization (GPaS) framework for dense video captioning within two stages.

Benefiting from the powerful discriminative feature learning capability of convolutional neural networks (CNNs), deep learning techniques have achieved remarkable performance improvement for the task of salient object detection (SOD) in recent years.

While current research on multimedia is essentially dealing with the information derived from our observations of the world, internal activities inside human brains, such as imaginations and memories of past events etc., could become a brand new concept of multimedia, for which we coin as “brain-media”.

JPEG lossy image compression is a still image compression algorithm model that is currently widely used in major network media. However, it is unsatisfactory in the quality of compressed images at low bit rates. The objective of this paper is to improve the quality of compressed images and suppress blocking artifacts by improving the JPEG image compression model at low bit rates.

We have recently seen great progress in image classification due to the success of deep convolutional neural networks and the availability of large-scale datasets. Most of the existing work focuses on single-label image classification. However, there are usually multiple tags associated with an image. The existing works on multi-label classification are mainly based on lab curated labels.

Generating images via a generative adversarial network (GAN) has attracted much attention recently. However, most of the existing GAN-based methods can only produce low-resolution images of limited quality. Directly generating high-resolution images using GANs is nontrivial, and often produces problematic images with incomplete objects.

The scalable video coding extensions of the High Efficient Video Coding (HEVC) standard (SHVC) have adopted a new quadtree-structured coding unit (CU). The SHVC test model (SHM) needs to test seven intermode sizes and one intramode size at depth levels of “0,” “1,” “2,” and four intermode sizes and two intramode sizes at a depth level of “3” for interframe CUs.

Using deep convolutional neural networks (CNN) to predict the depth from a single image has received considerable attention in recent years due to its impressive performance. However, existing methods process each single image independently without leveraging the multiview information of video sequences in practical scenarios.

Pages

SPS on Twitter

  • We're proud to sponsor a new journal, IEEE Transactions on Quantum Engineering, publishing regular, review, and tut… https://t.co/cZskrh9cvX
  • We are now seeking mentors and students for the launch of a new initiative, Mentoring Experiences for Underrepresen… https://t.co/i9SarNyKm9
  • This Wednesday, 13 October, join the Women in Signal Processing Committee for an IEEE Day webinar, "Promoting Diver… https://t.co/HrtVGqpwFx
  • New SPS Webinar! On Friday, 29 October, join Dr. Jérôme Gilles for "Empirical Wavelets," based on his original arti… https://t.co/eftMlvByhm
  • Happy ! The IEEE Signal Processing Society is celebrating with 50% off select membership packages for Profe… https://t.co/PmjHDaUM7S

SPS Videos


Signal Processing in Home Assistants

 


Multimedia Forensics


Careers in Signal Processing             

 


Under the Radar