IEEE Transactions on Multimedia

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Screen content coding (SCC) is the extension to high-efficiency video coding (HEVC) for compressing screen content videos. New coding tools, intrablock copy (IBC), and palette (PLT) modes, are introduced to encode screen content (SC) such as texts and graphics. The IBC mode is used for encoding repeating patterns by performing block matching within the same frame, while the PLT mode is designed for SC with few distinct colors by coding the major colors and their corresponding locations using an index map.

In past years, various encrypted algorithms have been proposed to fully or partially protect the multimedia content in view of practical applications. In the context of digital TV broadcasting, transparent encryption only protects partial content and fulfills both security and quality requirements. 

This paper presents a new method of secret three-dimensional object sharing (S3DOS), which allows sharing of three-dimensional (3-D) objects, while preserving its file format by selectively encrypting a 3-D object in order to sufficiently protect the visual nature of the content. 

This paper addresses the problem of encoding the video generated by the screen of an airplane cockpit. As other computer screens, cockpit screens consist of computer-generated graphics often atop a natural background. Existing screen content coding schemes fail notably in preserving the readability of textual information at the low bitrates required in avionic applications. 

In this paper, we propose a coding tree unit (CTU)-level rate control scheme from the perspective of SSIM-based rate-distortion optimization to improve the coding efficiency. First, we establish the SSIM-based rate-distortion model based on the divisive normalization scheme, which characterizes the relationship between the local visual quality and the coding bits.

With the rapid popularization of mobile intelligent terminals, mobile video and cloud services applications are widely used in people's lives. However, the resource-constrained characteristic of the terminals and the enormous amount of video information make the efficient terminal-to-cloud data upload a challenge.

Multimedia streams consume a significant chunk of the consumer Internet traffic exchanged and will continue to do so due to the ever-increasing connection among people, businesses, and industries. To cope with the deviation of the Internet's intended use, unreliable underlying infrastructure, and best effort protocols while leveraging existing technologies...

The saliency detection technologies are very useful to analyze and extract important information from given multimedia data, and have already been extensively used in many multimedia applications. Past studies have revealed that utilizing the global cues is effective in saliency detection. Nevertheless, most of prior works mainly considered the single-scale segmentation when the global cues are employed. In this paper, we attempt to incorporate the multi-scale global cues for saliency detection problem. 

With the development of video coding technology, high-efficiency video coding (HEVC) has become a promising alternative, compared with the previous coding standards, for example, H.264. In general, H.264 to HEVC transcoding can be accomplished by fully H.264 decoding and fully HEVC encoding, which suffers from considerable time consumption on the brute-force search of the HEVC coding tree unit (CTU) partition for rate-distortion optimization (RDO).

Predicting articulatory movements from audio or text has diverse applications, such as speech visualization. Various approaches have been proposed to solve the acoustic-articulatory mapping problem. However, their precision is not high enough with only acoustic features available. Recently, deep neural network (DNN) has brought tremendous success in various fields, like speech recognition and image processing.

Pages

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel