IEEE Transactions on Multimedia

Multi-Task Learning for Acoustic Event Detection Using Event and Frame Position Information

Acoustic event detection deals with the acoustic signals to determine the sound type and to estimate the audio event boundaries. Multi-label classification based approaches are commonly used to detect the frame wise event types with a median filter applied to determine the happening acoustic events. However, the multi-label classifiers are trained only on the acoustic event types ignoring the frame position within the audio events.

Radiance–Reflectance Combined Optimization and Structure-Guided ℓ0-Norm for Single Image Dehazing

TMM Volume 22 Issue 1

Outdoor images are subject to degradation regarding contrast and color because atmospheric particles scatter incoming light to a camera. Existing haze models that employ model-based dehazing methods cannot avoid the dehazing artifacts. These artifacts include color distortion and overenhancement around object boundaries because of the incorrect transmission estimation from a depth error in the skyline and the wrong haze information, especially in bright objects.

Image Vectorization With Real-Time Thin-Plate Spline

TMM Volume 22 Issue 1

The vector graphics with gradient mesh can be attributed to their compactness and scalability; however, they tend to fall short when it comes to real-time editing due to a lack of real-time rasterization and an efficient editing tool for image details. In this paper, we encode global manipulation geometries and local image details within a hybrid vector structure, using parametric patches and detailed features for localized and parallelized thin-plate spline interpolation in order to achieve good compressibility, interactive expressibility, and editability.

Generative Model Driven Representation Learning in a Hybrid Framework for Environmental Audio Scene and Sound Event Recognition

TMM Volume 22 Issue 1

The analysis of sound information is helpful for audio surveillance, multimedia information retrieval, audio tagging, and forensic applications. Environmental audio scene recognition (EASR) and sound event recognition (SER) for audio surveillance are challenging tasks due to the presence of multiple sound sources, background noises, and the existence of overlapping or polyphonic contexts.

Deep Dual-Channel Neural Network for Image-Based Smoke Detection

TMM Volume 22 Issue 2

Smoke detection plays an important role in industrial safety warning systems and fire prevention. Due to the complicated changes in the shape, texture, and color of smoke, identifying the smoke from a given image still remains a substantial challenge, and this has accordingly aroused a considerable amount of research attention recently.

Style-Controlled Synthesis of Clothing Segments for Fashion Image Manipulation

TMM Volume 22 Issue 2

We propose an approach for digitally altering people's outfits in images. Given images of a person and a desired clothing style, our method generates a new clothing item image. The new item displays the color and pattern of the desired style while geometrically mimicking the person's original item. Through superimposition, the altered image is made to look as if the person is wearing the new item.

Multi-Focus Image Fusion by Hessian Matrix Based Decomposition

TMM Volume 22 Issue 2

In this paper, a Hessian matrix based multi-focus image fusion method is proposed. First, the integral map is introduced for fast compute the Hessian matrix of source images at different scales, and the multi-scale Hessian matrix of source image is obtained. Second, the multi-scale Hessian matrix is used to decompose each source image into two kinds of regions: the feature and background regions.

A Multi-Grained Parallel Solution for HEVC Encoding on Heterogeneous Platforms

TMM Volume 21 Issue 12

To improve the parallel processing capability of video coding, the emerging high efficiency video coding (HEVC) standard introduces two parallel techniques, i.e., Wavefront Parallel Processing (WPP) and Tiles , to make it much more parallel-friendly than its predecessors. However, these two techniques are designed to explore coarse-grained parallelism in HEVC encoding on multicore Central Processing Unit (CPU) platforms.

Learning Semantic Text Features for Web Text-Aided Image Classification

TMM Volume 21 Issue 12

The good generalization performance of conventional pattern classifiers often relies on the size of training data labeled by costly human labor. These days, publicly available web resources grow explosively, and this allows us to easily obtain abundant and cheap web data. Yet, web data are usually not as cooperative as human labeled data. In this paper, we explore the use of web text data to aid image classification.

Joint Texture/Depth Power Allocation for 3-D Video SoftCast

TMM Volume 21 Issue 12

Recently, a novel uncoded (pseudoanalog) scheme called SoftCast is proposed for wireless video transmission, which eliminates the cliff effect of the state-of-the-art source-channel coding based schemes and achieves linear quality transition within a wide range of channel signal-to-noise ratio. Therefore, SoftCast-like uncoded and hybrid transmission has become an attractive research issue for natural 2-D video. However, very few studies focus on the SoftCast-based wireless transmission of the 3-D video (3DV) currently.

congratulations.jpg

Congratulations to Signal Processing Society Members Elevated to Senior Members!

MLSP-2027.jpg

2027 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2027)

ISPA-2025.jpg

2025 14th International Symposium on Image and Signal Processing and Analysis (ISPA)

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

IEEE Transactions on Multimedia

Top Reasons to Join SPS Today!

Multi-Task Learning for Acoustic Event Detection Using Event and Frame Position Information

Radiance–Reflectance Combined Optimization and Structure-Guided ℓ0-Norm for Single Image Dehazing

Image Vectorization With Real-Time Thin-Plate Spline

Generative Model Driven Representation Learning in a Hybrid Framework for Environmental Audio Scene and Sound Event Recognition

Deep Dual-Channel Neural Network for Image-Based Smoke Detection

Style-Controlled Synthesis of Clothing Segments for Fashion Image Manipulation

Multi-Focus Image Fusion by Hessian Matrix Based Decomposition

A Multi-Grained Parallel Solution for HEVC Encoding on Heterogeneous Platforms

Learning Semantic Text Features for Web Text-Aided Image Classification

Joint Texture/Depth Power Allocation for 3-D Video SoftCast

Pages

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

IEEE Transactions on Multimedia

Search form

You are here

Top Reasons to Join SPS Today!

Pages

SPS Social Media

IEEE SPS Educational Resources