Skip to main content

SPL Volume 27 | 2020

Simple Yet Effective Way for Improving the Performance of Depth Map Super-Resolution

In depth map super-resolution (SR), a high-resolution color image plays an important role as guidance for preventing blurry depth boundaries. However, excessive/deficient use of the color image features often causes performance degradation such as texture-copying/edge-smoothing in flat/boundary areas. To alleviate these problems, this letter presents a simple yet effective method for enhancing the performance of the SR without requiring significant modifications to the original SR network. To this end, we present a self-selective concatenation (SSC), which is a substitute for the conventional feature concatenation. In the upsampling layers of the SR network, the SSC extracts spatial and channel attention from both color and depth features such that color features can be selectively used for depth SR.

Read more

A Multifamily GLRT for CFAR Detection of Signals in a Union of Subspaces

We consider the problem of detecting an unknown signal that lies in a union of subspaces (UoS) and that is observed in additive white Gaussian noise with unknown variance. The main contribution of this letter is the derivation of a detector that can accommodate a union made of nested subspaces. This detector includes the generalized likelihood ratio test (GLRT) as a special case when the subspace dimensions are all identical. It relies on the framework of multifamily likelihood ratio tests (MFLRT) and is shown by numerical examples to achieve better performance than existing detectors.

Read more

Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking

The study of label noise in sound event recognition has recently gained attention with the advent of larger and noisier datasets. This work addresses the problem of missing labels, one of the big weaknesses of large audio datasets, and one of the most conspicuous issues for AudioSet. We propose a simple and modelagnostic method based on a teacher-student framework with loss masking to first identify the most critical missing label candidates, and then ignore their contribution during the learning process.

Read more

A Novel Modified Mel-DCT Filter Bank Structure With Application to Voice Activity Detection

We propose a novel modified Mel-discrete cosine transform (MMD) filter bank structure, which restricts the overlap of each filter response to its immediate neighbor. In contrast to the well-known triangular filters employed in the extraction of the Mel-frequency cepstral coefficients (MFCC), the proposed filter structure has a smoother response and offers discrete cosine transformation and Mel-scale filtering in a single operation.

Read more

Learning With Learned Loss Function: Speech Enhancement With Quality-Net to Improve Perceptual Evaluation of Speech Quality

Utilizing a human-perception-related objective function to train a speech enhancement model has become a popular topic recently. The main reason is that the conventional mean squared error (MSE) loss cannot represent auditory perception well. One of the typical human-perception-related metrics, which is the perceptual evaluation of speech quality (PESQ), has been proven to provide a high correlation to the quality scores rated by humans.

Read more