SPL Volume 27 | 2020

January, 2021

In depth map super-resolution (SR), a high-resolution color image plays an important role as guidance for preventing blurry depth boundaries. However, excessive/deficient use of the color image features often causes performance degradation such as texture-copying/edge-smoothing in flat/boundary areas. To alleviate these problems, this letter presents a simple yet effective method for enhancing the performance of the SR without requiring significant modifications to the original SR network. To this end, we present a self-selective concatenation (SSC), which is a substitute for the conventional feature concatenation. In the upsampling layers of the SR network, the SSC extracts spatial and channel attention from both color and depth features such that color features can be selectively used for depth SR.

We consider the problem of detecting an unknown signal that lies in a union of subspaces (UoS) and that is observed in additive white Gaussian noise with unknown variance. The main contribution of this letter is the derivation of a detector that can accommodate a union made of nested subspaces. This detector includes the generalized likelihood ratio test (GLRT) as a special case when the subspace dimensions are all identical. It relies on the framework of multifamily likelihood ratio tests (MFLRT) and is shown by numerical examples to achieve better performance than existing detectors.

The study of label noise in sound event recognition has recently gained attention with the advent of larger and noisier datasets. This work addresses the problem of missing labels, one of the big weaknesses of large audio datasets, and one of the most conspicuous issues for AudioSet. We propose a simple and modelagnostic method based on a teacher-student framework with loss masking to first identify the most critical missing label candidates, and then ignore their contribution during the learning process.

We propose a novel modified Mel-discrete cosine transform (MMD) filter bank structure, which restricts the overlap of each filter response to its immediate neighbor. In contrast to the well-known triangular filters employed in the extraction of the Mel-frequency cepstral coefficients (MFCC), the proposed filter structure has a smoother response and offers discrete cosine transformation and Mel-scale filtering in a single operation.

Automatic modulation classification facilitates many important signal processing applications. Recently, deep learning models have been adopted in modulation recognition, which outperform traditional machine learning techniques based on hand-crafted features. However, automatic modulation classification is still challenging due to the following reasons.

Previous research methods on wake-up word detection (WWD) have been proposed with focus on finding a decent word representation that can well express the characteristics of a word. However, there are various obstacles such as noise and reverberation which make it difficult in real-world environments where WWD works.

In this letter, we propose a new approach to tracking a target that maneuvers based on the multiple-constant-turns model. Usually, the interactive-multiple-model (IMM) algorithm based on the extended Kalman filter (IMM-EKF) is employed for this problem with successful tracking performance.

Developing Semi-Supervised Seq2Seq (S4) learning for sequence transduction tasks in natural language processing (NLP), e.g. semantic parsing, is challenging, since both the input and the output sequences are discrete. This discrete nature makes trouble for methods which need gradients either from the input space or from the output space.

Utilizing a human-perception-related objective function to train a speech enhancement model has become a popular topic recently. The main reason is that the conventional mean squared error (MSE) loss cannot represent auditory perception well. One of the typical human-perception-related metrics, which is the perceptual evaluation of speech quality (PESQ), has been proven to provide a high correlation to the quality scores rated by humans.

This correspondence proposes the use of a real-only equalizer (ROE), which acts on real signals derived from the received offset quadrature amplitude modulation (OQAM) symbols. For the same fading channel, we prove that both ROE and the widely linear equalizer (WLE) yield equivalent outputs.


