Skip to main content

JSTSP Volume 13 Issue 2

Polyphonic Sound Event Detection by Using Capsule Neural Networks

Given the recent surge in developments of deep learning, this paper provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross fertilization between areas.

Read more

Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

We address voice activity detection in acoustic environments of transients and stationary noises, which often occur in real-life scenarios. We exploit unique spatial patterns of speech and non-speech audio frames by independently learning their underlying geometric structure. This process is done through a deep encoder-decoder-based neural network architecture.

Read more

Deep Learning for Audio Signal Processing

Given the recent surge in developments of deep learning, this paper provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross fertilization between areas.

Read more