Skip to main content

Fast Retinomorphic Event-Driven Representations for Video Gameplay and Action Recognition

Good temporal representations are crucial for video understanding, and the state-of-the-art video recognition framework is based on two-stream networks. In such framework, besides the regular ConvNets responsible for RGB frame inputs, a second network is introduced to handle the temporal representation, usually the optical flow (OF). 

Tensor Representation for Three-Dimensional Radar Target Imaging With Sparsely Sampled Data

Three-dimensional (3-D) radar imaging can provide additional information along elevation dimension about the target with respect to the conventional 2-D radar imaging, but usually requires a huge amount of data collected over 3-D frequency-azimuth-elevation space, which motivates us to perform 3-D imaging by using sparsely sampled data. Traditional compressive sensing (CS) based 3-D imaging methods with sparse data convert the 3-D data into a long vector, and then complete the sensing and recovery steps.

Video Coding System Architect*

WE HAVE AN IMMEDIATE OPENING IN OUR »AUDIO AND MEDIA TECHNOLOGIES« DIVISION OF FRAUNHOFER IIS IN ERLANGEN, GERMANY, FOR A   VIDEO CODING SYSTEM ARCHITECT*

You have a vision about future directions in 2D video coding? You like to design technologies that are ready to conquer the world?

Then join our institute and contribute to the growth of our video activities

Video Coding Application Engineer & Product Manager

WE HAVE AN IMMEDIATE OPENING IN OUR »AUDIO AND MEDIA TECHNOLOGIES« DIVISION OF FRAUNHOFER IIS IN ERLANGEN, GERMANY, FOR A

VIDEO CODING APPLICATION ENGINEER & PRODUCT MANAGER*

Your passion is to bring innovative technologies into the market? You like communicating to potential customers, identifying their needs and proposing them suitable solutions?

Then join our institute and contribute to the growth of our video activities

Reflection Assisted Sound Source Localization Through a Harmonic Domain MUSIC Framework

This work presents a method that persuades acoustic reflections to be a favorable property for sound source localization. Whilst most real world spatial audio applications utilize prior knowledge of sound source position, estimating such positions in reverberant environments is still considered to be a difficult problem due to acoustic reflections.

Design of Planar Differential Microphone Arrays With Fractional Orders

Differential microphone arrays (DMAs) often encounter white noise amplification, especially at low frequencies. If the array geometry and the number of microphones are fixed, one can improve the white noise amplification problem by reducing the DMA order. With the existing differential beamforming methods, the DMA order can only be a positive integer number. 

A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis

Recurrent neural networks (RNNs) can predict fundamental frequency (F 0 ) for statistical parametric speech synthesis systems, given linguistic features as input. However, these models assume conditional independence between consecutive F 0 values, given the RNN state. In a previous study, we proposed autoregressive (AR) neural F 0 models to capture the causal dependency of successive F 0 values.

Binaural Direct-to-Reverberant Energy Ratio and Speaker Distance Estimation

This article addresses the problem of distance estimation using binaural hearing aid microphones in reverberant rooms. Among several distance indicators, the direct-to-reverberant energy ratio (DRR) has been shown to be more effective than other features. Therefore, we present two novel approaches to estimate the DRR of binaural signals.