PANNs: Large-scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Audio pattern recognition is an important research topic in the machine learning area, and includes several tasks such as audio tagging, acoustic scene classification, music classification, speech emotion classification and sound event detection. In this blog, we introduce pretrained audio neural networks (PANNs) trained on the large-scale AudioSet dataset. These PANNs are transferred to other audio related tasks. We investigate the performance and computational complexity of PANNs modeled by a variety of convolutional neural networks. We propose an architecture called Wavegram-Logmel-CNN using both log-mel spectrogram and waveform as input feature.
Frontal-Centers Guided Face: Boosting Face Recognition by Learning Pose-Invariant Features
Recent years, face recognition has made a remarkable breakthrough due to the emergence of deep learning. However, compared with frontal face recognition, many deep face recognition models still suffer serious performance degradation when handling profile faces. To address this issue, we propose a novel Frontal-Centers Guided Loss (FCGFace) to obtain highly discriminative features for face recognition. Most existing discriminative feature learning approaches project features from the same class into a separated latent subspace.
Recent Advances of Deep Learning within X-ray Security Imaging
This blog explores modern deep learning applications as well as traditional machine learning techniques for automated X-ray security imaging.
Reconfigurable Intelligent Surfaces Aided Robust Systems
A framework of robust transmission design for reconfigurable intelligent surfaces (RIS) aided systems has been proposed to address the imperfect cascaded channel state information issue.
Advancing Technological Equity in Speech and Language Processing: Aspects, Challenges, Successes, and Future Actions
Recent years have seen great strides being made in R&D of speech and language technologies. As these technologies continue to permeate our daily lives, they need to support diverse users and usage contexts, especially those with inputs that deviate from the mainstream.
Model-Driven Deep Learning for MIMO Detection
In this blog, we investigate the model-driven deep learning for multiple input-multiple output (MIMO) detection. In particular, the MIMO detector is specially designed by unfolding an iterative algorithm and adding some trainable parameters.
Estimation in Multi-Object State Space Model
A brief introduction to state estimation in multi-object system that arises from applications where the number of objects and their states are unknown and vary randomly with time. State space model (SSM) is a fundamental concept in system theory that permeated through many fields of study.
Empirical Wavelets
We design a data-driven wavelet transform, called the empirical wavelet transform, which permits to extract very accurate time-frequency information from signals, or features from images.
Facial Expression Analysis with Attention Mechanism
We develop algorithms to analyzing facial expression by learning from the data. Since local characters of muscle movements play an important role in distinguishing facial expression by machines, we explore the local characters of facial expressions by introducing the attention mechanism in both supervised and self-supervised supervised manners. Our methods is experimentally shown to be effective on facial expression recognition with occlusions and facial action unit detection.
When Quantum Signal Processing and Communications Meet
Quantum search algorithms are capable of efficiently solving large-scale quantum computing and signal processing problems, but their operation is contaminated by the decoherence of quantum circuits. This may be mitigated by quantum codes. Secure QKD is already a commercial reality in 2021.

