PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Pretrained audio neural networks (PANNs) are trained on 5800 hours of AudioSet data that can be used to recognize hundreds of sound types in the natural world.
Deep Learning for All-in-Focus Imaging
Focus stacking is an effective approach to extending the depth of field of a camera, yet is challenging with regard to 1) controlling focal planes in forming a stack and 2) fusing the focal stack into composites free from defocusing, i.e., all-in-focus. We propose a deep learning all-in-focus imaging pipeline as a novel solution for focus stacking.
Underwater Image Enhancement via a Fast yet Effective Traditional Method
Addressing underwater image challenges, our method MLLE enhances color, contrast, and details efficiently. Outperforming competitors, it processes 1024×1024×3 images in under 1s on a single CPU. Experiments show improved underwater image segmentation, keypoint detection, and saliency detection.
An Echo in Time: Tracing the Evolution of Beamforming Algorithms
Beamforming is a widely used signal processing technique to steer, shape, and focus an electromagnetic wave using an array of sensors toward a desired direction.
Deep CNN-Based Channel Estimation Using 3D Channel Correlation
Millimeter wave (mmWave) communications provide a promising solution to meet the proliferating demand for high data rate because of large bandwidth. The current “boomingly” deployed fifth generation communication system (5G) has not actually touched the dominant frequency band of mmWave and thus can hardly enjoy its merit on dramatically boosting transmission rate, which motivates us to conduct research on the ultimate implementation of mmWave communications.
Coarse-to-Fine CNN for Image Super-Resolution
A coarse-to-fine SR CNN (CFSRCNN) consisting of a stack of feature extraction blocks (FEBs), an enhancement block (EB), a construction block (CB) and, a feature refinement block (FRB) is proposed to learn a robust SR model.
Collaborative Cloud and Edge Mobile Computing in C-RAN Systems
To handle the various types of tasks in the upcoming cellular services, we can design the system with both cloud and edge computing capabilities, where the computational tasks can be partially offloaded to the ENs and the CP.
Revitalizing Underwater Image Enhancement in the Deep Learning Era
Underwater image enhancement has drawn considerable attention in both image processing and underwater vision. Due to the complicated underwater environment and lighting conditions, enhancing underwater image is a challenging problem.
How can we make cameras smarter to better analyze humans?
This blog describes 4 computer vision algorithms for better human analysis, that understand human hand, gesture, pose, and action from various input modalities.
Deep-learning-based audio-visual speech enhancement
We all experienced the discomfort of communicating with our friends at a cocktail party or in a pub with loud background music. When difficult acoustic scenarios like these occur, we tend to rely on several visual cues, such as lips and mouth movement of the speaker, in order to understand the speech of interest.

