1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
Remote Photoplethysmography (rPPG) has been attracting increasing attention due to its potential in a wide range of application scenarios such as physical training, clinical monitoring, and face anti-spoofing. On top of conventional solutions, deep-learning approach starts to dominate in rPPG estimation and achieves top-level performance.
Cross-component prediction is an important intra-prediction tool in the modern video coders. Existing prediction methods to exploit cross-component correlation include cross-component linear model and its extension of multi-model linear model. These models are designed for camera captured content. For screen content coding, where videos exhibit different signal characteristics, a cross-component prediction model tailored to their characteristics is desirable.
Graph Convolutional Networks (GCN) which typically follows a neural message passing framework to model dependencies among skeletal joints has achieved high success in skeleton-based human motion prediction task. Nevertheless, how to construct a graph from a skeleton sequence and how to perform message passing on the graph are still open problems, which severely affect the performance of GCN.
We provide an expressive framework that allows analyzing and generating provably secure, state-of-the-art Byzantine fault-tolerant (BFT) protocols over graph of nodes, a notion formalized in the HotStuff protocol. Our framework is hierarchical, including three layers. The top layer is used to model the message pattern and abstract core functions on which BFT algorithms can be built.
Website Fingerprinting (WF) is a network traffic mining technique for anonymous traffic identification, which enables a local adversary to identify the target website that an anonymous network user is browsing. WF attacks based on deep convolutional neural networks (CNN) get the state-of-the-art anonymous traffic classification performance. However, due to the locality restriction of CNN architecture for feature extraction on sequence data, these methods ignore the temporal feature extraction in the anonymous traffic analysis.
Since the generative adversarial network (GAN) was proposed by Ian Goodfellow et al. in 2014, it has been widely used in various fields. However, there are only a few works related to image steganography so far. Existing GAN-based steganographic methods mainly focus on the design of generator, and just assign a relatively poorer steganalyzer in discriminator, which inevitably limits the performances of their models.
Compared to gait recognition, Gait Attribute Recognition (GAR) is a seldom-investigated problem. However, since gait attribute recognition can provide richer and finer semantic descriptions, it is an indispensable part of building intelligent gait analysis systems. Nonetheless, the types of attributes considered in the existing datasets are very limited.
Indirect Time-of-Flight (iToF) sensors measure the received signal's phase shift or time delay to calculate depth. In realistic conditions, however, recovering depth is challenging as reflections from secondary scattering areas or translucent objects may interfere with the direct reflection, resulting in inaccurate 3D estimates.
We propose a differentiable imaging framework to address uncertainty in measurement coordinates such as sensor locations and projection angles. We formulate the problem as measurement interpolation at unknown nodes supervised through the forward operator. To solve it we apply implicit neural networks, also known as neural fields, which are naturally differentiable with respect to the input coordinates. We also develop differentiable spline interpolators which perform as well as neural networks, require less time to optimize and have well-understood properties.
Human speech can be characterized by different components, including semantic content, speaker identity and prosodic information. Significant progress has been made in disentangling representations for semantic content and speaker identity in speech recognition and speaker verification tasks respectively. However, it is still an open challenging question to extract prosodic information because of the intrinsic association of different attributes, such as timbre and rhythm, and because of the need for supervised training schemes to achieve robust speech recognition.
Speech applications in far-field real world settings often deal with signals that are corrupted by reverberation. The task of dereverberation constitutes an important step to improve the audible quality and to reduce the error rates in applications like automatic speech recognition (ASR). We propose a unified framework of speech dereverberation for improving the speech quality and the ASR performance using the approach of envelope-carrier decomposition provided by an autoregressive (AR) model.
Question answering requiring numerical reasoning, which generally involves symbolic operations such as sorting, counting, and addition, is a challenging task. To address such a problem, existing mixture-of-experts (MoE)-based methods design several specific answer predictors to handle different types of questions and achieve promising performance. However, they ignore the modeling and exploitation of fine-grained reasoning-related operations to support numerical reasoning, encountering the inadequacy in reasoning capability and interpretability.
The speaker recognition evaluation is conducted in a framework in which three score distributions and two decision thresholds are employed, and the statistic of interest is an average of the two weighted sums of the probabilities of type I and type II errors at the two thresholds correspondingly. And data dependence caused by multiple use of the same subjects exists ubiquitously in order to generate more samples because of limited resources.
Adaptive importance sampling (AIS) methods provide a useful alternative to Markov Chain Monte Carlo (MCMC) algorithms for performing inference of intractable distributions. Population Monte Carlo (PMC) algorithms constitute a family of AIS approaches which adapt the proposal distributions iteratively to improve the approximation of the target distribution.
Vision Transformer (ViT)-based image super-resolution (SR) methods have achieved impressive performance and surpassed CNN-based SR methods by utilizing Multi-Head Self-Attention (MHSA) to model long-range dependencies. However, the quadratic complexity of MHSA and the inefficiency of non-parallelized window partition seriously affect the inference speed, hindering these SR methods from being applied to application scenarios requiring speed and quality.
Learning-based approaches inspired by the scattering model for enhancing underwater imagery have gained prominence. Nevertheless, these methods often suffer from time-consuming attributable to their sizable model dimensions. Moreover, they face challenges in adapting unknown scenes, primarily because the scattering model's original design was intended for atmospheric rather than marine condition.
Decoding silent reading Electroencephalography (EEG) signals is challenging because of its low signal-to-noise ratio. In addition, EEG signals are typically non-Euclidean structured, therefore merely using a two-dimensional matrix to represent the variation of sampling points of each channel in time cannot richly represent the spatial connection between channels.
Date: 30 January 2024
Time: 8:00 AM ET (New York Time)
Presenter(s): Dr. Congli Wang, Dr. Wolfgang Heidrich
Manuscript Due: 22 August 2024
Publication Date: March 2025
Date: 6 March 2024
Time: 10:00 AM ET (New York Time)
Presenter(s): Javier Escudero