1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
This paper considers the problem of decentralized consensus optimization over a network, where each node holds a strongly convex and twice-differentiable local objective function. Our goal is to minimize the sum of the local objective functions and find the exact optimal solution using only local computation and neighboring communication.
Novel Monte Carlo estimators are proposed to solve both the Tikhonov regularization (TR) and the interpolation problems on graphs. These estimators are based on random spanning forests (RSF), the theoretical properties of which enable to analyze the estimators’ theoretical mean and variance.
The video captioning task aims to describe video content using several natural-language sentences. Although one-step encoder-decoder models have achieved promising progress, the generations always involve many errors, which are mainly caused by the large semantic gap between the visual domain and the language domain and by the difficulty in long-sequence generation.
The prevailing use of both images and text to express opinions on the web leads to the need for multimodal sentiment recognition. Some commonly used social media data containing short text and few images, such as tweets and product reviews, have been well studied. However, it is still challenging to predict the readers’ sentiment after reading online news articles, since news articles often have more complicated structures, e.g., longer text and more images.
Recently, dense video captioning has made attractive progress in detecting and captioning all events in a long untrimmed video. Despite promising results were achieved, most existing methods do not sufficiently explore the scene evolution within an event temporal proposal for captioning, and therefore perform less satisfactorily when the scenes and objects change over a relatively long proposal. To address this problem, we propose a graph-based partition-and-summarization (GPaS) framework for dense video captioning within two stages.
Diversity “multiple description” (MD) source coding promises graceful degradation in the presence of a priori unknown number of erased packets in the channel. A simple coding scheme for the case of two packets consists of oversampling the source by a factor of two and delta-sigma quantization. This approach was applied successfully to JPEG-based image coding over a lossy packet network, where the interpolation and splitting into two descriptions are done in the discrete cosine transform (DCT) domain.
Video surveillance and its applications have become increasingly ubiquitous in modern daily life. In video surveillance system, video coding as a critical enabling technology determines the effective transmission and storage of surveillance videos. In order to meet the real-time or time-critical transmission requirements of video surveillance systems, the low-delay (LD) configuration of the advanced high efficiency video coding (HEVC) standard is usually used to encode surveillance videos.
RGB-thermal salient object detection (SOD) aims to segment the common prominent regions of visible image and corresponding thermal infrared image that we call it RGBT SOD. Existing methods don’t fully explore and exploit the potentials of complementarity of different modalities and multi-type cues of image contents, which play a vital role in achieving accurate results.
We propose a neural network model to estimate the current frame from two reference frames, using affine transformation and adaptive spatially-varying filters. The estimated affine transformation allows for using shorter filters compared to existing approaches for deep frame prediction. The predicted frame is used as a reference for coding the current frame.
Machine learning techniques have been widely applied to various applications. However, they are potentially vulnerable to data poisoning attacks, where sophisticated attackers can disrupt the learning procedure by injecting a fraction of malicious samples into the training dataset. Existing defense techniques against poisoning attacks are largely attack-specific: they are designed for one specific type of attacks but do not work for other types, mainly due to the distinct principles they follow.
This paper presents a signal processing and machine learning (ML) based methodology to leverage Electromagnetic (EM) emissions from an embedded device to remotely detect a malicious application running on the device and classify the application into a malware family. We develop Fast Fourier Transform (FFT) based feature extraction followed by Support Vector Machine (SVM) and Random Forest (RF) based ML models to detect a malware.
Deep learning-based person re-identification (Re-ID) has made great progress and achieved high performance recently. In this paper, we make the first attempt to examine the vulnerability of current person Re-ID models against a dangerous attack method, i.e. , the universal adversarial perturbation (UAP) attack, which has been shown to fool classification models with a little overhead.
Given a spectral library, sparse unmixing aims to estimate the fractional proportions in each pixel of a hyperspectral image scene. However, the ever-growing dimensionality of spectral dictionaries strongly limits the performance of sparse unmixing algorithms. In this study, we propose a novel dictionary pruning (DP) approach to improve the performance of sparse unmixing algorithms, making them more accurate and time-efficient.
In cell and molecular biology, the fusion of green fluorescent protein (GFP) and phase contrast (PC) images aims to generate a composite image, which can simultaneously display the functional information in the GFP image related to the molecular distribution of biological living cells and the structural information in the PC image such as nucleus and mitochondria. In this paper, we propose a detail preserving cross network (DPCN), which consists of a structural-guided functional feature extraction branch (SFFEB), a functional-guided structural feature extraction branch (FSFEB) and a detail preserving module (DPM), to address the GFP and PC image fusion issue.
Automatic speech recognition (ASR) technologies have been significantly advanced in the past few decades. However, recognition of overlapped speech remains a highly challenging task to date. To this end, multi-channel microphone array data are widely used in current ASR systems.
In music source separation, the number of sources may vary for each piece and some of the sources may belong to the same family of instruments, thus sharing timbral characteristics and making the sources more correlated. This leads to additional challenges in the source separation problem.
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences. Speaker adaptation techniques play a vital role to reduce the mismatch. Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness.
Point Clouds (PCs) have recently been adopted as the preferred data structure for representing 3D visual contents. Examples of Point Cloud (PC) applications range from 3D representations of small objects up to large scenes, both still or dynamic in time. PC adoption triggered the development of new coding, transmission, and display methodologies that culminated in new international standards for PC compression.
Video inpainting aims to fill missing regions with plausible content in a video sequence. Deep learning-based video inpainting methods have made promising progress over the past few years. However, these methods tend to generate degraded completion content, such as missing textural details.
Quadrature spatial modulation (QSM) isa recently proposed multiple-input multiple-output (MIMO) wireless transmission paradigm that has garnered considerable research interest owing to its relatively high spectral efficiency. QSM essentially enhances the spatial multiplexing gain while maintaining all the inherent advantages of spatial modulation (SM).