SPS Feed

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

The Latest News, Articles, and Events in Signal Processing

IEEE/ACM Transactions on Audio, Speech, and Language Processing

This paper introduces a new framework for non-parallel emotion conversion in speech. Our framework is based on two key contributions. First, we propose a stochastic version of the popular Cycle-GAN model. Our modified loss function introduces a Kullback–Leibler (KL) divergence term that aligns the source and target data distributions learned by the generators, thus overcoming the limitations of sample-wise generation. By using a variational approximation to this stochastic loss function, we show that our KL divergence term can be implemented via a paired density discriminator.

IEEE/ACM Transactions on Audio, Speech, and Language Processing

In automatic speech recognition (ASR) research, discriminative criteria have achieved superior performance in DNN-HMM systems. Given this success, the adoption of discriminative criteria is promising to boost the performance of end-to-end (E2E) ASR systems. With this motivation, previous works have introduced the minimum Bayesian risk (MBR, one of the discriminative criteria) into E2E ASR systems. However, the effectiveness and efficiency of the MBR-based methods are compromised: the MBR criterion is only used in system training, which creates a mismatch between training and decoding;

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Emotional voice conversion (VC) aims to convert a neutral voice to an emotional one while retaining the linguistic information and speaker identity. We note that the decoupling of emotional features from other speech information (such as content, speaker identity, etc.) is the key to achieving promising performance. Some recent attempts of speech representation decoupling on the neutral speech cannot work well on the emotional speech, due to the more complex entanglement of acoustic properties in the latter. 

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Detection of speech and music signals in isolated and overlapped conditions is an essential preprocessing step for many audio applications. Speech signals have wavy and continuous harmonics, while music signals exhibit horizontally linear and discontinuous harmonic patterns. Music signals also contain more percussive components than speech signals, manifested as vertical striations in the spectrograms.

IEEE Signal Processing Letters

A key challenge of image splicing detection is how to localize integral tampered regions without false alarm. Although current forgery detection approaches have achieved promising performance, the integrality and false alarm are overlooked. In this paper, we argue that the insufficient use of splicing boundary is a main reason for poor accuracy. To tackle this problem, we propose an Edge-enhanced Transformer (ET) for tampered region localization. Specifically, to capture rich tampering traces, a two-branch edge-aware transformer is built to integrate the splicing edge clues into the forgery localization network, generating forgery features and edge features.

IEEE Signal Processing Letters

In this letter, we propose a novel solution to the problem of single image super-resolution at multiple scaling factors, with a single network architecture. In applications where only a detail needs to be super-resolved, traditional solutions must choose to use as input either the low-resolution detail, thus losing the information about the context, or the whole low-resolution image and then crop the desired output detail, which is quite wasteful in terms of computations and storage. 

IEEE Signal Processing Letters

Active reconfigurable intelligent surfaces (RISs) are a novel and promising technology that allows controlling the radio propagation environment while compensating for the product path loss along the RIS-assisted path. In this letter, we consider the classical radar detection problem and propose to use an active RIS to get a second independent look at a prospective target illuminated by the radar transmitter.

IEEE Open Journal of Signal Processing

Model selection is an omnipresent problem in signal processing applications. The Akaike information criterion (AIC) and the Bayesian information criterion (BIC) are the most commonly used solutions to this problem. These criteria have been found to have satisfactory performance in many cases and had a dominant role in the model selection literature since their introduction several decades ago, despite numerous attempts to dethrone them. Model selection can be viewed as a multiple hypothesis testing problem.

IEEE Open Journal of Signal Processing

The algorithms based on the technique of optimal k -thresholding (OT) were recently proposed for signal recovery, and they are very different from the traditional family of hard thresholding methods. However, the computational cost for OT-based algorithms remains high at the current stage of their development. This stimulates the development of the so-called natural thresholding (NT) algorithm and its variants in this paper. The family of NT algorithms is developed through the first-order approximation of the so-called regularized optimal k -thresholding model, and thus the computational cost for this family of algorithms is significantly lower than that of the OT-based algorithms. 

IEEE Open Journal of Signal Processing

Mask-based lensless cameras offer a novel design for imaging systems by replacing the lens in a conventional camera with a layer of coded mask. Each pixel of the lensless camera encodes the information of the entire 3D scene. Existing methods for 3D reconstruction from lensless measurements suffer from poor spatial and depth resolution.

IEEE Journal of Selected Topics in Signal Processing

Recently, self-supervised learning (SSL) from unlabelled speech data has gained increased attention in the automatic speech recognition (ASR) community. Typical SSL methods include autoregressive predictive coding (APC), Wav2vec2.0, and hidden unit BERT (HuBERT). However, SSL models are biased to the pretraining data. When SSL models are finetuned with data from another domain, domain shifting occurs and might cause limited knowledge transfer for downstream tasks.

IEEE Journal of Selected Topics in Signal Processing

Speech self-supervised learning has attracted much attention due to its promising performance in multiple downstream tasks, and has become a new growth engine for speech recognition in low-resource languages. In this paper, we exploit and analyze a series of wav2vec pre-trained models for speech recognition in 15 low-resource languages in the OpenASR21 Challenge.

IEEE Journal of Selected Topics in Signal Processing

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. 

IEEE Journal of Selected Topics in Signal Processing

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. 

IEEE Journal of Selected Topics in Signal Processing

The papers in this special section focus on self-supervised learning for speech and audio processing. A current trend in the machine learning community is the adoption of self-supervised approaches to pretrain deep networks. Self-supervised learning utilizes proxy-supervised learning tasks (or pretext tasks) - for example, distinguishing parts of the input signal from distractors or reconstructing masked input segments conditioned on unmasked segments—to obtain training data from unlabeled corpora. 

The IEEE SPS congratulates the following SPS members who will receive the Society’s prestigious awards during ICASSP 2023 in Greece.

Novel computational signal and image analysis approaches based on feature-rich mathematical/computational frameworks continue to push the limits of the technological envelope, thus providing optimized and efficient solutions.

IEEE SPS has built a streamlined mechanism for employers to add a job announcement by simply filling in a simple job opportunity submission Web form related to a particular TC field. To submit job announcements for a particular Technical Committee, the submission form can be found by visiting the page below and selecting a particular TC.

The Signal Processing Society (SPS) has 12 Technical Committees that support a broad selection of signal processing-related activities defined by the scope of the Society.

Each year, the IEEE Board of Directors confers the grade of Fellow on up to one-tenth of one percent of the voting members.  To qualify for consideration, an individual must have been a Member, normally for five years or more, and a Senior Member at the time for nomination to Fellow.  The grade of Fellow recognizes unusual distinction in IEEE’s designated fields.

Pages

SPS ON X

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel