1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
In past years, various encrypted algorithms have been proposed to fully or partially protect the multimedia content in view of practical applications. In the context of digital TV broadcasting, transparent encryption only protects partial content and fulfills both security and quality requirements.
Visual attention is an important mechanism in the human visual system (HVS) and there have been numerous saliency detection algorithms designed for 2D images/video recently. However, the research for fixation detection of stereoscopic video is still limited and challenging due to the complicated depth and motion information.
In this paper, a self-guiding multimodal LSTM (sgLSTM) image captioning model is proposed to handle an uncontrolled imbalanced real-world image-sentence dataset. We collect a FlickrNYC dataset from Flickr as our testbed with 306,165 images and the original text descriptions uploaded by the users are utilized as the ground truth for training.
This paper presents a new method of secret three-dimensional object sharing (S3DOS), which allows sharing of three-dimensional (3-D) objects, while preserving its file format by selectively encrypting a 3-D object in order to sufficiently protect the visual nature of the content.
We present a novel global non-rigid registration method for dynamic 3D objects. Our method allows objects to undergo large non-rigid deformations and achieves high-quality results even with substantial pose change or camera motion between views. In addition, our method does not require a template prior and uses less raw data than tracking-based methods since only a sparse set of scans is needed.
This paper addresses the problem of encoding the video generated by the screen of an airplane cockpit. As other computer screens, cockpit screens consist of computer-generated graphics often atop a natural background. Existing screen content coding schemes fail notably in preserving the readability of textual information at the low bitrates required in avionic applications.
We study the problem of image alignment for panoramic stitching. Unlike most existing approaches that are feature-based, our algorithm works on pixels directly, and accounts for errors across the whole images globally. Technically, we formulate the alignment problem as rank-1 and sparse matrix decomposition over transformed images, and develop an efficient algorithm for solving this challenging non-convex optimization problem.
Image classification is an essential and challenging task in computer vision. Despite its prevalence, the combination of the deep convolutional neural network (DCNN) and the Fisher vector (FV) encoding method has limited performance since the class-irrelevant background used in the traditional FV encoding may result in less discriminative image features.
Resolution enhancements are often desired in imaging applications where high-resolution sensor arrays are difficult to obtain. Many computational imaging methods have been proposed to encode high-resolution scene information on low-resolution sensors by cleverly modulating light from the scene before it hits the sensor.
Dictionary learning for sparse representations is generally conducted in two alternating steps-sparse coding and dictionary updating. In this paper, a new approach to solve the sparse coding step is proposed. Because this step involves an
Coded illumination can enable quantitative phase microscopy of transparent samples with minimal hardware requirements. Intensity images are captured with different source patterns, then a nonlinear phase retrieval optimization reconstructs the image. The nonlinear nature of the processing makes optimizing the illumination pattern designs complicated.
Short duration text-independent speaker verification remains a hot research topic in recent years, and deep neural network based embeddings have shown impressive results in such conditions. Good speaker embeddings require the property of both small intra-class variation and large inter-class difference, which is critical for the ability of discrimination and generalization.
Automatic speech emotion recognition has been a research hotspot in the field of human-computer interaction over the past decade. However, due to the lack of research on the inherent temporal relationship of the speech waveform, the current recognition accuracy needs improvement.
Representation learning is the foundation of machine reading comprehension and inference. In state-of-the-art models, character-level representations have been broadly adopted to alleviate the problem of effectively representing rare or complex words. However, character itself is not a natural minimal linguistic unit for representation or word embedding composing due to ignoring the linguistic coherence of consecutive characters inside word.
Over the last years, several stationarity tests have been proposed. One of these methods uses time-frequency representations and stationarized replicas of the signal (known as surrogates) for testing wide-sense stationarity. In this letter, we propose a procedure to improve the original surrogate test.
In this letter, we propose a heuristic method to address sensor bias estimation to improve track-to-track association accuracy. A novel multi-parameter cost function is derived from rigid transformation function and it is minimized by the covariance matrix adaptation evolution strategies algorithm.
Diacritics restoration is a necessary component in order to develop Arabic text to speech systems. When diacritics are present, the phonetic transcription algorithm can be implemented based on a few rules. Restoring Arabic diacritics based on language model scoring is the dominant approach. A fixed vocabulary is usually used to build the language model used for scoring.
We study the problem of distributed filtering for state space models over networks, which aims to collaboratively estimate the states by a network of nodes. Most of existing works on this problem assume that both process and measurement noises are Gaussian and their covariances are known in advance. In some cases, this assumption breaks down and no longer holds.
Expander recovery is an iterative algorithm designed to recover sparse signals measured with binary matrices with linear complexity. In the paper, we study the expander recovery performance of the bipartite graph with girth greater than 4, which can be associated with a binary matrix with column correlations equal to either 0 or 1.
A key challenge in designing distributed particle filters is to minimize the communication overhead without compromising tracking performance. In this paper, we present two distributed particle filters that achieve robust performance with low communication overhead.