Noise-Resilient Training Method for Face Landmark Generation From Speech
Visual cues such as lip movements, when available, play an important role in speech communication. They are especially helpful for the hearing…
Read moreVisual cues such as lip movements, when available, play an important role in speech communication. They are especially helpful for the hearing…
Read moreThe acoustic-to-word model based on the Connectionist Temporal Classification (CTC) criterion is a natural end-to-end (E2E) system directly targeting…
Read moreSequence generation tasks, such as neural machine translation (NMT) and abstractive summarization, usually suffer from exposure bias as well as the…
Read moreA sound field reproduction method based on the spherical wavefunction expansion of sound fields is proposed, which can be flexibly applied to various…
Read moreShort duration text-independent speaker verification remains a hot research topic in recent years, and deep neural network based embeddings have…
Read moreAutomatic speech emotion recognition has been a research hotspot in the field of human-computer interaction over the past decade. However, due to the…
Read moreRepresentation learning is the foundation of machine reading comprehension and inference. In state-of-the-art models, character-level representations…
Read moreConstrained image splicing detection and localization (CISDL), which investigates two input suspected images and identifies whether one image has…
Read moreSparse coding-based anomaly detection has shown promising performance, of which the keys are feature learning, sparse representation, and dictionary…
Read moreThe importance of normalizing biometric features or matching scores is understood in the multimodal biometric case, but there is less attention to…
Read more