Skip to main content

Vision and Language: Bridging Vision and Language with Deep Learning [Part 1 of 2]

SHARE:
Category
Proficiency
Language
Media Type
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Date
Author Bio/Abstract Recognition of visual content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on understanding visual content using a predefined yet limited vocabulary. Thanks to the recent development of deep learning techniques, researchers in both computer vision and multimedia communities are now striving to bridge vision with natural language, which can be regarded as the ultimate goal of visual understanding. We will present recent advances in exploring the synergy of visual understanding and language processing techniques, including vision-language alignment, visual captioning and commenting, visual emotion analysis, visual question answering, visual storytelling, and as well as open issues for this emerging research area.
Duration
1:16:00
Subtitles

Scalable Deep Learning for Image Processing with Microsoft Cognitive Toolkit [Part 2 of 2]

SHARE:
Category
Proficiency
Language
Media Type
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Authors
Date
Author Bio/Abstract Deep learning has become the de facto standard method in most image processing problems. In the past few years, deep learning algorithms have met and exceeded human-level performance in image recognition. Nevertheless, training deep learning networks on a large data set remains very challenging. The sheer amount of computation needed to train a convolutional neural network can take months on large data sets. Combining that with the black art of hyper-parameter tuning, the community desperately needs tools to help train deep learning networks on multiple servers with multiple GPUs. This tutorial will introduce Microsoft's Cognitive Toolkit, also known as CNTK (https://github.com/Microsoft/CNTK), to the image processing community. Various algorithms supported by the toolkit will be presented. The benefits of CNTK in terms of speed and scalability relative to existing toolkits will also be described.
Duration
1:01:51
Subtitles

Scalable Deep Learning for Image Processing with Microsoft Cognitive Toolkit [Part 1 of 2]

SHARE:
Category
Proficiency
Language
Media Type
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Authors
Date
Author Bio/Abstract Deep learning has become the de facto standard method in most image processing problems. In the past few years, deep learning algorithms have met and exceeded human-level performance in image recognition. Nevertheless, training deep learning networks on a large data set remains very challenging. The sheer amount of computation needed to train a convolutional neural network can take months on large data sets. Combining that with the black art of hyper-parameter tuning, the community desperately needs tools to help train deep learning networks on multiple servers with multiple GPUs. This tutorial will introduce Microsoft's Cognitive Toolkit, also known as CNTK (https://github.com/Microsoft/CNTK), to the image processing community. Various algorithms supported by the toolkit will be presented. The benefits of CNTK in terms of speed and scalability relative to existing toolkits will also be described. 
Duration
0:59:20
Subtitles

Vision and Language: Bridging Vision and Language with Deep Learning [Part 2 of 2]

SHARE:
Category
Proficiency
Language
Media Type
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Date
Author Bio/Abstract Recognition of visual content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on understanding visual content using a predefined yet limited vocabulary. Thanks to the recent development of deep learning techniques, researchers in both computer vision and multimedia communities are now striving to bridge vision with natural language, which can be regarded as the ultimate goal of visual understanding. We will present recent advances in exploring the synergy of visual understanding and language processing techniques, including vision-language alignment, visual captioning and commenting, visual emotion analysis, visual question answering, visual storytelling, and as well as open issues for this emerging research area.
Duration
1:28:02
Subtitles

Future Video Coding: Coding Tools and Developments beyond HEVC [Part 2 of 2]

SHARE:
Category
Proficiency
Language
Media Type
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Date
Author Bio/Abstract While HEVC is the state-of-the-art video compression standard with profiles addressing virtually all video-related products of today, recent developments suggest significant performance improvements relative to this established technology. At the same time, the target application space evolves further towards higher picture resolution, higher dynamic range, fast motion capture, or previously unaddressed formats such as 360° video. The signal properties of this content open the door for different designs of established coding tools as well as the introduction of new algorithmic concepts which have not been applied in the context of video coding before. Specifically, the required ultra-high picture resolutions and the projection operations in the context of processing 360° video provide exciting options for new developments. This type of content further modifies the way of video consumption (enabling the use of head-mounted displays) as well as the methods of video content creation and production. This tutorial will provide a comprehensive overview on recent developments and perspectives in the area of video coding. As a central element, the work performed in the Joint Video Exploration Team (JVET) of ITU-T SG16/Q6 (VCEG) and ISO/IEC JTC1 SC29WG11 (MPEG) is covered, as well as trends outside of the tracks of standardization bodies. The focus of the presentation is on algorithms, tools and concepts with potential for competitive future video compression technology. In this context, also the potential of methods related to perceptional models, synthesis of perceptional equivalent content, and deep learning based approaches will be discussed.
Duration
0:34:31
Subtitles

Multimedia and Autism

SHARE:
Category
Proficiency
Language
Media Type
Intended Audience
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Date
Duration
1:05:58
Subtitles

A Tale of Three Families: Descriptive, Generative and Discriminative Models

SHARE:
Category
Proficiency
Language
Media Type
Pricing

SPS Members $0.00
IEEE Members $11.00
Non-members $15.00

Authors
Date
Representations of images, in general, belong to three probabilistic families, developed for different regimes of data and tasks. (i) Descriptive models, originated from statistical physics, reproduce certain statistical regularities in data, and are often suitable for patterns in the high entropy regime, such as MRF, Gibbs and FRAME. (ii) Generative models, originated from harmonic analysis, seek latent variables and dictionaries to explain data in parsimonious representations, and are often more effective for the low entropy regime, such as sparse models and auto-encoders. (iii) Discriminative models are often trained by statistical regression for classification tasks. This talk will start with the Julesz quest on texture and texton representations in the 1960s, and then review the developments, interactions and integration of these model families in the recent deep learning era, such as the adversary and cooperative models. Then the talk will draw a unification of these models in a continuous entropy spectrum in terms of information scaling. Finally, the talk will discuss future directions in developing cognitive models for representations beyond deep learning, i.e. modeling the task-oriented cognitive aspects, such as functionality, physics, intents and causality, which are the invisible “dark matter”, by analogy to cosmology, in human intelligence.
Duration
1:06:44
Subtitles

EAM-DLMA