Deep Learning for Visual Understanding: Part 2

January 2018

Deep Learning for Visual Understanding: Part 2

Deep-Learning-for-Visual-Understanding.png

By:

Fatih Porikli; Shiguang Shan; Cees Snoek; Rahul Sukthankar; Xiaogang Wang

Visual perception is one of our most essential and fundamental abilities that enables us to make sense of what our eyes see and interpret the world that surrounds us. It allows us to function and, thus, our civilization to survive. No sensory loss is more debilitating than blindness as we are, above all, visual beings. Close your eyes for a moment after reading this sentence and try grabbing something in front of you, navigating your way in your environment, or just walking straight, reading a book, playing a game, or perhaps learning something new. Of course, please do not attempt to drive a vehicle. As you would realize again and appreciate profoundly, we owe so much to this amazing facility. It is no coincidence that most of the electrical activity in the human brain and most of its cerebral cortex is associated with visual understanding.

Computer vision is the field of study that develops solutions for visual perception. In other words, it aims to make computers understand the seen data in the same way that human vision does. It incorporates several scientific disciplines such as signal processing, machine learning, applied mathematics, sensing, geometry, optimization, statistics, and data sciences to name a few. It is concerned with the extraction, modeling, analysis, and use of information from a single image or a sequence of images across a spectrum of modalities for building intelligent systems.

As our visual perception of the world is reflected in our ability to make decisions through what we see, providing such analytical capabilities to computers makes it possible to design remarkable applications that enhance our lives. Computer vision solutions are acting everywhere, including in our:

computer mouse, determining its motion
phones, reading our fingerprints
cameras, controlling lenses
mail centers, sorting parcels
warehouse robots, retrieving packages
gateways, recognizing faces
vehicles, assisting drivers
hospitals, diagnosing medical problems
factories, performing inspections
farmlands, harvesting produce
dressers, checking the style of our outfits.

As well as revolutionizing technologies for autonomous vehicles and virtual reality devices, it will soon unfold a transformative and disruptive impact on our culture and economy.

On the journey of developing algorithms that can match human visual perception, most of the progress happened within the last decade with the rebirth of artificial neural networks in computer vision, in particular, convolutional architectures. Ascribing to their complex and layered structures, a broader family of data-driven machine-learning methods based on neural network models today is called deep learning. An illustration of common deep-learning networks such as convolutional neural networks, autoencoders, and generativeadversarial networks (GANs) can be seen in Figure 1, and a very comprehensive discussion of different deep-learning techniques for visual understanding also can be found in the tutorial articles in the first part of this special issue in the November 2017 issue of IEEE Signal Processing Magazine (SPM).

There are many compelling advantages of deep-learning methods. In their cascaded layers that can contain hundreds of millions of parameters, they can model highly nonlinear functions. With their pooling layers that can generate multiple levels of representations corresponding to different levels of abstraction, they can coalesce the information from local and global receptive fields. They can run efficiently on parallel processors with their feed-forward characteristics. Since they learn what part of the data is relevant and discriminative from training samples automatically, they are not limited to handcraft - ed descriptors and manually defined transformations. Most importantly, they can learn from their mistakes when provided with such cases and become superior as the amount of training data increases. The success of deeplearning methods also reflects on the volume of the scientific publications. Deep-learn - ing-related articles in main computer vision venues boosted from fewer than 100 in 2012 to an astounding level of more than 1,000 in 2017.

The November 2017 special issue of SPM on deep learning for visual understanding surveyed deep-learning solutions under reinforcement; weakly supervised and multimodal settings, investigated their robustness; and presented overviews of their applications in domain adaptation, hashing, semantic segmentation, metric learning, inverse problems in imaging, image-to-text generation, and picture-quality assessment.

Read on IEEE Xplore

multimedia_general.jpg

2025 Cycle 1 Chapter Initiative: DecodeX: A Comprehensive Signal Processing Experience

SP-Society-Name-Change-Forum.jpg

2025 Cycle 1 IEEE SPS Forum on IGNITE : A PhD Forum and PG Poster Presentation 2.0

MRI-Brain-Scan_3.jpg

2025 Cycle 1 IEEE SPS Seasonal School on "DeepMed: AI-Driven Imaging and Diagnostics for Remote Healthcare"

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Deep Learning for Visual Understanding: Part 2

Publications & Resources

Signal Processing Magazine

For Authors

light_bulb_general.jpg

lrac2025_vertical_text (2).png

congratulations.jpg

Top Reasons to Join SPS Today!

Deep Learning for Visual Understanding: Part 2

Deep-Learning-for-Visual-Understanding.png

SPS Social Media

IEEE SPS Educational Resources

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Deep Learning for Visual Understanding: Part 2

Search form

You are here

Publications & Resources

Signal Processing Magazine

For Authors

Top Reasons to Join SPS Today!

Deep Learning for Visual Understanding: Part 2

SPS Social Media

IEEE SPS Educational Resources