SPS-ASI Webinar: Midlevel Representations and Domain Specific Geometric Estimation in Computer Vision

Date: 27 February 2024
Time: 4:00 PM PST (CET)
Presenter(s): Dr. Magnus Oskarsson

The ASI Webinar Series is an event initiated by the Autonomous System Initiative (ASI) of the IEEE Signal Processing (SP) Society. The goal is to offer the SP community with free webinars looking into the future of autonomous systems. These monthly webinars are hosted on Zoom, with recordings made available in the IEEE ASI’s YouTube channel following the live events.

Abstract

The world is a complex place. In order to make sense of it, we as humans typically make simplifications in order to understand and interpret our surroundings. The use of geometric primitives, to represent and build up the world, has been advocated both from a computer vision perspective, via the influential ideas of Marr and from a psychology perspective, via the recognition-by-components (RBC) theory of Biederman, for a long time. These ideas were put forward mostly in the context of object recognition. Today, applications for extraction of semantic information from images and scenes are typically highly data driven, with representations largely learned and coded implicitly in neural network architectures. Geometric estimation in Structure from Motion (SfM) and Simultaneous Localization and Mapping (SLAM) application are, on the other hand, often based on explicit representations (sparse 3D-points and camera matrices) even if the image feature representations are learned. There are today mature methods that do SfM and SLAM efficiently. Sparse point clouds are well suited for matching, camera geometry estimation and optimization. However, they do not scale well for very large scenes, they are often not stable over time, and are bad for downstream tasks such as interpretation and recognition. For this reason we are interested in investigating the use of mid-level representations that carry more semantic meaning than simple points do. An example of such  a possible primitive shape is a cylinder. Such structures are quite common, e.g. trees, lampposts, pillars, and furniture legs. Traditionally, the projection of the center lines of such cylinders have been considered and used in computer vision. Here, we demonstrate that the apparent width of the cylinders also contains useful information for structure and motion estimation. We will describe robust methods for  how such structures can be used  for simultaneously estimating camera pose and scene structure from silhouette lines. These methods are tested on real world use-cases in forestry applications. 

Biography

Magnus Oskarsson received the M.Sc. degree in physics engineering and the Ph.D. degree in mathematics from the University of Lund, Sweden, in 1997 and 2002, respectively. His thesis work was devoted to computer vision with applications to autonomous vehicles. He is currently an Associate Professor with the Centre for Mathematical Sciences, Lund University, where his teachings include undergraduate and graduate courses in mathematics and image analysis. He has authored or coauthored several papers in international journals and conference proceedings within geometry, algebra, and optimization with applications in computer vision, cognitive vision, and image enhancement.