1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
10 years of news and resources for members of the IEEE Signal Processing Society
Li, Kang (Northeastern University) “Video event recognition and prediction based on temporal structure analysis”, Advisor: Fu, Yun
The increasing ubiquitousness of multimedia information in today's world has positioned video as a favored information vehicle, and given rise to an astonishing generation of social media and surveillance footage. Consumer-grade video is becoming abundant on the Internet, and it is now easier than ever to download multimedia material of any kind and quality. This raises a series of technological demands for automatic video understanding, which has motivated the research community to guide its steps towards a better attainment of such capabilities. As a result, current trends on cognitive vision promise to recognize complex events and self-adapt to different environments, while managing and integrating several types of knowledge.
One important problem that will significantly enhance semantic-level video analysis is activity and event understanding, which aims at accurately describing video contents using key semantic elements, such activities and events. One well-known challenge is the long-standing semantic gap between computable low-level features and semantic information that they encode. In this thesis, several studies of high-level video content understanding were presented, which address these difficulties and narrow the semantic gap effectively. In particular, the authors have focused on two types of videos, namely human activity video and unconstrained consumer video. The proposed temporal structure analysis frameworks significantly extend the domains of video that can be understood by machine vision systems.
In aspect of human activity recognition, the authors notice that in case a time-critical decision is needed, there is no work that utilizes the temporal structure of videos for early prediction of ongoing human activity. Thus the authors present a general activity prediction framework in which human activities can be characterized by a complex temporal composition of constituent simple actions and interacting objects. Then the authors extend their work to the 3D cases of action prediction motivated by recent advent of the cost-effective sensors, such as depth camera Kinect. By considering 3D action data as multivariate time series (m.t.s.) synchronized to a shared common clock (frames), the authors proposed a stochastic process called Marked Point Process (MPP) modelling the 3D action as temporal dynamic patterns, where both timing and strength information are captured.
In aspect of unconstrained consumer video understanding, the authors also focus on the temporal structure of the video content through a semantic-segment based design, in which each video clip can be represented as a series of varying videography words. Then, unique videography signatures from different events can be automatically identified, using statistical analysis methods. The authors explore the use of videography analysis for different types of applications, including content-based video retrieval, video summarization (both visual and textual), videography based feature pooling.
For details, please visit the thesis page.
© Copyright 2021 IEEE – All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.