Unsupervised Video Summarization With Cycle-Consistent Adversarial LSTM Networks

You are here

IEEE Transactions on Multimedia

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Unsupervised Video Summarization With Cycle-Consistent Adversarial LSTM Networks

Li Yuan; Francis Eng Hock Tay; Ping Li; Jiashi Feng

Video summarization is an important technique to browse, manage and retrieve a large amount of videos efficiently. The main objective of video summarization is to minimize the information loss when selecting a subset of video frames from the original video, hence the summary video can faithfully represent the overall story of the original video. Recently developed unsupervised video summarization approaches are free of requiring tedious annotation on important frames to train a video summarization model and thus are practically attractive. However, their performance is still limited due to the difficulty of minimizing information loss between the summary and original videos. In this paper, we address unsupervised video summarization by developing a novel Cycle-consistent Adversarial LSTM architecture to effectively reduce the information loss in the summary video. The proposed model, named Cycle-SUM, consists of a frame selector and a cycle-consistent learning based evaluator. The selector is a bi-directional LSTM network to capture the long-range relationship between video frames. To overcome the difficulty of specifying a suitable information preserving metric between original video and summary video, the evaluator is introduced to “supervise” selector to improve the video summarization quality. Specifically, the evaluator is composed of two generative adversarial networks (GANs), in which the forward GAN component is learned to reconstruct the original video from summary video, while the backward GAN learns to invert the process. We establish the relation between mutual information maximization and such cycle learning procedure and further introduce cycle-consistent loss to regularize the summarization. Extensive experiments on three video summarization benchmark datasets demonstrate a state-of-the-art performance, and show the superiority of the Cycle-SUM model compared with other unsupervised approaches.

SPS on Twitter

  • DEADLINE EXTENDED: The 2023 IEEE International Workshop on Machine Learning for Signal Processing is now accepting… https://t.co/NLH2u19a3y
  • ONE MONTH OUT! We are celebrating the inaugural SPS Day on 2 June, honoring the date the Society was established in… https://t.co/V6Z3wKGK1O
  • The new SPS Scholarship Program welcomes applications from students interested in pursuing signal processing educat… https://t.co/0aYPMDSWDj
  • CALL FOR PAPERS: The IEEE Journal of Selected Topics in Signal Processing is now seeking submissions for a Special… https://t.co/NPCGrSjQbh
  • Test your knowledge of signal processing history with our April trivia! Our 75th anniversary celebration continues:… https://t.co/4xal7voFER

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel