JDNet: A Joint-Learning Distilled Network for Mobile Visual Food Recognition

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

JDNet: A Joint-Learning Distilled Network for Mobile Visual Food Recognition

By: 
Heng Zhao; Kim-Hui Yap; Alex Chichung Kot; Lingyu Duan

Visual food recognition on mobile devices has attracted increasing attention in recent years due to its roles in individual diet monitoring and social health management and analysis. Existing visual food recognition approaches usually use large server-based networks to achieve high accuracy. However, these networks are not compact enough to be deployed on mobile devices. Even though some compact architectures have been proposed, most of them are unable to obtain the performance of full-size networks. In view of this, this paper proposes a Joint-learning Distilled Network (JDNet) that targets to achieve a high food recognition accuracy of a compact student network by learning from a large teacher network, while retaining a compact network size. Compared to the conventional one-directional knowledge distillation methods, the proposed JDNet has a novel joint-learning framework where the large teacher network and the small student network are trained simultaneously, by leveraging on different intermediate layer features in both network. JDNet introduces a new Multi-Stage Knowledge Distillation (MSKD) for simultaneous student-teacher training at different levels of abstraction. A new Instance Activation Learning (IAL) is also proposed to jointly train student and teacher on instance activation map of each training sample. Experimental results show that the trained student model is able to achieve a state-of-the-art Top-1 recognition accuracy on the benchmark UECFood-256 and Food-101 datasets at 84.0% and 91.2%, respectively, and retaining a 4x smaller network size for mobile deployment.

SPS on Twitter

  • On 15 September 2022, we are excited to partner with and to bring you a webinar and roundtable,… https://t.co/we14OUl2QV
  • The SPS Webinar Series continues on Monday, 22 August when Dr. Yu-Huan Wu and Dr. Shanghua Gao present “Towards Des… https://t.co/ZkHjQLLn7L
  • CALL FOR PAPERS: The IEEE/ACM Transactions on Audio, Speech, and Language Processing is now accepting submissions f… https://t.co/wkoVBKfE5j
  • DEADLINE EXTENDED: The IEEE Journal of Selected Topics in Signal Processing is now accepting submissions for a Spec… https://t.co/qoRbzFeMLL
  • Our Information Forensics and Security Webinar Series continues on Tuesday, 23 August when Dr. Anderson Rocha prese… https://t.co/q48hnIMfan

SPS Videos


Signal Processing in Home Assistants

 


Multimedia Forensics


Careers in Signal Processing             

 


Under the Radar