1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
Sounds carry a large amount of information about our everyday environment and physical events that take place in it. We can perceive the sound scene we are within (busy street, office, etc.), and recognize individual sound sources (car passing by, footsteps, etc.). Developing signal processing methods to automatically extract this information has huge potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile devices, robots, cars etc., and intelligent monitoring systems to recognize activities in their environments using acoustic information. However, a significant amount of research is still needed to reliably recognize sound scenes and individual sound sources in realistic soundscapes, where multiple sounds are present, often simultaneously, and distorted by the environment.
Building up on the success of the previous editions (DCASE 2013 and DCASE 2016, both supported by IEEE AASP TC), this year's evaluation continues supporting the development of computational scene and event analysis methods by comparing different approaches using a common publicly available dataset. The continuous effort in this direction sets the milestones of development and anchors the current performance for further reference.
The proposed tasks for DCASE 2017 Challenge are the following:
Task 1: Acoustic scene classification. The goal of acoustic scene classification is to classify a test recording into one of predefined classes that characterizes the environment in which it was recorded -- for example "park", "street", "office". The acoustic data will include recordings from 15 sound scenes with high acoustic diversity.
Task 2: Detection of rare sound events. This task will focus on detection of rare sound events in artificially created mixtures. This specific use of data will allow creating mixtures of everyday audio and sound events of interest at different event-to-background ratio, providing a larger amount of training conditions than would be available in real recordings.
Task 3: Sound event detection in real life audio. This task evaluates performance of the sound event detection systems in multisource conditions, using audio recorded in street environments. In real-life environments the sound sources are rarely heard in isolation, and for this task there is no control over the number of overlapping sound events at each time, not in the training nor the testing audio data. The annotations of event activities are done manually, and can therefore be somewhat subjective.
Task 4: Large-scale weakly supervised sound event detection for smart cars. This task evaluates systems for the large-scale detection of sound events using weakly labeled training data. The data are web video excerpts focusing on transportation and warnings due to their industry relevance and to the underuse of audio in this context. Two tasks are included: audio tagging and sound event detection, reflecting requirement for systems to produce a label without and with timestamps, respectively.
Each task is based on a specific dataset containing audio and reference annotations. For each task, a development dataset and baseline system are provided. The development dataset contains also cross-validation setup as train/test splits or train/development subsets, in order to allow complete comparison of submissions. All developments datasets are currently publicly available.
Challenge evaluation will be done using an evaluation dataset that will be published without reference annotations one month before the challenge deadline. Participants will submit system output using the provided evaluation data, and this will be evaluated by the organizers.
Task 1 uses TUT Acoustic Scenes 2017 dataset, composed of the development and evaluation sets from TUT Acoustic Scenes 2016, split into 10-second segments. The development set contains 312 segments for each class and is provided together with a cross-validation setup. Evaluation data consists of newly recorded audio material and will contain at least 70 segments per class.
Task 2 uses TUT Rare Sound Events 2017 dataset, consisting of isolated sound events for three target classes (baby crying, glass breaking and gunshot, approximately 100 per class) and recordings of everyday acoustic scenes as background. Isolated sound examples were collected from freesound.org, and the background audio consists of TUT Acoustic Scenes 2016. Source code for creating mixtures at different event-to-background is also provided, along with a set of ready generated mixtures. For each target class, evaluation data will consist of 500 mixtures.
Task 3 uses TUT Sound Events 2017 dataset, a subset of TUT Acoustic Scenes 2017. The audio consists of recordings of street scenes with various levels of traffic and other activity. The scene was selected as representing an environment of interest for detection of sound events related to human activities and hazard situations. Target sound event classes were selected to represent common sounds related to human presence and traffic: brakes squeaking, car, children, large vehicle, people speaking, and people walking.
Task 4 uses a subset of AudioSet , with selected 17 target classes belonging to two categories: “Warning” and “Vehicle”. The training set has weak labels denoting the presence of a given sound event in the video’s soundtrack, and no timestamps. For testing and evaluation, strong labels with timestamps are provided for the purpose of evaluating sound event detection performance, while audio tagging will be evaluated only with respect to the produced label.
Task 1 will be scored using classification accuracy. Task 2 will be scored using event-based error rate with a collar of 500 ms . In addition, event-based F-score will also be calculated. Task 3 will be scored using segment-based error rate in one-second segments. In addition, segment-based F-score will also be calculated. Task 4 will be scored in two settings: using F-score for detection of sound events within the 10-second clip (weak labels) and using segment-based error rate in one-second segments (strong labels).
Participants are expected from both academy and industry, at a similar level to previous edition (DCASE 2016 had 87 total submissions)
21 Mar 2017 Publication of datasets and baseline systems
30 June 2017 Publication of evaluation datasets
31 July 2017 Submission of results and technical report
15 Oct 2017 Publication of evaluation results on the challenge website
16-17 Nov 2017 DCASE 2017 Workshop
Complete information is available at http://www.cs.tut.fi/sgn/arg/dcase2017/.
The website contains all necessary details, including tasks and datasets descriptions, baseline system implementation and performance on development dataset, submission procedure, challenge rules, and contact information of task coordinators.
© Copyright 2021 IEEE – All rights reserved. Use of this website signifies your agreement to the IEEE Terms and Conditions.
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.