This is a classification problem. ‘1’ indicates that the current sample contains the wake-up word, and ‘0’ indicates the opposite. We use false reject rate (FRR) and false alarm rate (FAR) on evaluation set as the criterion of the WWS performance. Suppose the test set consists of Nwake examples with wake word and Nnon-wake examples without wake word, FRR and FAR are defined as follows:
where NFR denotes the number of examples including wake-up word but the WWS system gives a negative decision. NFA is the number of examples without wake-up word but the WWS system gives a positive decision. The final score of KWS is defined as:
FRR and FAR are calculated on all samples in evaluation set and the final rank is ScoreWWS. The system has lower ScoreWWS will be ranked higher. For training and development sets, we will prepare the scoring script which will be released together with the baseline. For the evaluation test set, the participant should submit a text file to the Kaggle platform which contains classification results for all utterances. And ScoreWWS will be calculated and updated in the leaderboard. Each line of the file should be in the form as < Utterance ID > < 0 or 1 >. Utterance IDs will be provided by the official. ‘0’ and ‘1’ representthe prediction results of the current sample.
The use of any external audio data that is not provided by organizers (except for RIR) is strictly prohibited. But use of external slient video data is allowed in the pretrained model training under the following conditions:
It is allowed to use the development set to train the WWS model. The exploration of different data augmentation and simulation methods are encouraged to allow the participants to train their models better. In addition, you can use the following annotations for training, development:
Manual modification of the data is forbidden. All parameters should be tuned on the training set or the development set. Modifications of the development set are allowed, provided that its size remains unchanged and these modifications do not induce the risk of inadvertently biasing the development set toward the particular speakers or acoustic conditions in the evaluation set. For instance, enhancing the signals, applying “unbiased” transformations or automatically refining the utterance start and end times is allowed. Augmenting the development set by applying biased signal transformations (e.g., systematically increasing intensity/pitch), or selecting a subset of the development set is forbidden. In case of doubt, please ask us ahead of the submission deadline.
There is also no limitation on WWS model structure and model training technology used by participants. Again, you are entirely free in the development of your system. In particular, you can:
For every tested system, you should report ScoreWWS in both development and evaluation sets.