News

The MISP 2025 challenge has been accepted by Interspeech 2025 Grand Challenge!

Introduction

In recent years, the proliferation of speech-enabled applications has led to increasingly complex usage scenarios, such as home environments and meetings. Previous multimodal information-based speech processing (MISP) challenges in 2021, 2022, and 2023 targeted the home scenario, where several people converse in Chinese while watching TV in a living room. A large-scale audio-visual Chinese home conversational corpus was released to support multiple audio-visual speech processing tasks, including wake word spotting, target speaker extraction, speaker diarization, and speech recognition. These MISP challenges have attracted extensive participation with over 150 teams downloading the dataset, more than 60 teams actively submitting their results, and 15 research papers presented at ICASSP in 2022, 2023, and 2024.

Meetings are among the most valuable yet challenging contexts for speech applications due to their rich information exchange and decision-making processes. Accurate transcription and analysis are crucial for enhancing productivity and preserving insights, but this task is difficult due to varied speech styles and complex acoustic conditions. Current state-of-the-art audio-only techniques are hitting performance plateaus. For example, in the AliMeeting scenarios, the best performances achieved a character error rate (CER) of approximately 20%, which is inadequate for many real-world applications. The McGurk effect and subsequent studies have shown that visual cues can improve speech perception in noisy environments. Thus, the MISP 2025 challenge aims to advance meeting transcription techniques by incorporating multimodal information, such as video. The specific tasks are as follows:

1) Audio-Visual Speaker Diarization.

2) Audio-Visual Speech Recognition.

3) Audio-Visual Diarization and Recognition.

The following resources will be provided:

  • A large-scale audio-visual meeting corpus, MISP-Meeting, and a comprehensive baseline system.
  • A public benchmark for fair comparisons
  • A challenge session to foster communication.
  • A summary publication highlighting the most effective techniques and promising directions.

Planned Schedule(AOE Time)

  • Registration opens and training set release: November 17, 2024
  • Development set release: November 20, 2024
  • Baseline release: November 24, 2024
  • Registration closes, evaluation set release and leaderboard update for evaluation set: January 3, 2025
  • Leaderboard freeze: January 24, 2025
  • System report submission: February 10, 2025
  • Final paper submission: February 12, 2025

Organizers


Jun Du

University of Science and Technology of China


Chin-Hui LEE

Georgia Institute of Technology


Jingdong Chen

Northwestern Polytechnical University


Shinji Watanabe

Carnegie Mellon University


Sabato Marco Siniscalchi

University of Palermo


Odette Scharenborg

Delft University of Technology


Hang Chen

University of Science and Technology of China

Contact Us

For additional information, please email us at mispchallenge@gmail.com.