MISP Challenge-Task2 Data

MISP-Meeting contains 125 hours of audio and video data in total. The dataset is divided into 119 hours for training (Train), 3 hours for development (Dev) and 3 hours as the evaluation set (Eval) for challenging scoring and ranking. Specifically, the training, development and evaluation sets contain 72, 9 and 9 sessions, respectively. There is no overlap in speakers and recording rooms among the data in each subset. Each session consists of a discussion involving 4-8 participants. The duration of these discussions varies: in the training set, each session lasts for 2 hours, while in the development and evaluation sets, sessions are 20 minutes long. Consequently, a training session encompasses multiple topic transitions. The total number of participants in the training, development, and evaluation sets is 233, 15, and 15, respectively, with balanced gender representation. All participants' professions or areas of study (for those who are students) are related to the meeting topics. This real-world relevance not only enhances the authenticity of the setting but also helps to minimize the occurrence of extended silent periods during the discussions. The ratio of the speech segment containing overlap to the entire speech segment in the training, development and evaluation sets are 57.30%, respectively.

One of the advantages of the MISP-Meeting corpus compared to other meeting corpora is the diversity of its meeting rooms. As shown in Table 1, the 23 meeting rooms are categorized into four size groups: tiny, small, medium, and large, ranging from 8.79 to 117.6 square meters. Each subset includes meeting rooms of all sizes, offering a broad spectrum of acoustic properties and layouts. The meeting rooms feature various wall materials, including cement and glass, and are equipped with furnishings such as sofas, TVs, blackboards, fans, air conditioners, and plants. Detailed parameters of each meeting venue will be released with the training data, providing a comprehensive resource for acoustic analysis.

Dataset	Train	Total
Duration (h)	119	125
Session	72	90
Room	15	23
Participant	233	263
-Male	115	130
-Female	118	133
Overlap Ratio	57.30%
Tab.1. Details of MISP-Meeting corpus

Size	Tiny	Small	Middle	Large	Total
Area (in m²)	0-15	15-35	35-60	60-∞	0-∞
Train	5	5	3	2	15
Dev
Eval
Tab.2. Statistics of meeting rooms

Downloads

This dataset is available under the specified license. Before using the corpus, please navigate to the Registration page to sign up. After registering, you will receive a download password.

download link

Multimodal Information Based Speech Processing (MISP) 2025 Challenge

Downloads