MeetCLR contains 125 hours of video data in total. The dataset is divided into 119 hours for training (Train), 3 hours for development (Dev) and 3 hours as the evaluation set (Eval) for challenging scoring and ranking. Specifically, the training, development and evaluation sets contain 72, 9 and 9 sessions, respectively. There is no overlap in speakers and recording rooms among the data in each subset. Each session consists of a discussion involving 4-8 participants. The duration of these discussions varies: in the training set, each session lasts for 2 hours, while in the development and evaluation sets, sessions are 20 minutes long. Consequently, a training session encompasses multiple topic transitions. The total number of participants in the training, development, and evaluation sets is 233, 15, and 15, respectively, with balanced gender representation. All participants' professions or areas of study (for those who are students) are related to the meeting topics. This real-world relevance not only enhances the authenticity of the setting but also helps to minimize the occurrence of extended silent periods during the discussions.

Dataset Train Dev Eval Total
Duration (h) 119 125
Session 72 90
Room 15 23
Participant 233 263
-Male 115 130
-Female 118 133
Tab.1. Details of MeetCLR corpus
Size Tiny Small Middle Large Total
Area (in m2) 0-15 15-35 35-60 60-∞ 0-∞
Train 5 5 3 2 15
Dev
Eval
Tab.2. Statistics of meeting rooms

Downloads

This dataset is available under the specified license. Before using the corpus, please navigate to the Registration page to sign up. After registering, you will receive a download password.

download link