Description
There is often a phenomenon of mixed English words in Chinese context in our daily communication and it is called Code-switch. Code-switch is a common language phenomenon, but it is one of the most difficult challenges faced by current automatic speech recognition technology. The difficulties of speech recognition among multiple languages are mainly manifested as below: Owning to the mother tongue accent, the mixing words often have serious non-native accent phenomenon; the differences between the phoneme components of different languages brings great difficulty to the hybrid acoustic modeling; and the mixed audio training data with annotations is extremely rare in the market. Combining the above difficulties, this competition builds three tracks for all the participants.
- Track1: traditional speech recognition with fixed language model;
- Track2: traditional speech recognition with open language model;
- Track3: end-to-end speech recognition.
All the data of this competition will be provided by Data tang.
Data Introduction
Data tang will provide [500 Hours Mandarin Speech Data] and [200 Hours of Chinese & English Mixed Speaking Speech Data]. English only data will use data from Librispeech. Participants are only allowed to use these speech data for model training, system setup and data augmentation. This competition does not support to use any other data.
Specification of [500 Hours Mandarin Speech Data]
- Data Size: 500 Hours
- Format: 16 kHz 16bit,wav,Mono-Channel
- Recording Environment:
- Quiet indoor environment, including background noise that does not affect speech recognition
- Recording Content: Daily speaking sentences
- Population: Male and female equal distribution; 23% are aged younger than 20, 70% are aged from 21~30, 4% are aged from 31~40, 3% are older than 40; covering 33 provinces including Guangdong, Fujian, Shandong, Jiangsu, Beijing, Hunan, Jiangxi, etc.
- Device: Android:iOS=9:1
- Language: Mandarin, Mandarin with accent.
- Application Scenario: speech recognition; machine translation; voiceprint recognition
- Annotation accuracy rate: Above 97%
Specification of [200 Hours of Chinese & English Mixed Speaking Speech Data]
- Data Size: 200 Hours
- Data Format: 16kHz,16bit,mono-channel,non-compression wav
- Recording Environment: Relatively quiet indoor environment without echo sound
- Recording Content: Daily speaking sentences, interaction category
- Population: Male and female equal distribution; 67% are aged younger than 25, 25% are aged from 26~40, 7% are aged older 40, covering seven China dialect areas, Northern Mandarin area, Wuyu area, Cantonese area, Yiyu area, Xiangyu area, Ganyu area.
- Device: Android, iOS
- Language: Mandarin
- Application Scenario: Speech recognition; machine translation; voiceprint recognition
- Annotation accuracy rate: Above 97%
Awards
Each Track has first, second and third prize, only one person will be honored for each prize.
- First Prize: 5000RMB
- Second Prize: 3000RMB
- Third Prize: 2000RMB
Notifications
- All the prize amounts mentioned above are tax included.
- Award assessment requires participants to provide materials and team members as required
Contest Steering Committee
- Xielei, Northwestern Polytechnical University
- Jialei, Baidu Voice Technology Department
- Chenwei, Sogou Voice Interaction Technology Center
- Zhang Shiliang, Alibaba Dharma
- Wangdong, Tsinghua University
- Hong Qingyang, Xiamen University
- Qian Yanzhen, Shanghai Jiao Tong University
- Xu Haihua, Nanyang Technological University
- Feng Qiangze, Datatang(Beijing) Technology Co., Ltd
- Wang Daliang, Datatang(Beijing) Technology Co., Ltd
Evaluation & Ranking
Track1 Result:
排名 | 团队名称 | CER in CH/WER in EN | MER in ALL |
第一名 | MobvoiASR | 4.04% 12.33% | 4.94% |
第二名 | Qdreamer | 3.85% 14.88% | 5.05% |
第三名 | 西南小宇宙 | 4.05% 15.43% | 5.28% |
第四名 | 苏州上下文人工智能技术研发团队 | 4.61% 14.33% | 5.66% |
第五名 | 京蓉语音 | 4.60% 15.06% | 5.74% |
第六名 | vivo语音识别 | 4.50% 16.63% | 5.81% |
第七名 | I2R | 4.95% 14.32% | 5.97% |
第八名 | Paopao | 5.22% 14.14% | 6.19% |
第九名 | SCUT-ASR | 5.43% 15.99% | 6.57% |
第十名 | royalflush | 5.18% 18.37% | 6.61% |
Track2 Result:
排名 | 团队名称 | CER in CH/WER in EN | MER in ALL |
第一名 | MobvoiASR | 3.74% 12.76% | 4.72% |
第二名 | Qdreamer | 3.89% 19.95% | 5.64% |
第三名 | 京蓉语音 | 4.56% 16.00% | 5.80% |
第四名 | royalflush | 4.50% 17.29% | 5.88% |
第五名 | vivo语音识别 | 4.57% 16.96% | 5.91% |
第六名 | I2R | 5.16% 15.22% | 6.25% |
第七名 | Aisg-xju | 5.77% 13.90% | 6.65% |
第八名 | NUS-HLT | 5.58% 15.85% | 6.69% |
第九名 | SCUT-ASR | 5.88% 13.75% | 6.74% |
第十名 | xmuspeech | 5.68% 17.85% | 7.01% |
Track3 Result:
排名 | 团队名称 | CER in CH/WER in EN | MER in ALL |
第一名 | WYHZ | 4.33% 18.95% | 5.91% |
第二名 | SJTU SpeechLab | 6.93% 24.35% | 8.82% |
第三名 | royalflush | 7.49% 21.40% | 9.00% |
第四名 | code-switcher | 7.38% 25.69% | 9.37% |
第五名 | ZFZ | 8.49% 24.56% | 10.24% |
第六名 | Qdreamer | 8.23% 33.32% | 10.96% |
第七名 | vivo语音识别 | 9.05% 32.21% | 11.57% |
第八名 | UVoice | 8.94% 41.60% | 12.48% |
第九名 | xmuspeech | 9.59% 37.20% | 12.59% |
第十名 | Aisg-xju | 10.44% 31.60% | 12.74% |