ASRU 2019 Code-Switching Challenge

Description

There is often a phenomenon of mixed English words in Chinese context in our daily communication and it is called Code-switch. Code-switch is a common language phenomenon, but it is one of the most difficult challenges faced by current automatic speech recognition technology. The difficulties of speech recognition among multiple languages are mainly manifested as below: Owning to the mother tongue accent, the mixing words often have serious non-native accent phenomenon; the differences between the phoneme components of different languages brings great difficulty to the hybrid acoustic modeling; and the mixed audio training data with annotations is extremely rare in the market. Combining the above difficulties, this competition builds three tracks for all the participants.

Track1: traditional speech recognition with fixed language model;
Track2: traditional speech recognition with open language model;
Track3: end-to-end speech recognition.

All the data of this competition will be provided by Data tang.

Data Introduction

Data tang will provide [500 Hours Mandarin Speech Data] and [200 Hours of Chinese & English Mixed Speaking Speech Data]. English only data will use data from Librispeech. Participants are only allowed to use these speech data for model training, system setup and data augmentation. This competition does not support to use any other data.

Specification of [500 Hours Mandarin Speech Data]

Data Size: 500 Hours
Format: 16 kHz 16bit，wav，Mono-Channel
Recording Environment:
Quiet indoor environment, including background noise that does not affect speech recognition
Recording Content: Daily speaking sentences
Population: Male and female equal distribution; 23% are aged younger than 20, 70% are aged from 21~30, 4% are aged from 31~40, 3% are older than 40; covering 33 provinces including Guangdong, Fujian, Shandong, Jiangsu, Beijing, Hunan, Jiangxi, etc.
Device: Android:iOS=9:1
Language: Mandarin, Mandarin with accent.
Application Scenario: speech recognition; machine translation; voiceprint recognition
Annotation accuracy rate: Above 97%

Specification of [200 Hours of Chinese & English Mixed Speaking Speech Data]

Data Size: 200 Hours
Data Format: 16kHz，16bit，mono-channel，non-compression wav
Recording Environment: Relatively quiet indoor environment without echo sound
Recording Content: Daily speaking sentences, interaction category
Population: Male and female equal distribution; 67% are aged younger than 25, 25% are aged from 26~40, 7% are aged older 40, covering seven China dialect areas, Northern Mandarin area, Wuyu area, Cantonese area, Yiyu area, Xiangyu area, Ganyu area.
Device: Android, iOS
Language: Mandarin
Application Scenario: Speech recognition; machine translation; voiceprint recognition
Annotation accuracy rate: Above 97%

Awards

Each Track has first, second and third prize, only one person will be honored for each prize.

First Prize: 5000RMB
Second Prize: 3000RMB
Third Prize: 2000RMB

Notifications

All the prize amounts mentioned above are tax included.
Award assessment requires participants to provide materials and team members as required

Contest Steering Committee

Xielei, Northwestern Polytechnical University
Jialei, Baidu Voice Technology Department
Chenwei, Sogou Voice Interaction Technology Center
Zhang Shiliang, Alibaba Dharma
Wangdong, Tsinghua University
Hong Qingyang, Xiamen University
Qian Yanzhen, Shanghai Jiao Tong University
Xu Haihua, Nanyang Technological University
Feng Qiangze, Datatang（Beijing） Technology Co., Ltd
Wang Daliang, Datatang（Beijing） Technology Co., Ltd

Evaluation & Ranking

Track1 Result:

排名	团队名称	CER in CH/WER in EN	MER in ALL
第一名	MobvoiASR	4.04% 12.33%	4.94%
第二名	Qdreamer	3.85% 14.88%	5.05%
第三名	西南小宇宙	4.05% 15.43%	5.28%
第四名	苏州上下文人工智能技术研发团队	4.61% 14.33%	5.66%
第五名	京蓉语音	4.60% 15.06%	5.74%
第六名	vivo语音识别	4.50% 16.63%	5.81%
第七名	I2R	4.95% 14.32%	5.97%
第八名	Paopao	5.22% 14.14%	6.19%
第九名	SCUT-ASR	5.43% 15.99%	6.57%
第十名	royalflush	5.18% 18.37%	6.61%

Track2 Result:

排名	团队名称	CER in CH/WER in EN	MER in ALL
第一名	MobvoiASR	3.74% 12.76%	4.72%
第二名	Qdreamer	3.89% 19.95%	5.64%
第三名	京蓉语音	4.56% 16.00%	5.80%
第四名	royalflush	4.50% 17.29%	5.88%
第五名	vivo语音识别	4.57% 16.96%	5.91%
第六名	I2R	5.16% 15.22%	6.25%
第七名	Aisg-xju	5.77% 13.90%	6.65%
第八名	NUS-HLT	5.58% 15.85%	6.69%
第九名	SCUT-ASR	5.88% 13.75%	6.74%
第十名	xmuspeech	5.68% 17.85%	7.01%

Track3 Result:

排名	团队名称	CER in CH/WER in EN	MER in ALL
第一名	WYHZ	4.33% 18.95%	5.91%
第二名	SJTU SpeechLab	6.93% 24.35%	8.82%
第三名	royalflush	7.49% 21.40%	9.00%
第四名	code-switcher	7.38% 25.69%	9.37%
第五名	ZFZ	8.49% 24.56%	10.24%
第六名	Qdreamer	8.23% 33.32%	10.96%
第七名	vivo语音识别	9.05% 32.21%	11.57%
第八名	UVoice	8.94% 41.60%	12.48%
第九名	xmuspeech	9.59% 37.20%	12.59%
第十名	Aisg-xju	10.44% 31.60%	12.74%