ASRU 2019 Code-Switching Challenge

Description

There is often a phenomenon of mixed English words in Chinese context in our daily communication and it is called Code-switch. Code-switch is a common language phenomenon, but it is one of the most difficult challenges faced by current automatic speech recognition technology. The difficulties of speech recognition among multiple languages are mainly manifested as below: Owning to the mother tongue accent, the mixing words often have serious non-native accent phenomenon; the differences between the phoneme components of different languages brings great difficulty to the hybrid acoustic modeling; and the mixed audio training data with annotations is extremely rare in the market. Combining the above difficulties, this competition builds three tracks for all the participants.

  • Track1: traditional speech recognition with fixed language model;
  • Track2: traditional speech recognition with open language model;
  • Track3: end-to-end speech recognition.

All the data of this competition will be provided by Data tang.

Data Introduction

Data tang will provide [500 Hours Mandarin Speech Data] and [200 Hours of Chinese & English Mixed Speaking Speech Data]. English only data will use data from Librispeech. Participants are only allowed to use these speech data for model training, system setup and data augmentation. This competition does not support to use any other data.

Specification of [500 Hours Mandarin Speech Data]

  • Data Size: 500 Hours
  • Format: 16 kHz 16bit,wav,Mono-Channel
  • Recording Environment:
  • Quiet indoor environment, including background noise that does not affect speech recognition
  • Recording Content: Daily speaking sentences
  • Population: Male and female equal distribution; 23% are aged younger than 20, 70% are aged from 21~30, 4% are aged from 31~40, 3% are older than 40; covering 33 provinces including Guangdong, Fujian, Shandong, Jiangsu, Beijing, Hunan, Jiangxi, etc.
  • Device: Android:iOS=9:1
  • Language: Mandarin, Mandarin with accent.
  • Application Scenario: speech recognition; machine translation; voiceprint recognition
  • Annotation accuracy rate: Above 97%

Specification of [200 Hours of Chinese & English Mixed Speaking Speech Data]

  • Data Size: 200 Hours
  • Data Format: 16kHz,16bit,mono-channel,non-compression wav
  • Recording Environment: Relatively quiet indoor environment without echo sound
  • Recording Content: Daily speaking sentences, interaction category
  • Population: Male and female equal distribution; 67% are aged younger than 25, 25% are aged from 26~40, 7% are aged older 40, covering seven China dialect areas, Northern Mandarin area, Wuyu area, Cantonese area, Yiyu area, Xiangyu area, Ganyu area.
  • Device: Android, iOS
  • Language: Mandarin
  • Application Scenario: Speech recognition; machine translation; voiceprint recognition
  • Annotation accuracy rate: Above 97%

Awards

Each Track has first, second and third prize, only one person will be honored for each prize.

  • First Prize: 5000RMB
  • Second Prize: 3000RMB
  • Third Prize: 2000RMB

Notifications

  1. All the prize amounts mentioned above are tax included.
  2. Award assessment requires participants to provide materials and team members as required

Contest Steering Committee

  • Xielei, Northwestern Polytechnical University
  • Jialei, Baidu Voice Technology Department
  • Chenwei, Sogou Voice Interaction Technology Center
  • Zhang Shiliang, Alibaba Dharma
  • Wangdong, Tsinghua University
  • Hong Qingyang, Xiamen University
  • Qian Yanzhen, Shanghai Jiao Tong University
  • Xu Haihua, Nanyang Technological University
  • Feng Qiangze, Datatang(Beijing) Technology Co., Ltd
  • Wang Daliang, Datatang(Beijing) Technology Co., Ltd

Evaluation & Ranking

Track1 Result:

排名团队名称CER in CH/WER in ENMER in ALL
第一名MobvoiASR4.04% 12.33%4.94%
第二名Qdreamer3.85% 14.88%5.05%
第三名西南小宇宙4.05% 15.43%5.28%
第四名苏州上下文人工智能技术研发团队4.61% 14.33%5.66%
第五名京蓉语音4.60% 15.06%5.74%
第六名vivo语音识别4.50% 16.63%5.81%
第七名I2R4.95% 14.32%5.97%
第八名Paopao5.22% 14.14%6.19%
第九名SCUT-ASR5.43% 15.99%6.57%
第十名royalflush5.18% 18.37%6.61%

Track2 Result:

排名团队名称CER in CH/WER in ENMER in ALL
第一名MobvoiASR3.74% 12.76%4.72%
第二名Qdreamer3.89% 19.95%5.64%
第三名京蓉语音4.56% 16.00%5.80%
第四名royalflush4.50% 17.29%5.88%
第五名vivo语音识别4.57% 16.96%5.91%
第六名I2R5.16% 15.22%6.25%
第七名Aisg-xju5.77% 13.90%6.65%
第八名NUS-HLT5.58% 15.85%6.69%
第九名SCUT-ASR5.88% 13.75%6.74%
第十名xmuspeech5.68% 17.85%7.01%

Track3 Result:

排名团队名称CER in CH/WER in ENMER in ALL
第一名WYHZ4.33% 18.95%5.91%
第二名SJTU SpeechLab6.93% 24.35%8.82%
第三名royalflush7.49% 21.40%9.00%
第四名code-switcher7.38% 25.69%9.37%
第五名ZFZ8.49% 24.56%10.24%
第六名Qdreamer8.23% 33.32%10.96%
第七名vivo语音识别9.05% 32.21%11.57%
第八名UVoice8.94% 41.60%12.48%
第九名xmuspeech9.59% 37.20%12.59%
第十名Aisg-xju10.44% 31.60%12.74%