Accepted Papers

  • Parnia Bahar, Tobias Bieschke, Hermann Ney, “A comparative study on end-to-end speech to text translation”
  • Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang, “A comparative study on transformer vs RNN in speech applications”
  • Chung-Cheng Chiu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Patrick Nguyen, Hagen Soltau, Arun Narayanan, Hank Liao, Shuyuan Zhang, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara Sainath, Yonghui Wu, “A comparison of end-to-end models for long-form speech recognition”
  • Albert Zeyer, Parnia Bahar, Kazuki Irie, Ralf Schlüter, Hermann Ney, “A comparison of transformer and LSTM encoder decoder models for ASR”
  • Rosanna Milner, Md Asif Jalal, Raymond W. M. Ng, Thomas Hain, “A cross-corpus study on speech emotion recognition”
  • Erik McDermott, Hasim Sak, Ehsan Variani, “A density ratio approach to language model fusion in end-to-end automatic speech recognition”
  • Jiayi Fu, Kuang Ru, “A dropout-based single model committee approach for active learning in ASR”
  • Yi Zhou, Xiaohai Tian, Emre Yilmaz, Rohan Kumar Das, Haizhou Li, “A modularized neural network with language-specific output layers for cross-lingual voice conversion”
  • Hossein Zeinali, Lukas Burget, Jan Cernocky, “A multi purpose and large scale speech corpus in Persian and English for speaker and speech Recognition: the DeepMine database”
  • Shuo-Yiin Chang, Bo Li, Gabor Simko, “A unified endpointer using multitask and multidomain training”
  • Joachim Fainberg, Ondrej Klejch, Erfan Loweimi, Peter Bell, Steve Renals, “Acoustic model adaptation from raw waveforms with SINCNET”
  • Chao-Wei Huang, Yun-Nung Chen, “Adapting pretrained transformer to lattices for spoken language understanding”
  • Myunghun Jung, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Hoirin Kim, “Additional shared decoder on Siamese multi-view encoders for learning acoustic word embeddings”
  • Takuya Yoshioka, Igor Abramovski, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan Zhou, “Advances in online audio-visual meeting transcription”
  • Songxiang Liu, Haibin Wu, Hung-yi Lee, Helen Meng, “Adversarial attacks on spoofing countermeasures of automatic speaker verification”
  • Catalin Zorila, Christoph Boeddeker, Rama Doddipatla, Reinhold Haeb-Umbach, “An investigation into the effectiveness of enhancement in ASR training and test for Chime-5 dinner party transcription”
  • Tirusha Mandava, Ravi Kumar Vuddagiri, Hari Krishna Vydana, Anil Kumar Vuppala, “An investigation of LSTM-CTC based joint acoustic model for Indian language identification”
  • Salar Jafarlou Jafarlou, Soheil Khorram, vinay Kothapally, John Hansen, “Analyzing large receptive field convolutional networks for distant speech recognition”
  • Kwangyoun Kim, Kyungmin Lee, Dhananjaya Gowda, Junmo Park, Sungsoo Kim, Sichen Jin, Young-Yoon Lee, Jinsu Yeo, Daehyun Kim, Seokyeong Jung, Jungin Lee, Myoungji Han, Chanwoo Kim, “Attention based on-device streaming speech recognition with large speech corpus”
  • Osamu Segawa, Tomoki Hayashi, Kazuya Takeda, “Attention-based speech recognition using gaze information”
  • Jen-Tzung Chien, Chun-Lin Kuo, “Bayesian adversarial learning for speaker recognition”
  • Hieu-Thi Luong, Junichi Yamagishi, “Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech”
  • Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong, “Character-aware attention-based end-to-end speech recognition”
  • Tianyan Zhou, Yong Zhao, Jinyu Li, Yifan Gong, Jian Wu, “CNN with phonetic attention for text-independent speaker verification”
  • Xiaolian Zhu, Shan Yang, Geng Yang, Lei Xie, “Controlling emotion strength with relative attribute for end-to-end speech synthesis”
  • Tohru Nagano, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, “Data augmentation based on vowel stretch for improving children’s speech recognition”
  • Daniel Kopev, Ahmed Ali, Ivan Koychev, Preslav Nakov, “Detecting deception in political debates using acoustic and textual features”
  • Joao Monteiro, Jahangir Alam, ” Development of voice spoofing detection systems for 2019 edition of automatic speaker verification and countermeasures challenge”,
  • Yu-An Wang, Yun-Nung Chen, “Dialogue environments are different from games: investigating variants of deep q-networks for dialogue policy”
  • Zhong Meng, Jinyu Li, Yashesh Gaur, Yifan Gong, “Domain adaptation via teacher-student learning for end-to-end speech recognition”,
  • Shahram Ghorbani, Soheil Khorram, John H.L Hansen, “Domain expansion in dnn-based acoustic models for robust speech recognition”
  • Andreas Stolcke, Takuya Yoshioka, “Dover: a method for combining diarization outputs”
  • Tomohiro Tanaka, Takahiro Shinozaki, “Efficient free keyword detection based on cnn and end-to-end continuous dp-matching”
  • Eunah Cho, He Xie, John Lalor, Varun Kumar, William M. Campbell, “Efficient semi-supervised learning for natural language understanding by optimizing diversity”
  • Joanna Rownicka, Peter Bell, Steve Renals, “Embeddings for dnn speaker adaptive training”
  • Chirag Singh, Abhay Kumar, Ajay Nagar, Suraj Tripathi, Promod Yenigalla, “Emoception: an inception inspired efficient speech emotion recognition network”
  • Xianghu Yue, Grandee Lee, Emre Yilmaz, Fang Deng, Haizhou Li, “End-to-end code-switching asr for low-resourced language pairs”
  • Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe, “End-to-end neural speaker diarization with self-attention”
  • Wangyou Zhang, Man Sun, Lan Wang, Yanmin Qian, “End-to-end overlapped speech detection and speaker counting with raw waveform”
  • Chanwoo Kim, Sungsoo Kim, Kwangyoun Kim, Mehul Kumar, Jiyeon Kim, Kyungmin Lee, Changwoo Han, Abhinav Garg, Eunhyang Kim, Minkyoo Shin, Shatrughan Singh, Larry Heck, Dhananjaya Gowda, “End-to-end training of a large vocabulary end-to-end speech recognition system”
  • Hsiao-Yun Lin, Tien-Hong Lo, Berlin Chen, “Enhanced bert-based ranking models for spoken document retrieval”
  • Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur, “Espresso: a fast end-to-end neural speech recognition toolkit”
  • Jennifer Drexler, James Glass, “Explicit alignment of text and speech encodings for attention-based end-to-end speech recognition”
  • Chien-Lin Huang, “Exploring effective data augmentation with tdnn-lstm neural network embedding for speaker recognition”
  • Mingkun Huang, YiZhou Lu, Lan Wang, Yanmin Qian, Kai Yu, “Exploring model units and training strategies for end-to-end speech recognition”
  • Yi Luo, Cong Han, Nima Mesgarani, Enea Ceolini, Shih-Chii Liu, “Fasnet: low-latency adaptive beamforming for multi-microphone audio processing”
  • Duc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer, “From senones to chenones: tied context-dependent graphemes for hybrid speech recognition”
  • Peiyao Sheng, Zhuolin Yang, Yanmin Qian, “Gans for children: a generative data augmentation strategy for children speech recognition”
  • Ryo Masumura, Mana Ihori, Tomohiro Tanaka, Itsumi Saito, Kyosuke Nishida, Takanobu Oba, “Generalized large-context language models based on forward-backward hierarchical recurrent encoder-decoder models”
  • Raghavendra Pappagari, Piotr Zelasko, Jesus Villalba, Yishay Carmiel, Najim Dehak, “Hierarchical transformers for long document classification”
  • Rao Ma, Qi Liu, Kai Yu, “Highly efficient neural network language model compression using soft binarization training”
  • Abhinav Garg, Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim, “Improved multi-stage training of online attention-based encoder-decoder models”
  • Lorenz Diener, Tejas Umesh, Tanja Schultz, “Improving fundamental frequency generation in emg-to-speech conversion using a quantization approach”
  • Abhishek Niranjan, Mahaboob Ali Basha Shaik, “Improving grapheme-to-phoneme conversion by investigating copying mechanism in recurrent architectures”
  • Fengyu Yang, Shan Yang, Pengcheng Zhu, Pengju Yan, Lei Xie, “Improving mandarin end-to-end speech synthesis by self-attention and learnable gaussian bias”
  • Jinyu Li, Rui Zhao, Hu Hu, Yifan Gong, “Improving rnn transducer modeling for end-to-end speech recognition”
  • Bo Wu, Meng Yu, Lianwu Chen, Mingjie Jin, Dan Su, Dong Yu, “Improving speech enhancement with phonetic embedding features”
  • Ryo Masumura, Mana Ihori, Tomohiro Tanaka, Atsushi Ando, Ryo Ishii, Takanobu Oba, Ryuichiro Higashinaka, “Improving speech-based end-of-turn detection via cross-modal representation learning with punctuated text data”
  • Tsun-Yat Leung, Lahiru Samarakoon, Albert Y.S. Lam, “Incorporating prior knowledge into speaker diarization and linking for identifying common speaker”
  • Zhehuai Chen, Mahsa Yarmohammadi, Hainan Xu, Hang Lv, LEI XIE, daniel povey, Sanjeev Khudanpur, “Incremental lattice determinization for wfst decoders”
  • Qiujia Li, Chao Zhang, Phil Woodland, “Integrating source-channel model with attention-based sequence-to-sequence models for speech recognition”
  • Joana Correia, Isabel Trancoso, Bhiksha Raj, “In-the-wild end-to-end detection of speech affecting diseases”
  • Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda, “Investigation of shallow wavenet vocoder with laplacian distribution output”
  • Mahesh Kumar Chelimilla, Shashi Kumar, Shakti P. Rath, “Joint distribution learning in the framework of variational autoencoders for far-field speech enhancement”
  • Jiewen Wu, Luis Fernando D’Haro, Nancy F. Chen, Pavitra Krishnaswamy, Rafael Banchs, “Joint learning of word and label embeddings for sequence labelling in spoken language understanding ”
  • Zhiming Wang, Kaisheng Yao, Shuo Fang, Xiaolong Li, “Joint optimization of classification and clustering for deep speaker embedding”
  • Hao Sun, Xu Tan, Jun-Wei Gan, Sheng Zhao, Dongxu Han, Hongzhi Liu, Tao Qin, Tie-Yan Liu, “Knowledge distillation from bert in pre-training and fine-tuning for polyphone disambiguation”
  • Surabhi Punjabi, Harish Arsikere, Sri Garimella, “Language model bootstrapping using neural machine translation for conversational speech recognition”
  • Kin Wai Cheuk, Balamurali B T, Gemma Roig, Dorien Herremans, “Latent space representation for multi-target speaker detection and identification with a sparse dataset using triplet neural networks”
  • Adrien Dufraux, Emmanuel Vincent, Awni Hannun, Armelle Brun, Matthijs Douze, “Lead2gold: towards exploiting the full potential of noisy transcriptions for speech recognition”
  • Jeremy Heng Meng Wong, Mark John Francis Gales, Yu Wang, “Learning between different teacher and student models in ASR”
  • Xiaochun An, Yuxuan Wang, Shan Yang, Zejun Ma, Lei Xie, “Learning hierarchical representations for expressive speaking style in end-to-end speech synthesis”
  • Austin Waters, Neeraj Gaur, Parisa Haghani, Pedro Moreno, Zhongdi Qu, “Leveraging language id in multilingual end-to-end speech recognition”
  • Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, “Listening while speaking and visualizing: improving asr through multimodal chain”
  • Junyi Peng, Rongzhi Gu, Yuexian Zou, “Logistic similarity metric learning via affinity matrix for text-independent speaker verification”
  • Rohan Kumar Das, Jichen Yang, Haizhou Li, “Long range acoustic and deep features perspective on asvspoof 2019”
  • Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Najim Dehak, “Low-resource domain adaptation for speaker recognition using cycle-gans”
  • Jen-Tzung Chien, Che-Yu Kuo, “Markov recurrent neural network language model”
  • Ahmed Ali, Suwon Shon, Younes Samih, Hamdy Mubarak, Ahmed Abdelali, James Glass, Steve Renals, Khalid Choukri, “Mgb-5: arabic dialect identification across 17 dialects and moroccan speech recognition”
  • Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan LE ROUX, Shinji Watanabe, “Mimo-speech: end-to-end multi-channel multi-speaker speech recognition”
  • Takashi Fukuda, Samuel Thomas, “Mixed bandwidth acoustic modeling leveraging knowledge distillation”
  • Anshuman Tripathi, Han Lu, Hasim Sak, Hagen Soltau, “Monotonic recurrent neural network transducer and decoding strategies”
  • Dhananjay Ram, Lesly Miculicich, Hervé Bourlard, “Multilingual bottleneck features for query by example spoken term detection”
  • Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe, “Multilingual end-to-end speech translation”
  • Rutuja Ubale, Vikram Ramanarayanan, Yao Qian, Keelan Evanini, Chee Wee Leong, Chong Min Lee, “Native language identification from raw waveforms using deep convolutional neural networks with attentive pooling”
  • Takatomo Kano, Sakriani Sakti, Satoshi Nakamura, “Neural machine translation with acoustic embedding”
  • Rajul Acharya, Hemant A. Patil, Harsh Kotta, “Novel enhanced teager energy based cepstral coefficients for replay spoof detection”
  • Timo Lohrenz, Maximilian Strake, Tim Fingscheidt, “On temporal context information for hybrid blstm-based phoneme recognition”
  • Berrak Sisman, Mingyang Zhang, Minghui Dong, Haizhou Li, “On the study of generative adversarial networks for cross-lingual voice conversion”
  • Mattia Antonino Di Gangi, Matteo Negri, Marco Turchi, “One-to-many multilingual end-to-end speech translation”
  • Franco Mana, Felix Weninger, Roberto Gemello, Puming Zhan, “Online batch normalization adaptation for automatic speech recognition”
  • Hira Dhamyal, Tianyan Zhou, Bhiksha Raj, Rita Singh, “Optimizing neural network embeddings using pair-wise loss for text-independent speaker verification”
  • Mingu Lee, Jinkyu Lee, Hye Jin Jang, Byeonggeun Kim, Wonil Chang, Kyuwoong Hwang, “Orthogonality constrained multi-head attention for keyword spotting”
  • Lohith Ravuru, Hyungtak Choi, Siddarth K.M., Hojung Lee, Inchul Hwang, “Paraphrase generation based on vae and pointer-generator networks”
  • Khe Chai Sim, Francoise Beaufays, Arnaud Benard, Dhruv Guliani, Andreas Kabel, Nikhil Khare, Tamar Lucassen, Petr Zadrazil, Harry Zhang, Leif Johnson, Giovanni Motta, Lillian Zhou, “Personalization of end-to-end speech recognition on mobile devices for name entities”
  • Chanwoo Kim, Mehul Kumar, Kwangyoun Kim, Dhananjaya Gowda, “Power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition”
  • Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur, “Probing the information encoded in x-vectors”
  • Byeonggeun Kim, Mingu Lee, Jinkyu Lee, Yeonseok Kim, Kyuwoong Hwang, “Query-by-example on-device keyword spotting”
  • Hany Ahmed, Hazem Mamdouh, Salah Ashraf, Ali Ramadan, Mohsen Rashwan, “Rdi-cu system for the 2019 arabic multi-genre broadcast challenge”
  • Arun Narayanan, Rohit Prabhavalkar, Chung-Cheng Chiu, David Rybach, Tara Sainath, Trevor Strohman, “Recognizing long-form speech using streaming end-to-end models”
  • Takaki Makino, Hank Liao, Yannis Assael, Brendan Shillingford, Basilio Garcia, Otavio Braga, Olivier Siohan, “Recurrent neural network transducer for audio-visual speech recognition”
  • Fotios Lygerakis, Vassilios Diakoloulas, Michail Lagoudakis, Kotti Margarita, “Robust belief state space representation for statistical dialogue managers using deep autoencoders”
  • Vevake Balaraman, Bernardo Magnini, “Scalable neural dialogue state tracking ”
  • Kiran Praveen, Anshul Gupta, Akshara Soman, Sriram Ganapathy, “Second language transfer learning in humans and machines using image supervision”
  • Youngmoon Jung, Yeunju Choi, Hoirin Kim, “Self-adaptive soft voice activity detection using deep neural networks for robust speaker verification”
  • Yinghui Huang, Samuel Thomas, Masayuki Suzuki, Zoltan Tuske, Larry Sansone, Michael Picheny, “Semi-supervised training and data augmentation for adaptation of automatic broadcast news captioning systems”
  • Jee-weon Jung, Hee-Soo Heo, Hye-jin Shim, Ha-Jin Yu, “Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings”
  • Lukas Lee, Jinhwan Park, Wonyong Sung, “Simple gated convent for small footprint acoustic modeling”
  • George Saon, Zoltan Tuske, Kartik Audhkhasi, Brian Kingsbury, Michael Picheny, Samuel Thomas, “Simplified lstms for speech recognition”
  • Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe, “Simultaneous speech recognition and speaker diarization for monaural dialogue recordings with target-speaker acoustic models”
  • Thierry Desot, François Portet, Michel Vacher, “Slu for voice command in smart home: comparison of pipeline and end-to-end approaches”
  • Xi Chen, Shouyi Yin, Dandan Song, Peng Ouyang, Leibo Liu, Shaojun Wei, “Small-footprint keyword spotting with graph convolutional network”
  • Md Asif Jalal, Roger K Moore, Thomas Hain, “Spatio-temporal context modelling for speech emotion classification”
  • Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals, “Speaker adaptive training using model agnostic meta-learning”
  • Shubham Bansal, Karan Malhotra, Sriram Ganapathy, “Speaker and language aware training for end-to-end asr”
  • Ladislav Mošner, Oldřich Plchot, Johan Rohdin, Lukáš Burget, Jan Černocký, “Speaker verification with application-aware beamforming”
  • Zhiyun Fan, Jie Li, Shiyu Zhou, Bo Xu, “Speaker-aware speech-transformer”
  • Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Ye Jia, Pedro Moreno, Yonghui Wu, Zelin Wu, “Speech recognition with augmented synthesized speech”
  • Jochen Weiner, Claudia Frankenberg, Johannes Schröder, Tanja Schultz, “Speech reveals future risk of developing dementia: predictive dementia screening from biographic interviews”
  • Peidong Wang, Zhuo Chen, Xiong Xiao, Zhong Meng, Takuya Yoshioka, Tianyan Zhou, Liang Lu, Jinyu Li, ” Speech separation using speaker inventory”
  • Andros Tjandra, Sakriani Sakti, SATOSHI NAKAMURA,”Speech-to-speech translation between untranscribed unknown languages”
  • Tuomas Kaseva, Aku Rouhe, Mikko Kurimo, “Spherediar: an effective speaker diarization system for meeting data”
  • Muralikrishna H, Pulkit Sapra, Anuksha Jain, Dileep Aroor Dinesh, “Spoken language identification using bidirectional lstm based lid sequential senones”
  • Shang-Bao Luo, Hung-Shin Lee, Kuan-Yu Chen, Hsin-Min Wang, “Spoken multiple-choice question answering using multimodal convolutional neural networks”
  • Mari Ganesh Kumar, Suvidha Rupesh Kumar, Saranya M S, Bharathi B, Hema A Murthy, “Spoof detection using time-delay shallow neural network and feature switching”
  • Kyu Han, Ramon Prieto, Tao Ma, “State-of-the-art speech recognition using multi-stream self-attention with dilated 1d convolutions”
  • Niko Moritz, Takaaki Hori, Jonathan Le Roux, “Streaming end-to-end speech recognition with joint ctc-attention based models”
  • Junyi Peng, Yuexian Zou, Na Li, Deyi Tuo, Dan Su, Meng Yu, Chunlei Zhang, Dong Yu, “Syllable-dependent discriminative learning for small footprint text-dependent speaker verification”
  • Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, “Tacotron-based acoustic model using phoneme alignment for practical neural text-to-speech systems”
  • Jian Wu, Yong Xu, Shi-Xiong Zhang, Lianwu Chen, Meng Yu, Lei Xie, Dong Yu, “Time domain audio visual speech separation”
  • Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li, “Time-domain speaker extraction network”
  • Zhengyuan Liu, Angela Ng, Sheldon Lee, Ai Ti Aw, Nancy Chen, “Topic-aware pointer-generator networks for summarizing spoken conversations”
  • Rosa González Hautamäki, Tomi H. Kinnunen, “Towards controlling false alarm — miss trade-off in perceptual speaker comparison via non-neutral listening task framing”
  • Peter Plantinga, Eric Fosler-Lussier, “Towards real-time mispronunciation detection in kids’ speech”
  • Kazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney, “Training language models for long-span cross-sentence evaluation”
  • Qian Chen, Zhu Zhuo, Wen Wang, Qiuyun Xu, “Transfer learning for context-aware spoken language understanding”
  • Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe, “Transformer asr with contextual block processing”
  • Hardik Sailor, Salil Deena, Md Asif Jalal, Rasa Lileikyte, Thomas Hain, “Unsupervised adaptation of acoustic models for asr using utterance-level embeddings from squeeze and excitation networks”
  • Xinhao Wang, Keelan Evanini, Yao Qian, Klaus Zechner, “Using very deep convolutional neural networks to automatically detect plagiarized spoken responses”
  • Yougen Yuan, Zhiqiang Lv, Shen Huang, Lei Xie, “Verifying deep keyword spotting detection with acoustic word embeddings”
  • Xiong Wang, Sining Sun, Lei Xie, “Virtual adversarial training for ds-cnn based small-footprint keyword spotting”
  • Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li, “Wavenet factorization with singular value decomposition for voice conversion”
  • Sahoko Nakayama, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, “Zero-shot code-switching asr and tts with multilingual machine speech chain”
  • Matthew Wiesner, Oliver Adams, David Yarowsky, Jan Trmal, Sanjeev Khudanpur, “Zero-shot pronunciation lexicons for cross-language acoustic model transfer”

Demo Papers:

  • Richeng Duan, Yi Ren Leng, Siti Umairah Md Salleh, Nur Farah Ain Binte Suhaimi, Nancy F. Chen, “Towards Automatic Speech Evaluation for Multilingual Societies: Prototype System for Singapore Children Learning Malay”
  • Jiadong Wang, Zihan Pan, Jibin Wu, Malu Zhang, Qing Cai, “A Sound Tracking Robot on Bio-Inspired Algorithm”
  • Paul A. Crook, Shivani Poddar, Ankita De, Semir Shafi, David Whitney, Alborz Geramifard, Rajen Subba, “SIMMC: Situated Interactive Multi-Modal Conversational Data Collection and Evaluation Platform”
  • Chitralekha Gupta, Haizhou Li, “MUSIGPRO: Automatic Leaderboard of Singers using Reference Independent Evaluation”