About
I am a Machine Learning Researcher at Apple MLR, specializing in Natural Language Processing (NLP) and Speech Technologies.
I completed my PhD in Computer Science at the University of Waterloo, with a dissertation titled “Novel Methods for Natural Language Modeling and Pretraining”, under the supervision of Professor Ming Li. Prior to my doctoral studies, I obtained a Master’s degree from the Chinese Academy of Sciences, where I began my NLP research journey under the mentorship of Professor Chengqing Zong.
Academic Service
- Area Chair for ACL Rolling Review
- Workshop Organizer: Embodied AI Workshop at CVPR (2024, 2025), WideningNLP at EMNLP 2025, VLM4RWD at NeurIPS 2025
- Outstanding Reviewer Award at ICML 2022
- Reviewer for ACL, EMNLP, ICML, NeurIPS, ICLR, and COLM
Research Interest
My research spans a broad and dynamic range of Natural Language Processing domains, including: language model pretraining, text-speech joint modeling, sentence representation learning, text summarization, machine translation, spoken language understanding, multilingual NLP, and information retrieval.
Currently, I am deeply engaged in advancing audio and text sequence modeling, developing innovative approaches that push the boundaries of computational linguistics and machine learning.
Selected Publications
Santiago Cuervo, Skyler Seto, Maureen de Seyssel, He Bai, Zijin Gu, Tatiana Likhomanenko, Navdeep Jaitly, Zakaria Aldeneh. Closing the Gap Between Text and Speech Understanding in LLMs. ICLR 2026.
He Bai, Zijin Gu, Tatiana Likhomanenko, Navdeep Jaitly. SpeakStream: Streaming Text-to-Speech with Interleaved Data. Preprint, 2025.
Tatiana Likhomanenko*, Luke Carlson*, He Bai*, Zijin Gu*, Han Tran*, Zakaria Aldeneh*, Yizhe Zhang, Ruixiang Zhang, Huangjie Zheng, Navdeep Jaitly. ChipChat: Low-Latency Cascaded Conversational Agent in MLX. IEEE ASRU 2025 (Best Demo). (*equal)
Pratyush Maini, Skyler Seto, He Bai, David Grangier, Yizhe Zhang, Navdeep Jaitly. Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling. ACL 2024.
Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, V.G. Vydiswaran, Navdeep Jaitly, Yizhe Zhang. Divide-or-Conquer? Which Part Should You Distill Your LLM? EMNLP 2024.
Y Zhang*, H Bai*, R Zhang*, J Gu, S Zhai, J Susskind, N Jaitly. How Far Are We from Intelligent Visual Deductive Reasoning? COLM 2024 [pdf][code]. (*equal)
He Bai*, Tatiana Likhomanenko*, Ruixiang Zhang, Zijin Gu, Zakaria Aldeneh, Navdeep Jaitly. dMel: Speech Tokenization Made Simple. Preprint, 2024. [pdf] (*equal)
He Bai*, Renjie Zheng*, Junkun Chen, Mingbo Ma, Xintong Li, Liang Huang. A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing. ICML 2022 (spotlight). [pdf][code] (*equal)
He Bai, Tong Wang, Alessandro Sordoni, Peng Shi. Better Language Model with Hypernym Class Prediction. ACL 2022. [pdf] [code]
He Bai, Peng Shi, Jimmy Lin, Luchen Tan, Kun Xiong, Wen Gao, Jie Liu, Ming Li. Semantics of the Unwritten: The Effect of End of Paragraph and Sequence Tokens on Text Generation. ACL 2021. [pdf] [code]
He Bai, Peng Shi, Jimmy Lin, Yuqing Xie, Luchen Tan, Kun Xiong, Wen Gao, Ming Li. Segatron: Segment-aware Transformer for Language Modeling and Understanding. AAAI 2021. [pdf] [code]
He Bai, Yu Zhou, Jiajun Zhang, Chengqing Zong. Memory Consolidation for Contextual Spoken Language Understanding with Dialogue Logistic Inference. ACL 2019. [pdf] [code]