I am currently a Researcher at ByteDance Seed. Prior to this, I received my Ph.D. degree from the University of Science and Technology of China (USTC), where I was advised by Prof. Linli Xu. My research interests inlcude:

Unified Models for Understanding and Generation
LLM based Interactive System
Multimodal learning

Publications

2025

Heptapod: Language Modeling on Visual Signals.
Yongxin Zhu, Jiawei Chen, Yuanzhe Chen, Zhuo Chen, Dongya Jia, Jian Cong, Xiaobin Zhuang, Yuping Wang, Yuxuan Wang.
[PDF]
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer. [code]
Yongxin Zhu, Bocheng Li, Yifei Xin, Zhihua Xia, Linli Xu.
In International Conference on Computer Vision (ICCV 2025). [PDF] (Recommended by Jianlin Su) (Integrated to lucidrains’s vector-quantize-pytorch repo)
Scalable Data Synthesis through Human-like Cognitive Imitation and Data Recombination.
Zhongyi Ye, Weitai Zhang, Xinyuan Zhou, Yongxin Zhu, Ninghui Rao, Enhong Chen.
In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP-25).
Dynamic Prefix as Instructor for Incremental Named Entity Recognition: A Unified Seq2Seq Generation Framework.
Zihao Wu, YongXiang Hua, Yongxin Zhu, Fang Zhang, Linli Xu.
In Findings of the Association for Computational Linguistics: ACL 2025 (ACL-25).

2024

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective. [code]
Yongxin Zhu, Bocheng Li, Hang Zhang, Xin Li, Linli Xu, Lidong Bing.
In Proceedings of the Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24). [PDF]
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer. [code]
Yongxin Zhu, Dan Su, Liqiang He, Linli Xu, Dong Yu
In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL-24). [PDF] (Adopted by Moshi)
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction. [code]
Haoqiu Yan^#, Yongxin Zhu^#, Kai Zheng, Bing Liu, Haoyu Cao, Deqiang Jiang, Linli Xu.
In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL-24). (Oral) [PDF]
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs. [code]
Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing.
ArXiv [PDF]
Difformer: Empowering Diffusion Models on the Embedding Space for Text Generation. [code]
Zhujin Gao, Junliang Guo, Xu Tan, Yongxin Zhu, Fang Zhang, Jiang Bian, Linli Xu.
In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-24). [PDF]
Few-shot Temporal Pruning Accelerates Diffusion Models for Text Generation. [code]
Bocheng Li, Zhujin Gao, Yongxin Zhu, Kun Yin, Haoyu Cao, Deqiang Jiang, Linli Xu.
In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (COLING-24). [PDF]
Visual Hallucination Elevates Audio Speech Recognition.
Fang Zhang, Yongxin Zhu, Xiangxiang Wang, Huang Chen, Xing Sun, Linli Xu.
In Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI-24). [PDF]
Summarizing Like Human: Edit-Based Text Summarization with Keywords.
Yukang Liang, Junliang Guo, Yongxin Zhu, Linli Xu.
In 33rd International Conference on Artificial Neural Networks (ICANN-24). [PDF]

2023

DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation.
Yongxin Zhu, Zhujin Gao, Xinyuan Zhou, Zhongyi Ye, Linli Xu.
In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP-23). [PDF]
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA.
Yongxin Zhu, Zhen Liu, Yukang Liang, Xin Li, Hao Liu, Changcun Bao, Linli Xu.
In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI-23).(Oral) [PDF]
Span-level Aspect-based Sentiment Analysis via Table Filling.
Mao Zhang, Yongxin Zhu, Zhen Liu, Zhimin Bao, Yunfei Wu, Xing Sun, Linli Xu.
In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL-23). [PDF]
ItrievalKD: An Iterative Retrieval Framework Assisted with Knowledge Distillation for Noisy Text-to-Image Retrieval.
Zhen Liu, Yongxin Zhu, Zhujin Gao, Xin Sheng, Linli Xu.
In the 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-23). [PDF]

2022

Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation.
Jiquan Li, Junliang Guo, Yongxin Zhu, Xin Sheng, Deqiang Jiang, Bo Ren, Linli Xu.
In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI-22). [PDF]

Awards

2025 National Scholarship
2024 NIO Scholarship
2024 NeurIPS Scholar Award

Educations

2020.09 - 2025.11, Ph.D. in Data Science, University of Science and Technology of China, Hefei.
2016.09 - 2020.07, B.Sc. in Statistics, University of Science and Technology of China, Hefei.

Internships

2024.12 - 2025.11, Bytedance Top Seed Intern, Shanghai.
2024.01 - 2024.06, Alibaba DAMO, Hangzhou.
2023.07 - 2023.12, Tencent AI Lab, Beijing.
2023.03 - 2023.06, iFlytek Research, Hefei.
2021.12 - 2022.08, Tencent Youtu Lab, Hefei.

Yongxin Zhu