I am now a Ph.D. student at University of Science and Technology of China (USTC), supervised by Prof. Linli Xu. Currently, I am working as a research intern at Alibaba DAMO. I am working in the field of Multi-Modal Learning. My research interests inlcude:
- Multi-Modal LLM
- SNLP
Publications
ArXiv
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer, Yongxin Zhu, Bocheng Li, Yifei Xin, Linli XuNeurIPS 2024
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective, Yongxin Zhu, Bocheng Li, Hang Zhang, Xin Li, Linli Xu, Lidong BingArXiv
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs, Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong BingACL 2024
Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer, Yongxin Zhu, Dan Su, Liqiang He, Linli Xu, Dong YuACL 2024
(Oral) Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction, Haoqiu Yan#, Yongxin Zhu#, Kai Zheng, Bing Liu, Haoyu Cao, Deqiang Jiang, Linli XuNAACL 2024
Difformer: Empowering Diffusion Models on the Embedding Space for Text Generation, Zhujin Gao, Junliang Guo, Xu Tan, Yongxin Zhu, Fang Zhang, Jiang Bian, Linli XuCOLING 2024
Few-shot Temporal Pruning Accelerates Diffusion Models for Text Generation, Bocheng Li, Zhujin Gao, Yongxin Zhu, Kun Yin, Haoyu Cao, Deqiang Jiang, Linli XuAAAI 2024
Visual Hallucination Elevates Audio Speech Recognition, Fang Zhang, Yongxin Zhu, Xiangxiang Wang, Huang Chen, Xing Sun, Linli XuEMNLP 2023
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation, Yongxin Zhu, Zhujin Gao, Xinyuan Zhou, Zhongyi Ye, Linli XuACL 2023
Span-level Aspect-based Sentiment Analysis via Table Filling, Mao Zhang, Yongxin Zhu, Zhen Liu, Zhimin Bao, Yunfei Wu, Xing Sun, Linli XuPAKDD 2023
ItrievalKD: An Iterative Retrieval Framework Assisted with Knowledge Distillation for Noisy Text-to-Image Retrieval, Zhen Liu, Yongxin Zhu, Zhujin Gao, Xin Sheng, Linli XuAAAI 2023
(Oral) Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA, Yongxin Zhu, Zhen Liu, Yukang Liang, Xin Li, Hao Liu, Changcun Bao, Linli XuAAAI 2022
Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation, Jiquan Li, Junliang Guo, Yongxin Zhu, Xin Sheng, Deqiang Jiang, Bo Ren, Linli Xu
Educations
- 2020.09 - present, Ph.D. in Data Science, University of Science and Technology of China, Hefei.
- 2016.09 - 2020.07, B.Sc. in Statistics, University of Science and Technology of China, Hefei.
Internships
- 2024.01 - present, Alibaba DAMO, Hangzhou.
- 2023.07 - 2023.12, Tencent AI Lab, Beijing.
- 2023.03 - 2023.06, iFlytek Research, Hefei.
- 2021.12 - 2022.08, Tencent Youtu Lab, Hefei.