Yuxiang Wei

About

I am a PhD candidate at UIUC CS, advised by Lingming Zhang. I am also a part-time researcher at Meta FAIR, working with Sida Wang, Rishabh Singh, Daniel Fried, and Gabriel Synnaeve.

I work on training LLMs for code and software, with experience and published results in pretraining, midtraining, and posttraining via synthetic data and reinforcement learning.

Researh impact: I lead Magicoder (ICML’24) and SelfCodeAlign (NeurIPS’24), projects with 400k+ downloads and 2.2k+ GitHub stars. They have been adopted by leading industry language models, including Meta Llama 3, Google CodeGemma, and IBM Granite code models.

News

Publications

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Wei, Yuxiang, Olivier Duchenne, Jade Copet, Quentin Carbonneaux, Lingming Zhang, Daniel Fried, Gabriel Synnaeve, Rishabh Singh, and Sida I. Wang. arXiv Preprint arXiv:2502.18449, 2025. https://arxiv.org/abs/2502.18449.
[NeurIPS’24] SelfCodeAlign: Self-Alignment for Code Generation
Wei, Yuxiang, Federico Cassano, Jiawei Liu, Yifeng Ding, Naman Jain, Zachary Mueller, Harm de Vries, Leandro Von Werra, Arjun Guha, and Lingming Zhang. In The Thirty-Eighth Annual Conference on Neural Information Processing Systems, 2024. https://openreview.net/forum?id=xXRnUU7xTL.
[DL4C@ICLR’25] Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining
Wei, Yuxiang, Hojae Han, and Rajhans Samdani. In ICLR 2025 Third Workshop on Deep Learning for Code, 2025. https://openreview.net/forum?id=lP44oj9cWU.
[COLM’24] Evaluating Language Models for Efficient Code Generation
Liu, Jiawei, Songrun Xie, Junhao Wang, Yuxiang Wei, Yifeng Ding, and Lingming Zhang. In First Conference on Language Modeling, 2024. https://openreview.net/forum?id=IBCBMeAhmC.
[LCFM@ICML’24] RepoQA: Evaluating Long Context Code Understanding
Liu, Jiawei, Jia Le Tian, Vijay Daita, Yuxiang Wei, Yifeng Ding, Yuhan Katherine Wang, Jun Yang, and Lingming Zhang. In First Workshop on Long-Context Foundation Models @ ICML 2024, 2024. https://openreview.net/forum?id=hK9YSrFuGf.
StarCoder 2 and the Stack V2: The Next Generation
Lozhkov, Anton, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, et al. arXiv Preprint arXiv:2402.19173, 2024.
[ACL’24] XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
Ding, Yifeng, Jiawei Liu, Yuxiang Wei, and Lingming Zhang. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 12941–55. Bangkok, Thailand: Association for Computational Linguistics, 2024. https://aclanthology.org/2024.acl-long.699.
[ICML’24] Magicoder: Empowering Code Generation with OSS-Instruct
Wei, Yuxiang, Zhe Wang, Jiawei Liu, Yifeng Ding, and Lingming Zhang. In Proceedings of the 41st International Conference on Machine Learning, 235:52632–57. Proceedings of Machine Learning Research. PMLR, 2024. https://proceedings.mlr.press/v235/wei24h.html.
[ESEC/FSE’23] Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair
Wei, Yuxiang, Chunqiu Steven Xia, and Lingming Zhang. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 172–84. New York, NY, USA: Association for Computing Machinery, 2023. https://doi.org/10.1145/3611643.3616271.
[ICSE’23] Automated Program Repair in the Era of Large Pre-Trained Language Models
Xia, Chunqiu Steven, Yuxiang Wei, and Lingming Zhang. In Proceedings of the 45th International Conference on Software Engineering, 1482–94. ICSE ’23. Melbourne, Victoria, Australia: IEEE Press, 2023. https://doi.org/10.1109/ICSE48619.2023.00129.
[OOPSLA’22] Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation
Liu, Jiawei, Yuxiang Wei, Sen Yang, Yinlin Deng, and Lingming Zhang. Proc. ACM Program. Lang. 6, no. OOPSLA1 (April 2022). https://doi.org/10.1145/3527317.

Awards and Honors

Services

Invited Talks