Yuxiang Wei

About

I am a 3rd year CS PhD student at UIUC advised by Professor Lingming Zhang. I am also a part-time researcher at Meta FAIR. I work on building code intelligence through large language models.

Notably, I lead the development of Magicoder and StarCoder2-Instruct, projects that have garnered over 405k downloads and 2.1k GitHub stars. The core techniques and datasets from these projects have been adopted by leading industry language models, including Meta Llama 3.1, Google CodeGemma, and IBM Granite code models.

News

Publications

[NeurIPS’24] Fully Transparent Self-Alignment for Code Generation
Wei, Yuxiang, Federico Cassano, Jiawei Liu, Yifeng Ding, Naman Jain, Zachary Mueller, Harm de Vries, Leandro Von Werra, Arjun Guha, and Lingming Zhang. In The Thirty-Eighth Annual Conference on Neural Information Processing Systems, 2024. https://openreview.net/forum?id=xXRnUU7xTL.
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining
Wei, Yuxiang, Hojae Han, and Rajhans Samdani, 2024. https://arxiv.org/abs/2409.02326.
[COLM’24] Evaluating Language Models for Efficient Code Generation
Liu, Jiawei, Songrun Xie, Junhao Wang, Yuxiang Wei, Yifeng Ding, and LINGMING ZHANG. In First Conference on Language Modeling, 2024. https://openreview.net/forum?id=IBCBMeAhmC.
[LCFM@ICML’24] RepoQA: Evaluating Long Context Code Understanding
Liu, Jiawei, Jia Le Tian, Vijay Daita, Yuxiang Wei, Yifeng Ding, Yuhan Katherine Wang, Jun Yang, and Lingming Zhang. In First Workshop on Long-Context Foundation Models @ ICML 2024, 2024. https://openreview.net/forum?id=hK9YSrFuGf.
StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation
Wei, Yuxiang, Federico Cassano, Jiawei Liu, Yifeng Ding, Naman Jain, Harm de Vries, Leandro von Werra, Arjun Guha, and Lingming Zhang. https://huggingface.co/blog/sc2-instruct, 2024.
StarCoder 2 and the Stack V2: The Next Generation
Lozhkov, Anton, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, et al. arXiv Preprint arXiv:2402.19173, 2024.
[ACL’24] XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
Ding, Yifeng, Jiawei Liu, Yuxiang Wei, and Lingming Zhang. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 12941–55. Bangkok, Thailand: Association for Computational Linguistics, 2024. https://aclanthology.org/2024.acl-long.699.
[ICML’24] Magicoder: Empowering Code Generation with OSS-Instruct
Wei, Yuxiang, Zhe Wang, Jiawei Liu, Yifeng Ding, and Lingming Zhang. In Proceedings of the 41st International Conference on Machine Learning, 235:52632–57. Proceedings of Machine Learning Research. PMLR, 2024. https://proceedings.mlr.press/v235/wei24h.html.
[ESEC/FSE’23] Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair
Wei, Yuxiang, Chunqiu Steven Xia, and Lingming Zhang. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 172–84. New York, NY, USA: Association for Computing Machinery, 2023. https://doi.org/10.1145/3611643.3616271.
[ICSE’23] Automated Program Repair in the Era of Large Pre-Trained Language Models
Xia, Chunqiu Steven, Yuxiang Wei, and Lingming Zhang. In Proceedings of the 45th International Conference on Software Engineering, 1482–94. ICSE ’23. Melbourne, Victoria, Australia: IEEE Press, 2023. https://doi.org/10.1109/ICSE48619.2023.00129.
[OOPSLA’22] Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation
Liu, Jiawei, Yuxiang Wei, Sen Yang, Yinlin Deng, and Lingming Zhang. Proc. ACM Program. Lang. 6, no. OOPSLA1 (April 2022). https://doi.org/10.1145/3527317.

Awards and Honors

Services

Invited Talks