I’m a Ph.D. student in Sky Computing Lab at UC Berkeley, advised by Prof. Ion Stoica. I received my B.Eng. in Computer Science from ACM Honor Class at Shanghai Jiao Tong University. During my senior year, I was fortunate to work with Prof. Baris Kasikci at University of Washington.
My research interests lie in building effient system support for AI applications, drawing insights from both system and algorithm sides.
Publications
* means equal contribution
-
NanoFlow: Towards Optimal Large Language Model Serving Throughput
Kan Zhu, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu, Dedong Xie, Yufei Gao, Qinyu Xu, Tian Tang, Zihao Ye, Keisuke Kamahori, Chien-Yu Lin, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci
-
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Jiaming Tang*, Yilong Zhao*, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han
-
Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving
Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci