publications
A list of academic publications on machine learning systems, on-device AI, and high-performance inference engine design.
2025
- MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection2025
- MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference2025
2024
- MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile DevicesIn Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops, Dec 2024
2022
- Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine LearningDec 2022