|
π Minghua He
I am currently a graduate student at Peking University, the National Engineering Research Center for Software Engineering.
I've also had great experiences working at Tencent, Microsoft Research Asia (MSRA) and Alibaba Group.
My research interest is broadly in Foundational LLMs and Reliable AI/Systems.
I am dedicated to bridging the gap between AI and systems in the real world, and making AI and systems more reliable.
π§ Email /
π Google Scholar /
π» Github /
|
|
π₯ News
- [2025/10] π I was selected as an in-person volunteerπ§βπ» for EMNLP 2025!
- [2025/09] π Two papers accepted to ASE 2025!
- [2025/08] π One paper accepted to ASE 2025 (Directly Accept, Top 9.5%), see you in Seoul, South Koreaπ°π·!
- [2025/08] π One paper accepted to EMNLP 2025 (Oral Presentation, Top 4.3%), see you in Suzhou, Chinaπ¨π³!
- [2025/08] π One paper accepted to ASE-NIER 2025!
- [2025/07] π Two papers accepted to ISSRE 2025!
- [2025/03] π Three papers accepted to FSE 2025, see you in Thronheim, Norwayπ³π΄!
- [2025/01] π One paper accepted to ICSE 2025, see you in Ottawa, Canadaπ¨π¦!
- [2024/08] π One paper accepted to ISSRE 2024, see you in Tsukuba, Japanπ―π΅!
|
π Research
π€ I'm open to collaborations on related projects, feel free to contact me!
π§ Email: hemh2120 [at] gmail dot com
Papers sorted by recency. * indicates equal contribution.
|
 |
A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models
Lingzhe Zhang*, Liancheng Fang*, Chiming Duan*, Minghua He*, Leyi Pan*, Pei Xiao, Shiyu Huang, Yunpeng Zhai, Xuming Hu, Philip S. Yu, Aiwei Liu
Paper / Code
Preprints
TL;DR: A comprehensive survey of parallel text generation techniques, from parallel decoding to the latest diffusion language models.
|
 |
MicroRemed: Benchmarking LLMs in Microservices Remediation
Lingzhe Zhang, Yunpeng Zhai, Tong Jia, Chiming Duan, Minghua He, Leyi Pan, Zhaoyang Liu, Bolin Ding, Ying Li
Paper / Code
Preprints
TL;DR: We introduce MicroRemed, the first benchmark for evaluating LLMs in end-to-end microservice remediation, from diagnosis reports directly to executable Ansible playbooks.
|
 |
CodeAD: Synthesize Code of Rules for Log-based Anomaly Detection with LLMs
Junjie Huang, Minghua He, Jinyang Liu, Yintong Huo, Domenico Bianculli, Michael R. Lyu
Paper / Code
Preprints
TL;DR: We present CodeAD, a novel framework that automatically synthesizes lightweight Python rule functions for LogAD using LLMs, achieving 3.6% F1 improvement while processing datasets 4x faster at a fraction of the cost.
|
|
Duet: Joint Exploration of UserβItem Profiles
Yue Chen,
Lu Wang,
Minjie Hong,
Pu Zhao,
Fangkai Yang,
Yifei Dong,
Minghua He,
Nan Hu,
Jianjin Zhang,
Zhiwei Dai,
Yuefeng Zhan,
Weihao Han,
Hao Sun,
Qingwei Lin,
Weiwei Deng,
Feng Sun,
Qi Zhang,
Saravan Rajmohan,
Dongmei Zhang
Project
/
Paper
/
Code
Preprints
TL;DR:
We propose DUET, a closed-loop framework for joint exploration of user-item textual profiles in recommendation systems. It distills raw data into concise cues, expands them into rich profiles via self-prompt construction, and optimizes profiles jointly with reinforcement learning using downstream recommendation feedback, while enabling interpretable LLM-compatible representations.
|
 |
SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization
Jiehui Luo, Yuguo Yin, Yuxin Xie, Jinghan Ru, Xianwei Zhuang, Minghua He, Aofan Liu, Zihan Xiong, Dongchao Yang
Paper / Code
Preprints
TL;DR: We propose Support Vector Regularization (SVR) to control optimization trajectory drift in contrastive language-audio pretraining by using an auxiliary support vector to harness rich information from negative samples while improving training stability.
|
 |
Adaptive Root Cause Localization for Microservice Systems with Multi-Agent Recursion-of-Thought
Lingzhe Zhang, Tong Jia, Kangjin Wang, Weijie Hong, Chiming Duan, Minghua He, Ying Li
Paper / Code
Preprints
TL;DR: Inspired by how human SREs operate, we introduce RCLAgent, a multi-agent recursion-of-thought framework that accurately localizes the root cause of failures in microservice systems using only a single request.
|
 |
WarriorMath: Enhancing the Mathematical Ability of Large Language Models with a Defect-aware Framework
Yue Chen*, Minghua He*, Fangkai Yang, Pu Zhao, Lu Wang, Yu Kang, Yifei Dong, Yuefeng Zhan, Hao Sun, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
Paper / Code
Preprints
TL;DR: We propose WarriorMath, a defect-aware framework that improves LLM mathematical reasoning through targeted data synthesis and progressive training.
|
 |
FusionLog: Cross-System Log-based Anomaly Detection via Fusion of General and Proprietary Knowledge
Xinlong Zhao, Tong Jia, Minghua He, Xixuan Yang, Ying Li
Paper / Code
Preprints
TL;DR: FusionLog routes unlabeled target logs by semantic similarity, applies meta-learned small models to general logs, and distills LLM-guided pseudo labels for proprietary logs to fuse knowledge without target labels.
|
 |
Generality Is Not Enough: Zero-Label Cross-System Log-Based Anomaly Detection via Knowledge-Level Collaboration
Xinlong Zhao, Tong Jia, Minghua He, Ying Li
Paper / Code
Preprints
TL;DR: GeneralLog collaborates LLMs and small models at the knowledge level, routing proprietary logs to LLMs and general logs to small models to reach 90%+ F1 under fully zero-label settings.
|
 |
LogAction: Consistent Cross-system Anomaly Detection through Logs via Active Domain
Chiming Duan*, Minghua He*, Pei Xiao, Tong Jia, Xin Zhang, Zhewei Zhong, Xiang Luo, Yan Niu, Lingzhe Zhang, Yifan Wu, Siyu Yu, Weijie Hong, Ying Li, Gang Huang
Project / Paper / Code
CCF-A
ASE 2025
IEEE/ACM International Conference on Automated Software Engineering
TL;DR: We propose LogAction, a framework that integrates transfer and active learning to achieve high-performance cross-system anomaly detection with minimal labeling effort.
|
 |
CoorLog: Efficient-Generalizable Log Anomaly Detection via Adaptive Coordinator in Software Evolution
Pei Xiao*, Chiming Duan*, Minghua He*, Tong Jia, Yifan Wu, Jing Xu, Gege Gao, Lingzhe Zhang, Weijie Hong, Ying Li, Gang Huang
Paper / Code
CCF-A
ASE 2025
IEEE/ACM International Conference on Automated Software Engineering
TL;DR: We propose CoorLog, a framework using an adaptive coordinator for efficient and generalizable log anomaly detection, especially in evolving software systems.
|
 |
United We Stand: Towards End-to-End Log-based Fault Diagnosis via Interactive Multi-Task Learning
Minghua He*, Chiming Duan*, Pei Xiao*, Tong Jia, Siyu Yu, Lingzhe Zhang, Weijie Hong, Jing Han, Yifan Wu, Ying Li, Gang Huang
Paper / Code
TL;DR: We propose Chimera, an end-to-end framework that unifies anomaly detection and root cause localization through interactive multi-task learning and bidirectional knowledge transfer.
|
 |
Walk the Talk: Is Your Log-based Software Reliability Maintenance System Really Reliable?
Minghua He, Tong Jia, Chiming Duan, Pei Xiao, Lingzhe Zhang, Kangjin Wang, Yifan Wu, Ying Li, Gang Huang
Paper / Code
CCF-A
ASE-NIER 2025
IEEE/ACM International Conference on Automated Software Engineering
TL;DR: We introduce 'diagnostic faithfulness' as a key metric and propose FaithLog, a system that enhances model trustworthiness via a causality-guided attention mechanism.
|
 |
ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation
Minghua He*, Yue Chen*, Fangkai Yang, Pu Zhao, Wenjie Yin, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
Project / Paper / Code
TL;DR: ExeCoder enhances LLM-based code translation by incorporating executability representations like syntax and semantics.
|
 |
ZeroLog: Zero-Label Generalizable Cross-System Log-based Anomaly Detection
Xinlong Zhao, Tong Jia, Minghua He, Ying Li, Gang Huang
Paper / Code
CCF-B
ISSRE 2025
International Symposium on Software Reliability Engineering
TL;DR: We introduce ZeroLog, a framework for zero-label, generalizable cross-system anomaly detection using logs.
|
 |
CSLParser: A Collaborative Framework Using Small and Large Language Models for Log Parsing
Weijie Hong, Yifan Wu, Lingzhe Zhang, Chiming Duan, Pei Xiao, Minghua He, Xixuan Yang, Ying Li
Paper / Code
CCF-B
ISSRE 2025
International Symposium on Software Reliability Engineering
TL;DR: CSLParser presents a collaborative framework where small and large language models work together for efficient log parsing.
|
 |
From Few-Label to Zero-Label: An Approach for Cross-System Log-Based Anomaly Detection with Meta-Learning
Xinlong Zhao, Tong Jia, Minghua He, Yihan Wu, Ying Li, Gang Huang
Paper / Code
CCF-A
FSE-IVR 2025
ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
TL;DR: We propose FreeLog, a system-agnostic meta-learning approach for cross-system log anomaly detection that requires no labeled data from the target system.
|
 |
Exploring Variable Potential for LLM-based Log Parsing Efficiency and Reduced Costs
Jinrui Sun, Tong Jia, Minghua He, Yihan Wu, Ying Li, Gang Huang
Paper / Code
CCF-A
FSE-IVR 2025
ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
TL;DR: We propose VISTA, a variable-centric strategy that improves the efficiency and reduces the cost of LLM-based log parsing through novel sampling, caching, and ICL techniques.
|
 |
CLSLog: Collaborating large and Small Models for Log-based Anomaly Detection
Pei Xiao, Tong Jia, Chiming Duan, Minghua He, Weijie Hong, Xixuan Yang, Yihan Wu, Ying Li, Gang Huang
Paper / Code
CCF-A
FSE-IVR 2025
ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
TL;DR: We propose CLSLog, a collaborative scheme combining LLM generalization and small model efficiency to effectively handle evolutionary logs in anomaly detection.
|
 |
Weakly-supervised Log-based Anomaly Detection with Inexact Labels via Multi-instance Learning
Minghua He, Tong Jia, Chiming Duan, Huaqian Cai, Ying Li, Gang Huang
Paper / Code
CCF-A
ICSE 2025
IEEE/ACM International Conference on Software Engineering
TL;DR: We propose MIDLog, a weakly-supervised method using multi-instance learning to enable log anomaly detection with inexact, bag-level labels instead of fine-grained annotation.
|
 |
LLMeLog: An Approach for Anomaly Detection based on LLM-enriched Log Events
Minghua He, Tong Jia, Chiming Duan, Huaqian Cai, Ying Li, Gang Huang
Paper / Code
CCF-B
ISSRE 2024
International Symposium on Software Reliability Engineering
TL;DR: We propose LLMeLog, which leverages LLMs to enrich log event semantics and fine-tunes a BERT model on the enriched data, significantly boosting anomaly detection accuracy.
|
π Miscellaneous
Outside of research, I recharge through travel, fitness, and landscape photographyβusing my camera to explore the world, from cherry-blossom-laden Tokyo to culture-steeped Paris.
I've chased auroras at the edge of the earth and I'm already dreaming of the next adventure.
Tokyo
|
Bergen
|
Paris
|
Murmansk
|
|
Thank you for visiting! Feel free to contact me if you have any questions.
This website is designed based on Jon Barron.
New
Last Update: October, 2025
|
|