MUST Ph.D. Student Unveils Breakthrough Research at KDD 2026: Proposing ToxiMol, the World's First Molecular Toxicity Repair Benchmark
MUST Ph.D. Student Unveils Breakthrough Research at KDD 2026: Proposing ToxiMol, the World's First Molecular Toxicity Repair Benchmark
Lin Fei, a third-year doctoral student in Intelligent Science and Systems of Faculty of Innovation Engineering (FIE) at the Macau University of Science and Technology (MUST), has made a significant mark at the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026). Serving as the first author, Lin presented the groundbreaking paper, “Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?”. The research was selected for an exclusive Oral Presentation—an honor reserved for the top 20% of accepted papers. The study was directed by corresponding author Professor Fei-Yue Wang from FIE of MUST and was conducted in collaboration with leading institutions including Shanghai Jiao Tong University, the Chinese Academy of Sciences (Institute of Automation and Institute of Process Engineering), the Shanghai Artificial Intelligence Laboratory, and Ningbo University.

ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) is globally recognized as a premier Tier-A conference by the China Computer Federation (CCF) and consistently ranks at the top of Google Scholar Metrics for data mining and artificial intelligence. Securing an Oral Presentation in the highly competitive "AI for Sciences" Track is a major milestone. Notably, this marks the first time a paper with MUST as the lead institution has been published at KDD, underscoring the university’s growing research prowess in the interdisciplinary field of AI for Science.
In drug discovery, researchers often face a familiar challenge: a candidate molecule may show strong therapeutic promise while also carrying risks such as liver toxicity, cardiotoxicity, or mutagenicity. For medicinal chemists, the ideal solution is usually not to discard the original molecule and start over. Instead, they aim to make precise changes to the structural fragments that may cause toxicity while preserving the molecule’s core structure and drug-development potential. This may sound like simply “editing a molecule,” but in practice it requires deep expertise, careful judgment, and repeated experimentation. Researchers must understand why a molecule may be toxic, whether the modified molecule remains drug-like, whether it can be synthesized, and whether fixing one toxicity problem may introduce new barriers in the drug-development process. For these reasons, molecular toxicity repair is one of the most challenging tasks in drug discovery. Traditional molecular detoxification also depends heavily on iterative trial and error by experienced medicinal chemists, which keeps costs high.
In recent years, multimodal large language models (MLLMs) have advanced quickly in image understanding, text-based reasoning, and complex generation tasks. They are also beginning to be explored for scientific problems such as molecular design and drug discovery. This raises an important question that has not yet been systematically tested: when a model is shown a toxic molecule and is asked to reduce a specific type of toxicity, can it make reasonable structure-level modifications the way a medicinal chemist would? Is the model truly learning the relationship between molecular structure and toxicity, or is it merely generating an answer that looks like a molecule?
To answer this question, the research team introduced ToxiMol, the world’s first benchmark task designed specifically to evaluate the ability of general-purpose MLLMs to repair molecular toxicity. Unlike traditional tasks that focus on predicting whether a molecule is toxic, ToxiMol addresses a more advanced structure-level repair problem: given a real toxic molecule and its corresponding toxicity type, the model is asked to generate a new molecular structure that reduces the target toxicity while preserving as much of the original molecule’s core properties and drug-development potential as possible. ToxiMol includes 11 major toxicity-repair tasks, covering representative toxicity endpoints such as LD50 (median lethal dose), DILI (drug-induced liver injury), and AMES mutagenicity. The benchmark is built on 660 real toxic molecules with high structural complexity and diverse toxicity mechanisms. It evaluates not only whether a model can generate a chemically valid molecule, but also whether it can perform integrated optimization in settings closer to real-world drug discovery. To evaluate the generated molecules more rigorously, the team also developed ToxiEval, a multi-criteria evaluation framework. ToxiEval uses a strict “all-constraints-pass” strategy: a repair is considered successful only if it satisfies multiple requirements at the same time, including structural validity, safety score, drug-likeness (QED), synthetic accessibility (SAS), and structural similarity. Together, ToxiMol and ToxiEval establish the first standardized evaluation system for structure-level molecular toxicity repair, providing important infrastructure for future research in this area. ToxiMol molecular toxicity repair tasks and the ToxiEval multi-criteria evaluation chain.

This publication highlights MUST's continuous innovation and its competitive edge on the global AI for Science stage. Moving forward, MUST remains committed to supporting cutting-edge research, driving the integration of AI with healthcare and the life sciences, and contributing to global technological advancements for a smarter society.
⬇️️Paper Links⬇️

(https://arxiv.org/abs/2506.10912)
GitHub Project: https://github.com/HydroSophy/ToxiMol
Dataset: https://huggingface.co/datasets/HydroSophyTech/ToxiMol-benchmark