Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System

¹Shanghai Artificial Intelligence Laboratory
²Shanghai Innovation Institute
³Department of Engineering Science, University of Oxford
⁴Shanghai Institute for Science of Science
⁵Department of Information Engineering, Chinese University of Hong Kong
⁶Department of Electronic Engineering, Tsinghua University
^#Co-first Authors
^※Corresponding Authors
*Project Lead

arXiv Code Data

fail — **Figure 1. The proposed LLM-based multi-agent system, VirSci,** consists of five key steps: Collaborator Selection, where a research team is assembled; Topic Discussion, where the research topic is determined; Idea Generation, where team members propose and refine ideas; Novelty Assessment, where ideas are evaluated and voted on to select the best one; and Abstract Generation, where the selected idea is developed into a complete abstract.

⭐ Highlights

To the best of our knowledge, we propose the first multi-agent system with a scientific research ecosystem for conducting and benchmarking scientific collaborations, named VirSci, where real data is used for role-playing and objective evaluation.
To simulate a reliable scientific collaboration process, we propose an end-to-end pipeline that spans team organization to idea generation. A novel inter- and intra-team discussion mechanism is introduced to promote communication topology and enhance the simulation realism.
Extensive experiments demonstrate that multi-agent collaboration improves outcome quality, surpassing existing methods. We also conduct systematic experiments to investigate factors influencing the system, with findings aligning with established principles from the Science of Science, such as fresh teams tend to generate more innovative research, underscoring VirSci's reliability as a powerful tool for autonomous scientific discovery.

👀 Abstract

The rapid advancement of scientific progress requires innovative tools that can accelerate knowledge discovery. Although recent AI methods, particularly large language models (LLMs), have shown promise in tasks such as hypothesis generation and experimental design, they fall short of replicating the collaborative nature of real-world scientific practices, where diverse experts work together in teams to tackle complex problems. To address the limitations, we propose an LLM-based multi-agent system, i.e., Virtual Scientists (VirSci), designed to mimic the teamwork inherent in scientific research. VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas. Through comprehensive experiments, we demonstrate that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas. We further investigate the collaboration mechanisms that contribute to its tendency to produce ideas with higher novelty, offering valuable insights to guide future research and illuminating pathways toward building a robust system for autonomous scientific discovery.

⚙️ Pipeline

Figure 2. Key components of the proposed system. The left section illustrates the collaborator selection process, where the team leader forms a research team. The middle section highlights the discussion routine, a fundamental part of every step in the system, where the team engages in collaborative dialogue to progress through tasks. The right section depicts the architecture of the author knowledge bank and paper database, which provide critical information used throughout the collaboration process.

📦 Experiments

Table 1. Comparisons with AI Scientist. Results show that our multi-agent system outperforms the AI Scientist across all metrics, with GPT-4o achieving the highest performance.
¹ GPT-4o API is “gpt-4o-2024-08-06”.

Figure 3. Effects of team size and discussion turns on novelty scores. Peak innovation occurs with 8 members and 5 turns, while larger teams or excessive turns hinder creativity. “Inference Cost” is the product of team size and turns.

Figure 4. Effects of team freshness on novelty. The balance of new and returning collaborators in the team has a notable impact on novelty, with 50% freshness yielding the highest historical dissimilarity and overall novelty, particularly in larger teams (size 8).

Figure 5. Effects of team diversity on novelty. The optimal diversity level appears to be 50%, which maximizes novelty and impact across team sizes.

📌 Citation


@article{su2024two,
  title={Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation},
  author={Su, Haoyang and Chen, Renqi and Tang, Shixiang and Zheng, Xinzhe and Li, Jingzhe and Yin, Zhenfei and Ouyang, Wanli and Dong, Nanqing},
  journal={arXiv preprint arXiv:2410.09403},
  year={2024}
}