diff --git a/MultiAgentEbook/README.md b/MultiAgentEbook/README.md new file mode 100644 index 000000000..799253491 --- /dev/null +++ b/MultiAgentEbook/README.md @@ -0,0 +1,31 @@ +
+

Multi-Agent Ebook

+ +
+ +

+ 【🏄 Go to the Website | 📚 Read the Chapters | 🧐 Learn More about our Research】 +

+ +## Multi-Agent Ebook + +- **Multi-Agent Ebook** presents an interactive eBook that compiles an extensive collection of research papers on large language model (LLM)-based multi-agent systems. Organized into multiple chapters and continuously updated with significant research, it strives to provide a comprehensive outline for both researchers and enthusiasts in the field. We welcome ongoing contributions to expand and enhance this resource. We thank the open-source templates for building this website ([sparshcodes/bookmark-landing-page](https://github.com/sparshcodes/bookmark-landing-page) and [fchavonet/web-flip_book](https://github.com/fchavonet/web-flip_book)). + +

+ +

+ +## How to Contribute + +- **Multi-Agent Ebook** is fully open-source and we welcome everyone to collaboratively build and enhance this repository. You can add a new page to the Ebook by creating an issue! Please follow the format below to submit an issue for adding a paper related to LLM Multi-Agent to the Ebook, and we will process and merge it as soon as possible! + + ``` + Issue Title: [Ebook New Paper] {Paper Title} + + Title: {Title of the Paper} + Authors: {All Authors of the Paper, separated by commas} + Date: {Paper Submission Date for the first version} + Abstract: {Abstract of the Paper} + Url: {Url of the Paper} + Affiliation: {Affiliations of All Authors, separated by commas} + ``` diff --git a/MultiAgentEbook/book_communication/data.csv b/MultiAgentEbook/book_communication/data.csv new file mode 100755 index 000000000..a0705536d --- /dev/null +++ b/MultiAgentEbook/book_communication/data.csv @@ -0,0 +1,29 @@ +,image_path,title,author,summary,affiliation +0,./images/1d.png,AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems,"Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, Ji-Rong Wen","Recently, there has been an emergence of employing LLM-poweredagents as believable human proxies, based on their remarkabledecision-making capability. However, existing studies mainly focuson simulating human dialogue. Human non-verbal behaviors, suchas item clicking in recommender systems, although implicitly ex-hibiting user preferences and could enhance the modeling of users,have not been deeply explored. The main reasons lie in the gapbetween language modeling and behavior modeling, as well as theincomprehension of LLMs about user-item relations.To address this issue, we propose AgentCF for simulating user-item interactions in recommender systems through agent-basedcollaborative filtering. We creatively consider not only users butalso items as agents, and develop a collaborative learning approachthat optimizes both kinds of agents together. Specifically, at eachtime step, we first prompt the user and item agents to interact au-tonomously. Then, based on the disparities between the agents’decisions and real-world interaction records, user and item agentsare prompted to reflect on and adjust the misleading simulationscollaboratively, thereby modeling their two-sided relations. The op-timized agents can also propagate their preferences to other agentsin subsequent interactions, implicitly capturing the collaborative fil-tering idea. Overall, the optimized agents exhibit diverse interactionbehaviors within our framework, including user-item, user-user,item-item, and collective interactions. The results show that theseagents can demonstrate personalized behaviors akin to those of real-world individuals, sparking the development of next-generationuser behavior simulation.","Renmin University of China, UC San Diego, Tencent" +1,./images/agentcf_collaborative_learning_with_20231013.png,AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors,"Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou","Autonomous agents empowered by Large Language Models (LLMs) have under-gone significant improvements, enabling them to generalize across a broad spec-trum of tasks. However, in real-world scenarios, cooperation among individuals isoften required to enhance the efficiency and effectiveness of task accomplishment.Hence, inspired by human group dynamics, we propose a multi-agent frameworkAGENTVERSE that can effectively orchestrate a collaborative group of expert agentsas a greater-than-the-sum-of-its-parts system. Our experiments demonstrate thatAGENTVERSE can proficiently deploy multi-agent groups that outperform a singleagent. Extensive experiments on text understanding, reasoning, coding, tool utiliza-tion, and embodied AI confirm the effectiveness of AGENTVERSE. Moreover, ouranalysis of agent interactions within AGENTVERSE reveals the emergence of spe-cific collaborative behaviors, contributing to heightened group efficiency. Our codehas been released at https://github.com/OpenBMB/AgentVerse/.","Tsinghua University, Beijing University of Posts and Telecommunications, Tencent Inc." +2,./images/agentverse_facilitating_multi-agent_collaboration_20230821.png,Apollo's Oracle: Retrieval-Augmented Reasoning in Multi-Agent Debates,"Haotian Wang, Xiyuan Du, Weijiang Yu, Qianglong Chen, Kun Zhu, Zheng Chu, Lian Yan, Yi Guan","Multi-agent debate systems are designed to derive accurate and consistent conclusions through adversarial interactions among agents. However, these systems often encounter challenges due to cognitive constraints, manifesting as (1) agents' obstinate adherence to incorrect viewpoints and (2) their propensity to abandon correct viewpoints. These issues are primarily responsible for the ineffectiveness of such debates. Addressing the challenge of cognitive constraints, we introduce a novel framework, the Multi-Agent Debate with Retrieval Augmented (MADRA). MADRA incorporates retrieval of prior knowledge into the debate process, effectively breaking cognitive constraints and enhancing the agents' reasoning capabilities. Furthermore, we have developed a self-selection module within this framework, enabling agents to autonomously select pertinent evidence, thereby minimizing the impact of irrelevant or noisy data. We have comprehensively tested and analyzed MADRA across six diverse datasets. The experimental results demonstrate that our approach significantly enhances performance across various tasks, proving the effectiveness of our proposed method.","Harbin Institute of Technology, Sun Yat-sen University, Zhejiang University" +3,./images/apollo's_oracle_retrieval-augmented_reasoning_20231208.png,ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator,"Junda Zhu, Lingyong Yan, Haibo Shi, Dawei Yin, Lei Sha","Large language models (LLMs) are proven tobenefit a lot from retrieval-augmented genera-tion (RAG) in alleviating hallucinations con-fronted with knowledge-intensive questions.RAG adopts information retrieval techniquesto inject external knowledge from semantic-relevant documents as input contexts. How-ever, due to today’s Internet being flooded withnumerous noisy and fabricating content, it isinevitable that RAG systems are vulnerableto these noises and prone to respond incor-rectly. To this end, we propose to optimizethe retrieval-augmented GENERATOR with aAdversarial Tuning Multi-agent system (ATM).The ATM steers the GENERATOR to have a ro-bust perspective of useful documents for ques-tion answering with the help of an auxiliaryATTACKER agent. The GENERATOR and theATTACKER are tuned adversarially for severaliterations. After rounds of multi-agent itera-tive tuning, the GENERATOR can eventuallybetter discriminate useful documents amongstfabrications. The experimental results verifythe effectiveness of ATM and we also observethat the GENERATOR can achieve better perfor-mance compared to state-of-the-art baselines.","Beihang University, Baidu Inc." +4,./images/atm_adversarial_tuning_multi-agent_20240528.png,Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions,"Ruochen Zhao, Wenxuan Zhang, Yew Ken Chia, Deli Zhao, Lidong Bing","As LLMs evolve on a daily basis, there is an urgent need for a trustworthy evaluationmethod that can provide robust evaluation results in a timely fashion. Currently,as static benchmarks are prone to contamination concerns, users tend to trusthuman voting platforms, such as Chatbot Arena. However, human annotationsrequire extensive manual efforts. To provide an automatic, robust, and trustworthyevaluation framework, we innovatively propose the Auto-Arena of LLMs, whichautomates the entire evaluation process with LLM agents. Firstly, an examinerLLM devises queries. Then, a pair of candidate LLMs engage in a multi-round peer-battle around the query, during which the LLM’s true performance gaps becomevisible. Finally, a committee of LLM judges collectively discuss and determine thewinner, which alleviates bias and promotes fairness. In our extensive experimenton the 17 newest LLMs, Auto-Arena shows the highest correlation with humanpreferences, providing a promising alternative to human evaluation platforms.","Nanyang Technological University, Alibaba Group, Singapore University of Technology and Design" +5,./images/auto_arena_of_llms_20240530.png,Autonomous Agents for Collaborative Task under Information Asymmetry,"Wei Liu, Chenxi Wang, Yifei Wang, Zihao Xie, Rennai Qiu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Chen Qian","Large Language Model Multi-Agent Systems (LLM-MAS) have achieved greatprogress in solving complex tasks. It performs communication among agents withinthe system to collaboratively solve tasks, under the premise of shared information.However, when agents’ communication is leveraged to enhance human cooperation,a new challenge arises due to information asymmetry, since each agent can onlyaccess the information of its human user. Previous MAS struggle to complete tasksunder this condition. To address this, we propose a new MAS paradigm termediAgents, which denotes Informative Multi-Agent Systems. In iAgents, the humansocial network is mirrored in the agent network, where agents proactively exchangehuman information necessary for task resolution, thereby overcoming informationasymmetry. iAgents employs a novel agent reasoning mechanism, InfoNav, tonavigate agents’ communication towards effective information exchange. Togetherwith InfoNav, iAgents organizes human information in a mixed memory to provideagents with accurate and comprehensive information for exchange. Additionally,we introduce InformativeBench, the first benchmark tailored for evaluating LLMagents’ task-solving ability under information asymmetry. Experimental resultsshow that iAgents can collaborate within a social network of 140 individualsand 588 relationships, autonomously communicate over 30 turns, and retrieveinformation from nearly 70,000 messages to complete tasks within 3 minutes.","Tsinghua University, Beijing University of Posts and Telecommunications" +6,./images/autonomous_agents_for_collaborative_20240621.png,Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation,"Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang","Recent breakthroughs in large language models (LLMs) have brought remark-able success in the field of LLM-as-Agent. Nevertheless, a prevalent assumptionis that the information processed by LLMs is consistently honest, neglecting thepervasive deceptive or misleading information in human society and AI-generatedcontent.This oversight makes LLMs susceptible to malicious manipulations,potentially resulting in detrimental outcomes. This study utilizes the intricateAvalon game as a testbed to explore LLMs’ potential in deceptive environments.Avalon, full of misinformation and requiring sophisticated logic, manifests as a“Game-of-Thoughts”. Inspired by the efficacy of humans’ recursive thinking andperspective-taking in the Avalon game, we introduce a novel framework, Recur-sive Contemplation (ReCon), to enhance LLMs’ ability to identify and counteractdeceptive information. ReCon combines formulation and refinement contempla-tion processes; formulation contemplation produces initial thoughts and speech,while refinement contemplation further polishes them. Additionally, we incor-porate first-order and second-order perspective transitions into these processesrespectively. Specifically, the first-order allows an LLM agent to infer others’mental states, and the second-order involves understanding how others perceivethe agent’s mental state.......","Tsinghua University, BIGAI, Technical University of Munich" +7,./images/avalon's_game_of_thoughts_20231002.png,Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication,"Weize Chen, Chenfei Yuan, Jiarui Yuan, Yusheng Su, Chen Qian, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun","Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reasoning and multi-agent communication, has not been thoroughly examined. In this work, we challenge the default use of NL by exploring the utility of non-NL formats in these contexts. We show that allowing LLMs to autonomously select the most suitable format before reasoning or communicating leads to a 3.3 to 5.7\% improvement in reasoning efficiency for different LLMs, and up to a 72.7\% reduction in token usage in multi-agent communication, all while maintaining communicative effectiveness. Our comprehensive analysis further reveals that LLMs can devise a format from limited task instructions and that the devised format is effectively transferable across different LLMs. Intriguingly, the structured communication format decided by LLMs exhibits notable parallels with established agent communication languages, suggesting a natural evolution towards efficient, structured communication in agent communication.","Tsinghua University, Tencent, Beijing University of Posts and Telecommunications" +8,./images/beyond_natural_language_llms_20240228.png,Building Cooperative Embodied Agents Modularly with Large Language Models,"Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan","In this work, we address challenging multi-agent cooperation problems with de-centralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. While previous re-search either presupposes a cost-free communication channel or relies on a central-ized controller with shared observations, we harness the commonsense knowledge,reasoning ability, language comprehension, and text generation prowess of LLMsand seamlessly incorporate them into a cognitive-inspired modular framework thatintegrates with perception, memory, and execution. Thus building a CooperativeEmbodied Language Agent CoELA, who can plan, communicate, and cooperatewith others to accomplish long-horizon tasks efficiently. Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strongplanning-based methods and exhibit emergent effective communication. Thoughcurrent Open LMs like LLAMA-2 still underperform, we fine-tune a CoLLAMAwith data collected with our agents and show how they can achieve promisingperformance. We also conducted a user study for human-agent interaction anddiscovered that CoELA communicating in natural language can earn more trust andcooperate more effectively with humans. Our research underscores the potential ofLLMs for future research in multi-agent cooperation. Videos can be found on theproject website https://vis-www.cs.umass.edu/Co-LLM-Agents/.","University of Massachusetts Amherst, Tsinghua University, Shanghai Jiao Tong University, MIT, MIT-IBM Watson AI Lab" +9,./images/building_cooperative_embodied_agents_20230705.png,"CAMEL: Communicative Agents for ""Mind"" Exploration of Large Language Model Society","Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, Bernard Ghanem","The rapid advancement of chat-based language models has led to remarkableprogress in complex task-solving. However, their success heavily relies on humaninput to guide the conversation, which can be challenging and time-consuming.This paper explores the potential of building scalable techniques to facilitate au-tonomous cooperation among communicative agents, and provides insight intotheir “cognitive” processes. To address the challenges of achieving autonomouscooperation, we propose a novel communicative agent framework named role-playing . Our approach involves using inception prompting to guide chat agentstoward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studyingthe behaviors and capabilities of a society of agents, providing a valuable resourcefor investigating conversational language models. In particular, we conduct com-prehensive studies on instruction-following cooperation in multi-agent settings.Our contributions include introducing a novel communicative agent framework,offering a scalable approach for studying the cooperative behaviors and capabili-ties of multi-agent systems, and open-sourcing our library to support research oncommunicative agents and beyond: https://github.com/camel-ai/camel.",King Abdullah University of Science and Technology +10,./images/camel_communicative_agents_for_20230331.png,ChatDev: Communicative Agents for Software Development,"Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun","Software development is a complex task thatnecessitates cooperation among multiple mem-bers with diverse skills. Numerous studies useddeep learning to improve specific phases in awaterfall model, such as design, coding, andtesting.However, the deep learning modelin each phase requires unique designs, lead-ing to technical inconsistencies across variousphases, which results in a fragmented and in-effective development process. In this paper,we introduce ChatDev, a chat-powered soft-ware development framework in which special-ized agents driven by large language models(LLMs) are guided in what to communicate(via chat chain) and how to communicate (viacommunicative dehallucination). These agentsactively contribute to the design, coding, andtesting phases through unified language-basedcommunication, with solutions derived fromtheir multi-turn dialogues. We found their uti-lization of natural language is advantageousfor system design, and communicating in pro-gramming language proves helpful in debug-ging. This paradigm demonstrates how linguis-tic communication facilitates multi-agent col-laboration, establishing language as a unify-ing bridge for autonomous task-solving amongLLM agents. The code and data are availableat https://github.com/OpenBMB/ChatDev.","Tsinghua University, The University of Sydney, BUPT, Modelbest Inc." +11,./images/chatdev_communicative_agents_for_20230716.png,Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate,"Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, Shuming Shi","Modern large language models (LLMs) likeChatGPT have shown remarkable performanceon general language tasks but still struggle oncomplex reasoning tasks, which drives the re-search on cognitive behaviors of LLMs to ex-plore human-like problem-solving strategies.Along this direction, one representative strat-egy is self-reflection, which asks an LLM torefine the solution with the feedback gener-ated by itself iteratively. However, our studyshows that such reflection-style methods suf-fer from the Degeneration-of-Thought (DoT)problem: once the LLM has established confi-dence in its solutions, it is unable to generatenovel thoughts later through reflection even ifits initial stance is incorrect. To address theDoT problem, we propose a Multi-Agent De-bate (MAD) framework, in which multipleagents express their arguments in the state of“tit for tat” and a judge manages the debateprocess to obtain a final solution. Clearly, ourMAD framework encourages divergent think-ing in LLMs which would be helpful for tasksthat require deep levels of contemplation. Ex-periment results on two challenging datasets,commonsense machine translation and counter-intuitive arithmetic reasoning, demonstrate theeffectiveness of our MAD framework. Exten-sive analyses suggest that the adaptive break ofdebate and the modest level of “tit for tat” stateare required for MAD to obtain good perfor-mance. Moreover, we find that LLMs might notbe a fair judge if different LLMs are used foragents. Code is available at https://github.com/Skytliang/Multi-Agents-Debate.","Tsinghua University, Shanghai Jiao Tong University, Tencent AI Lab" +12,./images/encouraging_divergent_thinking_in_20230530.png,Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate,"Kai Xiong, Xiao Ding, Yixin Cao, Ting Liu, Bing Qin","Large Language Models (LLMs) have shownimpressive capabilities in various applications,but they still face various inconsistency issues.Existing works primarily focus on the incon-sistency issues within a single LLM, while wecomplementarily explore the inter-consistencyamong multiple LLMs for collaboration. Toexamine whether LLMs can collaborate effec-tively to achieve a consensus for a shared goal,we focus on commonsense reasoning, and in-troduce a formal debate framework (FORD)to conduct a three-stage debate among LLMswith real-world scenarios alignment: fair de-bate, mismatched debate, and roundtable de-bate. Through extensive experiments on var-ious datasets, LLMs can effectively collabo-rate to reach a consensus despite noticeableinter-inconsistencies, but imbalances in theirabilities can lead to domination by superiorLLMs. Leveraging a more advanced LLM likeGPT-4 as an authoritative judge can boost col-laboration performance. Our work contributesto understanding the inter-consistency amongLLMs and lays the foundation for develop-ing future collaboration methods. Codes anddata are available at https://github.com/Waste-Wood/FORD.","Harbin Institute of Technology, Singapore Management University" +13,./images/examining_inter-consistency_of_large_20230519.png,Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf,"Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, Yang Liu","Communication games, which we refer to asincomplete information games that heavily de-pend on natural language communication, holdsignificant research value in fields such as eco-nomics, social science, and artificial intelli-gence. In this work, we explore the problem ofhow to engage large language models (LLMs)in communication games, and in response, pro-pose a tuning-free framework. Our approachkeeps LLMs frozen, and relies on the retrievaland reflection on past communications and ex-periences for improvement. An empirical studyon the representative and widely-studied com-munication game, “Werewolf”, demonstratesthat our framework can effectively play Were-wolf game without tuning the parameters of theLLMs. More importantly, strategic behaviorsbegin to emerge in our experiments, suggest-ing that it will be a fruitful journey to engageLLMs in communication games and associateddomains.","Tsinghua University, Zhongguancun Laboratory" +14,./images/exploring_large_language_models_20230909.png,Generative Agents: Interactive Simulacra of Human Behavior,"Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein","Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.","Stanford University, Google Research, Google DeepMind" +15,./images/generative_agents_interactive_simulacra_20230407.png,Improving Factuality and Reasoning in Language Models through Multiagent Debate,"Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch","Large language models (LLMs) have demonstrated remarkable capabilities inlanguage generation, understanding, and few-shot learning in recent years. Anextensive body of work has explored how their performance may be further im-proved through the tools of prompting, ranging from verification, self-consistency,or intermediate scratchpads. In this paper, we present a complementary approachto improve language responses where multiple language model instances proposeand debate their individual responses and reasoning processes over multiple roundsto arrive at a common final answer. Our findings indicate that this approachsignificantly enhances mathematical and strategic reasoning across a number oftasks. We also demonstrate that our approach improves the factual validity ofgenerated content, reducing fallacious answers and hallucinations that contem-porary models are prone to. Our approach may be directly applied to existingblack-box models and uses identical procedure and prompts for all tasks we inves-tigate. Overall, our findings suggest that such ""society of minds"" approach has thepotential to significantly advance the capabilities of LLMs and pave the way forfurther breakthroughs in language generation and understanding. Project websiteat https://composable-models.github.io/llm_debate/.","MIT CSAIL, Google Brain" +16,./images/improving_factuality_and_reasoning_20230523.png,Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback,"Yao Fu, Hao Peng, Tushar Khot, Mirella Lapata","We study whether multiple large language models (LLMs) can autonomouslyimprove each other in a negotiation game by playing, reflecting, and criticizing.We are interested in this question because if LLMs were able to improve eachother, it would imply the possibility of creating strong AI agents with minimalhuman intervention. We ask two LLMs to negotiate with each other, playingthe roles of a buyer and a seller, respectively. They aim to reach a deal withthe buyer targeting a lower price and the seller a higher one. A third languagemodel, playing the critic, provides feedback to a player to improve the player’snegotiation strategies. We let the two agents play multiple rounds, using previousnegotiation history and AI feedback as in-context demonstrations to improve themodel’s negotiation strategy iteratively. We use different LLMs (GPT and Claude)for different roles and use the deal price as the evaluation metric. Our experimentsreveal multiple intriguing findings: (","University of Edinburgh, Allen Institute for AI, University of Edinburgh" +17,./images/improving_language_model_negotiation_20230517.png,Improving Multi-Agent Debate with Sparse Communication Topology,"Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, Eugene Ie","Multi-agent debate has proven effective in im-proving large language models quality for rea-soning and factuality tasks. While various role-playing strategies in multi-agent debates havebeen explored, in terms of the communica-tion among agents, existing approaches adopta brute force algorithm – each agent can com-municate with all other agents. In this paper,we systematically investigate the effect of com-munication connectivity in multi-agent systems.Our experiments on GPT and Mistral models re-veal that multi-agent debates leveraging sparsecommunication topology can achieve compara-ble or superior performance while significantlyreducing computational costs. Furthermore, weextend the multi-agent debate framework tomultimodal reasoning and alignment labelingtasks, showcasing its broad applicability andeffectiveness. Our findings underscore the im-portance of communication connectivity on en-hancing the efficiency and effectiveness of the“society of minds” approach.","Google, Google DeepMind" +18,./images/improving_multi-agent_debate_with_20240617.png,LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay,"Yihuai Lan, Zhiqiang Hu, Lei Wang, Yang Wang, Deheng Ye, Peilin Zhao, Ee-Peng Lim, Hui Xiong, Hao Wang","This paper explores the open research prob-lem of understanding the social behaviors ofLLM-based agents. Using Avalon as a testbed,we employ system prompts to guide LLMagents in gameplay. While previous studieshave touched on gameplay with LLM agents,research on their social behaviors is lacking.We propose a novel framework, tailored forAvalon, features a multi-agent system facil-itating efficient communication and interac-tion. We evaluate its performance based ongame success and analyze LLM agents’ so-cial behaviors. Results affirm the framework’seffectiveness in creating adaptive agents andsuggest LLM-based agents’ potential in nav-igating dynamic social interactions. By ex-amining collaboration and confrontation be-haviors, we offer insights into this field’s re-search and applications.Our code is pub-licly available at https://github.com/3DAgentWorld/LLM-Game-Agent","The Hong Kong University of Science and Technology (Guangzhou), Singapore University of Technology and Design, Singapore Management University, Verily Life Sciences, Tencent" +19,./images/llm-based_agent_society_investigation_20231023.png,LM vs LM: Detecting Factual Errors via Cross Examination,"Roi Cohen, May Hamri, Mor Geva, Amir Globerson","A prominent weakness of modern languagemodels (LMs) is their tendency to generate fac-tually incorrect text, which hinders their us-ability. A natural question is whether such fac-tual errors can be detected automatically. In-spired by truth-seeking mechanisms in law, wepropose a factuality evaluation framework forLMs that is based on cross-examination. Ourkey idea is that an incorrect claim is likely toresult in inconsistency with other claims thatthe model generates. To discover such incon-sistencies, we facilitate a multi-turn interactionbetween the LM that generated the claim andanother LM (acting as an examiner) which in-troduces questions to discover inconsistencies.We empirically evaluate our method on factualclaims made by multiple recent LMs on fourbenchmarks, finding that it outperforms exist-ing methods and baselines, often by a largegap. Our results demonstrate the potential ofusing interacting LMs to capture factual errors.","Tel Aviv University, Google DeepMind, Google Research" +20,./images/lm_vs_lm_detecting_20230522.png,PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games,"Qinglin Zhu, Runcong Zhao, Jinhua Du, Lin Gui, Yulan He","We propose PLAYER*, a novel framework that addresses the limitations of existing agent-based approaches built on Large Language Models (LLMs) in handling complex questions and understanding interpersonal relationships in dynamic environments. PLAYER* enhances path planning in Murder Mystery Games (MMGs) using an anytime sampling-based planner and a questioning-driven search framework. By equipping agents with a set of sensors, PLAYER* eliminates the need for pre-defined questions and enables agents to navigate complex social interactions. We additionally make a contribution by introducing a quantifiable evaluation method using multiple-choice questions and present WellPlay, a dataset containing 1,482 question-answer pairs. Experimental results demonstrate PLAYER*'s superiority over existing multi-agent methods, enhancing the generalisability and adaptability of agents in MMGs and paving the way for more effective multi-agent interactions.","King’s College London, Huawei London Research Centre, The Alan Turing Institute" +21,./images/player_enhancing_llm-based_multi-agent_20240426.png,RoCo: Dialectic Multi-Robot Collaboration with Large Language Models,"Zhao Mandi, Shreeya Jain, Shuran Song",": We propose a novel approach to multi-robot collaboration that har-nesses the power of pre-trained large language models (LLMs) for both high-levelcommunication and low-level path planning. Robots are equipped with LLMs todiscuss and collectively reason task strategies. They then generate sub-task plansand task space waypoint paths, which are used by a multi-arm motion planner toaccelerate trajectory planning. We also provide feedback from the environment,such as collision checking, and prompt the LLM agents to improve their plan andwaypoints in-context. For evaluation, we introduce RoCoBench, a 6-task bench-mark covering a wide range of multi-robot collaboration scenarios, accompaniedby a text-only dataset for agent representation and reasoning. We experimentallydemonstrate the effectiveness of our approach – it achieves high success ratesacross all tasks in RoCoBench and adapts to variations in task semantics. Our di-alog setup offers high interpretability and flexibility – in real world experiments,we show RoCo easily incorporates human-in-the-loop, where a user can commu-nicate and collaborate with a robot agent to complete tasks together. See projectwebsite project-roco.github.io for videos and code.",Columbia University +22,./images/roco_dialectic_multi-robot_collaboration_20230710.png,Scaling Large-Language-Model-based Multi-Agent Collaboration,"Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun","Pioneering advancements in large languagemodel-powered agents have underscored thedesign pattern of multi-agent collaboration,demonstrating that collective intelligence cansurpass the capabilities of each individual. In-spired by the neural scaling law, which positsthat increasing neurons leads to emergent abil-ities, this study investigates whether a simi-lar principle applies to increasing agents inmulti-agent collaboration.Technically, wepropose ::multi-agent:collaboration::networks(MACNET), which utilize directed acyclicgraphs to organize agents and streamline theirinteractive reasoning via topological ordering,with solutions derived from their dialogues.Extensive experiments show that MACNETconsistently outperforms baseline models, en-abling effective agent collaboration across var-ious network topologies and supporting coop-eration among more than a thousand agents.Notably, we observed a small-world collabo-ration phenomenon, where topologies resem-bling small-world properties achieved supe-rior performance. Additionally, we identifieda collaborative scaling law, indicating thatnormalized solution quality follows a logisticgrowth pattern as scaling agents, with collabo-rative emergence occurring much earlier thanpreviously observed instances of neural emer-gence. The code and data will be available athttps://github.com/OpenBMB/ChatDev.","Tsinghua University, Beijing University of Posts and Telecommunications" +23,./images/scaling_large-language-model-based_multi-agent_collaboration_20240611.png,The Impact of Language on Arithmetic Proficiency- A Multilingual Investigation with Cross-Agent Checking Computation,"Chung-Chi Chen, Hiroya Takamura, Ichiro Kobayashi, Yusuke Miyao","This paper critically examines the arithmetic capabilities of Large Language Models (LLMs), uncovering significant limitations in their performance. Our research reveals a notable decline in accuracy for complex calculations involving large numbers, with addition and subtraction tasks showing varying degrees of proficiency. Additionally, we challenge the notion that arithmetic is language-independent, finding up to a 10% difference in performance across twenty languages. The study also compares self-verification methods with cross-agent collaborations, showing that a single model often outperforms collaborative approaches in basic arithmetic tasks. These findings suggest a need to reassess the effectiveness of LLMs in tasks requiring numerical accuracy and precision.","AIST, University of Tokyo" +24,./images/the_impact_of_language_20240616.png,Theory of Mind for Multi-Agent Collaboration via Large Language Models,"Huao Li, Yu Quan Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Michael Lewis, Katia Sycara","While Large Language Models (LLMs) havedemonstrated impressive accomplishments inboth reasoning and planning, their abilitiesin multi-agent collaborations remains largelyunexplored.This study evaluates LLM-based agents in a multi-agent cooperative textgame with Theory of Mind (ToM) inferencetasks, comparing their performance with Multi-Agent Reinforcement Learning (MARL) andplanning-based baselines. We observed evi-dence of emergent collaborative behaviors andhigh-order Theory of Mind capabilities amongLLM-based agents. Our results reveal limi-tations in LLM-based agents’ planning opti-mization due to systematic failures in managinglong-horizon contexts and hallucination aboutthe task state. We explore the use of explicitbelief state representations to mitigate these is-sues, finding that it enhances task performanceand the accuracy of ToM inferences for LLM-based agents.","University of Pittsburgh, Carnegie Mellon University" +25,./images/theory_of_mind_for_20231016.png,Toward Optimal LLM Alignments Using Two-Player Games,"Rui Zheng, Hongyi Guo, Zhihan Liu, Xiaoying Zhang, Yuanshun Yao, Xiaojun Xu, Zhaoran Wang, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang, Hang Li, Yang Liu","Alignment of large language models is a critical process designed to ensure thatthe model’s responses to user prompts accurately reflect human intentions andadhere to societal values. The standard Reinforcement Learning from HumanFeedback (RLHF) framework primarily focuses on optimizing the performance oflarge language models using pre-collected prompts. However, collecting promptsthat provide comprehensive coverage is both tedious and challenging, and oftenfails to include scenarios that LLMs need to improve on the most. In this paper,we investigate alignment through the lens of two-agent games, involving iterativeinteractions between an adversarial and a defensive agent. The adversarial agent’stask at each step is to generate prompts that expose the weakness of the defensiveagent. In return, the defensive agent seeks to improve its responses to these newlyidentified prompts it “struggled"" with, based on feedback from the reward model.We theoretically demonstrate that this iterative reinforcement learning optimizationconverges to a Nash Equilibrium for the game induced by the agents. Experi-mental results in safety scenarios demonstrate that learning in such a competitiveenvironment not only fully trains agents but also leads to policies with enhancedgeneralization capabilities for both adversarial and defensive agents. Our code isreleased at https://github.com/ruizheng20/gpo.","Fudan University, Northwestern University, ByteDance Research" +26,./images/toward_optimal_llm_alignments_20240616.png,Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework,"Xiaoxi Sun, Jinpeng Li, Yan Zhong, Dongyan Zhao, Rui Yan","The advent of large language models (LLMs)has facilitated the development of natural lan-guage text generation. It also poses unprece-dented challenges, with content hallucinationemerging as a significant concern. Existingsolutions often involve expensive and complexinterventions during the training process. More-over, some approaches emphasize problem dis-assembly while neglecting the crucial valida-tion process, leading to performance degrada-tion or limited applications. To overcome theselimitations, we propose a Markov Chain-basedmulti-agent debate verification framework toenhance hallucination detection accuracy inconcise claims. Our method integrates the fact-checking process, including claim detection,evidence retrieval, and multi-agent verification.In the verification stage, we deploy multipleagents through flexible Markov Chain-baseddebates to validate individual claims, ensuringmeticulous verification outcomes. Experimen-tal results across three generative tasks demon-strate that our approach achieves significantimprovements over baselines.","Peking University, Renmin University of China" +27,./images/towards_detecting_llms_hallucination_20240605.png,To be Continued...,Your Contributions are Welcome!,, diff --git a/MultiAgentEbook/book_communication/script.js b/MultiAgentEbook/book_communication/script.js new file mode 100755 index 000000000..566c6648a --- /dev/null +++ b/MultiAgentEbook/book_communication/script.js @@ -0,0 +1,94 @@ +document.addEventListener("DOMContentLoaded", function() { + + const csvFilePath = './book_communication/data.csv'; + + + function loadCSV(filePath) { + return fetch(filePath) + .then(response => response.text()) + .then(text => Papa.parse(text, { header: true }).data); + } + + + function createFlipBook(pages) { + const container = document.getElementById('flip_book_container'); + const numPages = pages.length; + + let flipBookHTML = ''; + let style = document.createElement('style'); + let css = ''; + + + flipBookHTML += `\n`; + for (let i = 0; i < numPages - 1; i++) { + flipBookHTML += `\n`; + } + + flipBookHTML += `
\n`; + + flipBookHTML += `
+ +
` + + + for (let i = 0; i < numPages - 1; i++) { + console.log(i) + const page = pages[i]; + const pageIndex = i + 1; + + flipBookHTML += ` +
+
+ + Back content +
+
+ + Back page edge shading +
+

${page.title}

+

${page.author}

+

${page.affiliation}

+

${page.summary}

+
+
+
\n`; + + + css += ` + #page${pageIndex} { + z-index: ${numPages - i}; + } + + #page${pageIndex}_checkbox:checked~#flip_book #page${pageIndex} { + transform: rotateY(-180deg); + z-index: ${i + 1}; + }\n`; + } + + flipBookHTML += `
+ Back Cover +
`; + + + container.innerHTML = flipBookHTML; + + + style.innerHTML = css; + document.head.appendChild(style); + + + const md = window.markdownit(); + const summaryElements = document.querySelectorAll('.summary'); + summaryElements.forEach(el => { + el.innerHTML = md.render(el.textContent); + }); + } + + + loadCSV(csvFilePath).then(pages => { + createFlipBook(pages); + }); +}); diff --git a/MultiAgentEbook/book_communication_index.html b/MultiAgentEbook/book_communication_index.html new file mode 100755 index 000000000..a974a9858 --- /dev/null +++ b/MultiAgentEbook/book_communication_index.html @@ -0,0 +1,23 @@ + + + + + + + Flip Book + + + + + +
+ + + + + + \ No newline at end of file diff --git a/MultiAgentEbook/book_evolution/data.csv b/MultiAgentEbook/book_evolution/data.csv new file mode 100755 index 000000000..870dede00 --- /dev/null +++ b/MultiAgentEbook/book_evolution/data.csv @@ -0,0 +1,11 @@ +,image_path,title,author,summary,affiliation +0,./images/3d.png,360°REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System,"Shen Gao, Hao Li, Zhengliang Shi, Chengrui Huang, Quan Tu, Zhiliang Tian, Minlie Huang, Shuo Shang","Largelanguagemodelagentshavedemonstratedremarkableadvancementsacross various complex tasks. Recent worksfocus on optimizing the agent team oremploying self-reflection to iteratively solvecomplex tasks.Since these agents are allbased on the same LLM, only conductingself-evaluation or removing underperformingagents does not substantively enhance thecapability of the agents.We argue that acomprehensive evaluation and accumulatingexperience from evaluation feedback is aneffectiveapproachtoimprovingsystemperformance.In this paper, we proposeReusableExperienceAccumulationwith360◦ Assessment (360◦REA), a hierarchicalmulti-agent framework inspired by corporateorganizational practices.The frameworkemploys a novel 360◦ performance assessmentmethod for multi-perspective performanceevaluation with fine-grained assessment. Toenhance the capability of agents in addressingcomplextasks,weintroducedual-levelexperience pool for agents to accumulateexperience through fine-grained assessment.Extensiveexperimentsoncomplextaskdatasets demonstrate the effectiveness of360◦REA.","University of Electronic Science and Technology of China, Shandong University, Renmin University of China, National University of Defense Technology, Tsinghua University" +1,./images/360°rea_towards_a_reusable_20240408.png,Affordable Generative Agents,"Yangbin Yu, Qin Zhang, Junyou Li, Qiang Fu, Deheng Ye","The emergence of large language models (LLMs)has significantly advanced the simulation ofbelievable interactive agents.However, thesubstantial cost on maintaining the prolongedagent interactions poses challenge over thedeployment of believable LLM-based agents.Therefore, in this paper, we develop AffordableGenerative Agents (AGA), a framework forenabling the generation of believable andlow-cost interactions on both agent-environmentand inter-agents levels. Specifically, for agent-environment interactions, we substitute repetitiveLLM inferences with learned policies; while forinter-agent interactions, we model the social rela-tionships between agents and compress auxiliarydialogue information. Extensive experiments onmultiple environments show the effectivenessand efficiency of our proposed framework. Also,we delve into the mechanisms of emergentbelievable behaviors lying in LLM agents,demonstrating that agents can only generatefinite behaviors in fixed environments, basedupon which, we understand ways to facilitateemergent interaction behaviors.Our code ispublicly available at:https://github.com/AffordableGenerativeAgents/Affordable-Generative-Agents.",Tencent Inc. +2,./images/affordable_generative_agents_20240203.png,Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents,"Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, Yang Liu","In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates theentire process of treating illness. All patients, nurses, and doctors are autonomous agents powered bylarge language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illnesswithin the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum cansimulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keepaccumulating experience from both successful and unsuccessful cases. Simulation experiments show thatthe treatment performance of doctor agents consistently improves on various tasks. More interestingly,the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicarebenchmarks. After treating around ten thousand patients (real-world doctors may take over two years),the evolved doctor agent achieves a state-of-the-art accuracy of 9",Tsinghua University +3,./images/agent_hospital_a_simulacrum_20240505.png,Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication,"Weize Chen, Chenfei Yuan, Jiarui Yuan, Yusheng Su, Chen Qian, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun","Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reasoning and multi-agent communication, has not been thoroughly examined. In this work, we challenge the default use of NL by exploring the utility of non-NL formats in these contexts. We show that allowing LLMs to autonomously select the most suitable format before reasoning or communicating leads to a 3.3 to 5.7\% improvement in reasoning efficiency for different LLMs, and up to a 72.7\% reduction in token usage in multi-agent communication, all while maintaining communicative effectiveness. Our comprehensive analysis further reveals that LLMs can devise a format from limited task instructions and that the devised format is effectively transferable across different LLMs. Intriguingly, the structured communication format decided by LLMs exhibits notable parallels with established agent communication languages, suggesting a natural evolution towards efficient, structured communication in agent communication.","Tsinghua University, Tencent, Beijing University of Posts and Telecommunications" +4,./images/beyond_natural_language_llms_20240228.png,Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization,"Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang","Large language model (LLM) agents have been shown effective on a wide rangeof tasks, and by ensembling multiple LLM agents, their performances could befurther improved. Existing approaches employ a fixed set of agents to interactwith each other in a static architecture, which limits their generalizability to vari-ous tasks and requires strong human prior in designing these agents. In this work,we propose to construct a strategic team of agents communicating in a dynamicinteraction architecture based on the task query. Specifically, we build a frame-work named Dynamic LLM-Agent Network (DyLAN) for LLM-agent collabora-tion on complicated tasks like reasoning and code generation. DyLAN enablesagents to interact for multiple rounds in a dynamic architecture with inference-time agent selection and an early-stopping mechanism to improve performanceand efficiency. We further design an automatic agent team optimization algorithmbased on an unsupervised metric termed Agent Importance Score, enabling theselection of best agents based on the contribution each agent makes. Empirically,we demonstrate that DyLAN performs well in both reasoning and code generationtasks with reasonable computational cost. DyLAN achieves 1","Tsinghua University, Georgia Tech, Stanford University" +5,./images/dynamic_llm-agent_network_an_20231003.png,Experiential Co-Learning of Software-Developing Agents,"Chen Qian, Yufan Dang, Jiahao Li, Wei Liu, Zihao Xie, Yifei Wang, Weize Chen, Cheng Yang, Xin Cong, Xiaoyin Che, Zhiyuan Liu, Maosong Sun","Recent advancements in large language mod-els (LLMs) have brought significant changesto various domains, especially through LLM-driven autonomous agents. A representativescenario is in software development, whereLLM agents demonstrate efficient collabora-tion, task division, and assurance of softwarequality, markedly reducing the need for man-ual involvement. However, these agents fre-quently perform a variety of tasks indepen-dently, without benefiting from past experi-ences, which leads to repeated mistakes andinefficient attempts in multi-step task execu-tion. To this end, we introduce Experiential Co-Learning, a novel LLM-agent learning frame-work in which instructor and assistant agentsgather shortcut-oriented experiences from theirhistorical trajectories and use these past expe-riences for future task execution. The exten-sive experiments demonstrate that the frame-work enables agents to tackle unseen software-developing tasks more effectively. We antici-pate that our insights will guide LLM agentstowards enhanced autonomy and contributeto their evolutionary growth in cooperativelearning. The code and data are available athttps://github.com/OpenBMB/ChatDev.","Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens" +6,./images/experiential_co-learning_of_software-developing_20231228.png,Iterative Experience Refinement of Software-Developing Agents,"Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, YiFei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun","Autonomous agents powered by large languagemodels (LLMs) show significant potential forachieving high autonomy in various scenar-ios such as software development. Recent re-search has shown that LLM agents can lever-age past experiences to reduce errors and en-hance efficiency. However, the static experi-ence paradigm, reliant on a fixed collection ofpast experiences acquired heuristically, lacksiterative refinement and thus hampers agents’adaptability. In this paper, we introduce the It-erative Experience Refinement framework, en-abling LLM agents to refine experiences itera-tively during task execution. We propose twofundamental patterns: the successive pattern,refining based on nearest experiences within atask batch, and the cumulative pattern, acquir-ing experiences across all previous task batches.Augmented with our heuristic experience elim-ination, the method prioritizes high-quality andfrequently-used experiences, effectively man-aging the experience space and enhancing effi-ciency. Extensive experiments show that whilethe successive pattern may yield superior re-sults, the cumulative pattern provides more sta-ble performance......","Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens" +7,./images/iterative_experience_refinement_of_20240507.png,Language Agents as Optimizable Graphs,"Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber","Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations. Graphs can be recursively combined into larger composite graphs representing hierarchies of inter-agent collaboration (where edges connect operations of different agents). Our novel automatic graph optimizers (1) refine node-level LLM prompts (node optimization) and (2) improve agent orchestration by changing graph connectivity (edge optimization). Experiments demonstrate that our framework can be used to efficiently develop, integrate, and automatically improve various LLM agents. ","King Abdullah University of Science and Technology, The Swiss AI Lab IDSIA, USI, SUPSI" +8,./images/language_agents_as_optimizable_20240226.png,Lyfe Agents: Generative agents for low-cost real-time social interactions,"Zhao Kaiya, Michelangelo Naim, Jovana Kondic, Manuel Cortes, Jiaxin Ge, Shuying Luo, Guangyu Robert Yang, Andrew Ahn","Highly autonomous generative agents powered by large language models promise to simulate intricate social behaviors in virtual societies. However, achieving real-time interactions with humans at a low computational cost remains challenging. Here, we introduce Lyfe Agents. They combine low-cost with real-time responsiveness, all while remaining intelligent and goal-oriented. Key innovations include: (1) an option-action framework, reducing the cost of high-level decisions; (2) asynchronous self-monitoring for better self-consistency; and (3) a Summarize-and-Forget memory mechanism, prioritizing critical memory items at a low cost. We evaluate Lyfe Agents' self-motivation and sociability across several multi-agent scenarios in our custom LyfeGame 3D virtual environment platform. When equipped with our brain-inspired techniques, Lyfe Agents can exhibit human-like self-motivated social reasoning. For example, the agents can solve a crime (a murder mystery) through autonomous collaboration and information exchange. Meanwhile, our techniques enabled Lyfe Agents to operate at a computational cost 10-100 times lower than existing alternatives. Our findings underscore the transformative potential of autonomous generative agents to enrich human social experiences in virtual worlds.","Massachusetts Institute of Technology, Peking University, LyfeAL" +9,./images/lyfe_agents_generative_agents_20231003.png,To be Continued...,Your Contributions are Welcome!,, diff --git a/MultiAgentEbook/book_evolution/script.js b/MultiAgentEbook/book_evolution/script.js new file mode 100755 index 000000000..b10dd0587 --- /dev/null +++ b/MultiAgentEbook/book_evolution/script.js @@ -0,0 +1,94 @@ +document.addEventListener("DOMContentLoaded", function() { + + const csvFilePath = './book_evolution/data.csv'; + + + function loadCSV(filePath) { + return fetch(filePath) + .then(response => response.text()) + .then(text => Papa.parse(text, { header: true }).data); + } + + + function createFlipBook(pages) { + const container = document.getElementById('flip_book_container'); + const numPages = pages.length; + + let flipBookHTML = ''; + let style = document.createElement('style'); + let css = ''; + + + flipBookHTML += `\n`; + for (let i = 0; i < numPages - 1; i++) { + flipBookHTML += `\n`; + } + + flipBookHTML += `
\n`; + + flipBookHTML += `
+ +
` + + + for (let i = 0; i < numPages - 1; i++) { + console.log(i) + const page = pages[i]; + const pageIndex = i + 1; + + flipBookHTML += ` +
+
+ + Back content +
+
+ + Back page edge shading +
+

${page.title}

+

${page.author}

+

${page.affiliation}

+

${page.summary}

+
+
+
\n`; + + + css += ` + #page${pageIndex} { + z-index: ${numPages - i}; + } + + #page${pageIndex}_checkbox:checked~#flip_book #page${pageIndex} { + transform: rotateY(-180deg); + z-index: ${i + 1}; + }\n`; + } + + flipBookHTML += `
+ Back Cover +
`; + + + container.innerHTML = flipBookHTML; + + + style.innerHTML = css; + document.head.appendChild(style); + + + const md = window.markdownit(); + const summaryElements = document.querySelectorAll('.summary'); + summaryElements.forEach(el => { + el.innerHTML = md.render(el.textContent); + }); + } + + + loadCSV(csvFilePath).then(pages => { + createFlipBook(pages); + }); +}); diff --git a/MultiAgentEbook/book_evolution_index.html b/MultiAgentEbook/book_evolution_index.html new file mode 100755 index 000000000..6af9e077f --- /dev/null +++ b/MultiAgentEbook/book_evolution_index.html @@ -0,0 +1,23 @@ + + + + + + + Flip Book + + + + + +
+ + + + + + \ No newline at end of file diff --git a/MultiAgentEbook/book_organization/data.csv b/MultiAgentEbook/book_organization/data.csv new file mode 100755 index 000000000..66705d360 --- /dev/null +++ b/MultiAgentEbook/book_organization/data.csv @@ -0,0 +1,42 @@ +,image_path,title,author,summary,affiliation +0,./images/2d.png,(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts,"Minghao Wu, Yulin Yuan, Gholamreza Haffari, Longyue Wang","Recent advancements in machine translation (MT) have significantly enhancedtranslation quality across various domains. However, the translation of literarytexts remains a formidable challenge due to their complex language, figurative ex-pressions, and cultural nuances. In this work, we introduce a novel multi-agentframework based on large language models (LLMs) for literary translation, im-plemented as a company called TRANSAGENTS, which mirrors traditional trans-lation publication process by leveraging the collective capabilities of multipleagents, to address the intricate demands of translating literary works. To evaluatethe effectiveness of our system, we propose two innovative evaluation strategies:Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP).MHP assesses translations from the perspective of monolingual readers of the tar-get language, while BLP uses advanced LLMs to compare translations directlywith the original texts. Empirical findings indicate that despite lower d-BLEUscores, translations from TRANSAGENTS are preferred by both human evalua-tors and LLMs over human-written references, particularly in genres requiringdomain-specific knowledge. We also highlight the strengths and limitations ofTRANSAGENTS through case studies and suggests directions for future research.","Monash University, University of Macau, Tencent AI Lab" +1,./images/(perhaps)_beyond_human_translation_20240520.png,Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents,"Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, Yang Liu","In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates theentire process of treating illness. All patients, nurses, and doctors are autonomous agents powered bylarge language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illnesswithin the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum cansimulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keepaccumulating experience from both successful and unsuccessful cases. Simulation experiments show thatthe treatment performance of doctor agents consistently improves on various tasks. More interestingly,the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicarebenchmarks. After treating around ten thousand patients (real-world doctors may take over two years),the evolved doctor agent achieves a state-of-the-art accuracy of 9",Tsinghua University +2,./images/agent_hospital_a_simulacrum_20240505.png,AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,"Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, Chi Wang","AutoGen2 is an open-source framework that allows developers to build LLM ap-plications via multiple agents that can converse with each other to accomplishtasks. AutoGen agents are customizable, conversable, and can operate in vari-ous modes that employ combinations of LLMs, human inputs, and tools. UsingAutoGen, developers can also flexibly define agent interaction behaviors. Bothnatural language and computer code can be used to program flexible conversationpatterns for different applications. AutoGen serves as a generic framework forbuilding diverse applications of various complexities and LLM capacities. Em-pirical studies demonstrate the effectiveness of the framework in many exampleapplications, with domains ranging from mathematics, coding, question answer-ing, operations research, online decision-making, entertainment, etc.","Microsoft Research, Pennsylvania State University, University of Washington, Xidian University" +3,./images/autogen_enabling_next-gen_llm_20230816.png,Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation,"Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang","Recent breakthroughs in large language models (LLMs) have brought remark-able success in the field of LLM-as-Agent. Nevertheless, a prevalent assumptionis that the information processed by LLMs is consistently honest, neglecting thepervasive deceptive or misleading information in human society and AI-generatedcontent.This oversight makes LLMs susceptible to malicious manipulations,potentially resulting in detrimental outcomes. This study utilizes the intricateAvalon game as a testbed to explore LLMs’ potential in deceptive environments.Avalon, full of misinformation and requiring sophisticated logic, manifests as a“Game-of-Thoughts”. Inspired by the efficacy of humans’ recursive thinking andperspective-taking in the Avalon game, we introduce a novel framework, Recur-sive Contemplation (ReCon), to enhance LLMs’ ability to identify and counteractdeceptive information. ReCon combines formulation and refinement contempla-tion processes; formulation contemplation produces initial thoughts and speech,while refinement contemplation further polishes them. Additionally, we incor-porate first-order and second-order perspective transitions into these processesrespectively. Specifically, the first-order allows an LLM agent to infer others’mental states, and the second-order involves understanding how others perceivethe agent’s mental state.......","Tsinghua University, BIGAI, Technical University of Munich" +4,./images/avalon's_game_of_thoughts_20231002.png,Chain of Agents: Large Language Models Collaborating on Long-Context Tasks,"Yusen Zhang, Ruoxi Sun, Yanfei Chen, Tomas Pfister, Rui Zhang, Sercan Ö. Arik","Addressing the challenge of effectively processing long contexts has become a critical issue for Large Language Models (LLMs). Two common strategies have emerged: 1) reducing the input length, such as retrieving relevant chunks by Retrieval-Augmented Generation (RAG), and 2) expanding the context window limit of LLMs. However, both strategies have drawbacks: input reduction has no guarantee of covering the part with needed information, while window extension struggles with focusing on the pertinent information for solving the task. To mitigate these limitations, we propose Chain-of-Agents (CoA), a novel framework that harnesses multi-agent collaboration through natural language to enable information aggregation and context reasoning across various LLMs over long-context tasks. CoA consists of multiple worker agents who sequentially communicate to handle different segmented portions of the text, followed by a manager agent who synthesizes these contributions into a coherent final output. CoA processes the entire input by interleaving reading and reasoning, and it mitigates long context focus issues by assigning each agent a short context. We perform comprehensive evaluation of CoA on a wide range of long-context tasks in question answering, summarization, and code completion, demonstrating significant improvements by up to 10% over strong baselines of RAG, Full-Context, and multi-agent LLMs.","Penn State University, Google Cloud AI Research" +5,./images/chain_of_agents_large_20240604.png,ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation,"Zejun Wang, Jia Li, Ge Li, Zhi Jin","Large language models have shown good performances in generat-ing code to meet human requirements. However, human require-ments expressed in natural languages can be vague, incomplete,and ambiguous, leading large language models to misunderstandhuman requirements and make mistakes. Worse, it is difficult for ahuman user to refine the requirement. To help human users refinetheir requirements and improve large language models’ code gen-eration performances, we propose ChatCoder: a method to refinethe requirements via chatting with large language models. We de-sign a chat scheme in which the large language models will guidethe human users to refine their expression of requirements to bemore precise, unambiguous, and complete than before. Experimentsshow that ChatCoder has improved existing large language models’performance by a large margin. Besides, ChatCoder has the advan-tage over refine-based methods and LLMs fine-tuned via humanresponse.",Peking University +6,./images/chatcoder_chat-based_refine_requirement_20231101.png,ChatDev: Communicative Agents for Software Development,"Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun","Software development is a complex task thatnecessitates cooperation among multiple mem-bers with diverse skills. Numerous studies useddeep learning to improve specific phases in awaterfall model, such as design, coding, andtesting.However, the deep learning modelin each phase requires unique designs, lead-ing to technical inconsistencies across variousphases, which results in a fragmented and in-effective development process. In this paper,we introduce ChatDev, a chat-powered soft-ware development framework in which special-ized agents driven by large language models(LLMs) are guided in what to communicate(via chat chain) and how to communicate (viacommunicative dehallucination). These agentsactively contribute to the design, coding, andtesting phases through unified language-basedcommunication, with solutions derived fromtheir multi-turn dialogues. We found their uti-lization of natural language is advantageousfor system design, and communicating in pro-gramming language proves helpful in debug-ging. This paradigm demonstrates how linguis-tic communication facilitates multi-agent col-laboration, establishing language as a unify-ing bridge for autonomous task-solving amongLLM agents. The code and data are availableat https://github.com/OpenBMB/ChatDev.","Tsinghua University, The University of Sydney, BUPT, Modelbest Inc." +7,./images/chatdev_communicative_agents_for_20230716.png,ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate,"Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, Zhiyuan Liu","Text evaluation has historically posed significant challenges, often demandingsubstantial labor and time cost. With the emergence of large language models(LLMs), researchers have explored LLMs’ potential as alternatives for humanevaluation. While these single-agent-based approaches show promise, experi-mental results suggest that further advancements are needed to bridge the gapbetween their current effectiveness and human-level evaluation quality. Recog-nizing that best practices of human evaluation processes often involve multiplehuman annotators collaborating in the evaluation, we resort to a multi-agent debateframework, moving beyond single-agent prompting strategies. The multi-agent-based approach enables a group of LLMs to synergize with an array of intelli-gent counterparts, harnessing their distinct capabilities and expertise to enhanceefficiency and effectiveness in handling intricate tasks. In this paper, we con-struct a multi-agent referee team called ChatEval to autonomously discuss andevaluate the quality of generated responses from different models on open-endedquestions and traditional natural language generation (NLG) tasks. We deriveinsights and lessons from practical scenarios where humans instigate group dis-cussions for brainstorming and propose different communication strategies withinChatEval......","Tsinghua University, Hong Kong University of Science and Technology, Peking University" +8,./images/chateval_towards_better_llm-based_20230814.png,"CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving","Pei Chen, Boran Han, Shuai Zhang","Large Language Models (LLMs) have showngreat ability in solving traditional natural lan-guage tasks and elementary reasoning taskswith appropriate prompting techniques. How-ever, their ability is still limited in solving com-plicated science problems. In this work, weaim to push the upper bound of the reason-ing capability of LLMs by proposing a col-laborative multi-agent, multi-reasoning-path(CoMM) prompting framework. Specifically,we prompt LLMs to play different roles in aproblem-solving team, and encourage differ-ent role-play agents to collaboratively solvethe target task. In particular, we discover thatapplying different reasoning paths for differ-ent roles is an effective strategy to implementfew-shot prompting approaches in the multi-agent scenarios. Empirical results demonstratethe effectiveness of the proposed methods ontwo college-level science problems over com-petitive baselines. Our further analysis showsthe necessity of prompting LLMs to play dif-ferent roles or experts independently. We re-lease the code at: https://github.com/amazon-science/comm-prompt.","Texas A&M University, Amazon Web Services" +9,"./images/comm_collaborative_multi-agent,_multi-reasoning-path_20240426.png","Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents","Zihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian Ma, Yitao Liang","We investigate the challenge of task planning for multi-task embodied agents in open-world environments. Two main difficulties are identified: 1) executing plans in an open-world environment (e.g., Minecraft) necessitates accurate and multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla planners do not consider how easy the current agent can achieve a given sub-task when ordering parallel sub-goals within a complicated plan, the resulting plan could be inefficient or even infeasible. To this end, we propose ""Describe, Explain, Plan and Select"" (DEPS), an interactive planning approach based on Large Language Models (LLMs). DEPS facilitates better error correction on initial LLM-generated plan by integrating description of the plan execution process and providing self-explanation of feedback when encountering failures during the extended planning phases. Furthermore, it includes a goal selector, which is a trainable module that ranks parallel candidate sub-goals based on the estimated steps of completion, consequently refining the initial plan. Our experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks and nearly double the overall performances. Further testing reveals our method's general effectiveness in popularly adopted non-open-ended domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and exploratory studies detail how our design beats the counterparts and provide a promising update on the 𝙾𝚋𝚝𝚊𝚒𝚗𝙳𝚒𝚊𝚖𝚘𝚗𝚍 grand challenge with our approach.","Peking University, University of California Los Angeles, Beijing Institute for General Artificial Intelligence" +10,"./images/describe,_explain,_plan_and_20230203.png",Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization,"Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang","Large language model (LLM) agents have been shown effective on a wide rangeof tasks, and by ensembling multiple LLM agents, their performances could befurther improved. Existing approaches employ a fixed set of agents to interactwith each other in a static architecture, which limits their generalizability to vari-ous tasks and requires strong human prior in designing these agents. In this work,we propose to construct a strategic team of agents communicating in a dynamicinteraction architecture based on the task query. Specifically, we build a frame-work named Dynamic LLM-Agent Network (DyLAN) for LLM-agent collabora-tion on complicated tasks like reasoning and code generation. DyLAN enablesagents to interact for multiple rounds in a dynamic architecture with inference-time agent selection and an early-stopping mechanism to improve performanceand efficiency. We further design an automatic agent team optimization algorithmbased on an unsupervised metric termed Agent Importance Score, enabling theselection of best agents based on the contribution each agent makes. Empirically,we demonstrate that DyLAN performs well in both reasoning and code generationtasks with reasonable computational cost. DyLAN achieves 1","Tsinghua University, Georgia Tech, Stanford University" +11,./images/dynamic_llm-agent_network_an_20231003.png,EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities,"Nian Li, Chen Gao, Mingyu Li, Yong Li, Qingmin Liao","The advent of artificial intelligence has led to agrowing emphasis on data-driven modeling inmacroeconomics, with agent-based modeling(ABM) emerging as a prominent bottom-upsimulation paradigm. In ABM, agents (e.g.,households, firms) interact within a macroe-conomic environment, collectively generatingmarket dynamics. Existing agent modeling typ-ically employs predetermined rules or learning-based neural networks for decision-making.However, customizing each agent presents sig-nificant challenges, complicating the modelingof agent heterogeneity. Additionally, the in-fluence of multi-period market dynamics andmultifaceted macroeconomic factors are oftenoverlooked in decision-making processes. Inthis work, we introduce EconAgent, a largelanguage model-empowered agent with human-like characteristics for macroeconomic simu-lation. We first construct a simulation envi-ronment that incorporates various market dy-namics driven by agents’ decisions regardingwork and consumption. Through the perceptionmodule, we create heterogeneous agents withdistinct decision-making mechanisms.Fur-thermore, we model the impact of macroeco-nomic trends using a memory module, whichallows agents to reflect on past individual ex-periences and market dynamics. Simulationexperiments show that EconAgent can makerealistic decisions, leading to more reasonablemacroeconomic phenomena compared to exist-ing rule-based or learning-based agents. Ourcodes are released at https://github.com/tsinghua-fib-lab/ACL24-EconAgent.",Tsinghua University +12,./images/econagent_large_language_model-empowered_20231016.png,Experiential Co-Learning of Software-Developing Agents,"Chen Qian, Yufan Dang, Jiahao Li, Wei Liu, Zihao Xie, Yifei Wang, Weize Chen, Cheng Yang, Xin Cong, Xiaoyin Che, Zhiyuan Liu, Maosong Sun","Recent advancements in large language mod-els (LLMs) have brought significant changesto various domains, especially through LLM-driven autonomous agents. A representativescenario is in software development, whereLLM agents demonstrate efficient collabora-tion, task division, and assurance of softwarequality, markedly reducing the need for man-ual involvement. However, these agents fre-quently perform a variety of tasks indepen-dently, without benefiting from past experi-ences, which leads to repeated mistakes andinefficient attempts in multi-step task execu-tion. To this end, we introduce Experiential Co-Learning, a novel LLM-agent learning frame-work in which instructor and assistant agentsgather shortcut-oriented experiences from theirhistorical trajectories and use these past expe-riences for future task execution. The exten-sive experiments demonstrate that the frame-work enables agents to tackle unseen software-developing tasks more effectively. We antici-pate that our insights will guide LLM agentstowards enhanced autonomy and contributeto their evolutionary growth in cooperativelearning. The code and data are available athttps://github.com/OpenBMB/ChatDev.","Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens" +13,./images/experiential_co-learning_of_software-developing_20231228.png,Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf,"Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, Yang Liu","Communication games, which we refer to asincomplete information games that heavily de-pend on natural language communication, holdsignificant research value in fields such as eco-nomics, social science, and artificial intelli-gence. In this work, we explore the problem ofhow to engage large language models (LLMs)in communication games, and in response, pro-pose a tuning-free framework. Our approachkeeps LLMs frozen, and relies on the retrievaland reflection on past communications and ex-periences for improvement. An empirical studyon the representative and widely-studied com-munication game, “Werewolf”, demonstratesthat our framework can effectively play Were-wolf game without tuning the parameters of theLLMs. More importantly, strategic behaviorsbegin to emerge in our experiments, suggest-ing that it will be a fruitful journey to engageLLMs in communication games and associateddomains.","Tsinghua University, Zhongguancun Laboratory" +14,./images/exploring_large_language_models_20230909.png,Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting,"Hongda Sun, Hongzhan Lin, Haiyu Yan, Chen Zhu, Yang Song, Xin Gao, Shuo Shang, Rui Yan","The emergence of online recruitment services has revolutionizedthe traditional landscape of job seeking and recruitment, neces-sitating the development of high-quality industrial applicationsto improve person-job fitting. Existing methods generally rely onmodeling the latent semantics of resumes and job descriptions andlearning a matching function between them. Inspired by the pow-erful role-playing capabilities of Large Language Models (LLMs),we propose to introduce a mock interview process between LLM-played interviewers and candidates. The mock interview conver-sations can provide additional evidence for candidate evaluation,thereby augmenting traditional person-job fitting based solely onresumes and job descriptions. However, characterizing these tworoles in online recruitment still presents several challenges, suchas developing the skills to raise interview questions, formulatingappropriate answers, and evaluating two-sided fitness.To this end, we propose MockLLM, a novel applicable frameworkthat divides the person-job matching process into two modules:mock interview generation and two-sided evaluation in handshakeprotocol, jointly enhancing their performance through collaborativebehaviors between interviewers and candidates. We design a role-playing framework as a multi-role and multi-behavior paradigmto enable a single LLM agent to effectively behave with multiplefunctions for both parties......","Renmin University of China, BOSS Zhipin, King Abdullah University of Science and Technology, University of Electronic Science and Technology of China" +15,./images/facilitating_multi-role_and_multi-behavior_20240528.png,GameGPT: Multi-agent Collaborative Framework for Game Development,"Dake Chen, Hanbin Wang, Yunhao Huo, Yuzhao Li, Haoyang Zhang","The large language model (LLM) based agents have demonstrated their capacityto automate and expedite software development processes. In this paper, wefocus on game development and propose a multi-agent collaborative framework,dubbed GameGPT, to automate game development. While many studies havepinpointed hallucination as a primary roadblock for deploying LLMs in production,we identify another concern: redundancy. Our framework presents a series ofmethods to mitigate both concerns. These methods include dual collaboration andlayered approaches with several in-house lexicons, to mitigate the hallucinationand redundancy in the planning, task identification, and implementation phases.Furthermore, a decoupling approach is also introduced to achieve code generationwith better precision.","AutoGame Research, X-Institute, University of Southern California" +16,./images/gamegpt_multi-agent_collaborative_framework_20231012.png,Generative Agents: Interactive Simulacra of Human Behavior,"Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein","Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.","Stanford University, Google Research, Google DeepMind" +17,./images/generative_agents_interactive_simulacra_20230407.png,Improving Multi-Agent Debate with Sparse Communication Topology,"Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, Eugene Ie","Multi-agent debate has proven effective in im-proving large language models quality for rea-soning and factuality tasks. While various role-playing strategies in multi-agent debates havebeen explored, in terms of the communica-tion among agents, existing approaches adopta brute force algorithm – each agent can com-municate with all other agents. In this paper,we systematically investigate the effect of com-munication connectivity in multi-agent systems.Our experiments on GPT and Mistral models re-veal that multi-agent debates leveraging sparsecommunication topology can achieve compara-ble or superior performance while significantlyreducing computational costs. Furthermore, weextend the multi-agent debate framework tomultimodal reasoning and alignment labelingtasks, showcasing its broad applicability andeffectiveness. Our findings underscore the im-portance of communication connectivity on en-hancing the efficiency and effectiveness of the“society of minds” approach.","Google, Google DeepMind" +18,./images/improving_multi-agent_debate_with_20240617.png,Iterative Experience Refinement of Software-Developing Agents,"Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, YiFei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun","Autonomous agents powered by large languagemodels (LLMs) show significant potential forachieving high autonomy in various scenar-ios such as software development. Recent re-search has shown that LLM agents can lever-age past experiences to reduce errors and en-hance efficiency. However, the static experi-ence paradigm, reliant on a fixed collection ofpast experiences acquired heuristically, lacksiterative refinement and thus hampers agents’adaptability. In this paper, we introduce the It-erative Experience Refinement framework, en-abling LLM agents to refine experiences itera-tively during task execution. We propose twofundamental patterns: the successive pattern,refining based on nearest experiences within atask batch, and the cumulative pattern, acquir-ing experiences across all previous task batches.Augmented with our heuristic experience elim-ination, the method prioritizes high-quality andfrequently-used experiences, effectively man-aging the experience space and enhancing effi-ciency. Extensive experiments show that whilethe successive pattern may yield superior re-sults, the cumulative pattern provides more sta-ble performance......","Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens" +19,./images/iterative_experience_refinement_of_20240507.png,Language Agents as Optimizable Graphs,"Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber","Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations. Graphs can be recursively combined into larger composite graphs representing hierarchies of inter-agent collaboration (where edges connect operations of different agents). Our novel automatic graph optimizers (1) refine node-level LLM prompts (node optimization) and (2) improve agent orchestration by changing graph connectivity (edge optimization). Experiments demonstrate that our framework can be used to efficiently develop, integrate, and automatically improve various LLM agents. ","King Abdullah University of Science and Technology, The Swiss AI Lab IDSIA, USI, SUPSI" +20,./images/language_agents_as_optimizable_20240226.png,Large Language Models are Diverse Role-Players for Summarization Evaluation,"Ning Wu, Ming Gong, Linjun Shou, Shining Liang, Daxin Jiang",". Text summarization has a wide range of applications in many scenarios.The evaluation of the quality of the generated text is a complex problem. A bigchallenge to language evaluation is that there is a clear divergence between existingmetrics and human evaluation. A document summary’s quality can be assessedby human annotators on various criteria, both objective ones like grammar andcorrectness, and subjective ones like informativeness, succinctness, and appeal.Most of the automatic evaluation methods like BLUE/ROUGE may be not ableto adequately capture the above dimensions. In this paper, we propose a newevaluation framework based on LLMs, which provides a comprehensive evaluationframework by comparing generated text and reference text from both objective andsubjective aspects. First, we propose to model objective and subjective dimensionsof generated text based on roleplayers prompting mechanism. Furthermore, weintroduce a context-based prompting mechanism that is able to generate dynamicroleplayer profiles based on input context. Finally, we design a multi-roleplayerprompting technology based on batch prompting and integrate multiple outputsinto the final evaluation results. Experimental results on three real datasets forsummarization show that our model is highly competitive and has a very highconsistency with human annotators.",Microsoft +21,./images/large_language_models_are_20230327.png,Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game,"Qianqiao Xu, Zhiliang Tian, Hongyan Wu, Zhen Huang, Yiping Song, Feng Liu, Dongsheng Li","With the enhanced performance of large models on natural language processingtasks, potential moral and ethical issues of large models arise. There exist ma-licious attackers who induce large models to jailbreak and generate informationcontaining illegal, privacy-invasive information through techniques such as promptengineering. As a result, large models counter malicious attackers’ attacks usingtechniques such as safety alignment. However, the strong defense mechanismof the large model through rejection replies is easily identified by attackers andused to strengthen attackers’ capabilities. In this paper, we propose a multi-agentattacker-disguiser game approach to achieve a weak defense mechanism that allowsthe large model to both safely reply to the attacker and hide the defense intent. First,we construct a multi-agent framework to simulate attack and defense scenarios,playing different roles to be responsible for attack, disguise, safety evaluation,and disguise evaluation tasks. After that, we design attack and disguise gamealgorithms to optimize the game strategies of the attacker and the disguiser and usethe curriculum learning process to strengthen the capabilities of the agents. Theexperiments verify that the method in this paper is more effective in strengtheningthe model’s ability to disguise the defense intent compared with other methods.Moreover, our approach can adapt any black-box large model to assist the model indefense and does not suffer from model version iterations.","National University of Defense Technology, Guangdong University of Foreign Studies, " +22,./images/learn_to_disguise_avoid_20240403.png,Leveraging Large Language Models for Collective Decision-Making,"Marios Papachristou, Longqi Yang, Chin-Chia Hsu","In various work contexts, such as meeting scheduling, collaborating, and project planning, collective decision-making is essential but often challenging due to diverse individual preferences, varying work focuses, and power dynamics among members. To address this, we propose a system leveraging Large Language Models (LLMs) to facilitate group decision-making by managing conversations and balancing preferences among individuals. Our system aims to extract individual preferences from conversations and suggest options that satisfy the preferences of the members. We specifically apply this system to corporate meeting scheduling. We create synthetic employee profiles and simulate conversations at scale, leveraging LLMs to evaluate the system performance as a novel approach to conducting a user study. Our results indicate efficient coordination with reduced interactions between the members and the LLM-based system. The system refines and improves its proposed options over time, ensuring that many of the members' individual preferences are satisfied in an equitable way. Finally, we conduct a survey study involving human participants to assess our system's ability to aggregate preferences and reasoning about them. Our findings show that the system exhibits strong performance in both dimensions","Cornell University, Microsoft" +23,./images/leveraging_large_language_models_20231103.png,LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay,"Yihuai Lan, Zhiqiang Hu, Lei Wang, Yang Wang, Deheng Ye, Peilin Zhao, Ee-Peng Lim, Hui Xiong, Hao Wang","This paper explores the open research prob-lem of understanding the social behaviors ofLLM-based agents. Using Avalon as a testbed,we employ system prompts to guide LLMagents in gameplay. While previous studieshave touched on gameplay with LLM agents,research on their social behaviors is lacking.We propose a novel framework, tailored forAvalon, features a multi-agent system facil-itating efficient communication and interac-tion. We evaluate its performance based ongame success and analyze LLM agents’ so-cial behaviors. Results affirm the framework’seffectiveness in creating adaptive agents andsuggest LLM-based agents’ potential in nav-igating dynamic social interactions. By ex-amining collaboration and confrontation be-haviors, we offer insights into this field’s re-search and applications.Our code is pub-licly available at https://github.com/3DAgentWorld/LLM-Game-Agent","The Hong Kong University of Science and Technology (Guangzhou), Singapore University of Technology and Design, Singapore Management University, Verily Life Sciences, Tencent" +24,./images/llm-based_agent_society_investigation_20231023.png,LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration,"Jun Zhao, Can Zu, Hao Xu, Yi Lu, Wei He, Yiwen Ding, Tao Gui, Qi Zhang, Xuanjing Huang","Large language models (LLMs) have demon-strated impressive performance in understand-ing language and executing complex reasoningtasks. However, LLMs with long context win-dows have been notorious for their expensivetraining costs and high inference latency. Eventhe most advanced models such as GPT-4 andClaude2 often make mistakes when processinginputs of over 100k tokens, a phenomenon alsoknown as lost in the middle. In this paper,we propose LONGAGENT, a method basedon multi-agent collaboration, which scalesLLMs (e.g., LLaMA) to a context of 128K anddemonstrates potential superiority in long-textprocessing compared to GPT-",Fudan University +25,./images/longagent_scaling_language_models_20240218.png,MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents,"Yuan Li, Yixuan Zhang, Lichao Sun","Significant advancements have occurred in the application of Large LanguageModels (LLMs) for various tasks and social simulations. Despite this, their capac-ities to coordinate within task-oriented social contexts are under-explored. Suchcapabilities are crucial if LLMs are to effectively mimic human-like social be-havior and produce meaningful results. To bridge this gap, we introduce collab-orative generative agents, endowing LLM-based Agents with consistent behaviorpatterns and task-solving abilities. We situate these agents in a simulated job fairenvironment as a case study to scrutinize their coordination skills. We proposea novel framework that equips collaborative generative agents with human-likereasoning abilities and specialized skills. Our evaluation demonstrates that theseagents show promising performance. However, we also uncover limitations thathinder their effectiveness in more complex coordination tasks. Our work providesvaluable insights into the role and evolution of LLMs in task-oriented social sim-ulations.","University of Cambridge, William & Mary, Lehigh University" +26,./images/metaagents_simulating_interactions_of_20231010.png,MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework,"Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, Jürgen Schmidhuber","Remarkable progress has been made on automated problem solving through so-cieties of agents based on large language models (LLMs). Existing LLM-basedmulti-agent systems can already solve simple dialogue tasks. Solutions to morecomplex tasks, however, are complicated through logic inconsistencies due tocascading hallucinations caused by naively chaining LLMs. Here we introduceMetaGPT, an innovative meta-programming framework incorporating efficienthuman workflows into LLM-based multi-agent collaborations.MetaGPT en-codes Standardized Operating Procedures (SOPs) into prompt sequences for morestreamlined workflows, thus allowing agents with human-like domain expertiseto verify intermediate results and reduce errors. MetaGPT utilizes an assemblyline paradigm to assign diverse roles to various agents, efficiently breaking downcomplex tasks into subtasks involving many agents working together. On col-laborative software engineering benchmarks, MetaGPT generates more coherentsolutions than previous chat-based multi-agent systems. Our project can be foundat https://github.com/geekan/MetaGPT","DeepWisdom, King Abdullah University of Science and Technology, Xiamen University, The Chinese University of Hong Kong (Shenzhen), Nanjing University, University of Pennsylvania University of California, Berkeley, The Swiss AI Lab IDSIA/USI/SUPSI" +27,./images/metagpt_meta_programming_for_20230801.png,Mora: Enabling Generalist Video Generation via A Multi-Agent Framework,"Zhengqing Yuan, Ruoxi Chen, Zhaoxu Li, Haolong Jia, Lifang He, Chi Wang, Lichao Sun","Sora is the first large-scale generalist video generation model that garnered significant attention across society. Since its launch by OpenAI in February 2024, no other video generation models have paralleled {Sora}'s performance or its capacity to support a broad spectrum of video generation tasks. Additionally, there are only a few fully published video generation models, with the majority being closed-source. To address this gap, this paper proposes a new multi-agent framework Mora, which incorporates several advanced visual AI agents to replicate generalist video generation demonstrated by Sora. In particular, Mora can utilize multiple visual agents and successfully mimic Sora's video generation capabilities in various tasks, such as (1) text-to-video generation, (2) text-conditional image-to-video generation, (3) extend generated videos, (4) video-to-video editing, (5) connect videos and (6) simulate digital worlds. Our extensive experimental results show that Mora achieves performance that is proximate to that of Sora in various tasks. However, there exists an obvious performance gap between our work and Sora when assessed holistically. In summary, we hope this project can guide the future trajectory of video generation through collaborative AI agents.","Lehigh University, Microsoft Research" +28,./images/mora_enabling_generalist_video_20240320.png,Multi-Agent Software Development through Cross-Team Collaboration,"Zhuoyun Du, Chen Qian, Wei Liu, Zihao Xie, Yifei Wang, Yufan Dang, Weize Chen, Cheng Yang","The latest breakthroughs in Large LanguageModels (LLMs), e.g., ChatDev, have catalyzedprofound transformations, particularly throughmulti-agent collaboration for software devel-opment. LLM agents can collaborate in teamslike humans, and follow the waterfall modelto sequentially work on requirements analysis,development, review, testing, and other phasesto perform autonomous software generation.However, for an agent team, each phase in asingle development process yields only one pos-sible outcome. This results in the completionof only one development chain, thereby losingthe opportunity to explore multiple potentialdecision paths within the solution space. Con-sequently, this may lead to obtaining subop-timal results. To address this challenge, weintroduce Cross-Team Collaboration (CTC),a scalable multi-team framework that enablesorchestrated teams to jointly propose variousdecisions and communicate with their insightsin a cross-team collaboration environment forsuperior content generation. Experimental re-sults in software development reveal a notableincrease in quality compared to state-of-the-art baselines, underscoring the efficacy of ourframework. The significant improvements instory generation demonstrate the promisinggeneralization ability of our framework acrossvarious domains. We anticipate that our workwill guide LLM agents towards a cross-teamparadigm and contribute to their significantgrowth in but not limited to software devel-opment. The code and data will be available athttps://github.com/OpenBMB/ChatDev.","Zhejiang University, Tsinghua University, Beijing University of Posts and Telecommunications" +29,./images/multi-agent_software_development_through_20240613.png,MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate,"Alfonso Amayuelas, Xianjun Yang, Antonis Antoniades, Wenyue Hua, Liangming Pan, William Wang","Large Language Models (LLMs) have shownexceptional results on current benchmarkswhen working individually. The advancementin their capabilities, along with a reduction inparameter size and inference times, has facil-itated the use of these models as agents, en-abling interactions among multiple models toexecute complex tasks. Such collaborationsoffer several advantages, including the use ofspecialized models (e.g. coding), improvedconfidence through multiple computations, andenhanced divergent thinking, leading to morediverse outputs. Thus, the collaborative use oflanguage models is expected to grow signifi-cantly in the coming years. In this work, weevaluate the behavior of a network of modelscollaborating through debate under the influ-ence of an adversary. We introduce pertinentmetrics to assess the adversary’s effectiveness,focusing on system accuracy and model agree-ment. Our findings highlight the importanceof a model’s persuasive ability in influencingothers. Additionally, we explore inference-timemethods to generate more compelling argu-ments and evaluate the potential of prompt-based mitigation as a defensive strategy.","UC Santa Barbara, Rutgers University" +30,./images/multiagent_collaboration_attack_investigating_20240620.png,ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs,"Justin Chih-Yao Chen, Swarnadeep Saha, Mohit Bansal","Large Language Models (LLMs) still struggle with natural language reasoning tasks. Motivated by the society of minds (Minsky, 1988), we propose ReConcile, a multi-model multi-agent framework designed as a round table conference among diverse LLM agents. ReConcile enhances collaborative reasoning between LLM agents via multiple rounds of discussion, learning to convince other agents to improve their answers, and employing a confidence-weighted voting mechanism that leads to a better consensus. In each round, ReConcile initiates discussion between agents via a 'discussion prompt' that consists of (a) grouped answers and explanations generated by each agent in the previous round, (b) their confidence scores, and (c) demonstrations of answer-rectifying human explanations, used for convincing other agents. Experiments on seven benchmarks demonstrate that ReConcile significantly improves LLMs' reasoning -- both individually and as a team -- surpassing prior single-agent and multi-agent baselines by up to 11.4% and even outperforming GPT-4 on three datasets. ReConcile also flexibly incorporates different combinations of agents, including API-based, open-source, and domain-specific models, leading to an 8% improvement on MATH. Finally, we analyze the individual components of ReConcile, demonstrating that the diversity originating from different models is critical to its superior performance.",UNC Chapel Hill +31,./images/reconcile_round-table_conference_improves_20230922.png,Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?,"Qineng Wang, Zihao Wang, Ying Su, Hanghang Tong, Yangqiu Song","Recent progress in LLMs discussion suggeststhat multi-agent discussion improves the rea-soning abilities of LLMs. In this work, wereevaluate this claim through systematic experi-ments, where we propose a novel group discus-sion framework to enrich the set of discussionmechanisms. Interestingly, our results showthat a single-agent LLM with strong promptscan achieve almost the same performance asthe best existing discussion approach on a widerange of reasoning tasks and backbone LLMs.We observe that the multi-agent discussion per-forms better than a single agent only when thereis no demonstration in the prompt. Furtherstudy reveals the common interaction mecha-nisms of LLMs during the discussion.","Zhejiang University, HKUST, UIUC" +32,./images/rethinking_the_bounds_of_20240228.png,Scalable Multi-Robot Collaboration with Large Language Models: Centralized or Decentralized Systems?,"Yongchao Chen, Jacob Arkin, Yang Zhang, Nicholas Roy, Chuchu Fan","— A flurry of recent work has demonstrated thatpre-trained large language models (LLMs) can be effectivetask planners for a variety of single-robot tasks. The planningperformance of LLMs is significantly improved via promptingtechniques, such as in-context learning or re-prompting withstate feedback, placing new importance on the token budgetfor the context window. An under-explored but natural nextdirection is to investigate LLMs as multi-robot task planners.However, long-horizon, heterogeneous multi-robot planningintroduces new challenges of coordination while also pushingup against the limits of context window length. It is thereforecritical to find token-efficient LLM planning frameworks thatare also able to reason about the complexities of multi-robotcoordination. In this work, we compare the task success rate andtoken efficiency of four multi-agent communication frameworks(centralized, decentralized, and two hybrid) as applied tofour coordination-dependent multi-agent 2D task scenarios forincreasing numbers of agents. We find that a hybrid frameworkachieves better task success rates across all four tasks andscales better to more agents. We further demonstrate the hybridframeworks in 3D simulations where the vision-to-text problemand dynamical errors are considered. ","Massachusetts Institute of Technology, Harvard University, MIT-IBM Watson AI Lab. " +33,./images/scalable_multi-robot_collaboration_with_20230927.png,Scaling Large-Language-Model-based Multi-Agent Collaboration,"Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun","Pioneering advancements in large languagemodel-powered agents have underscored thedesign pattern of multi-agent collaboration,demonstrating that collective intelligence cansurpass the capabilities of each individual. In-spired by the neural scaling law, which positsthat increasing neurons leads to emergent abil-ities, this study investigates whether a simi-lar principle applies to increasing agents inmulti-agent collaboration.Technically, wepropose ::multi-agent:collaboration::networks(MACNET), which utilize directed acyclicgraphs to organize agents and streamline theirinteractive reasoning via topological ordering,with solutions derived from their dialogues.Extensive experiments show that MACNETconsistently outperforms baseline models, en-abling effective agent collaboration across var-ious network topologies and supporting coop-eration among more than a thousand agents.Notably, we observed a small-world collabo-ration phenomenon, where topologies resem-bling small-world properties achieved supe-rior performance. Additionally, we identifieda collaborative scaling law, indicating thatnormalized solution quality follows a logisticgrowth pattern as scaling agents, with collabo-rative emergence occurring much earlier thanpreviously observed instances of neural emer-gence. The code and data will be available athttps://github.com/OpenBMB/ChatDev.","Tsinghua University, Beijing University of Posts and Telecommunications" +34,./images/scaling_large-language-model-based_multi-agent_collaboration_20240611.png,Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization,"Yoichi Ishibashi, Yoshimasa Nishimura","Recent advancements in automatic code gener-ation using large language model (LLM) agenthave brought us closer to the future of auto-mated software development. However, exist-ing single-agent approaches face limitationsin generating and improving large-scale, com-plex codebases due to constraints in contextlength. To tackle this challenge, we proposeSelf-Organized multi-Agent framework (SoA),a novel multi-agent framework that enables thescalable and efficient generation and optimiza-tion of large-scale code. In SoA, self-organizedagents operate independently to generate andmodify code components while seamlessly col-laborating to construct the overall codebase. Akey feature of our framework is the automaticmultiplication of agents based on problem com-plexity, allowing for dynamic scalability. Thisenables the overall code volume to be increasedindefinitely according to the number of agents,while the amount of code managed by eachagent remains constant. We evaluate SoA onthe HumanEval benchmark and demonstratethat, compared to a single-agent system, eachagent in SoA handles significantly less code,yet the overall generated code is substantiallygreater. Moreover, SoA surpasses the powerfulsingle-agent baseline by 5%......",TsukushiAI +35,./images/self-organized_agents_a_llm_20240402.png,"StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving","Chang Gao, Haiyun Jiang, Deng Cai, Shuming Shi, Wai Lam","Most existing prompting methods suffer from the issues of generalizability and consistency, as they often rely on instance-specific solutions that may not be applicable to other instances and lack task-level consistency across the selected few-shot examples. To address these limitations, we propose a comprehensive framework, StrategyLLM, allowing LLMs to perform inductive reasoning, deriving general strategies from specific task instances, and deductive reasoning, applying these general strategies to particular task examples, for constructing generalizable and consistent few-shot prompts. It employs four LLM-based agents: strategy generator, executor, optimizer, and evaluator, working together to generate, evaluate, and select promising strategies for a given task. Experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT-SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (34.2\% → 38.8\%), commonsense reasoning (70.3\% → 72.5\%), algorithmic reasoning (73.7\% → 85.0\%), and symbolic reasoning (30.0\% → 79.2\%). Further analysis reveals that StrategyLLM is applicable to various LLMs and demonstrates advantages across numerous scenarios.","The Chinese University of Hong Kong, Sun Yat-sen University, Tencent AI Lab" +36,./images/strategyllm_large_language_models_20231115.png,TraveLER: A Multi-LMM Agent Framework for Video Question-Answering,"Chuyi Shang, Amos You, Sanjay Subramanian, Trevor Darrell, Roei Herzig","Recently, Large Multimodal Models (LMMs) have made significant progressin video question-answering using a frame-wise approach by leveraginglarge-scale, image-based pretraining in a zero-shot manner. While image-based methods for videos have shown impressive performance, a currentlimitation is that they often overlook how key timestamps are selected andcannot adjust when incorrect timestamps are identified. Moreover, they areunable to extract details relevant to the question, instead providing generaldescriptions of the frame. To overcome this, we design a multi-LMM agentframework that travels along the video, iteratively collecting relevant in-formation from keyframes through interactive question-asking until thereis sufficient information to answer the question. Specifically, we proposeTraveLER, a model that can create a plan to “Traverse” through the video,ask questions about individual frames to “Locate” and store key informa-tion, and then “Evaluate” if there is enough information to answer thequestion. Finally, if there is not enough information, our method is able to“Replan” based on its collected knowledge. Through extensive experiments,we find that the proposed TraveLER approach improves performance onseveral video question-answering benchmarks, such as NExT-QA, STAR,and Perception Test, without the need to fine-tune on specific datasets.","University of California, Berkeley" +37,./images/traveler_a_multi-lmm_agent_20240401.png,Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration,"Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, Heng Ji","Human intelligence thrives on cognitive syn-ergy, where collaboration among differentminds yield superior outcomes compared to iso-lated individuals. In this work, we propose SoloPerformance Prompting (SPP), which trans-forms a single LLM into a cognitive synergistby engaging in multi-turn self-collaborationwith multiple personas.A cognitive syner-gist is an intelligent agent that collaborativelycombines multiple minds’ strengths and knowl-edge to enhance problem-solving in complextasks. By dynamically identifying and simu-lating different personas based on task inputs,SPP unleashes the potential of cognitive syn-ergy in LLMs. Our in-depth analysis showsthat assigning multiple fine-grained personasin LLMs improves problem-solving abilitiescompared to using a single or fixed numberof personas. We evaluate SPP on three chal-lenging tasks: Trivia Creative Writing, Code-names Collaborative, and Logic Grid Puzzle,encompassing both knowledge-intensive andreasoning-intensive types. Unlike previousworks, such as Chain-of-Thought, that solelyenhance the reasoning abilities in LLMs, ex-perimental results demonstrate that SPP effec-tively reduces factual hallucination, and main-tains strong reasoning capabilities. Addition-ally, comparative experiments show that cog-nitive synergy only emerges in GPT-4 anddoes not appear in less capable models, suchas GPT-","University of Illinois Urbana-Champaign, Microsoft Research Asia" +38,./images/unleashing_the_emergent_cognitive_20230711.png,User Behavior Simulation with Large Language Model based Agents,"Lei Wang, Jingsen Zhang, Hao Yang, Zhiyuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, Ji-Rong Wen","Simulating high quality user behavior data has always been a fundamental problem in human-centered applications, where the major difficulty originates from the intricate mechanism of human decision process. Recently, substantial evidences have suggested that by learning huge amounts of web knowledge, large language models (LLMs) can achieve human-like intelligence. We believe these models can provide significant opportunities to more believable user behavior simulation. To inspire such direction, we propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors. Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans. Concerning potential applications, we simulate and study two social phenomenons including (1) information cocoons and (2) user conformity behaviors. This research provides novel simulation paradigms for human-centered applications.","Renmin University of China, Beijing Key Laboratory of Big Data Management and Analysis Methods, University College London" +39,./images/user_behavior_simulation_with_20230605.png,War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars,"Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, Yongfeng Zhang","Can we avoid wars at the crossroads of history? This question has been pursued byindividuals, scholars, policymakers, and organizations throughout human history.In this research, we attempt to answer the question based on the recent advancesof Artificial Intelligence (AI) and Large Language Models (LLMs). We proposeWarAgent, an LLM-powered multi-agent AI system, to simulate the participatingcountries, their decisions, and the consequences, in historical international conflicts,including the World War I (WWI), the World War II (WWII), and the WarringStates Period (WSP) in Ancient China. By evaluating the simulation effectiveness,we examine the advancements and limitations of cutting-edge AI systems’ abilitiesin studying complex collective human behaviors such as international conflictsunder diverse settings. In these simulations, the emergent interactions amongagents also offer a novel perspective for examining the triggers and conditions thatlead to war. Our findings offer data-driven and AI-augmented insights that canredefine how we approach conflict resolution and peacekeeping strategies. Theimplications stretch beyond historical analysis, offering a blueprint for using AI tounderstand human history and possibly prevent future international conflicts. Codeand data are available at https://github.com/agiresearch/WarAgent.",Rutgers University +40,./images/war_and_peace_(waragent)_20231128.png,To be Continued...,Your Contributions are Welcome!,, diff --git a/MultiAgentEbook/book_organization/script.js b/MultiAgentEbook/book_organization/script.js new file mode 100755 index 000000000..c5d61d710 --- /dev/null +++ b/MultiAgentEbook/book_organization/script.js @@ -0,0 +1,94 @@ +document.addEventListener("DOMContentLoaded", function() { + + const csvFilePath = './book_organization/data.csv'; + + + function loadCSV(filePath) { + return fetch(filePath) + .then(response => response.text()) + .then(text => Papa.parse(text, { header: true }).data); + } + + + function createFlipBook(pages) { + const container = document.getElementById('flip_book_container'); + const numPages = pages.length; + + let flipBookHTML = ''; + let style = document.createElement('style'); + let css = ''; + + + flipBookHTML += `\n`; + for (let i = 0; i < numPages - 1; i++) { + flipBookHTML += `\n`; + } + + flipBookHTML += `
\n`; + + flipBookHTML += `
+ +
` + + + for (let i = 0; i < numPages - 1; i++) { + console.log(i) + const page = pages[i]; + const pageIndex = i + 1; + + flipBookHTML += ` +
+
+ + Back content +
+
+ + Back page edge shading +
+

${page.title}

+

${page.author}

+

${page.affiliation}

+

${page.summary}

+
+
+
\n`; + + + css += ` + #page${pageIndex} { + z-index: ${numPages - i}; + } + + #page${pageIndex}_checkbox:checked~#flip_book #page${pageIndex} { + transform: rotateY(-180deg); + z-index: ${i + 1}; + }\n`; + } + + flipBookHTML += `
+ Back Cover +
`; + + + container.innerHTML = flipBookHTML; + + + style.innerHTML = css; + document.head.appendChild(style); + + + const md = window.markdownit(); + const summaryElements = document.querySelectorAll('.summary'); + summaryElements.forEach(el => { + el.innerHTML = md.render(el.textContent); + }); + } + + + loadCSV(csvFilePath).then(pages => { + createFlipBook(pages); + }); +}); diff --git a/MultiAgentEbook/book_organization_index.html b/MultiAgentEbook/book_organization_index.html new file mode 100755 index 000000000..e03105839 --- /dev/null +++ b/MultiAgentEbook/book_organization_index.html @@ -0,0 +1,23 @@ + + + + + + + Flip Book + + + + + +
+ + + + + + \ No newline at end of file diff --git a/MultiAgentEbook/book_simulation/data.csv b/MultiAgentEbook/book_simulation/data.csv new file mode 100755 index 000000000..697a7ac4e --- /dev/null +++ b/MultiAgentEbook/book_simulation/data.csv @@ -0,0 +1,34 @@ +,image_path,title,author,summary,affiliation +0,./images/4d.png,(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts,"Minghao Wu, Yulin Yuan, Gholamreza Haffari, Longyue Wang","Recent advancements in machine translation (MT) have significantly enhancedtranslation quality across various domains. However, the translation of literarytexts remains a formidable challenge due to their complex language, figurative ex-pressions, and cultural nuances. In this work, we introduce a novel multi-agentframework based on large language models (LLMs) for literary translation, im-plemented as a company called TRANSAGENTS, which mirrors traditional trans-lation publication process by leveraging the collective capabilities of multipleagents, to address the intricate demands of translating literary works. To evaluatethe effectiveness of our system, we propose two innovative evaluation strategies:Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP).MHP assesses translations from the perspective of monolingual readers of the tar-get language, while BLP uses advanced LLMs to compare translations directlywith the original texts. Empirical findings indicate that despite lower d-BLEUscores, translations from TRANSAGENTS are preferred by both human evalua-tors and LLMs over human-written references, particularly in genres requiringdomain-specific knowledge. We also highlight the strengths and limitations ofTRANSAGENTS through case studies and suggests directions for future research.","Monash University, University of Macau, Tencent AI Lab" +1,./images/(perhaps)_beyond_human_translation_20240520.png,Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents,"Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, Yang Liu","In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates theentire process of treating illness. All patients, nurses, and doctors are autonomous agents powered bylarge language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illnesswithin the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum cansimulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keepaccumulating experience from both successful and unsuccessful cases. Simulation experiments show thatthe treatment performance of doctor agents consistently improves on various tasks. More interestingly,the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicarebenchmarks. After treating around ten thousand patients (real-world doctors may take over two years),the evolved doctor agent achieves a state-of-the-art accuracy of 9",Tsinghua University +2,./images/agent_hospital_a_simulacrum_20240505.png,AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems,"Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, Ji-Rong Wen","Recently, there has been an emergence of employing LLM-poweredagents as believable human proxies, based on their remarkabledecision-making capability. However, existing studies mainly focuson simulating human dialogue. Human non-verbal behaviors, suchas item clicking in recommender systems, although implicitly ex-hibiting user preferences and could enhance the modeling of users,have not been deeply explored. The main reasons lie in the gapbetween language modeling and behavior modeling, as well as theincomprehension of LLMs about user-item relations.To address this issue, we propose AgentCF for simulating user-item interactions in recommender systems through agent-basedcollaborative filtering. We creatively consider not only users butalso items as agents, and develop a collaborative learning approachthat optimizes both kinds of agents together. Specifically, at eachtime step, we first prompt the user and item agents to interact au-tonomously. Then, based on the disparities between the agents’decisions and real-world interaction records, user and item agentsare prompted to reflect on and adjust the misleading simulationscollaboratively, thereby modeling their two-sided relations. The op-timized agents can also propagate their preferences to other agentsin subsequent interactions, implicitly capturing the collaborative fil-tering idea. Overall, the optimized agents exhibit diverse interactionbehaviors within our framework, including user-item, user-user,item-item, and collective interactions. The results show that theseagents can demonstrate personalized behaviors akin to those of real-world individuals, sparking the development of next-generationuser behavior simulation.","Renmin University of China, UC San Diego, Tencent" +3,./images/agentcf_collaborative_learning_with_20231013.png,AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors,"Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou","Autonomous agents empowered by Large Language Models (LLMs) have under-gone significant improvements, enabling them to generalize across a broad spec-trum of tasks. However, in real-world scenarios, cooperation among individuals isoften required to enhance the efficiency and effectiveness of task accomplishment.Hence, inspired by human group dynamics, we propose a multi-agent frameworkAGENTVERSE that can effectively orchestrate a collaborative group of expert agentsas a greater-than-the-sum-of-its-parts system. Our experiments demonstrate thatAGENTVERSE can proficiently deploy multi-agent groups that outperform a singleagent. Extensive experiments on text understanding, reasoning, coding, tool utiliza-tion, and embodied AI confirm the effectiveness of AGENTVERSE. Moreover, ouranalysis of agent interactions within AGENTVERSE reveals the emergence of spe-cific collaborative behaviors, contributing to heightened group efficiency. Our codehas been released at https://github.com/OpenBMB/AgentVerse/.","Tsinghua University, Beijing University of Posts and Telecommunications, Tencent Inc." +4,./images/agentverse_facilitating_multi-agent_collaboration_20230821.png,AI Hospital: Interactive Evaluation and Collaboration of LLMs as Intern Doctors for Clinical Diagnosis,"Zhihao Fan, Jialong Tang, Wei Chen, Siyuan Wang, Zhongyu Wei, Jun Xi, Fei Huang, Jingren Zhou","The incorporation of Large Language Models(LLMs) in healthcare marks a significant ad-vancement. However, the application has pre-dominantly been limited to discriminative andquestion-answering tasks, which does not fullyleverage their interactive potential. To addressthis limitation, our paper presents AI Hospital,a framework designed to build a real-time in-teractive diagnosis environment. To simulatethe procedure, we collect high-quality medicalrecords to create patient, examiner, and medicaldirector agents. AI Hospital is then utilized forthe interactive evaluation and collaboration ofLLMs. Initially, we create a Multi-View Medi-cal Evaluation (MVME) benchmark where vari-ous LLMs serve as intern doctors for interactivediagnosis. Subsequently, to improve diagnosticaccuracy, we introduce a collaborative mech-anism that involves iterative discussions anda dispute resolution process under the supervi-sion of the medical director. In our experiments,we validate the reliability of AI Hospital. Theresults not only explore the feasibility of applyLLMs in clinical consultation but also confirmthe effectiveness of the dispute resolution fo-cused collaboration method.","Alibaba Inc., Huazhong University of Science and Technology, Fudan University" +5,./images/ai_hospital_interactive_evaluation_20240215.png,Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks,"Siyu Li, Jin Yang, Kui Zhao","As the capabilities of Large Language Models (LLMs) emerge, they not only assist in accomplishing traditional tasks within more efficient paradigms but also stimulate the evolution of social bots. Researchers have begun exploring the implementation of LLMs as the driving core of social bots, enabling more efficient and user-friendly completion of tasks like profile completion, social behavior decision-making, and social content generation. However, there is currently a lack of systematic research on the behavioral characteristics of LLMs-driven social bots and their impact on social networks. We have curated data from Chirper, a Twitter-like social network populated by LLMs-driven social bots and embarked on an exploratory study. Our findings indicate that: (1) LLMs-driven social bots possess enhanced individual-level camouflage while exhibiting certain collective characteristics; (2) these bots have the ability to exert influence on online communities through toxic behaviors; (3) existing detection methods are applicable to the activity environment of LLMs-driven social bots but may be subject to certain limitations in effectiveness. Moreover, we have organized the data collected in our study into the Masquerade-23 dataset, which we have publicly released, thus addressing the data void in the subfield of LLMs-driven social bots behavior datasets. Our research outcomes provide primary insights for the research and governance of LLMs-driven social bots within the research community.",Sichuan University +6,./images/are_you_in_a_20230719.png,BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis,"Shuhang Lin, Wenyue Hua, Lingyao Li, Che-Jui Chang, Lizhou Fan, Jianchao Ji, Hang Hua, Mingyu Jin, Jiebo Luo, Yongfeng Zhang","This paper presents BattleAgent, a detailed emulation demonstration system thatcombines the Large Vision-Language Model (VLM) and Multi-Agent System(MAS). This novel system aims to simulate complex dynamic interactions amongmultiple agents, as well as between agents and their environments, over a period oftime. It emulates both the decision-making processes of leaders and the viewpointsof ordinary participants, such as soldiers. The emulation showcases the currentcapabilities of agents, featuring fine-grained multi-modal interactions betweenagents and landscapes. It develops customizable agent structures to meet specificsituational requirements, for example, a variety of battle-related activities likescouting and trench digging. These components collaborate to recreate historicalevents in a lively and comprehensive manner while offering insights into thethoughts and feelings of individuals from diverse viewpoints. The technologicalfoundations of BattleAgent establish detailed and immersive settings for historicalbattles, enabling individual agents to partake in, observe, and dynamically respondto evolving battle scenarios. This methodology holds the potential to substantiallydeepen our understanding of historical events, particularly through individualaccounts. Such initiatives can also aid historical research, as conventional historicalnarratives often lack documentation and prioritize the perspectives of decision-makers, thereby overlooking the experiences of ordinary individuals. This biaseddocumentation results in a considerable gap in our historical understanding, as manystories remain untold......","Rutgers University, University of Michigan, University of Rochester" +7,./images/battleagent_multi-modal_dynamic_emulation_20240423.png,Can Large Language Model Agents Simulate Human Trust Behaviors?,"Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Kai Shu, Adel Bibi, Ziniu Hu, Philip Torr, Bernard Ghanem, Guohao Li","Large Language Model (LLM) agents have beenincreasingly adopted as simulation tools to modelhumans in applications such as social science.However, one fundamental question remains: canLLM agents really simulate human behaviors? Inthis paper, we focus on one of the most criticalbehaviors in human interactions, trust, and aim toinvestigate whether or not LLM agents can sim-ulate human trust behaviors. We first find thatLLM agents generally exhibit trust behaviors, re-ferred to as agent trust, under the framework ofTrust Games, which are widely recognized in be-havioral economics. Then, we discover that LLMagents can have high behavioral alignment withhumans regarding trust behaviors, particularly forGPT-4, indicating the feasibility to simulate hu-man trust behaviors with LLM agents. In addition,we probe into the biases in agent trust and thedifferences in agent trust towards agents and hu-mans. We also explore the intrinsic properties ofagent trust under conditions including advancedreasoning strategies and external manipulations.We further offer important implications of ourdiscoveries for various scenarios where trust isparamount. Our study provides new insights intothe behaviors of LLM agents and the fundamentalanalogy between LLMs and humans.","KAUST, Illinois Institute of Technology, Pennsylvania State University, The University of Chicago, University of Oxford, California Institute of Technology" +8,./images/can_large_language_model_20240207.png,ChatDev: Communicative Agents for Software Development,"Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun","Software development is a complex task thatnecessitates cooperation among multiple mem-bers with diverse skills. Numerous studies useddeep learning to improve specific phases in awaterfall model, such as design, coding, andtesting.However, the deep learning modelin each phase requires unique designs, lead-ing to technical inconsistencies across variousphases, which results in a fragmented and in-effective development process. In this paper,we introduce ChatDev, a chat-powered soft-ware development framework in which special-ized agents driven by large language models(LLMs) are guided in what to communicate(via chat chain) and how to communicate (viacommunicative dehallucination). These agentsactively contribute to the design, coding, andtesting phases through unified language-basedcommunication, with solutions derived fromtheir multi-turn dialogues. We found their uti-lization of natural language is advantageousfor system design, and communicating in pro-gramming language proves helpful in debug-ging. This paradigm demonstrates how linguis-tic communication facilitates multi-agent col-laboration, establishing language as a unify-ing bridge for autonomous task-solving amongLLM agents. The code and data are availableat https://github.com/OpenBMB/ChatDev.","Tsinghua University, The University of Sydney, BUPT, Modelbest Inc." +9,./images/chatdev_communicative_agents_for_20230716.png,CompeteAI: Understanding the Competition Dynamics in Large Language Model-based Agents,"Qinlin Zhao, Jindong Wang, Yixuan Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, Xing Xie","Large language models (LLMs) have been widelyused as agents to complete different tasks, suchas personal assistance or event planning. Whilemost of the work has focused on cooperationand collaboration between agents, little workexplores competition, another important mech-anism that promotes the development of soci-ety and economy. In this paper, we seek to ex-amine the competition dynamics in LLM-basedagents. We first propose a general framework forstudying the competition between agents. Then,we implement a practical competitive environ-ment using GPT-4 to simulate a virtual town withtwo types of agents, including restaurant agentsand customer agents. Specifically, the restaurantagents compete with each other to attract morecustomers, where competition encourages themto transform, such as cultivating new operatingstrategies. Simulation experiments reveal severalinteresting findings at the micro and macro lev-els, which align well with existing market andsociological theories. We hope that the frame-work and environment can be a promising testbedto study the competition that fosters understand-ing of society. Code is available at: https://github.com/microsoft/competeai.","University of Science and Technology of China, Microsoft Research, William & Mary, Georgia Institute of Technology, Carnegie Mellon University" +10,./images/competeai_understanding_the_competition_20231026.png,EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities,"Nian Li, Chen Gao, Mingyu Li, Yong Li, Qingmin Liao","The advent of artificial intelligence has led to agrowing emphasis on data-driven modeling inmacroeconomics, with agent-based modeling(ABM) emerging as a prominent bottom-upsimulation paradigm. In ABM, agents (e.g.,households, firms) interact within a macroe-conomic environment, collectively generatingmarket dynamics. Existing agent modeling typ-ically employs predetermined rules or learning-based neural networks for decision-making.However, customizing each agent presents sig-nificant challenges, complicating the modelingof agent heterogeneity. Additionally, the in-fluence of multi-period market dynamics andmultifaceted macroeconomic factors are oftenoverlooked in decision-making processes. Inthis work, we introduce EconAgent, a largelanguage model-empowered agent with human-like characteristics for macroeconomic simu-lation. We first construct a simulation envi-ronment that incorporates various market dy-namics driven by agents’ decisions regardingwork and consumption. Through the perceptionmodule, we create heterogeneous agents withdistinct decision-making mechanisms.Fur-thermore, we model the impact of macroeco-nomic trends using a memory module, whichallows agents to reflect on past individual ex-periences and market dynamics. Simulationexperiments show that EconAgent can makerealistic decisions, leading to more reasonablemacroeconomic phenomena compared to exist-ing rule-based or learning-based agents. Ourcodes are released at https://github.com/tsinghua-fib-lab/ACL24-EconAgent.",Tsinghua University +11,./images/econagent_large_language_model-empowered_20231016.png,Epidemic Modeling with Generative Agents,"Ross Williams, Niyousha Hosseinichimeh, Aritra Majumdar, Navid Ghaffarzadegan","This study offers a new paradigm of individual-level modeling to address the grand challenge of incorporating human behavior in epidemic models. Using generative artificial intelligence in an agent-based epidemic model, each agent is empowered to make its own reasonings and decisions via connecting to a large language model such as ChatGPT. Through various simulation experiments, we present compelling evidence that generative agents mimic real-world behaviors such as quarantining when sick and self-isolation when cases rise. Collectively, the agents demonstrate patterns akin to multiple waves observed in recent pandemics followed by an endemic period. Moreover, the agents successfully flatten the epidemic curve. This study creates potential to improve dynamic system modeling by offering a way to represent human brain, reasoning, and decision making.",Virginia Tech +12,./images/epidemic_modeling_with_generative_20230711.png,Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View,"Jintian Zhang, Xin Xu, Ningyu Zhang, Ruibo Liu, Bryan Hooi, Shumin Deng","As Natural Language Processing (NLP) sys-tems are increasingly employed in intricate so-cial environments, a pressing query emerges:Can these NLP systems mirror human-esquecollaborative intelligence, in a multi-agent so-ciety consisting of multiple large language mod-els (LLMs)? This paper probes the collabora-tion mechanisms among contemporary NLPsystems by melding practical experiments withtheoretical insights. We fabricate four unique‘societies’ comprised of LLM agents, whereeach agent is characterized by a specific ‘trait’(easy-going or overconfident) and engages incollaboration with a distinct ‘thinking pattern’(debate or reflection).Through evaluatingthese multi-agent societies on three benchmarkdatasets, we discern that certain collaborativestrategies not only outshine previous top-tierapproaches but also optimize efficiency (usingfewer API tokens). Moreover, our results fur-ther illustrate that LLM agents manifest human-like social behaviors, such as conformity andconsensus reaching, mirroring foundational so-cial psychology theories. In conclusion, weintegrate insights from social psychology tocontextualize the collaboration of LLM agents,inspiring further investigations into the collab-oration mechanism for LLMs. We have sharedour code and datasets1, hoping to catalyze fur-ther research in this promising avenue.","Zhejiang University, National University of Singapore, NUS-NCS Joint Lab, Google DeepMind" +13,./images/exploring_collaboration_mechanisms_for_20231003.png,Generative Agents: Interactive Simulacra of Human Behavior,"Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein","Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.","Stanford University, Google Research, Google DeepMind" +14,./images/generative_agents_interactive_simulacra_20230407.png,Humanoid Agents: Platform for Simulating Human-like Generative Agents,"Zhilin Wang, Yu Ying Chiu, Yu Cheung Chiu","Just as computational simulations of atoms, molecules and cells have shaped the way we study the sciences, true-to-life simulations of human-like agents can be valuable tools for studying human behavior. We propose Humanoid Agents, a system that guides Generative Agents to behave more like humans by introducing three elements of System 1 processing: Basic needs (e.g. hunger, health and energy), Emotion and Closeness in Relationships. Humanoid Agents are able to use these dynamic elements to adapt their daily activities and conversations with other agents, as supported with empirical experiments. Our system is designed to be extensible to various settings, three of which we demonstrate, as well as to other elements influencing human behavior (e.g. empathy, moral values and cultural background). Our platform also includes a Unity WebGL game interface for visualization and an interactive analytics dashboard to show agent statuses over time.","University of Washington, NVIDIA, The University of Hong Kong" +15,./images/humanoid_agents_platform_for_20231009.png,Language Agents as Digital Representatives in Collective Decision-Making,"Jarrett, Daniel and Pislar, Miruna and Bakker, Michiel A and Tessler, Michael Henry and Koster, Raphael and Balaguer, Jan and Elie, Romuald and Summerfield, Christopher and Tacchetti, Andrea","Consider the process of collective decision-making, in which a group of individualsinteractively select a preferred outcome from among a universe of alternatives. Inthis context, “representation” is the activity of making an individual’s preferencespresent in the process via participation by a proxy agent—i.e. their “representative”.To this end, learned models of human behavior have the potential to fill this role,with practical implications for multi-agent scenario studies and mechanism design.In this work, we investigate the possibility of training language agents to behavein the capacity of representatives of human agents, appropriately expressing thepreferences of those individuals whom they stand for. First, we formalize the settingof collective decision-making—as the episodic process of interaction between agroup of agents and a decision mechanism. On this basis, we then formalize theproblem of digital representation—as the simulation of an agent’s behavior to yieldequivalent outcomes from the mechanism. Finally, we conduct an empirical casestudy in the setting of consensus-finding among diverse humans, and demonstratethe feasibility of fine-tuning large language models to act as digital representatives.",Google DeepMind +16,./images/language_agents_as_digital_20231108.png,LLM-Driven Agents for Influencer Selection in Digital Advertising Campaigns,"Xiaoqing Zhang, Xiuying Chen, Yuhan Liu, Jianzhou Wang, Zhenxing Hu, Rui Yan","In the digital world, influencers are pivotal as opinion leaders, shap-ing the views and choices of their influencees. Modern advertisingoften follows this trend, where marketers choose appropriate in-fluencers for product endorsements, based on thorough marketanalysis. Previous studies on influencer selection have typicallyrelied on numerical representations of individual opinions andinteractions, a method that simplifies the intricacies of social dy-namics. With the development of large language models (LLMs),we now have the opportunity to capture the nuanced exchangesof information within social networks. Hence, in this work, wefirst introduce an Influencer Dynamics Simulator (IDS), helpingpromoters identify and select the right influencers to market theirproducts, based on LLM simulation. Concretely, we first propose aninfluencer-influencee engagement-based pre-selection module toscreen potential influencer candidates. Subsequently, a simulation isconstructed for these candidates and their influencees. Each user isrepresented as an LLM-based agent, drawing from their interactionhistory to deduce their profile and interests. The influencee agentswill predict their behavior in response to influencer advertising. Fi-nally, we develop a ranking metric designed to pinpoint influencerswho are most likely to drive product purchases based on feedbackfrom their influencees. To evaluate our framework, we collect areal-world advertising network dataset, including social relations,post and comment content, and user behaviors.......","Renmin University of China, King Abdullah University of Science and Technology, Moonshot AI" +17,./images/llm-driven_agents_for_influencer_20240322.png,Lyfe Agents: Generative agents for low-cost real-time social interactions,"Zhao Kaiya, Michelangelo Naim, Jovana Kondic, Manuel Cortes, Jiaxin Ge, Shuying Luo, Guangyu Robert Yang, Andrew Ahn","Highly autonomous generative agents powered by large language models promise to simulate intricate social behaviors in virtual societies. However, achieving real-time interactions with humans at a low computational cost remains challenging. Here, we introduce Lyfe Agents. They combine low-cost with real-time responsiveness, all while remaining intelligent and goal-oriented. Key innovations include: (1) an option-action framework, reducing the cost of high-level decisions; (2) asynchronous self-monitoring for better self-consistency; and (3) a Summarize-and-Forget memory mechanism, prioritizing critical memory items at a low cost. We evaluate Lyfe Agents' self-motivation and sociability across several multi-agent scenarios in our custom LyfeGame 3D virtual environment platform. When equipped with our brain-inspired techniques, Lyfe Agents can exhibit human-like self-motivated social reasoning. For example, the agents can solve a crime (a murder mystery) through autonomous collaboration and information exchange. Meanwhile, our techniques enabled Lyfe Agents to operate at a computational cost 10-100 times lower than existing alternatives. Our findings underscore the transformative potential of autonomous generative agents to enrich human social experiences in virtual worlds.","Massachusetts Institute of Technology, Peking University, LyfeAL" +18,./images/lyfe_agents_generative_agents_20231003.png,MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents,"Yuan Li, Yixuan Zhang, Lichao Sun","Significant advancements have occurred in the application of Large LanguageModels (LLMs) for various tasks and social simulations. Despite this, their capac-ities to coordinate within task-oriented social contexts are under-explored. Suchcapabilities are crucial if LLMs are to effectively mimic human-like social be-havior and produce meaningful results. To bridge this gap, we introduce collab-orative generative agents, endowing LLM-based Agents with consistent behaviorpatterns and task-solving abilities. We situate these agents in a simulated job fairenvironment as a case study to scrutinize their coordination skills. We proposea novel framework that equips collaborative generative agents with human-likereasoning abilities and specialized skills. Our evaluation demonstrates that theseagents show promising performance. However, we also uncover limitations thathinder their effectiveness in more complex coordination tasks. Our work providesvaluable insights into the role and evolution of LLMs in task-oriented social sim-ulations.","University of Cambridge, William & Mary, Lehigh University" +19,./images/metaagents_simulating_interactions_of_20231010.png,On Generative Agents in Recommendation,"An Zhang, Yuxin Chen, Leheng Sheng, Xiang Wang, Tat-Seng Chua","Recommender systems are the cornerstone of today's information dissemination, yet a disconnect between offline metrics and online performance greatly hinders their development. Addressing this challenge, we envision a recommendation simulator, capitalizing on recent breakthroughs in human-level intelligence exhibited by Large Language Models (LLMs). We propose Agent4Rec, a user simulator in recommendation, leveraging LLM-empowered generative agents equipped with user profile, memory, and actions modules specifically tailored for the recommender system. In particular, these agents' profile modules are initialized using real-world datasets (e.g. MovieLens, Steam, Amazon-Book), capturing users' unique tastes and social traits; memory modules log both factual and emotional memories and are integrated with an emotion-driven reflection mechanism; action modules support a wide variety of behaviors, spanning both taste-driven and emotion-driven actions. Each agent interacts with personalized recommender models in a page-by-page manner, relying on a pre-implemented collaborative filtering-based recommendation algorithm. We delve into both the capabilities and limitations of Agent4Rec, aiming to explore an essential research question: ``To what extent can LLM-empowered generative agents faithfully simulate the behavior of real, autonomous humans in recommender systems?'' Extensive and multi-faceted evaluations of Agent4Rec highlight both the alignment and deviation between agents and user-personalized preferences. Beyond mere performance comparison, we explore insightful experiments, such as emulating the filter bubble effect and discovering the underlying causal relationships in recommendation tasks.","National University of Singapore, Tsinghua University, University of Science and Technology of China" +20,./images/on_generative_agents_in_20231016.png,"Out of One, Many: Using Language Models to Simulate Human Samples","Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua Gubler, Christopher Rytting, David Wingate","We propose and explore the possibility that language models can be studied as effective proxies for specific human sub-populations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the ""algorithmic bias"" within one such tool -- the GPT-3 language model -- is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property ""algorithmic fidelity"" and explore its extent in GPT-3. We create ""silicon samples"" by conditioning the model on thousands of socio-demographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio-cultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.",Brigham Young University +21,./images/out_of_one_many_20220914.png,Quantifying the Impact of Large Language Models on Collective Opinion Dynamics,"Chao Li, Xing Su, Haoying Han, Cong Xue, Chunmo Zheng, Chao Fan","The process of opinion expression and exchange is a critical component of democratic societies. As people interact with large language models (LLMs) in the opinion shaping process different from traditional media, the impacts of LLMs are increasingly recognized and being concerned. However, the knowledge about how LLMs affect the process of opinion expression and exchange of social opinion networks is very limited. Here, we create an opinion network dynamics model to encode the opinions of LLMs, cognitive acceptability and usage strategies of individuals, and simulate the impact of LLMs on opinion dynamics in a variety of scenarios. The outcomes of the simulations inform about effective demand-oriented opinion network interventions. The results from this study suggested that the output opinion of LLMs has a unique and positive effect on the collective opinion difference. The marginal effect of cognitive acceptability on collective opinion formation is nonlinear and shows a decreasing trend. When people partially rely on LLMs, the exchange process of opinion becomes more intense and the diversity of opinion becomes more favorable. In fact, there is 38.6% more opinion diversity when people all partially rely on LLMs, compared to prohibiting the use of LLMs entirely. The optimal diversity of opinion was found when the fractions of people who do not use, partially rely on, and fully rely on LLMs reached roughly 4:12:1. Our experiments also find that introducing extra agents with opposite/neutral/random opinions, we can effectively mitigate the impact of biased/toxic output from LLMs. Our findings provide valuable insights into opinion dynamics in the age of LLMs, highlighting the need for customized interventions tailored to specific scenarios to address the drawbacks of improper output and use of LLMs."," Zhejiang University, Clemson University, " +22,./images/quantifying_the_impact_of_20230807.png,S3: Social-network Simulation System with Large Language Model-Empowered Agents,"Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, Yong Li","Simulation plays a crucial role in addressing various challenges within socialscience. It offers extensive applications such as state prediction, phenomena ex-planation, and policy-making support, among others. In this work, we harness thehuman-like capabilities of large language models (LLMs) in sensing, reasoning,and behaving, and utilize these qualities to construct the S3 system (short forSocial network Simulation System). Adhering to the widely employed agent-basedsimulation paradigm, we employ fine-tuning and prompt engineering techniques toensure that the agent’s behavior closely emulates that of a genuine human withinthe social network. Specifically, we simulate three pivotal aspects: emotion, at-titude, and interaction behaviors. By endowing the agent in the system with theability to perceive the informational environment and emulate human actions, weobserve the emergence of population-level phenomena, including the propagationof information, attitudes, and emotions. We conduct an evaluation encompassingtwo levels of simulation, employing real-world social network data. Encouragingly,the results demonstrate promising accuracy. This work represents an initial step inthe realm of social network simulation empowered by LLM-based agents. We an-ticipate that our endeavors will serve as a source of inspiration for the developmentof simulation systems within, but not limited to, social science.",Tsinghua University +23,./images/s3_social-network_simulation_system_20230727.png,Simulating Opinion Dynamics with Networks of LLM-based Agents,"Yun-Shiuan Chuang, Agam Goyal, Nikunj Harlalka, Siddharth Suresh, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers","Accurately simulating human opinion dynam-ics is crucial for understanding a variety of soci-etal phenomena, including polarization and thespread of misinformation. However, the agent-based models (ABMs) commonly used for suchsimulations often over-simplify human behav-ior. We propose a new approach to simulat-ing opinion dynamics based on populations ofLarge Language Models (LLMs). Our findingsreveal a strong inherent bias in LLM agents to-wards producing accurate information, leadingsimulated agents to consensus in line with sci-entific reality. This bias limits their utility forunderstanding resistance to consensus viewson issues like climate change. After induc-ing confirmation bias through prompt engineer-ing, however, we observed opinion fragmenta-tion in line with existing agent-based modelingand opinion dynamics research. These insightshighlight the promise and limitations of LLMagents in this domain and suggest a path for-ward: refining LLMs with real-world discourseto better simulate the evolution of human be-liefs.",University of Wisconsin-Madison +24,./images/simulating_opinion_dynamics_with_20231116.png,Simulating Social Media Using Large Language Models to Evaluate Alternative News Feed Algorithms,"Petter Törnberg, Diliara Valeeva, Justus Uitermark, Christopher Bail",". Social media is often criticized for amplifyingtoxic discourse and discouraging constructive conversa-tions. But designing social media platforms to promotebetter conversations is inherently challenging. This paperasks whether simulating social media through a combina-tion of Large Language Models (LLM) and Agent-BasedModeling can help researchers study how different newsfeed algorithms shape the quality of online conversations.We create realistic personas using data from the Ameri-can National Election Study to populate simulated socialmedia platforms. Next, we prompt the agents to readand share news articles — and like or comment uponeach other’s messages — within three platforms that usedifferent news feed algorithms. In the first platform, userssee the most liked and commented posts from users whomthey follow. In the second, they see posts from all users —even those outside their own network. The third platformemploys a novel “bridging” algorithm that highlights poststhat are liked by people with opposing political views. Wefind this bridging algorithm promotes more constructive,non-toxic, conversation across political divides than theother two models. Though further research is needed toevaluate these findings, we argue that LLMs hold consid-erable potential to improve simulation research on socialmedia and many other complex social settings.","University of Amsterdam, Duke University" +25,./images/simulating_social_media_using_20231005.png,Social Simulacra: Creating Populated Prototypes for Social Computing Systems,"Joon Sung Park, Lindsay Popowski, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein","Social computing prototypes probe the social behaviors that mayarise in an envisioned system design. This prototyping practiceis currently limited to recruiting small groups of people. Unfortu-nately, many challenges do not arise until a system is populatedat a larger scale. Can a designer understand how a social systemmight behave when populated, and make adjustments to the de-sign before the system falls prey to such challenges? We intro-duce social simulacra, a prototyping technique that generates abreadth of realistic social interactions that may emerge when a so-cial computing system is populated. Social simulacra take as inputthe designer’s description of a community’s design—goal, rules, andmember personas—and produce as output an instance of that designwith simulated behavior, including posts, replies, and anti-socialbehaviors. We demonstrate that social simulacra shift the behaviorsthat they generate appropriately in response to design changes, andthat they enable exploration of “what if?” scenarios where commu-nity members or moderators intervene. To power social simulacra,we contribute techniques for prompting a large language modelto generate thousands of distinct community members and theirsocial interactions with each other; these techniques are enabled bythe observation that large language models’ training data alreadyincludes a wide variety of positive and negative behavior on socialmedia platforms. In evaluations, we show that participants are of-ten unable to distinguish social simulacra from actual communitybehavior and that social computing designers successfully refinetheir social computing designs when using social simulacra.","Stanford University, Google Research" +26,./images/social_simulacra_creating_populated_20220808.png,The Wisdom of Partisan Crowds: Comparing Collective Intelligence in Humans and LLM-based Agents,"Yun-Shiuan Chuang, Siddharth Suresh, Nikunj Harlalka, Agam Goyal, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers","Human groups are able to converge on more accurate beliefs through deliberation,even in the presence of polarization and partisan bias — a phenomenon known asthe “wisdom of partisan crowds.” Generated agents powered by Large LanguageModels (LLMs) are increasingly used to simulate human collective behavior, yetfew benchmarks exist for evaluating their dynamics against the behavior of hu-man groups. In this paper, we examine the extent to which the wisdom of partisancrowds emerges in groups of LLM-based agents that are prompted to role-playas partisan personas (e.g., Democrat or Republican). We find that they not onlydisplay human-like partisan biases, but also converge to more accurate beliefsthrough deliberation as humans do. We then identify several factors that interferewith convergence, including the use of chain-of-thought prompt and lack of detailsin personas. Conversely, fine-tuning on human data appears to enhance conver-gence. These findings show the potential and limitations of LLM-based agents asa model of human collective intelligence.",University of Wisconsin-Madison +27,./images/the_wisdom_of_partisan_20231116.png,To Infinity and Beyond- SHOW-1 and Showrunner Agents in Multi-Agent Simulations,"Philipp Maas, Frank Carey, Chris Wheeler, Edward Saatchi, Pete Billington, Jessica Yaffa Shamash","In this work we present our approach to generating high-quality episodic content for IP’s (Intellectual Property) using large language models (LLMs), custom state-of- the art diffusion models and our multi-agent simulation for contextualization, story progression and behavioral control. Powerful LLMs such as GPT-4 were trained on a large corpus of TV show data which lets us believe that with the right guidance users will be able to rewrite entire seasons.""That Is What Entertainment Will Look Like. Maybe people are still upset about the last season of Game of Thrones. Imagine if you could ask your A.I. to make a new ending that goes a different way and maybe even put yourself in there as a main character or something.”. ",Fable Studio +28,./images/to_infinity_and_beyond_20230724.png,Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation,"Xinyi Mou, Zhongyu Wei, Xuanjing Huang","Social media has emerged as a cornerstone ofsocial movements, wielding significant influ-ence in driving societal change. Simulatingthe response of the public and forecasting thepotential impact has become increasingly im-portant. However, existing methods for simu-lating such phenomena encounter challengesconcerning their efficacy and efficiency in cap-turing the behaviors of social movement par-ticipants. In this paper, we introduce a hybridframework HiSim for social media user simu-lation, wherein users are categorized into twotypes. Core users are driven by Large Lan-guage Models, while numerous ordinary usersare modeled by deductive agent-based models.We further construct a Twitter-like environmentto replicate their response dynamics followingtrigger events. Subsequently, we develop amulti-faceted benchmark SoMoSiMu-Benchfor evaluation and conduct comprehensive ex-periments across real-world datasets. Exper-imental results demonstrate the effectivenessand flexibility of our method","Fudan University, Shanghai Collaborative Innovation Center of Intelligent Visual Computing" +29,./images/unveiling_the_truth_and_20240226.png,User Behavior Simulation with Large Language Model based Agents,"Lei Wang, Jingsen Zhang, Hao Yang, Zhiyuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, Ji-Rong Wen","Simulating high quality user behavior data has always been a fundamental problem in human-centered applications, where the major difficulty originates from the intricate mechanism of human decision process. Recently, substantial evidences have suggested that by learning huge amounts of web knowledge, large language models (LLMs) can achieve human-like intelligence. We believe these models can provide significant opportunities to more believable user behavior simulation. To inspire such direction, we propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors. Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans. Concerning potential applications, we simulate and study two social phenomenons including (1) information cocoons and (2) user conformity behaviors. This research provides novel simulation paradigms for human-centered applications.","Renmin University of China, Beijing Key Laboratory of Big Data Management and Analysis Methods, University College London" +30,./images/user_behavior_simulation_with_20230605.png,Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies,"Gati Aher, Rosa I. Arriaga, Adam Tauman Kalai","We introduce a new type of test, called a Turing Experiment (TE), for evaluating to what extent a given language model, such as GPT models, can simulate different aspects of human behavior. A TE can also reveal consistent distortions in a language model's simulation of a specific human behavior. Unlike the Turing Test, which involves simulating a single arbitrary individual, a TE requires simulating a representative sample of participants in human subject research. We carry out TEs that attempt to replicate well-established findings from prior studies. We design a methodology for simulating TEs and illustrate its use to compare how well different language models are able to reproduce classic economic, psycholinguistic, and social psychology experiments: Ultimatum Game, Garden Path Sentences, Milgram Shock Experiment, and Wisdom of Crowds. In the first three TEs, the existing findings were replicated using recent models, while the last TE reveals a ""hyper-accuracy distortion"" present in some language models (including ChatGPT and GPT-4), which could affect downstream applications in education and the arts.","Olin College of Engineering, Georgia Tech, Microsoft Research" +31,./images/using_large_language_models_20220818.png,War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars,"Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, Yongfeng Zhang","Can we avoid wars at the crossroads of history? This question has been pursued byindividuals, scholars, policymakers, and organizations throughout human history.In this research, we attempt to answer the question based on the recent advancesof Artificial Intelligence (AI) and Large Language Models (LLMs). We proposeWarAgent, an LLM-powered multi-agent AI system, to simulate the participatingcountries, their decisions, and the consequences, in historical international conflicts,including the World War I (WWI), the World War II (WWII), and the WarringStates Period (WSP) in Ancient China. By evaluating the simulation effectiveness,we examine the advancements and limitations of cutting-edge AI systems’ abilitiesin studying complex collective human behaviors such as international conflictsunder diverse settings. In these simulations, the emergent interactions amongagents also offer a novel perspective for examining the triggers and conditions thatlead to war. Our findings offer data-driven and AI-augmented insights that canredefine how we approach conflict resolution and peacekeeping strategies. Theimplications stretch beyond historical analysis, offering a blueprint for using AI tounderstand human history and possibly prevent future international conflicts. Codeand data are available at https://github.com/agiresearch/WarAgent.",Rutgers University +32,./images/war_and_peace_(waragent)_20231128.png,To be Continued...,Your Contributions are Welcome!,, diff --git a/MultiAgentEbook/book_simulation/script.js b/MultiAgentEbook/book_simulation/script.js new file mode 100755 index 000000000..54da6ca8b --- /dev/null +++ b/MultiAgentEbook/book_simulation/script.js @@ -0,0 +1,94 @@ +document.addEventListener("DOMContentLoaded", function() { + + const csvFilePath = './book_simulation/data.csv'; + + + function loadCSV(filePath) { + return fetch(filePath) + .then(response => response.text()) + .then(text => Papa.parse(text, { header: true }).data); + } + + + function createFlipBook(pages) { + const container = document.getElementById('flip_book_container'); + const numPages = pages.length; + + let flipBookHTML = ''; + let style = document.createElement('style'); + let css = ''; + + + flipBookHTML += `\n`; + for (let i = 0; i < numPages - 1; i++) { + flipBookHTML += `\n`; + } + + flipBookHTML += `
\n`; + + flipBookHTML += `
+ +
` + + + for (let i = 0; i < numPages - 1; i++) { + console.log(i) + const page = pages[i]; + const pageIndex = i + 1; + + flipBookHTML += ` +
+
+ + Back content +
+
+ + Back page edge shading +
+

${page.title}

+

${page.author}

+

${page.affiliation}

+

${page.summary}

+
+
+
\n`; + + + css += ` + #page${pageIndex} { + z-index: ${numPages - i}; + } + + #page${pageIndex}_checkbox:checked~#flip_book #page${pageIndex} { + transform: rotateY(-180deg); + z-index: ${i + 1}; + }\n`; + } + + flipBookHTML += `
+ Back Cover +
`; + + + container.innerHTML = flipBookHTML; + + + style.innerHTML = css; + document.head.appendChild(style); + + + const md = window.markdownit(); + const summaryElements = document.querySelectorAll('.summary'); + summaryElements.forEach(el => { + el.innerHTML = md.render(el.textContent); + }); + } + + + loadCSV(csvFilePath).then(pages => { + createFlipBook(pages); + }); +}); diff --git a/MultiAgentEbook/book_simulation_index.html b/MultiAgentEbook/book_simulation_index.html new file mode 100755 index 000000000..66e2a6e01 --- /dev/null +++ b/MultiAgentEbook/book_simulation_index.html @@ -0,0 +1,23 @@ + + + + + + + Flip Book + + + + + +
+ + + + + + \ No newline at end of file diff --git a/MultiAgentEbook/book_style.css b/MultiAgentEbook/book_style.css new file mode 100755 index 000000000..86f7d1a6d --- /dev/null +++ b/MultiAgentEbook/book_style.css @@ -0,0 +1,238 @@ +*, *::before, *::after { + box-sizing: border-box; + margin: 0; + padding: 0; + font-family: "Arial Black"; + color: rgb(25, 25, 25); +} + +body { + display: flex; + flex-direction: column; + justify-content: center; + align-items: center; + width: 100%; + height: 100vh; + background-color: rgb(245, 245, 245); +} + +input { + display: none; +} + +#flip_book { + position: relative; + width: 596px; + height: 840px; + transition-duration: 1s; + perspective: 2000px; +} + +.front_cover, .back_cover { + display: flex; + justify-content: center; + align-items: center; + width: 100%; + height: 100%; + border-radius: 2.5px 5px 5px 2.5px; + background-image: url(./images/flip_book_edge_shading.png); + background-size: cover; + background-position: center; + background-repeat: no-repeat; + background-color: rgb(255, 255, 255); + box-shadow: 0 0 5px 0 rgb(25, 25, 25, 0.25); +} + +.front_cover { + position: absolute; + cursor: pointer; + transform-origin: center left; + transition: transform 0.5s; + z-index: 99; +} + +.front_cover label { + position: absolute; + width: 100%; + height: 100%; + cursor: pointer; +} + +.page { + position: absolute; + top: 10px; + left: 1px; + width: 576px; + height: 820px; + border-radius: 0 5px 5px 0; + background-color: white; + transform-origin: left; + transform-style: preserve-3d; + transform: rotateY(0deg); + transition-duration: 0.5s; +} + +.front_page { + position: absolute; + width: 100%; + height: 100%; + backface-visibility: hidden; +} + +.front_page label { + position: absolute; + width: 100%; + height: 100%; + cursor: pointer; + z-index: 100; +} + +.back_page { + position: absolute; + width: 100%; + height: 100%; + backface-visibility: hidden; + transform: rotateY(180deg); + z-index: 100; +} + +.back_page label { + position: absolute; + width: 100%; + height: 100%; + cursor: pointer; + z-index: 100; +} + +.edge_shading { + position: absolute; + width: 576px; + height: 820px; + z-index: 98; +} + +.front_content { + position: absolute; + top: 1px; + width: 574px; + height: 796px; + border-radius: 0 5px 5px 0; + z-index: 97; +} + +.back_content { + position: absolute; + top: 1px; + left: 1px; + width: 574px; + height: 796px; + border-radius: 5px 0 0 5px; + z-index: 97; +} + +.back_cover { + position: relative; + z-index: -1; +} + + +.welcome_text { + display: flex; + justify-content: center; + align-items: center; + width: 100%; + height: 100%; + font-size: 2em; + color: rgb(25, 25, 25); +} + +.text_content { + display: flex; + flex-direction: column; + justify-content: center; + align-items: justify; + width: 100%; + height: 100%; + padding: 20px; + box-sizing: border-box; + text-align: center; +} + +.text_content h1, .text_content p.author { + margin: 0; + padding: 10px 0; +} + +.text_content p.author { + font-style: italic; + color: #555; +} + +.text_content p.summary { + text-align: justify; + text-align-last: left; + max-width: 800px; + margin: 20px 0; + line-height: 1.6; + overflow-wrap: break-word; + hyphens: auto; + font-size: smaller; +} + +.text_content h1 { + font-size: 24px; +} + + + +.thank_you_text { + display: flex; + justify-content: center; + align-items: center; + width: 100%; + height: 100%; + font-size: 2em; + color: rgb(25, 25, 25); +} + +#cover_checkbox:checked~#flip_book { + transform: translateX(288px) +} + +#cover_checkbox:checked~#flip_book .front_cover { + transform: rotateY(-180deg); + transition: transform 1.5s, z-index 0.5s 0.5s; + z-index: 0; +} + +#cover_checkbox:checked~#flip_book { + transform: translateX(288px); +} + +#cover_checkbox:checked~#flip_book .front_cover { + transform: rotateY(-180deg); + transition: transform 1.5s, z-index 0.5s 0.5s; + z-index: 0; +} + + +.cover_image { + width: 100%; + height: 100%; + object-fit: cover; + border-radius: 2.5px 5px 5px 2.5px; +} + + +.back_cover .cover_image { + width: 100%; + height: 100%; + object-fit: cover; + border-radius: 2.5px 5px 5px 2.5px; +} + +.text_content_summary { + text-align: left; + display: inline-block; + width: 100%; +} \ No newline at end of file diff --git a/MultiAgentEbook/communication.html b/MultiAgentEbook/communication.html new file mode 100644 index 000000000..a66f7246e --- /dev/null +++ b/MultiAgentEbook/communication.html @@ -0,0 +1,162 @@ + + + + + + + + + + + §1: Communication + + + + + + + + + + + +
+
+
+
+
+
+
+ ← Back Homepage +
+

§1: Communication

+
+
+

+ Task-oriented agent communication typically focuses on protocol design and knowledge-augmented communication, ensuring more effective information interaction and consensus building. Click on the ebook below to read. +

+
+ +
+ +
+
+
+
+ +
+ + + + + + + + + + + + +
TitleAuthorsAffiliationsLinkDate
+
+
+
+ +
+

+ Initiated by the ChatDev Group, Tsinghua + University +
Contact us via qianc62@gmail.com +

+
+ + + + + \ No newline at end of file diff --git a/MultiAgentEbook/evolution.html b/MultiAgentEbook/evolution.html new file mode 100644 index 000000000..621708afd --- /dev/null +++ b/MultiAgentEbook/evolution.html @@ -0,0 +1,159 @@ + + + + + + + + + + + §3: Evolution + + + + + + + + + + + +
+
+
+
+
+
+
+ ← Back Homepage +
+

§3: Evolution

+
+
+

+ The evolution of multi-agent systems focuses on cross-task experience accumulation, enabling agents to enhance their capabilities and adapt to increasingly complex challenges. Click on the ebook below to read. +

+
+ +
+ +
+
+
+
+
+ + + + + + + + + + + + +
TitleAuthorsAffiliationsLinkDate
+
+
+
+ +
+

+ Initiated by the ChatDev Group, Tsinghua + University +
Contact us via qianc62@gmail.com +

+
+ + + + \ No newline at end of file diff --git a/MultiAgentEbook/images/(perhaps)_beyond_human_translation_20240520.png b/MultiAgentEbook/images/(perhaps)_beyond_human_translation_20240520.png new file mode 100644 index 000000000..d5d28d17d Binary files /dev/null and b/MultiAgentEbook/images/(perhaps)_beyond_human_translation_20240520.png differ diff --git a/MultiAgentEbook/images/1.png b/MultiAgentEbook/images/1.png new file mode 100755 index 000000000..c9e45a217 Binary files /dev/null and b/MultiAgentEbook/images/1.png differ diff --git a/MultiAgentEbook/images/1a.png b/MultiAgentEbook/images/1a.png new file mode 100755 index 000000000..7fec8c37f Binary files /dev/null and b/MultiAgentEbook/images/1a.png differ diff --git a/MultiAgentEbook/images/1b.png b/MultiAgentEbook/images/1b.png new file mode 100755 index 000000000..913d1ccaa Binary files /dev/null and b/MultiAgentEbook/images/1b.png differ diff --git a/MultiAgentEbook/images/1d.png b/MultiAgentEbook/images/1d.png new file mode 100755 index 000000000..6ab168c6c Binary files /dev/null and b/MultiAgentEbook/images/1d.png differ diff --git a/MultiAgentEbook/images/1e.png b/MultiAgentEbook/images/1e.png new file mode 100755 index 000000000..2c76e1020 Binary files /dev/null and b/MultiAgentEbook/images/1e.png differ diff --git a/MultiAgentEbook/images/2.png b/MultiAgentEbook/images/2.png new file mode 100755 index 000000000..7809aaa6e Binary files /dev/null and b/MultiAgentEbook/images/2.png differ diff --git a/MultiAgentEbook/images/2a.png b/MultiAgentEbook/images/2a.png new file mode 100755 index 000000000..e658f0536 Binary files /dev/null and b/MultiAgentEbook/images/2a.png differ diff --git a/MultiAgentEbook/images/2b.png b/MultiAgentEbook/images/2b.png new file mode 100755 index 000000000..44df92dbf Binary files /dev/null and b/MultiAgentEbook/images/2b.png differ diff --git a/MultiAgentEbook/images/2d.png b/MultiAgentEbook/images/2d.png new file mode 100755 index 000000000..42406e05c Binary files /dev/null and b/MultiAgentEbook/images/2d.png differ diff --git a/MultiAgentEbook/images/2e.png b/MultiAgentEbook/images/2e.png new file mode 100755 index 000000000..e21c80044 Binary files /dev/null and b/MultiAgentEbook/images/2e.png differ diff --git a/MultiAgentEbook/images/3.png b/MultiAgentEbook/images/3.png new file mode 100644 index 000000000..10ca82c75 Binary files /dev/null and b/MultiAgentEbook/images/3.png differ diff --git "a/MultiAgentEbook/images/360\302\260rea_towards_a_reusable_20240408.png" "b/MultiAgentEbook/images/360\302\260rea_towards_a_reusable_20240408.png" new file mode 100644 index 000000000..4039bc8d1 Binary files /dev/null and "b/MultiAgentEbook/images/360\302\260rea_towards_a_reusable_20240408.png" differ diff --git a/MultiAgentEbook/images/3a.png b/MultiAgentEbook/images/3a.png new file mode 100755 index 000000000..97c5932f3 Binary files /dev/null and b/MultiAgentEbook/images/3a.png differ diff --git a/MultiAgentEbook/images/3b.png b/MultiAgentEbook/images/3b.png new file mode 100644 index 000000000..34cd13768 Binary files /dev/null and b/MultiAgentEbook/images/3b.png differ diff --git a/MultiAgentEbook/images/3d.png b/MultiAgentEbook/images/3d.png new file mode 100644 index 000000000..235728562 Binary files /dev/null and b/MultiAgentEbook/images/3d.png differ diff --git a/MultiAgentEbook/images/3e.png b/MultiAgentEbook/images/3e.png new file mode 100755 index 000000000..d568221af Binary files /dev/null and b/MultiAgentEbook/images/3e.png differ diff --git a/MultiAgentEbook/images/4.png b/MultiAgentEbook/images/4.png new file mode 100755 index 000000000..099450873 Binary files /dev/null and b/MultiAgentEbook/images/4.png differ diff --git a/MultiAgentEbook/images/4a.png b/MultiAgentEbook/images/4a.png new file mode 100755 index 000000000..966405897 Binary files /dev/null and b/MultiAgentEbook/images/4a.png differ diff --git a/MultiAgentEbook/images/4b.png b/MultiAgentEbook/images/4b.png new file mode 100755 index 000000000..f2880a1d2 Binary files /dev/null and b/MultiAgentEbook/images/4b.png differ diff --git a/MultiAgentEbook/images/4d.png b/MultiAgentEbook/images/4d.png new file mode 100755 index 000000000..39a229d61 Binary files /dev/null and b/MultiAgentEbook/images/4d.png differ diff --git a/MultiAgentEbook/images/4e.png b/MultiAgentEbook/images/4e.png new file mode 100755 index 000000000..7795bd5a0 Binary files /dev/null and b/MultiAgentEbook/images/4e.png differ diff --git a/MultiAgentEbook/images/affordable_generative_agents_20240203.png b/MultiAgentEbook/images/affordable_generative_agents_20240203.png new file mode 100644 index 000000000..dfa76e619 Binary files /dev/null and b/MultiAgentEbook/images/affordable_generative_agents_20240203.png differ diff --git a/MultiAgentEbook/images/agent_hospital_a_simulacrum_20240505.png b/MultiAgentEbook/images/agent_hospital_a_simulacrum_20240505.png new file mode 100644 index 000000000..63a90d511 Binary files /dev/null and b/MultiAgentEbook/images/agent_hospital_a_simulacrum_20240505.png differ diff --git a/MultiAgentEbook/images/agentcf_collaborative_learning_with_20231013.png b/MultiAgentEbook/images/agentcf_collaborative_learning_with_20231013.png new file mode 100644 index 000000000..d6c03a78e Binary files /dev/null and b/MultiAgentEbook/images/agentcf_collaborative_learning_with_20231013.png differ diff --git a/MultiAgentEbook/images/agentverse_cover.png b/MultiAgentEbook/images/agentverse_cover.png new file mode 100755 index 000000000..a765c0196 Binary files /dev/null and b/MultiAgentEbook/images/agentverse_cover.png differ diff --git a/MultiAgentEbook/images/agentverse_facilitating_multi-agent_collaboration_20230821.png b/MultiAgentEbook/images/agentverse_facilitating_multi-agent_collaboration_20230821.png new file mode 100644 index 000000000..4d64a1eed Binary files /dev/null and b/MultiAgentEbook/images/agentverse_facilitating_multi-agent_collaboration_20230821.png differ diff --git a/MultiAgentEbook/images/ai_hospital_interactive_evaluation_20240215.png b/MultiAgentEbook/images/ai_hospital_interactive_evaluation_20240215.png new file mode 100644 index 000000000..a661925f2 Binary files /dev/null and b/MultiAgentEbook/images/ai_hospital_interactive_evaluation_20240215.png differ diff --git a/MultiAgentEbook/images/apollo's_oracle_retrieval-augmented_reasoning_20231208.png b/MultiAgentEbook/images/apollo's_oracle_retrieval-augmented_reasoning_20231208.png new file mode 100644 index 000000000..6238d4318 Binary files /dev/null and b/MultiAgentEbook/images/apollo's_oracle_retrieval-augmented_reasoning_20231208.png differ diff --git a/MultiAgentEbook/images/application_cover.png b/MultiAgentEbook/images/application_cover.png new file mode 100755 index 000000000..51b7f32b1 Binary files /dev/null and b/MultiAgentEbook/images/application_cover.png differ diff --git a/MultiAgentEbook/images/are_you_in_a_20230719.png b/MultiAgentEbook/images/are_you_in_a_20230719.png new file mode 100644 index 000000000..4af38f572 Binary files /dev/null and b/MultiAgentEbook/images/are_you_in_a_20230719.png differ diff --git a/MultiAgentEbook/images/atm_adversarial_tuning_multi-agent_20240528.png b/MultiAgentEbook/images/atm_adversarial_tuning_multi-agent_20240528.png new file mode 100644 index 000000000..5d4240fe8 Binary files /dev/null and b/MultiAgentEbook/images/atm_adversarial_tuning_multi-agent_20240528.png differ diff --git a/MultiAgentEbook/images/auto_arena_of_llms_20240530.png b/MultiAgentEbook/images/auto_arena_of_llms_20240530.png new file mode 100644 index 000000000..093e4902e Binary files /dev/null and b/MultiAgentEbook/images/auto_arena_of_llms_20240530.png differ diff --git a/MultiAgentEbook/images/autoform_cover.png b/MultiAgentEbook/images/autoform_cover.png new file mode 100755 index 000000000..9917a423c Binary files /dev/null and b/MultiAgentEbook/images/autoform_cover.png differ diff --git a/MultiAgentEbook/images/autogen_enabling_next-gen_llm_20230816.png b/MultiAgentEbook/images/autogen_enabling_next-gen_llm_20230816.png new file mode 100644 index 000000000..60691aa7c Binary files /dev/null and b/MultiAgentEbook/images/autogen_enabling_next-gen_llm_20230816.png differ diff --git a/MultiAgentEbook/images/autonomous_agents_for_collaborative_20240621.png b/MultiAgentEbook/images/autonomous_agents_for_collaborative_20240621.png new file mode 100644 index 000000000..dfe61c9dc Binary files /dev/null and b/MultiAgentEbook/images/autonomous_agents_for_collaborative_20240621.png differ diff --git a/MultiAgentEbook/images/avalon's_game_of_thoughts_20231002.png b/MultiAgentEbook/images/avalon's_game_of_thoughts_20231002.png new file mode 100644 index 000000000..c8ab2b534 Binary files /dev/null and b/MultiAgentEbook/images/avalon's_game_of_thoughts_20231002.png differ diff --git a/MultiAgentEbook/images/back_page_edge_shading.png b/MultiAgentEbook/images/back_page_edge_shading.png new file mode 100755 index 000000000..d45f4bbf6 Binary files /dev/null and b/MultiAgentEbook/images/back_page_edge_shading.png differ diff --git a/MultiAgentEbook/images/battleagent_multi-modal_dynamic_emulation_20240423.png b/MultiAgentEbook/images/battleagent_multi-modal_dynamic_emulation_20240423.png new file mode 100644 index 000000000..c2d3ef106 Binary files /dev/null and b/MultiAgentEbook/images/battleagent_multi-modal_dynamic_emulation_20240423.png differ diff --git a/MultiAgentEbook/images/beyond_natural_language_llms_20240228.png b/MultiAgentEbook/images/beyond_natural_language_llms_20240228.png new file mode 100644 index 000000000..22a6dadb0 Binary files /dev/null and b/MultiAgentEbook/images/beyond_natural_language_llms_20240228.png differ diff --git a/MultiAgentEbook/images/bg-pattern.svg b/MultiAgentEbook/images/bg-pattern.svg new file mode 100755 index 000000000..5fc9a3bf6 --- /dev/null +++ b/MultiAgentEbook/images/bg-pattern.svg @@ -0,0 +1,3 @@ + + + diff --git a/MultiAgentEbook/images/building_cooperative_embodied_agents_20230705.png b/MultiAgentEbook/images/building_cooperative_embodied_agents_20230705.png new file mode 100644 index 000000000..8905e02c6 Binary files /dev/null and b/MultiAgentEbook/images/building_cooperative_embodied_agents_20230705.png differ diff --git a/MultiAgentEbook/images/camel_communicative_agents_for_20230331.png b/MultiAgentEbook/images/camel_communicative_agents_for_20230331.png new file mode 100644 index 000000000..c190c8477 Binary files /dev/null and b/MultiAgentEbook/images/camel_communicative_agents_for_20230331.png differ diff --git a/MultiAgentEbook/images/can_large_language_model_20240207.png b/MultiAgentEbook/images/can_large_language_model_20240207.png new file mode 100644 index 000000000..0ce3d14c8 Binary files /dev/null and b/MultiAgentEbook/images/can_large_language_model_20240207.png differ diff --git a/MultiAgentEbook/images/chain_of_agents_large_20240604.png b/MultiAgentEbook/images/chain_of_agents_large_20240604.png new file mode 100644 index 000000000..97c9232e0 Binary files /dev/null and b/MultiAgentEbook/images/chain_of_agents_large_20240604.png differ diff --git a/MultiAgentEbook/images/chatcoder_chat-based_refine_requirement_20231101.png b/MultiAgentEbook/images/chatcoder_chat-based_refine_requirement_20231101.png new file mode 100644 index 000000000..f7a559c51 Binary files /dev/null and b/MultiAgentEbook/images/chatcoder_chat-based_refine_requirement_20231101.png differ diff --git a/MultiAgentEbook/images/chatdev.png b/MultiAgentEbook/images/chatdev.png new file mode 100755 index 000000000..315b6c071 Binary files /dev/null and b/MultiAgentEbook/images/chatdev.png differ diff --git a/MultiAgentEbook/images/chatdev_communicative_agents_for_20230716.png b/MultiAgentEbook/images/chatdev_communicative_agents_for_20230716.png new file mode 100644 index 000000000..a1ed24783 Binary files /dev/null and b/MultiAgentEbook/images/chatdev_communicative_agents_for_20230716.png differ diff --git a/MultiAgentEbook/images/chatdev_cover.png b/MultiAgentEbook/images/chatdev_cover.png new file mode 100755 index 000000000..bfd87c478 Binary files /dev/null and b/MultiAgentEbook/images/chatdev_cover.png differ diff --git a/MultiAgentEbook/images/chateval_cover.png b/MultiAgentEbook/images/chateval_cover.png new file mode 100755 index 000000000..470a03204 Binary files /dev/null and b/MultiAgentEbook/images/chateval_cover.png differ diff --git a/MultiAgentEbook/images/chateval_towards_better_llm-based_20230814.png b/MultiAgentEbook/images/chateval_towards_better_llm-based_20230814.png new file mode 100644 index 000000000..d55133be2 Binary files /dev/null and b/MultiAgentEbook/images/chateval_towards_better_llm-based_20230814.png differ diff --git a/MultiAgentEbook/images/colearning_cover.png b/MultiAgentEbook/images/colearning_cover.png new file mode 100755 index 000000000..8ddb8e1b6 Binary files /dev/null and b/MultiAgentEbook/images/colearning_cover.png differ diff --git a/MultiAgentEbook/images/comm_collaborative_multi-agent,_multi-reasoning-path_20240426.png b/MultiAgentEbook/images/comm_collaborative_multi-agent,_multi-reasoning-path_20240426.png new file mode 100644 index 000000000..df128660b Binary files /dev/null and b/MultiAgentEbook/images/comm_collaborative_multi-agent,_multi-reasoning-path_20240426.png differ diff --git a/MultiAgentEbook/images/communication_cover.png b/MultiAgentEbook/images/communication_cover.png new file mode 100755 index 000000000..027460ab8 Binary files /dev/null and b/MultiAgentEbook/images/communication_cover.png differ diff --git a/MultiAgentEbook/images/competeai_understanding_the_competition_20231026.png b/MultiAgentEbook/images/competeai_understanding_the_competition_20231026.png new file mode 100644 index 000000000..aaeaebb0f Binary files /dev/null and b/MultiAgentEbook/images/competeai_understanding_the_competition_20231026.png differ diff --git a/MultiAgentEbook/images/cover.png b/MultiAgentEbook/images/cover.png new file mode 100755 index 000000000..1f7ca7668 Binary files /dev/null and b/MultiAgentEbook/images/cover.png differ diff --git a/MultiAgentEbook/images/ctc_cover.png b/MultiAgentEbook/images/ctc_cover.png new file mode 100755 index 000000000..5cd7da599 Binary files /dev/null and b/MultiAgentEbook/images/ctc_cover.png differ diff --git a/MultiAgentEbook/images/dataset_cover.png b/MultiAgentEbook/images/dataset_cover.png new file mode 100755 index 000000000..79732f88a Binary files /dev/null and b/MultiAgentEbook/images/dataset_cover.png differ diff --git a/MultiAgentEbook/images/describe,_explain,_plan_and_20230203.png b/MultiAgentEbook/images/describe,_explain,_plan_and_20230203.png new file mode 100644 index 000000000..c71d4acd4 Binary files /dev/null and b/MultiAgentEbook/images/describe,_explain,_plan_and_20230203.png differ diff --git a/MultiAgentEbook/images/dynamic_llm-agent_network_an_20231003.png b/MultiAgentEbook/images/dynamic_llm-agent_network_an_20231003.png new file mode 100644 index 000000000..3b2ea8549 Binary files /dev/null and b/MultiAgentEbook/images/dynamic_llm-agent_network_an_20231003.png differ diff --git a/MultiAgentEbook/images/econagent_large_language_model-empowered_20231016.png b/MultiAgentEbook/images/econagent_large_language_model-empowered_20231016.png new file mode 100644 index 000000000..b5febfa6a Binary files /dev/null and b/MultiAgentEbook/images/econagent_large_language_model-empowered_20231016.png differ diff --git a/MultiAgentEbook/images/ei_cover.png b/MultiAgentEbook/images/ei_cover.png new file mode 100755 index 000000000..4fde06d8a Binary files /dev/null and b/MultiAgentEbook/images/ei_cover.png differ diff --git a/MultiAgentEbook/images/encouraging_divergent_thinking_in_20230530.png b/MultiAgentEbook/images/encouraging_divergent_thinking_in_20230530.png new file mode 100644 index 000000000..9b79d118d Binary files /dev/null and b/MultiAgentEbook/images/encouraging_divergent_thinking_in_20230530.png differ diff --git a/MultiAgentEbook/images/epidemic_modeling_with_generative_20230711.png b/MultiAgentEbook/images/epidemic_modeling_with_generative_20230711.png new file mode 100644 index 000000000..4f02f2c52 Binary files /dev/null and b/MultiAgentEbook/images/epidemic_modeling_with_generative_20230711.png differ diff --git a/MultiAgentEbook/images/evolution_cover.png b/MultiAgentEbook/images/evolution_cover.png new file mode 100644 index 000000000..7a815376a Binary files /dev/null and b/MultiAgentEbook/images/evolution_cover.png differ diff --git a/MultiAgentEbook/images/examining_inter-consistency_of_large_20230519.png b/MultiAgentEbook/images/examining_inter-consistency_of_large_20230519.png new file mode 100644 index 000000000..f284a3771 Binary files /dev/null and b/MultiAgentEbook/images/examining_inter-consistency_of_large_20230519.png differ diff --git a/MultiAgentEbook/images/experiential_co-learning_of_software-developing_20231228.png b/MultiAgentEbook/images/experiential_co-learning_of_software-developing_20231228.png new file mode 100644 index 000000000..2d4909d94 Binary files /dev/null and b/MultiAgentEbook/images/experiential_co-learning_of_software-developing_20231228.png differ diff --git a/MultiAgentEbook/images/exploring_collaboration_mechanisms_for_20231003.png b/MultiAgentEbook/images/exploring_collaboration_mechanisms_for_20231003.png new file mode 100644 index 000000000..f46d3b0c7 Binary files /dev/null and b/MultiAgentEbook/images/exploring_collaboration_mechanisms_for_20231003.png differ diff --git a/MultiAgentEbook/images/exploring_large_language_models_20230909.png b/MultiAgentEbook/images/exploring_large_language_models_20230909.png new file mode 100644 index 000000000..2e8dd4ab1 Binary files /dev/null and b/MultiAgentEbook/images/exploring_large_language_models_20230909.png differ diff --git a/MultiAgentEbook/images/facilitating_multi-role_and_multi-behavior_20240528.png b/MultiAgentEbook/images/facilitating_multi-role_and_multi-behavior_20240528.png new file mode 100644 index 000000000..7ea0936e0 Binary files /dev/null and b/MultiAgentEbook/images/facilitating_multi-role_and_multi-behavior_20240528.png differ diff --git a/MultiAgentEbook/images/favicon.png b/MultiAgentEbook/images/favicon.png new file mode 100755 index 000000000..c6c67af8c Binary files /dev/null and b/MultiAgentEbook/images/favicon.png differ diff --git a/MultiAgentEbook/images/flip_book_edge_shading.png b/MultiAgentEbook/images/flip_book_edge_shading.png new file mode 100755 index 000000000..5396cfe23 Binary files /dev/null and b/MultiAgentEbook/images/flip_book_edge_shading.png differ diff --git a/MultiAgentEbook/images/framework_cover.png b/MultiAgentEbook/images/framework_cover.png new file mode 100755 index 000000000..7c7aab1fc Binary files /dev/null and b/MultiAgentEbook/images/framework_cover.png differ diff --git a/MultiAgentEbook/images/gamegpt_multi-agent_collaborative_framework_20231012.png b/MultiAgentEbook/images/gamegpt_multi-agent_collaborative_framework_20231012.png new file mode 100644 index 000000000..081c1bf26 Binary files /dev/null and b/MultiAgentEbook/images/gamegpt_multi-agent_collaborative_framework_20231012.png differ diff --git a/MultiAgentEbook/images/generative_agents_interactive_simulacra_20230407.png b/MultiAgentEbook/images/generative_agents_interactive_simulacra_20230407.png new file mode 100644 index 000000000..e76981f16 Binary files /dev/null and b/MultiAgentEbook/images/generative_agents_interactive_simulacra_20230407.png differ diff --git a/MultiAgentEbook/images/github.png b/MultiAgentEbook/images/github.png new file mode 100755 index 000000000..04c49c387 Binary files /dev/null and b/MultiAgentEbook/images/github.png differ diff --git a/MultiAgentEbook/images/github_normal.png b/MultiAgentEbook/images/github_normal.png new file mode 100755 index 000000000..d5c80e094 Binary files /dev/null and b/MultiAgentEbook/images/github_normal.png differ diff --git a/MultiAgentEbook/images/humanoid_agents_platform_for_20231009.png b/MultiAgentEbook/images/humanoid_agents_platform_for_20231009.png new file mode 100644 index 000000000..96c6ab8ba Binary files /dev/null and b/MultiAgentEbook/images/humanoid_agents_platform_for_20231009.png differ diff --git a/MultiAgentEbook/images/iagents_cover.png b/MultiAgentEbook/images/iagents_cover.png new file mode 100755 index 000000000..f19314270 Binary files /dev/null and b/MultiAgentEbook/images/iagents_cover.png differ diff --git a/MultiAgentEbook/images/icon-close.svg b/MultiAgentEbook/images/icon-close.svg new file mode 100755 index 000000000..0777febe8 --- /dev/null +++ b/MultiAgentEbook/images/icon-close.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/MultiAgentEbook/images/icon-hamburger.svg b/MultiAgentEbook/images/icon-hamburger.svg new file mode 100755 index 000000000..248cb5ccf --- /dev/null +++ b/MultiAgentEbook/images/icon-hamburger.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/MultiAgentEbook/images/improving_factuality_and_reasoning_20230523.png b/MultiAgentEbook/images/improving_factuality_and_reasoning_20230523.png new file mode 100644 index 000000000..c6b253694 Binary files /dev/null and b/MultiAgentEbook/images/improving_factuality_and_reasoning_20230523.png differ diff --git a/MultiAgentEbook/images/improving_language_model_negotiation_20230517.png b/MultiAgentEbook/images/improving_language_model_negotiation_20230517.png new file mode 100644 index 000000000..70205cb1b Binary files /dev/null and b/MultiAgentEbook/images/improving_language_model_negotiation_20230517.png differ diff --git a/MultiAgentEbook/images/improving_multi-agent_debate_with_20240617.png b/MultiAgentEbook/images/improving_multi-agent_debate_with_20240617.png new file mode 100644 index 000000000..1203b6301 Binary files /dev/null and b/MultiAgentEbook/images/improving_multi-agent_debate_with_20240617.png differ diff --git a/MultiAgentEbook/images/interaction_welcome.png b/MultiAgentEbook/images/interaction_welcome.png new file mode 100755 index 000000000..4537efea1 Binary files /dev/null and b/MultiAgentEbook/images/interaction_welcome.png differ diff --git a/MultiAgentEbook/images/ioa_cover.png b/MultiAgentEbook/images/ioa_cover.png new file mode 100755 index 000000000..2b268b2ab Binary files /dev/null and b/MultiAgentEbook/images/ioa_cover.png differ diff --git a/MultiAgentEbook/images/iterative_experience_refinement_of_20240507.png b/MultiAgentEbook/images/iterative_experience_refinement_of_20240507.png new file mode 100644 index 000000000..ae3ed7436 Binary files /dev/null and b/MultiAgentEbook/images/iterative_experience_refinement_of_20240507.png differ diff --git a/MultiAgentEbook/images/language_agents_as_digital_20231108.png b/MultiAgentEbook/images/language_agents_as_digital_20231108.png new file mode 100755 index 000000000..11571b163 Binary files /dev/null and b/MultiAgentEbook/images/language_agents_as_digital_20231108.png differ diff --git a/MultiAgentEbook/images/language_agents_as_optimizable_20240226.png b/MultiAgentEbook/images/language_agents_as_optimizable_20240226.png new file mode 100755 index 000000000..ab5fa7d7f Binary files /dev/null and b/MultiAgentEbook/images/language_agents_as_optimizable_20240226.png differ diff --git a/MultiAgentEbook/images/large_language_models_are_20230327.png b/MultiAgentEbook/images/large_language_models_are_20230327.png new file mode 100755 index 000000000..44527a592 Binary files /dev/null and b/MultiAgentEbook/images/large_language_models_are_20230327.png differ diff --git a/MultiAgentEbook/images/learn_to_disguise_avoid_20240403.png b/MultiAgentEbook/images/learn_to_disguise_avoid_20240403.png new file mode 100755 index 000000000..ba0c2719b Binary files /dev/null and b/MultiAgentEbook/images/learn_to_disguise_avoid_20240403.png differ diff --git a/MultiAgentEbook/images/leveraging_large_language_models_20231103.png b/MultiAgentEbook/images/leveraging_large_language_models_20231103.png new file mode 100755 index 000000000..e4606a4df Binary files /dev/null and b/MultiAgentEbook/images/leveraging_large_language_models_20231103.png differ diff --git a/MultiAgentEbook/images/llm-based_agent_society_investigation_20231023.png b/MultiAgentEbook/images/llm-based_agent_society_investigation_20231023.png new file mode 100755 index 000000000..c5def2516 Binary files /dev/null and b/MultiAgentEbook/images/llm-based_agent_society_investigation_20231023.png differ diff --git a/MultiAgentEbook/images/llm-driven_agents_for_influencer_20240322.png b/MultiAgentEbook/images/llm-driven_agents_for_influencer_20240322.png new file mode 100755 index 000000000..4e83770a0 Binary files /dev/null and b/MultiAgentEbook/images/llm-driven_agents_for_influencer_20240322.png differ diff --git a/MultiAgentEbook/images/lm_vs_lm_detecting_20230522.png b/MultiAgentEbook/images/lm_vs_lm_detecting_20230522.png new file mode 100755 index 000000000..32d632037 Binary files /dev/null and b/MultiAgentEbook/images/lm_vs_lm_detecting_20230522.png differ diff --git a/MultiAgentEbook/images/logo.png b/MultiAgentEbook/images/logo.png new file mode 100755 index 000000000..6058f7146 Binary files /dev/null and b/MultiAgentEbook/images/logo.png differ diff --git a/MultiAgentEbook/images/logo3.png b/MultiAgentEbook/images/logo3.png new file mode 100755 index 000000000..485b99293 Binary files /dev/null and b/MultiAgentEbook/images/logo3.png differ diff --git a/MultiAgentEbook/images/logo4.png b/MultiAgentEbook/images/logo4.png new file mode 100755 index 000000000..9961d3a72 Binary files /dev/null and b/MultiAgentEbook/images/logo4.png differ diff --git a/MultiAgentEbook/images/logo5.png b/MultiAgentEbook/images/logo5.png new file mode 100755 index 000000000..e9627ab57 Binary files /dev/null and b/MultiAgentEbook/images/logo5.png differ diff --git a/MultiAgentEbook/images/longagent_scaling_language_models_20240218.png b/MultiAgentEbook/images/longagent_scaling_language_models_20240218.png new file mode 100755 index 000000000..27a1652dc Binary files /dev/null and b/MultiAgentEbook/images/longagent_scaling_language_models_20240218.png differ diff --git a/MultiAgentEbook/images/lyfe_agents_generative_agents_20231003.png b/MultiAgentEbook/images/lyfe_agents_generative_agents_20231003.png new file mode 100755 index 000000000..cb6828c15 Binary files /dev/null and b/MultiAgentEbook/images/lyfe_agents_generative_agents_20231003.png differ diff --git a/MultiAgentEbook/images/metaagents_simulating_interactions_of_20231010.png b/MultiAgentEbook/images/metaagents_simulating_interactions_of_20231010.png new file mode 100755 index 000000000..78b49fd9a Binary files /dev/null and b/MultiAgentEbook/images/metaagents_simulating_interactions_of_20231010.png differ diff --git a/MultiAgentEbook/images/metagpt_meta_programming_for_20230801.png b/MultiAgentEbook/images/metagpt_meta_programming_for_20230801.png new file mode 100755 index 000000000..c4480b711 Binary files /dev/null and b/MultiAgentEbook/images/metagpt_meta_programming_for_20230801.png differ diff --git a/MultiAgentEbook/images/mora_enabling_generalist_video_20240320.png b/MultiAgentEbook/images/mora_enabling_generalist_video_20240320.png new file mode 100755 index 000000000..7a0c76f7a Binary files /dev/null and b/MultiAgentEbook/images/mora_enabling_generalist_video_20240320.png differ diff --git a/MultiAgentEbook/images/multi-agent_software_development_through_20240613.png b/MultiAgentEbook/images/multi-agent_software_development_through_20240613.png new file mode 100755 index 000000000..80dd82517 Binary files /dev/null and b/MultiAgentEbook/images/multi-agent_software_development_through_20240613.png differ diff --git a/MultiAgentEbook/images/multi_agent_framework.png b/MultiAgentEbook/images/multi_agent_framework.png new file mode 100755 index 000000000..a01f9135b Binary files /dev/null and b/MultiAgentEbook/images/multi_agent_framework.png differ diff --git a/MultiAgentEbook/images/multi_agent_framework_ss.png b/MultiAgentEbook/images/multi_agent_framework_ss.png new file mode 100755 index 000000000..1153d92d3 Binary files /dev/null and b/MultiAgentEbook/images/multi_agent_framework_ss.png differ diff --git a/MultiAgentEbook/images/multi_agent_framework_ts.png b/MultiAgentEbook/images/multi_agent_framework_ts.png new file mode 100755 index 000000000..57b4b262f Binary files /dev/null and b/MultiAgentEbook/images/multi_agent_framework_ts.png differ diff --git a/MultiAgentEbook/images/multiagent_collaboration_attack_investigating_20240620.png b/MultiAgentEbook/images/multiagent_collaboration_attack_investigating_20240620.png new file mode 100755 index 000000000..a95fa6a33 Binary files /dev/null and b/MultiAgentEbook/images/multiagent_collaboration_attack_investigating_20240620.png differ diff --git a/MultiAgentEbook/images/on_generative_agents_in_20231016.png b/MultiAgentEbook/images/on_generative_agents_in_20231016.png new file mode 100755 index 000000000..b17501782 Binary files /dev/null and b/MultiAgentEbook/images/on_generative_agents_in_20231016.png differ diff --git a/MultiAgentEbook/images/organization.png b/MultiAgentEbook/images/organization.png new file mode 100755 index 000000000..b6698af00 Binary files /dev/null and b/MultiAgentEbook/images/organization.png differ diff --git a/MultiAgentEbook/images/organization_cover.png b/MultiAgentEbook/images/organization_cover.png new file mode 100755 index 000000000..2f90297f4 Binary files /dev/null and b/MultiAgentEbook/images/organization_cover.png differ diff --git a/MultiAgentEbook/images/out_of_one_many_20220914.png b/MultiAgentEbook/images/out_of_one_many_20220914.png new file mode 100755 index 000000000..3b645a9f9 Binary files /dev/null and b/MultiAgentEbook/images/out_of_one_many_20220914.png differ diff --git a/MultiAgentEbook/images/pdf.png b/MultiAgentEbook/images/pdf.png new file mode 100755 index 000000000..4a8fa8a37 Binary files /dev/null and b/MultiAgentEbook/images/pdf.png differ diff --git a/MultiAgentEbook/images/pdf_normal.png b/MultiAgentEbook/images/pdf_normal.png new file mode 100755 index 000000000..6e4148e4f Binary files /dev/null and b/MultiAgentEbook/images/pdf_normal.png differ diff --git a/MultiAgentEbook/images/player_enhancing_llm-based_multi-agent_20240426.png b/MultiAgentEbook/images/player_enhancing_llm-based_multi-agent_20240426.png new file mode 100755 index 000000000..fc4438438 Binary files /dev/null and b/MultiAgentEbook/images/player_enhancing_llm-based_multi-agent_20240426.png differ diff --git a/MultiAgentEbook/images/quantifying_the_impact_of_20230807.png b/MultiAgentEbook/images/quantifying_the_impact_of_20230807.png new file mode 100755 index 000000000..55804146f Binary files /dev/null and b/MultiAgentEbook/images/quantifying_the_impact_of_20230807.png differ diff --git a/MultiAgentEbook/images/reconcile_round-table_conference_improves_20230922.png b/MultiAgentEbook/images/reconcile_round-table_conference_improves_20230922.png new file mode 100755 index 000000000..264c82d17 Binary files /dev/null and b/MultiAgentEbook/images/reconcile_round-table_conference_improves_20230922.png differ diff --git a/MultiAgentEbook/images/rethinking_the_bounds_of_20240228.png b/MultiAgentEbook/images/rethinking_the_bounds_of_20240228.png new file mode 100755 index 000000000..f544d0d40 Binary files /dev/null and b/MultiAgentEbook/images/rethinking_the_bounds_of_20240228.png differ diff --git a/MultiAgentEbook/images/roco_dialectic_multi-robot_collaboration_20230710.png b/MultiAgentEbook/images/roco_dialectic_multi-robot_collaboration_20230710.png new file mode 100755 index 000000000..941d1936b Binary files /dev/null and b/MultiAgentEbook/images/roco_dialectic_multi-robot_collaboration_20230710.png differ diff --git a/MultiAgentEbook/images/s3_social-network_simulation_system_20230727.png b/MultiAgentEbook/images/s3_social-network_simulation_system_20230727.png new file mode 100755 index 000000000..d7bab0211 Binary files /dev/null and b/MultiAgentEbook/images/s3_social-network_simulation_system_20230727.png differ diff --git a/MultiAgentEbook/images/scalable_multi-robot_collaboration_with_20230927.png b/MultiAgentEbook/images/scalable_multi-robot_collaboration_with_20230927.png new file mode 100755 index 000000000..d8d0cd2d9 Binary files /dev/null and b/MultiAgentEbook/images/scalable_multi-robot_collaboration_with_20230927.png differ diff --git a/MultiAgentEbook/images/scaling_large-language-model-based_multi-agent_collaboration_20240611.png b/MultiAgentEbook/images/scaling_large-language-model-based_multi-agent_collaboration_20240611.png new file mode 100755 index 000000000..5fcf618e7 Binary files /dev/null and b/MultiAgentEbook/images/scaling_large-language-model-based_multi-agent_collaboration_20240611.png differ diff --git a/MultiAgentEbook/images/self-organized_agents_a_llm_20240402.png b/MultiAgentEbook/images/self-organized_agents_a_llm_20240402.png new file mode 100755 index 000000000..b43396867 Binary files /dev/null and b/MultiAgentEbook/images/self-organized_agents_a_llm_20240402.png differ diff --git a/MultiAgentEbook/images/simulating_opinion_dynamics_with_20231116.png b/MultiAgentEbook/images/simulating_opinion_dynamics_with_20231116.png new file mode 100755 index 000000000..df45dd008 Binary files /dev/null and b/MultiAgentEbook/images/simulating_opinion_dynamics_with_20231116.png differ diff --git a/MultiAgentEbook/images/simulating_social_media_using_20231005.png b/MultiAgentEbook/images/simulating_social_media_using_20231005.png new file mode 100755 index 000000000..391d114a7 Binary files /dev/null and b/MultiAgentEbook/images/simulating_social_media_using_20231005.png differ diff --git a/MultiAgentEbook/images/simulation.png b/MultiAgentEbook/images/simulation.png new file mode 100755 index 000000000..fa788dd97 Binary files /dev/null and b/MultiAgentEbook/images/simulation.png differ diff --git a/MultiAgentEbook/images/simulation_cover.pdf b/MultiAgentEbook/images/simulation_cover.pdf new file mode 100644 index 000000000..9adafa3d9 Binary files /dev/null and b/MultiAgentEbook/images/simulation_cover.pdf differ diff --git a/MultiAgentEbook/images/simulation_cover.png b/MultiAgentEbook/images/simulation_cover.png new file mode 100755 index 000000000..8732c8ee7 Binary files /dev/null and b/MultiAgentEbook/images/simulation_cover.png differ diff --git a/MultiAgentEbook/images/social_simulacra_creating_populated_20220808.png b/MultiAgentEbook/images/social_simulacra_creating_populated_20220808.png new file mode 100755 index 000000000..3367c3dce Binary files /dev/null and b/MultiAgentEbook/images/social_simulacra_creating_populated_20220808.png differ diff --git a/MultiAgentEbook/images/strategyllm_large_language_models_20231115.png b/MultiAgentEbook/images/strategyllm_large_language_models_20231115.png new file mode 100755 index 000000000..cbeb2311c Binary files /dev/null and b/MultiAgentEbook/images/strategyllm_large_language_models_20231115.png differ diff --git a/MultiAgentEbook/images/the_impact_of_language_20240616.png b/MultiAgentEbook/images/the_impact_of_language_20240616.png new file mode 100755 index 000000000..aecc2c6de Binary files /dev/null and b/MultiAgentEbook/images/the_impact_of_language_20240616.png differ diff --git a/MultiAgentEbook/images/the_wisdom_of_partisan_20231116.png b/MultiAgentEbook/images/the_wisdom_of_partisan_20231116.png new file mode 100755 index 000000000..e424aa61c Binary files /dev/null and b/MultiAgentEbook/images/the_wisdom_of_partisan_20231116.png differ diff --git a/MultiAgentEbook/images/theory_of_mind_for_20231016.png b/MultiAgentEbook/images/theory_of_mind_for_20231016.png new file mode 100755 index 000000000..ca778f77c Binary files /dev/null and b/MultiAgentEbook/images/theory_of_mind_for_20231016.png differ diff --git a/MultiAgentEbook/images/tmp.pdf b/MultiAgentEbook/images/tmp.pdf new file mode 100644 index 000000000..96444fd76 Binary files /dev/null and b/MultiAgentEbook/images/tmp.pdf differ diff --git a/MultiAgentEbook/images/to_infinity_and_beyond_20230724.png b/MultiAgentEbook/images/to_infinity_and_beyond_20230724.png new file mode 100755 index 000000000..e142b54e3 Binary files /dev/null and b/MultiAgentEbook/images/to_infinity_and_beyond_20230724.png differ diff --git a/MultiAgentEbook/images/toward_optimal_llm_alignments_20240616.png b/MultiAgentEbook/images/toward_optimal_llm_alignments_20240616.png new file mode 100755 index 000000000..aed071618 Binary files /dev/null and b/MultiAgentEbook/images/toward_optimal_llm_alignments_20240616.png differ diff --git a/MultiAgentEbook/images/towards_detecting_llms_hallucination_20240605.png b/MultiAgentEbook/images/towards_detecting_llms_hallucination_20240605.png new file mode 100755 index 000000000..bfd631828 Binary files /dev/null and b/MultiAgentEbook/images/towards_detecting_llms_hallucination_20240605.png differ diff --git a/MultiAgentEbook/images/traveler_a_multi-lmm_agent_20240401.png b/MultiAgentEbook/images/traveler_a_multi-lmm_agent_20240401.png new file mode 100755 index 000000000..6f7f98abe Binary files /dev/null and b/MultiAgentEbook/images/traveler_a_multi-lmm_agent_20240401.png differ diff --git a/MultiAgentEbook/images/unleashing_the_emergent_cognitive_20230711.png b/MultiAgentEbook/images/unleashing_the_emergent_cognitive_20230711.png new file mode 100755 index 000000000..8e2a829e7 Binary files /dev/null and b/MultiAgentEbook/images/unleashing_the_emergent_cognitive_20230711.png differ diff --git a/MultiAgentEbook/images/unveiling_the_truth_and_20240226.png b/MultiAgentEbook/images/unveiling_the_truth_and_20240226.png new file mode 100755 index 000000000..49aa67a39 Binary files /dev/null and b/MultiAgentEbook/images/unveiling_the_truth_and_20240226.png differ diff --git a/MultiAgentEbook/images/user_behavior_simulation_with_20230605.png b/MultiAgentEbook/images/user_behavior_simulation_with_20230605.png new file mode 100755 index 000000000..73ea49b94 Binary files /dev/null and b/MultiAgentEbook/images/user_behavior_simulation_with_20230605.png differ diff --git a/MultiAgentEbook/images/using_large_language_models_20220818.png b/MultiAgentEbook/images/using_large_language_models_20220818.png new file mode 100755 index 000000000..40f5da75c Binary files /dev/null and b/MultiAgentEbook/images/using_large_language_models_20220818.png differ diff --git a/MultiAgentEbook/images/war_and_peace_(waragent)_20231128.png b/MultiAgentEbook/images/war_and_peace_(waragent)_20231128.png new file mode 100755 index 000000000..f2fa91c33 Binary files /dev/null and b/MultiAgentEbook/images/war_and_peace_(waragent)_20231128.png differ diff --git a/MultiAgentEbook/index.html b/MultiAgentEbook/index.html new file mode 100644 index 000000000..0083c6de0 --- /dev/null +++ b/MultiAgentEbook/index.html @@ -0,0 +1,329 @@ + + + + + + + + + + + + Multi-Agent Research Outline + + + + + + + + +
+
+ + +
+ background-pattern +
+

+ Comprehensive Outline of Large Language Model-based Multi-Agent Research +

+

+ This project presents an interactive eBook that compiles an extensive collection of research papers on + large language model (LLM)-based multi-agent systems. Organized into multiple chapters and + continuously updated with significant research, it strives to provide a comprehensive outline for + both researchers and enthusiasts in the field. We welcome ongoing contributions to expand and enhance + this resource. +

+

Initiated by the ChatDev Group at Tsinghua + University.

+ +
+
+ Cover +
+
+
+
+ + +
+
+ background-pattern +

Multi-Agent Directions

+
+

+ Multi-agent systems are currently classified into two categories based on whether the agents are designed to + achieve specific task goals under external human instructions: task-solving-oriented systems and + social-simulation-oriented systems. +

+
+
+
    +
  • Task Solving
  • +
  • Social Simulation
  • +
+
+
+ Comprehensive Resources +
+
+
+

+ Task solving-oriented multi-agent systems employ autonomous agents working collaboratively to tackle + complex problems. Cutting-edge research in this direction revolves around three primary areas: + facilitating communication among agents, designing effective organizational structures for interaction, + and exploring how agents co-evolve over time. +

+ Dataset cover +
+
+
+
+
+ Community Driven +
+
+
+

+ Social simulation-oriented multi-agent systems concentrate on modeling and analyzing the social + behaviors of agents, offering valuable insights into human dynamics and enhances the ability to analyze + or predict social phenomena. +

+ Dataset cover +
+
+
+
+
+
+ +
+
+
Dive into Each Chapter
+
+

+ This ebook contains research papers on the multi-agent layer and above, organized into multiple chapters based + on proposed core technologies. Let's dive into each section. +

+
+
+
+ Systems cover +

§1: Communication

+

facilitating agent communication

+ Read +
+
+ Benchmark cover +

§2: Organization

+

organizing agents effectively

+ Read +
+
+ Dataset cover +

§3: Evolution

+

growing capabilities over time

+ Read +
+
+ Systems cover +

§4: Simulation

+

simulating societal dynamics

+ Read +
+
+
+
+ +
+

Learn More

+
+

+ In addition to the aforementioned resources, we also feature recent research from our lab. If you find our work + of interest, we invite you to read, extend, or collaborate. +

+
+
+
+ +
+ +

iAgents

+

Bijective Social Networks of Humans and Agents +

+ PDF IconPaper + GitHub IconCode +
+ +
+ +

IoA

+

Networking Heterogeneous Agents

+ PDF IconPaper + GitHub IconCode +
+ +
+ +

ChatDev

+

Multi-Agent Collaboration for Software Development

+ PDF IconPaper + GitHub IconCode +
+ +
+ +

AgentVerse

+

General-Purpose Multi-Agent Framework

+ PDF IconPaper + GitHub IconCode +
+ +
+ +

Co-Learning

+

Cross-Task Experience Co-Leaning for Mutual Growth

+ PDF IconPaper + GitHub IconCode +
+ +
+ +

Co-Evolving

+

Continuous Experience Refinement over Time

+ PDF IconPaper + GitHub IconCode +
+ +
+ +

MacNet

+

Exploring Collaborative Scaling Law

+ PDF IconPaper + GitHub IconCode +
+ +
+ +

CTC

+

Cross-Team Multi-Agent Orchestration

+ PDF IconPaper + GitHub IconCode +
+ +
+ +

ChatEval

+

Communication for Automated Evaluation

+ PDF IconPaper + GitHub IconCode +
+ +
+ +

AutoForm

+

Finding Effective Communication Protocals

+ PDF IconPaper + GitHub IconCode +
+ +
+
+
+ + +
+
+

Frequently Asked Questions

+
+
+ +

+ This ebook gathers leading research on LLM-powered multi-agent systems since 2023, categorized by key + perspectives in the field. As this area rapidly evolves, updates will be ongoing. +

+
+
+ +

+ We encourage open-source collaboration on this project. You can contribute by submitting a pull request with + detailed metadata for notable papers in the table. +

+
+
+ +

+ You can download all ebook content in CSV format directly from here. +

+
+
+
+
+ +
+

+ Initiated by the ChatDev Group, Tsinghua + University +
Contact us via qianc62@gmail.com +
+ Total PV +

+
+ + + + + \ No newline at end of file diff --git a/MultiAgentEbook/main.js b/MultiAgentEbook/main.js new file mode 100644 index 000000000..b94596195 --- /dev/null +++ b/MultiAgentEbook/main.js @@ -0,0 +1,42 @@ +const hamburger = document.querySelector(".hamburger-container"); +const tabNav = document.querySelector(".tab-nav"); +const tabNavList = document.querySelectorAll(".tab-nav li"); +const tabList = document.querySelectorAll(".tab-body"); +const questions = document.querySelectorAll(".question"); +const logoContainer = document.querySelector('.logo-container'); +let toggle = false; + +hamburger.addEventListener("click", function () { + const hamburger = document.querySelector(".hamburger"); + const navList = document.querySelector(".nav-list"); + toggle = !toggle; + let srcHam = "./images/icon-hamburger.svg"; + let srcClose = "./images/icon-close.svg"; + hamburger.src = toggle ? srcClose : srcHam; + navList.classList.toggle("active"); + logoContainer.classList.toggle('active'); + document.body.style.position = toggle ? 'fixed' : 'static'; +}); + +tabNavList.forEach((item, index, array) => { + item.addEventListener("click", () => { + tabNav.querySelector(".active").classList.remove("active"); + item.classList.add("active"); + + if (item.classList.contains("one")) { + tabList[0].classList.add("active"); + tabList[1].classList.remove("active"); + } + + if (item.classList.contains("two")) { + tabList[1].classList.add("active"); + tabList[0].classList.remove("active"); + } + }); +}); + +questions.forEach((item) => { + item.addEventListener("click", () => { + item.classList.toggle("open"); + }); +}); \ No newline at end of file diff --git a/MultiAgentEbook/organization.html b/MultiAgentEbook/organization.html new file mode 100644 index 000000000..7c3384af9 --- /dev/null +++ b/MultiAgentEbook/organization.html @@ -0,0 +1,159 @@ + + + + + + + + + + + §2: Organization + + + + + + + + + + + +
+
+
+
+
+
+
+ ← Back Homepage +
+

§2: Organization

+
+
+

+ Multi-agent organization emphasizes both the topological structures and workflow orchestration, facilitating enhanced collaboration and improved collective intelligence. Click on the ebook below to read. +

+
+ +
+ +
+
+
+
+
+ + + + + + + + + + + + +
TitleAuthorsAffiliationsLinkDate
+
+
+
+
+

+ Initiated by the ChatDev Group, Tsinghua + University +
Contact us via qianc62@gmail.com +

+
+ + + + \ No newline at end of file diff --git a/MultiAgentEbook/papers.csv b/MultiAgentEbook/papers.csv new file mode 100755 index 000000000..3cbd7158b --- /dev/null +++ b/MultiAgentEbook/papers.csv @@ -0,0 +1,1804 @@ +Title,Authors,Date,Abstract,Url,AwesomeListCategory,Categories,PaperIndex,Affiliation +(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts,"Minghao Wu, Yulin Yuan, Gholamreza Haffari, Longyue Wang",2024.5.20,"Recent advancements in machine translation (MT) have significantly enhanced +translation quality across various domains. However, the translation of literary +texts remains a formidable challenge due to their complex language, figurative ex- +pressions, and cultural nuances. In this work, we introduce a novel multi-agent +framework based on large language models (LLMs) for literary translation, im- +plemented as a company called TRANSAGENTS, which mirrors traditional trans- +lation publication process by leveraging the collective capabilities of multiple +agents, to address the intricate demands of translating literary works. To evaluate +the effectiveness of our system, we propose two innovative evaluation strategies: +Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP). +MHP assesses translations from the perspective of monolingual readers of the tar- +get language, while BLP uses advanced LLMs to compare translations directly +with the original texts. Empirical findings indicate that despite lower d-BLEU +scores, translations from TRANSAGENTS are preferred by both human evalua- +tors and LLMs over human-written references, particularly in genres requiring +domain-specific knowledge. We also highlight the strengths and limitations of +TRANSAGENTS through case studies and suggests directions for future research.",https://arxiv.org/abs/2405.11804,Organization,Computation and Language (cs.CL),(perhaps)_beyond_human_translation_20240520,"Monash University, University of Macau, Tencent AI Lab" +(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts,"Minghao Wu, Yulin Yuan, Gholamreza Haffari, Longyue Wang",2024.5.20,"Recent advancements in machine translation (MT) have significantly enhanced +translation quality across various domains. However, the translation of literary +texts remains a formidable challenge due to their complex language, figurative ex- +pressions, and cultural nuances. In this work, we introduce a novel multi-agent +framework based on large language models (LLMs) for literary translation, im- +plemented as a company called TRANSAGENTS, which mirrors traditional trans- +lation publication process by leveraging the collective capabilities of multiple +agents, to address the intricate demands of translating literary works. To evaluate +the effectiveness of our system, we propose two innovative evaluation strategies: +Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP). +MHP assesses translations from the perspective of monolingual readers of the tar- +get language, while BLP uses advanced LLMs to compare translations directly +with the original texts. Empirical findings indicate that despite lower d-BLEU +scores, translations from TRANSAGENTS are preferred by both human evalua- +tors and LLMs over human-written references, particularly in genres requiring +domain-specific knowledge. We also highlight the strengths and limitations of +TRANSAGENTS through case studies and suggests directions for future research.",https://arxiv.org/abs/2405.11804,Simulation,Computation and Language (cs.CL),(perhaps)_beyond_human_translation_20240520,"Monash University, University of Macau, Tencent AI Lab" +360°REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System,"Shen Gao, Hao Li, Zhengliang Shi, Chengrui Huang, Quan Tu, Zhiliang Tian, Minlie Huang, Shuo Shang",2024.4.8,"Large +language +model +agents +have +demonstrated +remarkable +advancements +across various complex tasks. Recent works +focus on optimizing the agent team or +employing self-reflection to iteratively solve +complex tasks. +Since these agents are all +based on the same LLM, only conducting +self-evaluation or removing underperforming +agents does not substantively enhance the +capability of the agents. +We argue that a +comprehensive evaluation and accumulating +experience from evaluation feedback is an +effective +approach +to +improving +system +performance. +In this paper, we propose +Reusable +Experience +Accumulation +with +360◦ Assessment (360◦REA), a hierarchical +multi-agent framework inspired by corporate +organizational practices. +The framework +employs a novel 360◦ performance assessment +method for multi-perspective performance +evaluation with fine-grained assessment. To +enhance the capability of agents in addressing +complex +tasks, +we +introduce +dual-level +experience pool for agents to accumulate +experience through fine-grained assessment. +Extensive +experiments +on +complex +task +datasets demonstrate the effectiveness of +360◦REA.",https://arxiv.org/abs/2404.05569,Evolution,Artificial Intelligence (cs.AI),360°rea_towards_a_reusable_20240408,"University of Electronic Science and Technology of China, Shandong University, Renmin University of China, National University of Defense Technology, Tsinghua University" +Affordable Generative Agents,"Yangbin Yu, Qin Zhang, Junyou Li, Qiang Fu, Deheng Ye",2024.2.3,"The emergence of large language models (LLMs) +has significantly advanced the simulation of +believable interactive agents. +However, the +substantial cost on maintaining the prolonged +agent interactions poses challenge over the +deployment of believable LLM-based agents. +Therefore, in this paper, we develop Affordable +Generative Agents (AGA), a framework for +enabling the generation of believable and +low-cost interactions on both agent-environment +and inter-agents levels. Specifically, for agent- +environment interactions, we substitute repetitive +LLM inferences with learned policies; while for +inter-agent interactions, we model the social rela- +tionships between agents and compress auxiliary +dialogue information. Extensive experiments on +multiple environments show the effectiveness +and efficiency of our proposed framework. Also, +we delve into the mechanisms of emergent +believable behaviors lying in LLM agents, +demonstrating that agents can only generate +finite behaviors in fixed environments, based +upon which, we understand ways to facilitate +emergent interaction behaviors. +Our code is +publicly available at: +https://github. +com/AffordableGenerativeAgents/ +Affordable-Generative-Agents.",https://arxiv.org/abs/2402.02053,Evolution,Artificial Intelligence (cs.AI),affordable_generative_agents_20240203,Tencent Inc. +Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents,"Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, Yang Liu",2024.5.5,"In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the +entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by +large language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illness +within the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum can +simulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keep +accumulating experience from both successful and unsuccessful cases. Simulation experiments show that +the treatment performance of doctor agents consistently improves on various tasks. More interestingly, +the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicare +benchmarks. After treating around ten thousand patients (real-world doctors may take over two years), +the evolved doctor agent achieves a state-of-the-art accuracy of 9",https://arxiv.org/abs/2405.02957,Evolution,Artificial Intelligence (cs.AI),agent_hospital_a_simulacrum_20240505,Tsinghua University +Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents,"Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, Yang Liu",2024.5.5,"In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the +entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by +large language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illness +within the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum can +simulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keep +accumulating experience from both successful and unsuccessful cases. Simulation experiments show that +the treatment performance of doctor agents consistently improves on various tasks. More interestingly, +the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicare +benchmarks. After treating around ten thousand patients (real-world doctors may take over two years), +the evolved doctor agent achieves a state-of-the-art accuracy of 9",https://arxiv.org/abs/2405.02957,Organization,Artificial Intelligence (cs.AI),agent_hospital_a_simulacrum_20240505,Tsinghua University +Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents,"Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, Yang Liu",2024.5.5,"In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the +entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by +large language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illness +within the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum can +simulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keep +accumulating experience from both successful and unsuccessful cases. Simulation experiments show that +the treatment performance of doctor agents consistently improves on various tasks. More interestingly, +the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicare +benchmarks. After treating around ten thousand patients (real-world doctors may take over two years), +the evolved doctor agent achieves a state-of-the-art accuracy of 9",https://arxiv.org/abs/2405.02957,Simulation,Artificial Intelligence (cs.AI),agent_hospital_a_simulacrum_20240505,Tsinghua University +AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems,"Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, Ji-Rong Wen",2023.10.13,"Recently, there has been an emergence of employing LLM-powered +agents as believable human proxies, based on their remarkable +decision-making capability. However, existing studies mainly focus +on simulating human dialogue. Human non-verbal behaviors, such +as item clicking in recommender systems, although implicitly ex- +hibiting user preferences and could enhance the modeling of users, +have not been deeply explored. The main reasons lie in the gap +between language modeling and behavior modeling, as well as the +incomprehension of LLMs about user-item relations. +To address this issue, we propose AgentCF for simulating user- +item interactions in recommender systems through agent-based +collaborative filtering. We creatively consider not only users but +also items as agents, and develop a collaborative learning approach +that optimizes both kinds of agents together. Specifically, at each +time step, we first prompt the user and item agents to interact au- +tonomously. Then, based on the disparities between the agents’ +decisions and real-world interaction records, user and item agents +are prompted to reflect on and adjust the misleading simulations +collaboratively, thereby modeling their two-sided relations. The op- +timized agents can also propagate their preferences to other agents +in subsequent interactions, implicitly capturing the collaborative fil- +tering idea. Overall, the optimized agents exhibit diverse interaction +behaviors within our framework, including user-item, user-user, +item-item, and collective interactions. The results show that these +agents can demonstrate personalized behaviors akin to those of real- +world individuals, sparking the development of next-generation +user behavior simulation.",https://arxiv.org/abs/2310.09233,Communication,Information Retrieval (cs.IR),agentcf_collaborative_learning_with_20231013,"Renmin University of China, UC San Diego, Tencent" +AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems,"Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, Ji-Rong Wen",2023.10.13,"Recently, there has been an emergence of employing LLM-powered +agents as believable human proxies, based on their remarkable +decision-making capability. However, existing studies mainly focus +on simulating human dialogue. Human non-verbal behaviors, such +as item clicking in recommender systems, although implicitly ex- +hibiting user preferences and could enhance the modeling of users, +have not been deeply explored. The main reasons lie in the gap +between language modeling and behavior modeling, as well as the +incomprehension of LLMs about user-item relations. +To address this issue, we propose AgentCF for simulating user- +item interactions in recommender systems through agent-based +collaborative filtering. We creatively consider not only users but +also items as agents, and develop a collaborative learning approach +that optimizes both kinds of agents together. Specifically, at each +time step, we first prompt the user and item agents to interact au- +tonomously. Then, based on the disparities between the agents’ +decisions and real-world interaction records, user and item agents +are prompted to reflect on and adjust the misleading simulations +collaboratively, thereby modeling their two-sided relations. The op- +timized agents can also propagate their preferences to other agents +in subsequent interactions, implicitly capturing the collaborative fil- +tering idea. Overall, the optimized agents exhibit diverse interaction +behaviors within our framework, including user-item, user-user, +item-item, and collective interactions. The results show that these +agents can demonstrate personalized behaviors akin to those of real- +world individuals, sparking the development of next-generation +user behavior simulation.",https://arxiv.org/abs/2310.09233,Simulation,Information Retrieval (cs.IR),agentcf_collaborative_learning_with_20231013,"Renmin University of China, UC San Diego, Tencent" +AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors,"Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou",2023.8.21,"Autonomous agents empowered by Large Language Models (LLMs) have under- +gone significant improvements, enabling them to generalize across a broad spec- +trum of tasks. However, in real-world scenarios, cooperation among individuals is +often required to enhance the efficiency and effectiveness of task accomplishment. +Hence, inspired by human group dynamics, we propose a multi-agent framework +AGENTVERSE that can effectively orchestrate a collaborative group of expert agents +as a greater-than-the-sum-of-its-parts system. Our experiments demonstrate that +AGENTVERSE can proficiently deploy multi-agent groups that outperform a single +agent. Extensive experiments on text understanding, reasoning, coding, tool utiliza- +tion, and embodied AI confirm the effectiveness of AGENTVERSE. Moreover, our +analysis of agent interactions within AGENTVERSE reveals the emergence of spe- +cific collaborative behaviors, contributing to heightened group efficiency. Our code +has been released at https://github.com/OpenBMB/AgentVerse/.",https://arxiv.org/abs/2308.10848,Communication,Computation and Language (cs.CL),agentverse_facilitating_multi-agent_collaboration_20230821,"Tsinghua University, Beijing University of Posts and Telecommunications, Tencent Inc." +AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors,"Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou",2023.8.21,"Autonomous agents empowered by Large Language Models (LLMs) have under- +gone significant improvements, enabling them to generalize across a broad spec- +trum of tasks. However, in real-world scenarios, cooperation among individuals is +often required to enhance the efficiency and effectiveness of task accomplishment. +Hence, inspired by human group dynamics, we propose a multi-agent framework +AGENTVERSE that can effectively orchestrate a collaborative group of expert agents +as a greater-than-the-sum-of-its-parts system. Our experiments demonstrate that +AGENTVERSE can proficiently deploy multi-agent groups that outperform a single +agent. Extensive experiments on text understanding, reasoning, coding, tool utiliza- +tion, and embodied AI confirm the effectiveness of AGENTVERSE. Moreover, our +analysis of agent interactions within AGENTVERSE reveals the emergence of spe- +cific collaborative behaviors, contributing to heightened group efficiency. Our code +has been released at https://github.com/OpenBMB/AgentVerse/.",https://arxiv.org/abs/2308.10848,Simulation,Computation and Language (cs.CL),agentverse_facilitating_multi-agent_collaboration_20230821,"Tsinghua University, Beijing University of Posts and Telecommunications, Tencent Inc." +AI Hospital: Interactive Evaluation and Collaboration of LLMs as Intern Doctors for Clinical Diagnosis,"Zhihao Fan, Jialong Tang, Wei Chen, Siyuan Wang, Zhongyu Wei, Jun Xi, Fei Huang, Jingren Zhou",2024.2.15,"The incorporation of Large Language Models +(LLMs) in healthcare marks a significant ad- +vancement. However, the application has pre- +dominantly been limited to discriminative and +question-answering tasks, which does not fully +leverage their interactive potential. To address +this limitation, our paper presents AI Hospital, +a framework designed to build a real-time in- +teractive diagnosis environment. To simulate +the procedure, we collect high-quality medical +records to create patient, examiner, and medical +director agents. AI Hospital is then utilized for +the interactive evaluation and collaboration of +LLMs. Initially, we create a Multi-View Medi- +cal Evaluation (MVME) benchmark where vari- +ous LLMs serve as intern doctors for interactive +diagnosis. Subsequently, to improve diagnostic +accuracy, we introduce a collaborative mech- +anism that involves iterative discussions and +a dispute resolution process under the supervi- +sion of the medical director. In our experiments, +we validate the reliability of AI Hospital. The +results not only explore the feasibility of apply +LLMs in clinical consultation but also confirm +the effectiveness of the dispute resolution fo- +cused collaboration method.",https://arxiv.org/abs/2402.09742,Simulation,Computation and Language (cs.CL),ai_hospital_interactive_evaluation_20240215,"Alibaba Inc., Huazhong University of Science and Technology, Fudan University" +Apollo's Oracle: Retrieval-Augmented Reasoning in Multi-Agent Debates,"Haotian Wang, Xiyuan Du, Weijiang Yu, Qianglong Chen, Kun Zhu, Zheng Chu, Lian Yan, Yi Guan",2023.12.8,"Multi-agent debate systems are designed to derive accurate and consistent conclusions through adversarial interactions among agents. However, these systems often encounter challenges due to cognitive constraints, manifesting as (1) agents' obstinate adherence to incorrect viewpoints and (2) their propensity to abandon correct viewpoints. These issues are primarily responsible for the ineffectiveness of such debates. Addressing the challenge of cognitive constraints, we introduce a novel framework, the Multi-Agent Debate with Retrieval Augmented (MADRA). MADRA incorporates retrieval of prior knowledge into the debate process, effectively breaking cognitive constraints and enhancing the agents' reasoning capabilities. Furthermore, we have developed a self-selection module within this framework, enabling agents to autonomously select pertinent evidence, thereby minimizing the impact of irrelevant or noisy data. We have comprehensively tested and analyzed MADRA across six diverse datasets. The experimental results demonstrate that our approach significantly enhances performance across various tasks, proving the effectiveness of our proposed method.",https://arxiv.org/abs/2312.04854,Communication,Computation and Language (cs.CL),apollo's_oracle_retrieval-augmented_reasoning_20231208,"Harbin Institute of Technology, Sun Yat-sen University, Zhejiang University" +Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks,"Siyu Li, Jin Yang, Kui Zhao",2023.7.19,"As the capabilities of Large Language Models (LLMs) emerge, they not only assist in accomplishing traditional tasks within more efficient paradigms but also stimulate the evolution of social bots. Researchers have begun exploring the implementation of LLMs as the driving core of social bots, enabling more efficient and user-friendly completion of tasks like profile completion, social behavior decision-making, and social content generation. However, there is currently a lack of systematic research on the behavioral characteristics of LLMs-driven social bots and their impact on social networks. We have curated data from Chirper, a Twitter-like social network populated by LLMs-driven social bots and embarked on an exploratory study. Our findings indicate that: (1) LLMs-driven social bots possess enhanced individual-level camouflage while exhibiting certain collective characteristics; (2) these bots have the ability to exert influence on online communities through toxic behaviors; (3) existing detection methods are applicable to the activity environment of LLMs-driven social bots but may be subject to certain limitations in effectiveness. Moreover, we have organized the data collected in our study into the Masquerade-23 dataset, which we have publicly released, thus addressing the data void in the subfield of LLMs-driven social bots behavior datasets. Our research outcomes provide primary insights for the research and governance of LLMs-driven social bots within the research community.",https://arxiv.org/abs/2307.10337,Simulation,Social and Information Networks (cs.SI),are_you_in_a_20230719,Sichuan University +ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator,"Junda Zhu, Lingyong Yan, Haibo Shi, Dawei Yin, Lei Sha",2024.5.28,"Large language models (LLMs) are proven to +benefit a lot from retrieval-augmented genera- +tion (RAG) in alleviating hallucinations con- +fronted with knowledge-intensive questions. +RAG adopts information retrieval techniques +to inject external knowledge from semantic- +relevant documents as input contexts. How- +ever, due to today’s Internet being flooded with +numerous noisy and fabricating content, it is +inevitable that RAG systems are vulnerable +to these noises and prone to respond incor- +rectly. To this end, we propose to optimize +the retrieval-augmented GENERATOR with a +Adversarial Tuning Multi-agent system (ATM). +The ATM steers the GENERATOR to have a ro- +bust perspective of useful documents for ques- +tion answering with the help of an auxiliary +ATTACKER agent. The GENERATOR and the +ATTACKER are tuned adversarially for several +iterations. After rounds of multi-agent itera- +tive tuning, the GENERATOR can eventually +better discriminate useful documents amongst +fabrications. The experimental results verify +the effectiveness of ATM and we also observe +that the GENERATOR can achieve better perfor- +mance compared to state-of-the-art baselines.",https://arxiv.org/abs/2405.18111,Communication,Computation and Language (cs.CL),atm_adversarial_tuning_multi-agent_20240528,"Beihang University, Baidu Inc." +Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions,"Ruochen Zhao, Wenxuan Zhang, Yew Ken Chia, Deli Zhao, Lidong Bing",2024.5.30,"As LLMs evolve on a daily basis, there is an urgent need for a trustworthy evaluation +method that can provide robust evaluation results in a timely fashion. Currently, +as static benchmarks are prone to contamination concerns, users tend to trust +human voting platforms, such as Chatbot Arena. However, human annotations +require extensive manual efforts. To provide an automatic, robust, and trustworthy +evaluation framework, we innovatively propose the Auto-Arena of LLMs, which +automates the entire evaluation process with LLM agents. Firstly, an examiner +LLM devises queries. Then, a pair of candidate LLMs engage in a multi-round peer- +battle around the query, during which the LLM’s true performance gaps become +visible. Finally, a committee of LLM judges collectively discuss and determine the +winner, which alleviates bias and promotes fairness. In our extensive experiment +on the 17 newest LLMs, Auto-Arena shows the highest correlation with human +preferences, providing a promising alternative to human evaluation platforms.",https://arxiv.org/abs/2405.20267,Communication,Computation and Language (cs.CL),auto_arena_of_llms_20240530,"Nanyang Technological University, Alibaba Group, Singapore University of Technology and Design" +AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,"Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, Chi Wang",2023.8.16,"AutoGen2 is an open-source framework that allows developers to build LLM ap- +plications via multiple agents that can converse with each other to accomplish +tasks. AutoGen agents are customizable, conversable, and can operate in vari- +ous modes that employ combinations of LLMs, human inputs, and tools. Using +AutoGen, developers can also flexibly define agent interaction behaviors. Both +natural language and computer code can be used to program flexible conversation +patterns for different applications. AutoGen serves as a generic framework for +building diverse applications of various complexities and LLM capacities. Em- +pirical studies demonstrate the effectiveness of the framework in many example +applications, with domains ranging from mathematics, coding, question answer- +ing, operations research, online decision-making, entertainment, etc.",https://arxiv.org/abs/2308.08155,Organization,Artificial Intelligence (cs.AI),autogen_enabling_next-gen_llm_20230816,"Microsoft Research, Pennsylvania State University, University of Washington, Xidian University" +Autonomous Agents for Collaborative Task under Information Asymmetry,"Wei Liu, Chenxi Wang, Yifei Wang, Zihao Xie, Rennai Qiu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Chen Qian",2024.6.21,"Large Language Model Multi-Agent Systems (LLM-MAS) have achieved great +progress in solving complex tasks. It performs communication among agents within +the system to collaboratively solve tasks, under the premise of shared information. +However, when agents’ communication is leveraged to enhance human cooperation, +a new challenge arises due to information asymmetry, since each agent can only +access the information of its human user. Previous MAS struggle to complete tasks +under this condition. To address this, we propose a new MAS paradigm termed +iAgents, which denotes Informative Multi-Agent Systems. In iAgents, the human +social network is mirrored in the agent network, where agents proactively exchange +human information necessary for task resolution, thereby overcoming information +asymmetry. iAgents employs a novel agent reasoning mechanism, InfoNav, to +navigate agents’ communication towards effective information exchange. Together +with InfoNav, iAgents organizes human information in a mixed memory to provide +agents with accurate and comprehensive information for exchange. Additionally, +we introduce InformativeBench, the first benchmark tailored for evaluating LLM +agents’ task-solving ability under information asymmetry. Experimental results +show that iAgents can collaborate within a social network of 140 individuals +and 588 relationships, autonomously communicate over 30 turns, and retrieve +information from nearly 70,000 messages to complete tasks within 3 minutes.",https://arxiv.org/abs/2406.14928,Communication,Artificial Intelligence (cs.AI),autonomous_agents_for_collaborative_20240621,"Tsinghua University, Beijing University of Posts and Telecommunications" +Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation,"Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang",2023.10.2,"Recent breakthroughs in large language models (LLMs) have brought remark- +able success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption +is that the information processed by LLMs is consistently honest, neglecting the +pervasive deceptive or misleading information in human society and AI-generated +content. +This oversight makes LLMs susceptible to malicious manipulations, +potentially resulting in detrimental outcomes. This study utilizes the intricate +Avalon game as a testbed to explore LLMs’ potential in deceptive environments. +Avalon, full of misinformation and requiring sophisticated logic, manifests as a +“Game-of-Thoughts”. Inspired by the efficacy of humans’ recursive thinking and +perspective-taking in the Avalon game, we introduce a novel framework, Recur- +sive Contemplation (ReCon), to enhance LLMs’ ability to identify and counteract +deceptive information. ReCon combines formulation and refinement contempla- +tion processes; formulation contemplation produces initial thoughts and speech, +while refinement contemplation further polishes them. Additionally, we incor- +porate first-order and second-order perspective transitions into these processes +respectively. Specifically, the first-order allows an LLM agent to infer others’ +mental states, and the second-order involves understanding how others perceive +the agent’s mental state.......",https://arxiv.org/abs/2310.01320,Communication,Artificial Intelligence (cs.AI),avalon's_game_of_thoughts_20231002,"Tsinghua University, BIGAI, Technical University of Munich" +Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation,"Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang",2023.10.2,"Recent breakthroughs in large language models (LLMs) have brought remark- +able success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption +is that the information processed by LLMs is consistently honest, neglecting the +pervasive deceptive or misleading information in human society and AI-generated +content. +This oversight makes LLMs susceptible to malicious manipulations, +potentially resulting in detrimental outcomes. This study utilizes the intricate +Avalon game as a testbed to explore LLMs’ potential in deceptive environments. +Avalon, full of misinformation and requiring sophisticated logic, manifests as a +“Game-of-Thoughts”. Inspired by the efficacy of humans’ recursive thinking and +perspective-taking in the Avalon game, we introduce a novel framework, Recur- +sive Contemplation (ReCon), to enhance LLMs’ ability to identify and counteract +deceptive information. ReCon combines formulation and refinement contempla- +tion processes; formulation contemplation produces initial thoughts and speech, +while refinement contemplation further polishes them. Additionally, we incor- +porate first-order and second-order perspective transitions into these processes +respectively. Specifically, the first-order allows an LLM agent to infer others’ +mental states, and the second-order involves understanding how others perceive +the agent’s mental state.......",https://arxiv.org/abs/2310.01320,Organization,Artificial Intelligence (cs.AI),avalon's_game_of_thoughts_20231002,"Tsinghua University, BIGAI, Technical University of Munich" +BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis,"Shuhang Lin, Wenyue Hua, Lingyao Li, Che-Jui Chang, Lizhou Fan, Jianchao Ji, Hang Hua, Mingyu Jin, Jiebo Luo, Yongfeng Zhang",2024.4.23,"This paper presents BattleAgent, a detailed emulation demonstration system that +combines the Large Vision-Language Model (VLM) and Multi-Agent System +(MAS). This novel system aims to simulate complex dynamic interactions among +multiple agents, as well as between agents and their environments, over a period of +time. It emulates both the decision-making processes of leaders and the viewpoints +of ordinary participants, such as soldiers. The emulation showcases the current +capabilities of agents, featuring fine-grained multi-modal interactions between +agents and landscapes. It develops customizable agent structures to meet specific +situational requirements, for example, a variety of battle-related activities like +scouting and trench digging. These components collaborate to recreate historical +events in a lively and comprehensive manner while offering insights into the +thoughts and feelings of individuals from diverse viewpoints. The technological +foundations of BattleAgent establish detailed and immersive settings for historical +battles, enabling individual agents to partake in, observe, and dynamically respond +to evolving battle scenarios. This methodology holds the potential to substantially +deepen our understanding of historical events, particularly through individual +accounts. Such initiatives can also aid historical research, as conventional historical +narratives often lack documentation and prioritize the perspectives of decision- +makers, thereby overlooking the experiences of ordinary individuals. This biased +documentation results in a considerable gap in our historical understanding, as many +stories remain untold......",https://arxiv.org/abs/2404.15532,Simulation,Human-Computer Interaction (cs.HC),battleagent_multi-modal_dynamic_emulation_20240423,"Rutgers University, University of Michigan, University of Rochester" +Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication,"Weize Chen, Chenfei Yuan, Jiarui Yuan, Yusheng Su, Chen Qian, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun",2024.2.28,"Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reasoning and multi-agent communication, has not been thoroughly examined. In this work, we challenge the default use of NL by exploring the utility of non-NL formats in these contexts. We show that allowing LLMs to autonomously select the most suitable format before reasoning or communicating leads to a 3.3 to 5.7\% improvement in reasoning efficiency for different LLMs, and up to a 72.7\% reduction in token usage in multi-agent communication, all while maintaining communicative effectiveness. Our comprehensive analysis further reveals that LLMs can devise a format from limited task instructions and that the devised format is effectively transferable across different LLMs. Intriguingly, the structured communication format decided by LLMs exhibits notable parallels with established agent communication languages, suggesting a natural evolution towards efficient, structured communication in agent communication.",https://arxiv.org/abs/2402.18439,Communication,Computation and Language (cs.CL),beyond_natural_language_llms_20240228,"Tsinghua University, Tencent, Beijing University of Posts and Telecommunications" +Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication,"Weize Chen, Chenfei Yuan, Jiarui Yuan, Yusheng Su, Chen Qian, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun",2024.2.28,"Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reasoning and multi-agent communication, has not been thoroughly examined. In this work, we challenge the default use of NL by exploring the utility of non-NL formats in these contexts. We show that allowing LLMs to autonomously select the most suitable format before reasoning or communicating leads to a 3.3 to 5.7\% improvement in reasoning efficiency for different LLMs, and up to a 72.7\% reduction in token usage in multi-agent communication, all while maintaining communicative effectiveness. Our comprehensive analysis further reveals that LLMs can devise a format from limited task instructions and that the devised format is effectively transferable across different LLMs. Intriguingly, the structured communication format decided by LLMs exhibits notable parallels with established agent communication languages, suggesting a natural evolution towards efficient, structured communication in agent communication.",https://arxiv.org/abs/2402.18439,Evolution,Computation and Language (cs.CL),beyond_natural_language_llms_20240228,"Tsinghua University, Tencent, Beijing University of Posts and Telecommunications" +Building Cooperative Embodied Agents Modularly with Large Language Models,"Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan",2023.7.5,"In this work, we address challenging multi-agent cooperation problems with de- +centralized control, raw sensory observations, costly communication, and multi- +objective tasks instantiated in various embodied environments. While previous re- +search either presupposes a cost-free communication channel or relies on a central- +ized controller with shared observations, we harness the commonsense knowledge, +reasoning ability, language comprehension, and text generation prowess of LLMs +and seamlessly incorporate them into a cognitive-inspired modular framework that +integrates with perception, memory, and execution. Thus building a Cooperative +Embodied Language Agent CoELA, who can plan, communicate, and cooperate +with others to accomplish long-horizon tasks efficiently. Our experiments on C- +WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong +planning-based methods and exhibit emergent effective communication. Though +current Open LMs like LLAMA-2 still underperform, we fine-tune a CoLLAMA +with data collected with our agents and show how they can achieve promising +performance. We also conducted a user study for human-agent interaction and +discovered that CoELA communicating in natural language can earn more trust and +cooperate more effectively with humans. Our research underscores the potential of +LLMs for future research in multi-agent cooperation. Videos can be found on the +project website https://vis-www.cs.umass.edu/Co-LLM-Agents/.",https://arxiv.org/abs/2307.02485,Communication,Artificial Intelligence (cs.AI),building_cooperative_embodied_agents_20230705,"University of Massachusetts Amherst, Tsinghua University, Shanghai Jiao Tong University, MIT, MIT-IBM Watson AI Lab" +"CAMEL: Communicative Agents for ""Mind"" Exploration of Large Language Model Society","Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, Bernard Ghanem",2023.3.31,"The rapid advancement of chat-based language models has led to remarkable +progress in complex task-solving. However, their success heavily relies on human +input to guide the conversation, which can be challenging and time-consuming. +This paper explores the potential of building scalable techniques to facilitate au- +tonomous cooperation among communicative agents, and provides insight into +their “cognitive” processes. To address the challenges of achieving autonomous +cooperation, we propose a novel communicative agent framework named role- +playing . Our approach involves using inception prompting to guide chat agents +toward task completion while maintaining consistency with human intentions. We +showcase how role-playing can be used to generate conversational data for studying +the behaviors and capabilities of a society of agents, providing a valuable resource +for investigating conversational language models. In particular, we conduct com- +prehensive studies on instruction-following cooperation in multi-agent settings. +Our contributions include introducing a novel communicative agent framework, +offering a scalable approach for studying the cooperative behaviors and capabili- +ties of multi-agent systems, and open-sourcing our library to support research on +communicative agents and beyond: https://github.com/camel-ai/camel.",https://arxiv.org/abs/2303.17760,Communication,Artificial Intelligence (cs.AI),camel_communicative_agents_for_20230331,King Abdullah University of Science and Technology +Can Large Language Model Agents Simulate Human Trust Behaviors?,"Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Kai Shu, Adel Bibi, Ziniu Hu, Philip Torr, Bernard Ghanem, Guohao Li",2024.2.7,"Large Language Model (LLM) agents have been +increasingly adopted as simulation tools to model +humans in applications such as social science. +However, one fundamental question remains: can +LLM agents really simulate human behaviors? In +this paper, we focus on one of the most critical +behaviors in human interactions, trust, and aim to +investigate whether or not LLM agents can sim- +ulate human trust behaviors. We first find that +LLM agents generally exhibit trust behaviors, re- +ferred to as agent trust, under the framework of +Trust Games, which are widely recognized in be- +havioral economics. Then, we discover that LLM +agents can have high behavioral alignment with +humans regarding trust behaviors, particularly for +GPT-4, indicating the feasibility to simulate hu- +man trust behaviors with LLM agents. In addition, +we probe into the biases in agent trust and the +differences in agent trust towards agents and hu- +mans. We also explore the intrinsic properties of +agent trust under conditions including advanced +reasoning strategies and external manipulations. +We further offer important implications of our +discoveries for various scenarios where trust is +paramount. Our study provides new insights into +the behaviors of LLM agents and the fundamental +analogy between LLMs and humans.",https://arxiv.org/abs/2402.04559,Simulation,Artificial Intelligence (cs.AI),can_large_language_model_20240207,"KAUST, Illinois Institute of Technology, Pennsylvania State University, The University of Chicago, University of Oxford, California Institute of Technology" +Chain of Agents: Large Language Models Collaborating on Long-Context Tasks,"Yusen Zhang, Ruoxi Sun, Yanfei Chen, Tomas Pfister, Rui Zhang, Sercan Ö. Arik",2024.6.4,"Addressing the challenge of effectively processing long contexts has become a critical issue for Large Language Models (LLMs). Two common strategies have emerged: 1) reducing the input length, such as retrieving relevant chunks by Retrieval-Augmented Generation (RAG), and 2) expanding the context window limit of LLMs. However, both strategies have drawbacks: input reduction has no guarantee of covering the part with needed information, while window extension struggles with focusing on the pertinent information for solving the task. To mitigate these limitations, we propose Chain-of-Agents (CoA), a novel framework that harnesses multi-agent collaboration through natural language to enable information aggregation and context reasoning across various LLMs over long-context tasks. CoA consists of multiple worker agents who sequentially communicate to handle different segmented portions of the text, followed by a manager agent who synthesizes these contributions into a coherent final output. CoA processes the entire input by interleaving reading and reasoning, and it mitigates long context focus issues by assigning each agent a short context. We perform comprehensive evaluation of CoA on a wide range of long-context tasks in question answering, summarization, and code completion, demonstrating significant improvements by up to 10% over strong baselines of RAG, Full-Context, and multi-agent LLMs.",https://arxiv.org/abs/2406.02818,Organization,Computation and Language (cs.CL),chain_of_agents_large_20240604,"Penn State University, Google Cloud AI Research" +ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation,"Zejun Wang, Jia Li, Ge Li, Zhi Jin",2023.11.1,"Large language models have shown good performances in generat- +ing code to meet human requirements. However, human require- +ments expressed in natural languages can be vague, incomplete, +and ambiguous, leading large language models to misunderstand +human requirements and make mistakes. Worse, it is difficult for a +human user to refine the requirement. To help human users refine +their requirements and improve large language models’ code gen- +eration performances, we propose ChatCoder: a method to refine +the requirements via chatting with large language models. We de- +sign a chat scheme in which the large language models will guide +the human users to refine their expression of requirements to be +more precise, unambiguous, and complete than before. Experiments +show that ChatCoder has improved existing large language models’ +performance by a large margin. Besides, ChatCoder has the advan- +tage over refine-based methods and LLMs fine-tuned via human +response.",https://arxiv.org/abs/2311.00272,Organization,Software Engineering (cs.SE),chatcoder_chat-based_refine_requirement_20231101,Peking University +ChatDev: Communicative Agents for Software Development,"Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun",2023.7.16,"Software development is a complex task that +necessitates cooperation among multiple mem- +bers with diverse skills. Numerous studies used +deep learning to improve specific phases in a +waterfall model, such as design, coding, and +testing. +However, the deep learning model +in each phase requires unique designs, lead- +ing to technical inconsistencies across various +phases, which results in a fragmented and in- +effective development process. In this paper, +we introduce ChatDev, a chat-powered soft- +ware development framework in which special- +ized agents driven by large language models +(LLMs) are guided in what to communicate +(via chat chain) and how to communicate (via +communicative dehallucination). These agents +actively contribute to the design, coding, and +testing phases through unified language-based +communication, with solutions derived from +their multi-turn dialogues. We found their uti- +lization of natural language is advantageous +for system design, and communicating in pro- +gramming language proves helpful in debug- +ging. This paradigm demonstrates how linguis- +tic communication facilitates multi-agent col- +laboration, establishing language as a unify- +ing bridge for autonomous task-solving among +LLM agents. The code and data are available +at https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2307.07924,Communication,Software Engineering (cs.SE),chatdev_communicative_agents_for_20230716,"Tsinghua University, The University of Sydney, BUPT, Modelbest Inc." +ChatDev: Communicative Agents for Software Development,"Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun",2023.7.16,"Software development is a complex task that +necessitates cooperation among multiple mem- +bers with diverse skills. Numerous studies used +deep learning to improve specific phases in a +waterfall model, such as design, coding, and +testing. +However, the deep learning model +in each phase requires unique designs, lead- +ing to technical inconsistencies across various +phases, which results in a fragmented and in- +effective development process. In this paper, +we introduce ChatDev, a chat-powered soft- +ware development framework in which special- +ized agents driven by large language models +(LLMs) are guided in what to communicate +(via chat chain) and how to communicate (via +communicative dehallucination). These agents +actively contribute to the design, coding, and +testing phases through unified language-based +communication, with solutions derived from +their multi-turn dialogues. We found their uti- +lization of natural language is advantageous +for system design, and communicating in pro- +gramming language proves helpful in debug- +ging. This paradigm demonstrates how linguis- +tic communication facilitates multi-agent col- +laboration, establishing language as a unify- +ing bridge for autonomous task-solving among +LLM agents. The code and data are available +at https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2307.07924,Organization,Software Engineering (cs.SE),chatdev_communicative_agents_for_20230716,"Tsinghua University, The University of Sydney, BUPT, Modelbest Inc." +ChatDev: Communicative Agents for Software Development,"Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun",2023.7.16,"Software development is a complex task that +necessitates cooperation among multiple mem- +bers with diverse skills. Numerous studies used +deep learning to improve specific phases in a +waterfall model, such as design, coding, and +testing. +However, the deep learning model +in each phase requires unique designs, lead- +ing to technical inconsistencies across various +phases, which results in a fragmented and in- +effective development process. In this paper, +we introduce ChatDev, a chat-powered soft- +ware development framework in which special- +ized agents driven by large language models +(LLMs) are guided in what to communicate +(via chat chain) and how to communicate (via +communicative dehallucination). These agents +actively contribute to the design, coding, and +testing phases through unified language-based +communication, with solutions derived from +their multi-turn dialogues. We found their uti- +lization of natural language is advantageous +for system design, and communicating in pro- +gramming language proves helpful in debug- +ging. This paradigm demonstrates how linguis- +tic communication facilitates multi-agent col- +laboration, establishing language as a unify- +ing bridge for autonomous task-solving among +LLM agents. The code and data are available +at https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2307.07924,Simulation,Software Engineering (cs.SE),chatdev_communicative_agents_for_20230716,"Tsinghua University, The University of Sydney, BUPT, Modelbest Inc." +ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate,"Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, Zhiyuan Liu",2023.8.14,"Text evaluation has historically posed significant challenges, often demanding +substantial labor and time cost. With the emergence of large language models +(LLMs), researchers have explored LLMs’ potential as alternatives for human +evaluation. While these single-agent-based approaches show promise, experi- +mental results suggest that further advancements are needed to bridge the gap +between their current effectiveness and human-level evaluation quality. Recog- +nizing that best practices of human evaluation processes often involve multiple +human annotators collaborating in the evaluation, we resort to a multi-agent debate +framework, moving beyond single-agent prompting strategies. The multi-agent- +based approach enables a group of LLMs to synergize with an array of intelli- +gent counterparts, harnessing their distinct capabilities and expertise to enhance +efficiency and effectiveness in handling intricate tasks. In this paper, we con- +struct a multi-agent referee team called ChatEval to autonomously discuss and +evaluate the quality of generated responses from different models on open-ended +questions and traditional natural language generation (NLG) tasks. We derive +insights and lessons from practical scenarios where humans instigate group dis- +cussions for brainstorming and propose different communication strategies within +ChatEval......",https://arxiv.org/abs/2308.07201,Organization,Computation and Language (cs.CL),chateval_towards_better_llm-based_20230814,"Tsinghua University, Hong Kong University of Science and Technology, Peking University" +"CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving","Pei Chen, Boran Han, Shuai Zhang",2024.4.26,"Large Language Models (LLMs) have shown +great ability in solving traditional natural lan- +guage tasks and elementary reasoning tasks +with appropriate prompting techniques. How- +ever, their ability is still limited in solving com- +plicated science problems. In this work, we +aim to push the upper bound of the reason- +ing capability of LLMs by proposing a col- +laborative multi-agent, multi-reasoning-path +(CoMM) prompting framework. Specifically, +we prompt LLMs to play different roles in a +problem-solving team, and encourage differ- +ent role-play agents to collaboratively solve +the target task. In particular, we discover that +applying different reasoning paths for differ- +ent roles is an effective strategy to implement +few-shot prompting approaches in the multi- +agent scenarios. Empirical results demonstrate +the effectiveness of the proposed methods on +two college-level science problems over com- +petitive baselines. Our further analysis shows +the necessity of prompting LLMs to play dif- +ferent roles or experts independently. We re- +lease the code at: https://github.com/ +amazon-science/comm-prompt.",https://arxiv.org/abs/2404.17729,Organization,Computation and Language (cs.CL),"comm_collaborative_multi-agent,_multi-reasoning-path_20240426","Texas A&M University, Amazon Web Services" +CompeteAI: Understanding the Competition Dynamics in Large Language Model-based Agents,"Qinlin Zhao, Jindong Wang, Yixuan Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, Xing Xie",2023.10.26,"Large language models (LLMs) have been widely +used as agents to complete different tasks, such +as personal assistance or event planning. While +most of the work has focused on cooperation +and collaboration between agents, little work +explores competition, another important mech- +anism that promotes the development of soci- +ety and economy. In this paper, we seek to ex- +amine the competition dynamics in LLM-based +agents. We first propose a general framework for +studying the competition between agents. Then, +we implement a practical competitive environ- +ment using GPT-4 to simulate a virtual town with +two types of agents, including restaurant agents +and customer agents. Specifically, the restaurant +agents compete with each other to attract more +customers, where competition encourages them +to transform, such as cultivating new operating +strategies. Simulation experiments reveal several +interesting findings at the micro and macro lev- +els, which align well with existing market and +sociological theories. We hope that the frame- +work and environment can be a promising testbed +to study the competition that fosters understand- +ing of society. Code is available at: https: +//github.com/microsoft/competeai.",https://arxiv.org/abs/2310.17512,Simulation,Artificial Intelligence (cs.AI),competeai_understanding_the_competition_20231026,"University of Science and Technology of China, Microsoft Research, William & Mary, Georgia Institute of Technology, Carnegie Mellon University" +"Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents","Zihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian Ma, Yitao Liang",2023.2.3,"We investigate the challenge of task planning for multi-task embodied agents in open-world environments. Two main difficulties are identified: 1) executing plans in an open-world environment (e.g., Minecraft) necessitates accurate and multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla planners do not consider how easy the current agent can achieve a given sub-task when ordering parallel sub-goals within a complicated plan, the resulting plan could be inefficient or even infeasible. To this end, we propose ""Describe, Explain, Plan and Select"" (DEPS), an interactive planning approach based on Large Language Models (LLMs). DEPS facilitates better error correction on initial LLM-generated plan by integrating description of the plan execution process and providing self-explanation of feedback when encountering failures during the extended planning phases. Furthermore, it includes a goal selector, which is a trainable module that ranks parallel candidate sub-goals based on the estimated steps of completion, consequently refining the initial plan. Our experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks and nearly double the overall performances. Further testing reveals our method's general effectiveness in popularly adopted non-open-ended domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and exploratory studies detail how our design beats the counterparts and provide a promising update on the 𝙾𝚋𝚝𝚊𝚒𝚗𝙳𝚒𝚊𝚖𝚘𝚗𝚍 grand challenge with our approach.",https://arxiv.org/abs/2302.01560,Organization,Artificial Intelligence (cs.AI),"describe,_explain,_plan_and_20230203","Peking University, University of California Los Angeles, Beijing Institute for General Artificial Intelligence" +Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization,"Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang",2023.10.3,"Large language model (LLM) agents have been shown effective on a wide range +of tasks, and by ensembling multiple LLM agents, their performances could be +further improved. Existing approaches employ a fixed set of agents to interact +with each other in a static architecture, which limits their generalizability to vari- +ous tasks and requires strong human prior in designing these agents. In this work, +we propose to construct a strategic team of agents communicating in a dynamic +interaction architecture based on the task query. Specifically, we build a frame- +work named Dynamic LLM-Agent Network (DyLAN) for LLM-agent collabora- +tion on complicated tasks like reasoning and code generation. DyLAN enables +agents to interact for multiple rounds in a dynamic architecture with inference- +time agent selection and an early-stopping mechanism to improve performance +and efficiency. We further design an automatic agent team optimization algorithm +based on an unsupervised metric termed Agent Importance Score, enabling the +selection of best agents based on the contribution each agent makes. Empirically, +we demonstrate that DyLAN performs well in both reasoning and code generation +tasks with reasonable computational cost. DyLAN achieves 1",https://arxiv.org/abs/2310.02170,Organization,Computation and Language (cs.CL),dynamic_llm-agent_network_an_20231003,"Tsinghua University, Georgia Tech, Stanford University" +Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization,"Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang",2023.10.3,"Large language model (LLM) agents have been shown effective on a wide range +of tasks, and by ensembling multiple LLM agents, their performances could be +further improved. Existing approaches employ a fixed set of agents to interact +with each other in a static architecture, which limits their generalizability to vari- +ous tasks and requires strong human prior in designing these agents. In this work, +we propose to construct a strategic team of agents communicating in a dynamic +interaction architecture based on the task query. Specifically, we build a frame- +work named Dynamic LLM-Agent Network (DyLAN) for LLM-agent collabora- +tion on complicated tasks like reasoning and code generation. DyLAN enables +agents to interact for multiple rounds in a dynamic architecture with inference- +time agent selection and an early-stopping mechanism to improve performance +and efficiency. We further design an automatic agent team optimization algorithm +based on an unsupervised metric termed Agent Importance Score, enabling the +selection of best agents based on the contribution each agent makes. Empirically, +we demonstrate that DyLAN performs well in both reasoning and code generation +tasks with reasonable computational cost. DyLAN achieves 1",https://arxiv.org/abs/2310.02170,Evolution,Computation and Language (cs.CL),dynamic_llm-agent_network_an_20231003,"Tsinghua University, Georgia Tech, Stanford University" +EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities,"Nian Li, Chen Gao, Mingyu Li, Yong Li, Qingmin Liao",2023.10.16,"The advent of artificial intelligence has led to a +growing emphasis on data-driven modeling in +macroeconomics, with agent-based modeling +(ABM) emerging as a prominent bottom-up +simulation paradigm. In ABM, agents (e.g., +households, firms) interact within a macroe- +conomic environment, collectively generating +market dynamics. Existing agent modeling typ- +ically employs predetermined rules or learning- +based neural networks for decision-making. +However, customizing each agent presents sig- +nificant challenges, complicating the modeling +of agent heterogeneity. Additionally, the in- +fluence of multi-period market dynamics and +multifaceted macroeconomic factors are often +overlooked in decision-making processes. In +this work, we introduce EconAgent, a large +language model-empowered agent with human- +like characteristics for macroeconomic simu- +lation. We first construct a simulation envi- +ronment that incorporates various market dy- +namics driven by agents’ decisions regarding +work and consumption. Through the perception +module, we create heterogeneous agents with +distinct decision-making mechanisms. +Fur- +thermore, we model the impact of macroeco- +nomic trends using a memory module, which +allows agents to reflect on past individual ex- +periences and market dynamics. Simulation +experiments show that EconAgent can make +realistic decisions, leading to more reasonable +macroeconomic phenomena compared to exist- +ing rule-based or learning-based agents. Our +codes are released at https://github.com/ +tsinghua-fib-lab/ACL24-EconAgent.",https://arxiv.org/abs/2310.10436,Organization,Artificial Intelligence (cs.AI),econagent_large_language_model-empowered_20231016,Tsinghua University +EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities,"Nian Li, Chen Gao, Mingyu Li, Yong Li, Qingmin Liao",2023.10.16,"The advent of artificial intelligence has led to a +growing emphasis on data-driven modeling in +macroeconomics, with agent-based modeling +(ABM) emerging as a prominent bottom-up +simulation paradigm. In ABM, agents (e.g., +households, firms) interact within a macroe- +conomic environment, collectively generating +market dynamics. Existing agent modeling typ- +ically employs predetermined rules or learning- +based neural networks for decision-making. +However, customizing each agent presents sig- +nificant challenges, complicating the modeling +of agent heterogeneity. Additionally, the in- +fluence of multi-period market dynamics and +multifaceted macroeconomic factors are often +overlooked in decision-making processes. In +this work, we introduce EconAgent, a large +language model-empowered agent with human- +like characteristics for macroeconomic simu- +lation. We first construct a simulation envi- +ronment that incorporates various market dy- +namics driven by agents’ decisions regarding +work and consumption. Through the perception +module, we create heterogeneous agents with +distinct decision-making mechanisms. +Fur- +thermore, we model the impact of macroeco- +nomic trends using a memory module, which +allows agents to reflect on past individual ex- +periences and market dynamics. Simulation +experiments show that EconAgent can make +realistic decisions, leading to more reasonable +macroeconomic phenomena compared to exist- +ing rule-based or learning-based agents. Our +codes are released at https://github.com/ +tsinghua-fib-lab/ACL24-EconAgent.",https://arxiv.org/abs/2310.10436,Simulation,Artificial Intelligence (cs.AI),econagent_large_language_model-empowered_20231016,Tsinghua University +Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate,"Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, Shuming Shi",2023.5.30,"Modern large language models (LLMs) like +ChatGPT have shown remarkable performance +on general language tasks but still struggle on +complex reasoning tasks, which drives the re- +search on cognitive behaviors of LLMs to ex- +plore human-like problem-solving strategies. +Along this direction, one representative strat- +egy is self-reflection, which asks an LLM to +refine the solution with the feedback gener- +ated by itself iteratively. However, our study +shows that such reflection-style methods suf- +fer from the Degeneration-of-Thought (DoT) +problem: once the LLM has established confi- +dence in its solutions, it is unable to generate +novel thoughts later through reflection even if +its initial stance is incorrect. To address the +DoT problem, we propose a Multi-Agent De- +bate (MAD) framework, in which multiple +agents express their arguments in the state of +“tit for tat” and a judge manages the debate +process to obtain a final solution. Clearly, our +MAD framework encourages divergent think- +ing in LLMs which would be helpful for tasks +that require deep levels of contemplation. Ex- +periment results on two challenging datasets, +commonsense machine translation and counter- +intuitive arithmetic reasoning, demonstrate the +effectiveness of our MAD framework. Exten- +sive analyses suggest that the adaptive break of +debate and the modest level of “tit for tat” state +are required for MAD to obtain good perfor- +mance. Moreover, we find that LLMs might not +be a fair judge if different LLMs are used for +agents. Code is available at https://github. +com/Skytliang/Multi-Agents-Debate.",https://arxiv.org/abs/2305.19118,Communication,Computation and Language (cs.CL),encouraging_divergent_thinking_in_20230530,"Tsinghua University, Shanghai Jiao Tong University, Tencent AI Lab" +Epidemic Modeling with Generative Agents,"Ross Williams, Niyousha Hosseinichimeh, Aritra Majumdar, Navid Ghaffarzadegan",2023.7.11,"This study offers a new paradigm of individual-level modeling to address the grand challenge of +incorporating human behavior in epidemic models. Using generative artificial intelligence in an +agent-based epidemic model, each agent is empowered to make its own reasonings and decisions +via connecting to a large language model such as ChatGPT. Through various simulation +experiments, we present compelling evidence that generative agents mimic real-world behaviors +such as quarantining when sick and self-isolation when cases rise. Collectively, the agents +demonstrate patterns akin to multiple waves observed in recent pandemics followed by an +endemic period. Moreover, the agents successfully flatten the epidemic curve. This study creates +potential to improve dynamic system modeling by offering a way to represent human brain, +reasoning, and decision making.",https://arxiv.org/abs/2307.04986,Simulation,Artificial Intelligence (cs.AI),epidemic_modeling_with_generative_20230711,Virginia Tech +Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate,"Kai Xiong, Xiao Ding, Yixin Cao, Ting Liu, Bing Qin",2023.5.19,"Large Language Models (LLMs) have shown +impressive capabilities in various applications, +but they still face various inconsistency issues. +Existing works primarily focus on the incon- +sistency issues within a single LLM, while we +complementarily explore the inter-consistency +among multiple LLMs for collaboration. To +examine whether LLMs can collaborate effec- +tively to achieve a consensus for a shared goal, +we focus on commonsense reasoning, and in- +troduce a formal debate framework (FORD) +to conduct a three-stage debate among LLMs +with real-world scenarios alignment: fair de- +bate, mismatched debate, and roundtable de- +bate. Through extensive experiments on var- +ious datasets, LLMs can effectively collabo- +rate to reach a consensus despite noticeable +inter-inconsistencies, but imbalances in their +abilities can lead to domination by superior +LLMs. Leveraging a more advanced LLM like +GPT-4 as an authoritative judge can boost col- +laboration performance. Our work contributes +to understanding the inter-consistency among +LLMs and lays the foundation for develop- +ing future collaboration methods. Codes and +data are available at https://github.com/Waste- +Wood/FORD.",https://arxiv.org/abs/2305.11595,Communication,Computation and Language (cs.CL),examining_inter-consistency_of_large_20230519,"Harbin Institute of Technology, Singapore Management University" +Experiential Co-Learning of Software-Developing Agents,"Chen Qian, Yufan Dang, Jiahao Li, Wei Liu, Zihao Xie, Yifei Wang, Weize Chen, Cheng Yang, Xin Cong, Xiaoyin Che, Zhiyuan Liu, Maosong Sun",2023.12.28,"Recent advancements in large language mod- +els (LLMs) have brought significant changes +to various domains, especially through LLM- +driven autonomous agents. A representative +scenario is in software development, where +LLM agents demonstrate efficient collabora- +tion, task division, and assurance of software +quality, markedly reducing the need for man- +ual involvement. However, these agents fre- +quently perform a variety of tasks indepen- +dently, without benefiting from past experi- +ences, which leads to repeated mistakes and +inefficient attempts in multi-step task execu- +tion. To this end, we introduce Experiential Co- +Learning, a novel LLM-agent learning frame- +work in which instructor and assistant agents +gather shortcut-oriented experiences from their +historical trajectories and use these past expe- +riences for future task execution. The exten- +sive experiments demonstrate that the frame- +work enables agents to tackle unseen software- +developing tasks more effectively. We antici- +pate that our insights will guide LLM agents +towards enhanced autonomy and contribute +to their evolutionary growth in cooperative +learning. The code and data are available at +https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2312.17025,Evolution,Computation and Language (cs.CL),experiential_co-learning_of_software-developing_20231228,"Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens" +Experiential Co-Learning of Software-Developing Agents,"Chen Qian, Yufan Dang, Jiahao Li, Wei Liu, Zihao Xie, Yifei Wang, Weize Chen, Cheng Yang, Xin Cong, Xiaoyin Che, Zhiyuan Liu, Maosong Sun",2023.12.28,"Recent advancements in large language mod- +els (LLMs) have brought significant changes +to various domains, especially through LLM- +driven autonomous agents. A representative +scenario is in software development, where +LLM agents demonstrate efficient collabora- +tion, task division, and assurance of software +quality, markedly reducing the need for man- +ual involvement. However, these agents fre- +quently perform a variety of tasks indepen- +dently, without benefiting from past experi- +ences, which leads to repeated mistakes and +inefficient attempts in multi-step task execu- +tion. To this end, we introduce Experiential Co- +Learning, a novel LLM-agent learning frame- +work in which instructor and assistant agents +gather shortcut-oriented experiences from their +historical trajectories and use these past expe- +riences for future task execution. The exten- +sive experiments demonstrate that the frame- +work enables agents to tackle unseen software- +developing tasks more effectively. We antici- +pate that our insights will guide LLM agents +towards enhanced autonomy and contribute +to their evolutionary growth in cooperative +learning. The code and data are available at +https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2312.17025,Organization,Computation and Language (cs.CL),experiential_co-learning_of_software-developing_20231228,"Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens" +Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View,"Jintian Zhang, Xin Xu, Ningyu Zhang, Ruibo Liu, Bryan Hooi, Shumin Deng",2023.10.3,"As Natural Language Processing (NLP) sys- +tems are increasingly employed in intricate so- +cial environments, a pressing query emerges: +Can these NLP systems mirror human-esque +collaborative intelligence, in a multi-agent so- +ciety consisting of multiple large language mod- +els (LLMs)? This paper probes the collabora- +tion mechanisms among contemporary NLP +systems by melding practical experiments with +theoretical insights. We fabricate four unique +‘societies’ comprised of LLM agents, where +each agent is characterized by a specific ‘trait’ +(easy-going or overconfident) and engages in +collaboration with a distinct ‘thinking pattern’ +(debate or reflection). +Through evaluating +these multi-agent societies on three benchmark +datasets, we discern that certain collaborative +strategies not only outshine previous top-tier +approaches but also optimize efficiency (using +fewer API tokens). Moreover, our results fur- +ther illustrate that LLM agents manifest human- +like social behaviors, such as conformity and +consensus reaching, mirroring foundational so- +cial psychology theories. In conclusion, we +integrate insights from social psychology to +contextualize the collaboration of LLM agents, +inspiring further investigations into the collab- +oration mechanism for LLMs. We have shared +our code and datasets1, hoping to catalyze fur- +ther research in this promising avenue.",https://arxiv.org/abs/2310.02124,Simulation,Computation and Language (cs.CL),exploring_collaboration_mechanisms_for_20231003,"Zhejiang University, National University of Singapore, NUS-NCS Joint Lab, Google DeepMind" +Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf,"Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, Yang Liu",2023.9.9,"Communication games, which we refer to as +incomplete information games that heavily de- +pend on natural language communication, hold +significant research value in fields such as eco- +nomics, social science, and artificial intelli- +gence. In this work, we explore the problem of +how to engage large language models (LLMs) +in communication games, and in response, pro- +pose a tuning-free framework. Our approach +keeps LLMs frozen, and relies on the retrieval +and reflection on past communications and ex- +periences for improvement. An empirical study +on the representative and widely-studied com- +munication game, “Werewolf”, demonstrates +that our framework can effectively play Were- +wolf game without tuning the parameters of the +LLMs. More importantly, strategic behaviors +begin to emerge in our experiments, suggest- +ing that it will be a fruitful journey to engage +LLMs in communication games and associated +domains.",https://arxiv.org/abs/2309.04658,Communication,Computation and Language (cs.CL),exploring_large_language_models_20230909,"Tsinghua University, Zhongguancun Laboratory" +Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf,"Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, Yang Liu",2023.9.9,"Communication games, which we refer to as +incomplete information games that heavily de- +pend on natural language communication, hold +significant research value in fields such as eco- +nomics, social science, and artificial intelli- +gence. In this work, we explore the problem of +how to engage large language models (LLMs) +in communication games, and in response, pro- +pose a tuning-free framework. Our approach +keeps LLMs frozen, and relies on the retrieval +and reflection on past communications and ex- +periences for improvement. An empirical study +on the representative and widely-studied com- +munication game, “Werewolf”, demonstrates +that our framework can effectively play Were- +wolf game without tuning the parameters of the +LLMs. More importantly, strategic behaviors +begin to emerge in our experiments, suggest- +ing that it will be a fruitful journey to engage +LLMs in communication games and associated +domains.",https://arxiv.org/abs/2309.04658,Organization,Computation and Language (cs.CL),exploring_large_language_models_20230909,"Tsinghua University, Zhongguancun Laboratory" +Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting,"Hongda Sun, Hongzhan Lin, Haiyu Yan, Chen Zhu, Yang Song, Xin Gao, Shuo Shang, Rui Yan",2024.5.28,"The emergence of online recruitment services has revolutionized +the traditional landscape of job seeking and recruitment, neces- +sitating the development of high-quality industrial applications +to improve person-job fitting. Existing methods generally rely on +modeling the latent semantics of resumes and job descriptions and +learning a matching function between them. Inspired by the pow- +erful role-playing capabilities of Large Language Models (LLMs), +we propose to introduce a mock interview process between LLM- +played interviewers and candidates. The mock interview conver- +sations can provide additional evidence for candidate evaluation, +thereby augmenting traditional person-job fitting based solely on +resumes and job descriptions. However, characterizing these two +roles in online recruitment still presents several challenges, such +as developing the skills to raise interview questions, formulating +appropriate answers, and evaluating two-sided fitness. +To this end, we propose MockLLM, a novel applicable framework +that divides the person-job matching process into two modules: +mock interview generation and two-sided evaluation in handshake +protocol, jointly enhancing their performance through collaborative +behaviors between interviewers and candidates. We design a role- +playing framework as a multi-role and multi-behavior paradigm +to enable a single LLM agent to effectively behave with multiple +functions for both parties......",https://arxiv.org/abs/2405.18113,Organization,Computation and Language (cs.CL),facilitating_multi-role_and_multi-behavior_20240528,"Renmin University of China, BOSS Zhipin, King Abdullah University of Science and Technology, University of Electronic Science and Technology of China" +GameGPT: Multi-agent Collaborative Framework for Game Development,"Dake Chen, Hanbin Wang, Yunhao Huo, Yuzhao Li, Haoyang Zhang",2023.10.12,"The large language model (LLM) based agents have demonstrated their capacity +to automate and expedite software development processes. In this paper, we +focus on game development and propose a multi-agent collaborative framework, +dubbed GameGPT, to automate game development. While many studies have +pinpointed hallucination as a primary roadblock for deploying LLMs in production, +we identify another concern: redundancy. Our framework presents a series of +methods to mitigate both concerns. These methods include dual collaboration and +layered approaches with several in-house lexicons, to mitigate the hallucination +and redundancy in the planning, task identification, and implementation phases. +Furthermore, a decoupling approach is also introduced to achieve code generation +with better precision.",https://arxiv.org/abs/2310.08067,Organization,Artificial Intelligence (cs.AI),gamegpt_multi-agent_collaborative_framework_20231012,"AutoGame Research, X-Institute, University of Southern California" +Generative Agents: Interactive Simulacra of Human Behavior,"Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein",2023.4.7,"Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.",https://arxiv.org/abs/2304.03442,Communication,Human-Computer Interaction (cs.HC),generative_agents_interactive_simulacra_20230407,"Stanford University, Google Research, Google DeepMind" +Generative Agents: Interactive Simulacra of Human Behavior,"Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein",2023.4.7,"Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.",https://arxiv.org/abs/2304.03442,Organization,Human-Computer Interaction (cs.HC),generative_agents_interactive_simulacra_20230407,"Stanford University, Google Research, Google DeepMind" +Generative Agents: Interactive Simulacra of Human Behavior,"Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein",2023.4.7,"Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.",https://arxiv.org/abs/2304.03442,Simulation,Human-Computer Interaction (cs.HC),generative_agents_interactive_simulacra_20230407,"Stanford University, Google Research, Google DeepMind" +Humanoid Agents: Platform for Simulating Human-like Generative Agents,"Zhilin Wang, Yu Ying Chiu, Yu Cheung Chiu",2023.10.9,"Just as computational simulations of atoms, molecules and cells have shaped the way we study the sciences, true-to-life simulations of human-like agents can be valuable tools for studying human behavior. We propose Humanoid Agents, a system that guides Generative Agents to behave more like humans by introducing three elements of System 1 processing: Basic needs (e.g. hunger, health and energy), Emotion and Closeness in Relationships. Humanoid Agents are able to use these dynamic elements to adapt their daily activities and conversations with other agents, as supported with empirical experiments. Our system is designed to be extensible to various settings, three of which we demonstrate, as well as to other elements influencing human behavior (e.g. empathy, moral values and cultural background). Our platform also includes a Unity WebGL game interface for visualization and an interactive analytics dashboard to show agent statuses over time.",https://arxiv.org/abs/2310.05418,Simulation,Computation and Language (cs.CL),humanoid_agents_platform_for_20231009,"University of Washington, NVIDIA, The University of Hong Kong" +Improving Factuality and Reasoning in Language Models through Multiagent Debate,"Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch",2023.5.23,"Large language models (LLMs) have demonstrated remarkable capabilities in +language generation, understanding, and few-shot learning in recent years. An +extensive body of work has explored how their performance may be further im- +proved through the tools of prompting, ranging from verification, self-consistency, +or intermediate scratchpads. In this paper, we present a complementary approach +to improve language responses where multiple language model instances propose +and debate their individual responses and reasoning processes over multiple rounds +to arrive at a common final answer. Our findings indicate that this approach +significantly enhances mathematical and strategic reasoning across a number of +tasks. We also demonstrate that our approach improves the factual validity of +generated content, reducing fallacious answers and hallucinations that contem- +porary models are prone to. Our approach may be directly applied to existing +black-box models and uses identical procedure and prompts for all tasks we inves- +tigate. Overall, our findings suggest that such ""society of minds"" approach has the +potential to significantly advance the capabilities of LLMs and pave the way for +further breakthroughs in language generation and understanding. Project website +at https://composable-models.github.io/llm_debate/.",https://arxiv.org/abs/2305.14325,Communication,Computation and Language (cs.CL),improving_factuality_and_reasoning_20230523,"MIT CSAIL, Google Brain" +Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback,"Yao Fu, Hao Peng, Tushar Khot, Mirella Lapata",2023.5.17,"We study whether multiple large language models (LLMs) can autonomously +improve each other in a negotiation game by playing, reflecting, and criticizing. +We are interested in this question because if LLMs were able to improve each +other, it would imply the possibility of creating strong AI agents with minimal +human intervention. We ask two LLMs to negotiate with each other, playing +the roles of a buyer and a seller, respectively. They aim to reach a deal with +the buyer targeting a lower price and the seller a higher one. A third language +model, playing the critic, provides feedback to a player to improve the player’s +negotiation strategies. We let the two agents play multiple rounds, using previous +negotiation history and AI feedback as in-context demonstrations to improve the +model’s negotiation strategy iteratively. We use different LLMs (GPT and Claude) +for different roles and use the deal price as the evaluation metric. Our experiments +reveal multiple intriguing findings: (",https://arxiv.org/abs/2305.10142,Communication,Computation and Language (cs.CL),improving_language_model_negotiation_20230517,"University of Edinburgh, Allen Institute for AI, University of Edinburgh" +Improving Multi-Agent Debate with Sparse Communication Topology,"Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, Eugene Ie",2024.6.17,"Multi-agent debate has proven effective in im- +proving large language models quality for rea- +soning and factuality tasks. While various role- +playing strategies in multi-agent debates have +been explored, in terms of the communica- +tion among agents, existing approaches adopt +a brute force algorithm – each agent can com- +municate with all other agents. In this paper, +we systematically investigate the effect of com- +munication connectivity in multi-agent systems. +Our experiments on GPT and Mistral models re- +veal that multi-agent debates leveraging sparse +communication topology can achieve compara- +ble or superior performance while significantly +reducing computational costs. Furthermore, we +extend the multi-agent debate framework to +multimodal reasoning and alignment labeling +tasks, showcasing its broad applicability and +effectiveness. Our findings underscore the im- +portance of communication connectivity on en- +hancing the efficiency and effectiveness of the +“society of minds” approach.",https://arxiv.org/abs/2406.11776,Organization,Computation and Language (cs.CL),improving_multi-agent_debate_with_20240617,"Google, Google DeepMind" +Improving Multi-Agent Debate with Sparse Communication Topology,"Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, Eugene Ie",2024.6.17,"Multi-agent debate has proven effective in im- +proving large language models quality for rea- +soning and factuality tasks. While various role- +playing strategies in multi-agent debates have +been explored, in terms of the communica- +tion among agents, existing approaches adopt +a brute force algorithm – each agent can com- +municate with all other agents. In this paper, +we systematically investigate the effect of com- +munication connectivity in multi-agent systems. +Our experiments on GPT and Mistral models re- +veal that multi-agent debates leveraging sparse +communication topology can achieve compara- +ble or superior performance while significantly +reducing computational costs. Furthermore, we +extend the multi-agent debate framework to +multimodal reasoning and alignment labeling +tasks, showcasing its broad applicability and +effectiveness. Our findings underscore the im- +portance of communication connectivity on en- +hancing the efficiency and effectiveness of the +“society of minds” approach.",https://arxiv.org/abs/2406.11776,Communication,Computation and Language (cs.CL),improving_multi-agent_debate_with_20240617,"Google, Google DeepMind" +Iterative Experience Refinement of Software-Developing Agents,"Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, YiFei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun",2024.5.7,"Autonomous agents powered by large language +models (LLMs) show significant potential for +achieving high autonomy in various scenar- +ios such as software development. Recent re- +search has shown that LLM agents can lever- +age past experiences to reduce errors and en- +hance efficiency. However, the static experi- +ence paradigm, reliant on a fixed collection of +past experiences acquired heuristically, lacks +iterative refinement and thus hampers agents’ +adaptability. In this paper, we introduce the It- +erative Experience Refinement framework, en- +abling LLM agents to refine experiences itera- +tively during task execution. We propose two +fundamental patterns: the successive pattern, +refining based on nearest experiences within a +task batch, and the cumulative pattern, acquir- +ing experiences across all previous task batches. +Augmented with our heuristic experience elim- +ination, the method prioritizes high-quality and +frequently-used experiences, effectively man- +aging the experience space and enhancing effi- +ciency. Extensive experiments show that while +the successive pattern may yield superior re- +sults, the cumulative pattern provides more sta- +ble performance......",https://arxiv.org/abs/2405.04219,Evolution,Computation and Language (cs.CL),iterative_experience_refinement_of_20240507,"Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens" +Iterative Experience Refinement of Software-Developing Agents,"Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, YiFei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun",2024.5.7,"Autonomous agents powered by large language +models (LLMs) show significant potential for +achieving high autonomy in various scenar- +ios such as software development. Recent re- +search has shown that LLM agents can lever- +age past experiences to reduce errors and en- +hance efficiency. However, the static experi- +ence paradigm, reliant on a fixed collection of +past experiences acquired heuristically, lacks +iterative refinement and thus hampers agents’ +adaptability. In this paper, we introduce the It- +erative Experience Refinement framework, en- +abling LLM agents to refine experiences itera- +tively during task execution. We propose two +fundamental patterns: the successive pattern, +refining based on nearest experiences within a +task batch, and the cumulative pattern, acquir- +ing experiences across all previous task batches. +Augmented with our heuristic experience elim- +ination, the method prioritizes high-quality and +frequently-used experiences, effectively man- +aging the experience space and enhancing effi- +ciency. Extensive experiments show that while +the successive pattern may yield superior re- +sults, the cumulative pattern provides more sta- +ble performance......",https://arxiv.org/abs/2405.04219,Organization,Computation and Language (cs.CL),iterative_experience_refinement_of_20240507,"Tsinghua University, Dalian University of Technology, Beijing University of Posts and Telecommunications, Siemens" +Language Agents as Digital Representatives in Collective Decision-Making,"Jarrett, Daniel and Pislar, Miruna and Bakker, Michiel A and Tessler, Michael Henry and Koster, Raphael and Balaguer, Jan and Elie, Romuald and Summerfield, Christopher and Tacchetti, Andrea",2023.11.8,"Consider the process of collective decision-making, in which a group of individuals +interactively select a preferred outcome from among a universe of alternatives. In +this context, “representation” is the activity of making an individual’s preferences +present in the process via participation by a proxy agent—i.e. their “representative”. +To this end, learned models of human behavior have the potential to fill this role, +with practical implications for multi-agent scenario studies and mechanism design. +In this work, we investigate the possibility of training language agents to behave +in the capacity of representatives of human agents, appropriately expressing the +preferences of those individuals whom they stand for. First, we formalize the setting +of collective decision-making—as the episodic process of interaction between a +group of agents and a decision mechanism. On this basis, we then formalize the +problem of digital representation—as the simulation of an agent’s behavior to yield +equivalent outcomes from the mechanism. Finally, we conduct an empirical case +study in the setting of consensus-finding among diverse humans, and demonstrate +the feasibility of fine-tuning large language models to act as digital representatives.",https://openreview.net/pdf?id=sv7KZcUqu1,Simulation,,language_agents_as_digital_20231108,Google DeepMind +Language Agents as Optimizable Graphs,"Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber",2024.2.26,"Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations. Graphs can be recursively combined into larger composite graphs representing hierarchies of inter-agent collaboration (where edges connect operations of different agents). Our novel automatic graph optimizers (1) refine node-level LLM prompts (node optimization) and (2) improve agent orchestration by changing graph connectivity (edge optimization). Experiments demonstrate that our framework can be used to efficiently develop, integrate, and automatically improve various LLM agents. ",https://arxiv.org/abs/2402.16823,Organization,Artificial Intelligence (cs.AI),language_agents_as_optimizable_20240226,"King Abdullah University of Science and Technology, The Swiss AI Lab IDSIA, USI, SUPSI" +Language Agents as Optimizable Graphs,"Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber",2024.2.26,"Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations. Graphs can be recursively combined into larger composite graphs representing hierarchies of inter-agent collaboration (where edges connect operations of different agents). Our novel automatic graph optimizers (1) refine node-level LLM prompts (node optimization) and (2) improve agent orchestration by changing graph connectivity (edge optimization). Experiments demonstrate that our framework can be used to efficiently develop, integrate, and automatically improve various LLM agents. ",https://arxiv.org/abs/2402.16823,Evolution,Artificial Intelligence (cs.AI),language_agents_as_optimizable_20240226,"King Abdullah University of Science and Technology, The Swiss AI Lab IDSIA, USI, SUPSI" +Large Language Models are Diverse Role-Players for Summarization Evaluation,"Ning Wu, Ming Gong, Linjun Shou, Shining Liang, Daxin Jiang",2023.3.27,". Text summarization has a wide range of applications in many scenarios. +The evaluation of the quality of the generated text is a complex problem. A big +challenge to language evaluation is that there is a clear divergence between existing +metrics and human evaluation. A document summary’s quality can be assessed +by human annotators on various criteria, both objective ones like grammar and +correctness, and subjective ones like informativeness, succinctness, and appeal. +Most of the automatic evaluation methods like BLUE/ROUGE may be not able +to adequately capture the above dimensions. In this paper, we propose a new +evaluation framework based on LLMs, which provides a comprehensive evaluation +framework by comparing generated text and reference text from both objective and +subjective aspects. First, we propose to model objective and subjective dimensions +of generated text based on roleplayers prompting mechanism. Furthermore, we +introduce a context-based prompting mechanism that is able to generate dynamic +roleplayer profiles based on input context. Finally, we design a multi-roleplayer +prompting technology based on batch prompting and integrate multiple outputs +into the final evaluation results. Experimental results on three real datasets for +summarization show that our model is highly competitive and has a very high +consistency with human annotators.",https://arxiv.org/abs/2303.15078,Organization,Computation and Language (cs.CL),large_language_models_are_20230327,Microsoft +Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game,"Qianqiao Xu, Zhiliang Tian, Hongyan Wu, Zhen Huang, Yiping Song, Feng Liu, Dongsheng Li",2024.4.3,"With the enhanced performance of large models on natural language processing +tasks, potential moral and ethical issues of large models arise. There exist ma- +licious attackers who induce large models to jailbreak and generate information +containing illegal, privacy-invasive information through techniques such as prompt +engineering. As a result, large models counter malicious attackers’ attacks using +techniques such as safety alignment. However, the strong defense mechanism +of the large model through rejection replies is easily identified by attackers and +used to strengthen attackers’ capabilities. In this paper, we propose a multi-agent +attacker-disguiser game approach to achieve a weak defense mechanism that allows +the large model to both safely reply to the attacker and hide the defense intent. First, +we construct a multi-agent framework to simulate attack and defense scenarios, +playing different roles to be responsible for attack, disguise, safety evaluation, +and disguise evaluation tasks. After that, we design attack and disguise game +algorithms to optimize the game strategies of the attacker and the disguiser and use +the curriculum learning process to strengthen the capabilities of the agents. The +experiments verify that the method in this paper is more effective in strengthening +the model’s ability to disguise the defense intent compared with other methods. +Moreover, our approach can adapt any black-box large model to assist the model in +defense and does not suffer from model version iterations.",https://arxiv.org/abs/2404.02532,Organization,Artificial Intelligence (cs.AI),learn_to_disguise_avoid_20240403,"National University of Defense Technology, Guangdong University of Foreign Studies, " +Leveraging Large Language Models for Collective Decision-Making,"Marios Papachristou, Longqi Yang, Chin-Chia Hsu",2023.11.3,"In various work contexts, such as meeting scheduling, collaborating, and project planning, collective decision-making is essential but often challenging due to diverse individual preferences, varying work focuses, and power dynamics among members. To address this, we propose a system leveraging Large Language Models (LLMs) to facilitate group decision-making by managing conversations and balancing preferences among individuals. Our system aims to extract individual preferences from conversations and suggest options that satisfy the preferences of the members. We specifically apply this system to corporate meeting scheduling. We create synthetic employee profiles and simulate conversations at scale, leveraging LLMs to evaluate the system performance as a novel approach to conducting a user study. Our results indicate efficient coordination with reduced interactions between the members and the LLM-based system. The system refines and improves its proposed options over time, ensuring that many of the members' individual preferences are satisfied in an equitable way. Finally, we conduct a survey study involving human participants to assess our system's ability to aggregate preferences and reasoning about them. Our findings show that the system exhibits strong performance in both dimensions",https://arxiv.org/abs/2311.04928,Organization,Computation and Language (cs.CL),leveraging_large_language_models_20231103,"Cornell University, Microsoft" +LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay,"Yihuai Lan, Zhiqiang Hu, Lei Wang, Yang Wang, Deheng Ye, Peilin Zhao, Ee-Peng Lim, Hui Xiong, Hao Wang",2023.10.23,"This paper explores the open research prob- +lem of understanding the social behaviors of +LLM-based agents. Using Avalon as a testbed, +we employ system prompts to guide LLM +agents in gameplay. While previous studies +have touched on gameplay with LLM agents, +research on their social behaviors is lacking. +We propose a novel framework, tailored for +Avalon, features a multi-agent system facil- +itating efficient communication and interac- +tion. We evaluate its performance based on +game success and analyze LLM agents’ so- +cial behaviors. Results affirm the framework’s +effectiveness in creating adaptive agents and +suggest LLM-based agents’ potential in nav- +igating dynamic social interactions. By ex- +amining collaboration and confrontation be- +haviors, we offer insights into this field’s re- +search and applications. +Our code is pub- +licly available at https://github.com/ +3DAgentWorld/LLM-Game-Agent",https://arxiv.org/abs/2310.14985,Communication,Computation and Language (cs.CL),llm-based_agent_society_investigation_20231023,"The Hong Kong University of Science and Technology (Guangzhou), Singapore University of Technology and Design, Singapore Management University, Verily Life Sciences, Tencent" +LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay,"Yihuai Lan, Zhiqiang Hu, Lei Wang, Yang Wang, Deheng Ye, Peilin Zhao, Ee-Peng Lim, Hui Xiong, Hao Wang",2023.10.23,"This paper explores the open research prob- +lem of understanding the social behaviors of +LLM-based agents. Using Avalon as a testbed, +we employ system prompts to guide LLM +agents in gameplay. While previous studies +have touched on gameplay with LLM agents, +research on their social behaviors is lacking. +We propose a novel framework, tailored for +Avalon, features a multi-agent system facil- +itating efficient communication and interac- +tion. We evaluate its performance based on +game success and analyze LLM agents’ so- +cial behaviors. Results affirm the framework’s +effectiveness in creating adaptive agents and +suggest LLM-based agents’ potential in nav- +igating dynamic social interactions. By ex- +amining collaboration and confrontation be- +haviors, we offer insights into this field’s re- +search and applications. +Our code is pub- +licly available at https://github.com/ +3DAgentWorld/LLM-Game-Agent",https://arxiv.org/abs/2310.14985,Organization,Computation and Language (cs.CL),llm-based_agent_society_investigation_20231023,"The Hong Kong University of Science and Technology (Guangzhou), Singapore University of Technology and Design, Singapore Management University, Verily Life Sciences, Tencent" +LLM-Driven Agents for Influencer Selection in Digital Advertising Campaigns,"Xiaoqing Zhang, Xiuying Chen, Yuhan Liu, Jianzhou Wang, Zhenxing Hu, Rui Yan",2024.3.22,"In the digital world, influencers are pivotal as opinion leaders, shap- +ing the views and choices of their influencees. Modern advertising +often follows this trend, where marketers choose appropriate in- +fluencers for product endorsements, based on thorough market +analysis. Previous studies on influencer selection have typically +relied on numerical representations of individual opinions and +interactions, a method that simplifies the intricacies of social dy- +namics. With the development of large language models (LLMs), +we now have the opportunity to capture the nuanced exchanges +of information within social networks. Hence, in this work, we +first introduce an Influencer Dynamics Simulator (IDS), helping +promoters identify and select the right influencers to market their +products, based on LLM simulation. Concretely, we first propose an +influencer-influencee engagement-based pre-selection module to +screen potential influencer candidates. Subsequently, a simulation is +constructed for these candidates and their influencees. Each user is +represented as an LLM-based agent, drawing from their interaction +history to deduce their profile and interests. The influencee agents +will predict their behavior in response to influencer advertising. Fi- +nally, we develop a ranking metric designed to pinpoint influencers +who are most likely to drive product purchases based on feedback +from their influencees. To evaluate our framework, we collect a +real-world advertising network dataset, including social relations, +post and comment content, and user behaviors.......",https://arxiv.org/abs/2403.15105,Simulation,Social and Information Networks (cs.SI),llm-driven_agents_for_influencer_20240322,"Renmin University of China, King Abdullah University of Science and Technology, Moonshot AI" +LM vs LM: Detecting Factual Errors via Cross Examination,"Roi Cohen, May Hamri, Mor Geva, Amir Globerson",2023.5.22,"A prominent weakness of modern language +models (LMs) is their tendency to generate fac- +tually incorrect text, which hinders their us- +ability. A natural question is whether such fac- +tual errors can be detected automatically. In- +spired by truth-seeking mechanisms in law, we +propose a factuality evaluation framework for +LMs that is based on cross-examination. Our +key idea is that an incorrect claim is likely to +result in inconsistency with other claims that +the model generates. To discover such incon- +sistencies, we facilitate a multi-turn interaction +between the LM that generated the claim and +another LM (acting as an examiner) which in- +troduces questions to discover inconsistencies. +We empirically evaluate our method on factual +claims made by multiple recent LMs on four +benchmarks, finding that it outperforms exist- +ing methods and baselines, often by a large +gap. Our results demonstrate the potential of +using interacting LMs to capture factual errors.",https://arxiv.org/abs/2305.13281,Communication,Computation and Language (cs.CL),lm_vs_lm_detecting_20230522,"Tel Aviv University, Google DeepMind, Google Research" +LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration,"Jun Zhao, Can Zu, Hao Xu, Yi Lu, Wei He, Yiwen Ding, Tao Gui, Qi Zhang, Xuanjing Huang",2024.2.18,"Large language models (LLMs) have demon- +strated impressive performance in understand- +ing language and executing complex reasoning +tasks. However, LLMs with long context win- +dows have been notorious for their expensive +training costs and high inference latency. Even +the most advanced models such as GPT-4 and +Claude2 often make mistakes when processing +inputs of over 100k tokens, a phenomenon also +known as lost in the middle. In this paper, +we propose LONGAGENT, a method based +on multi-agent collaboration, which scales +LLMs (e.g., LLaMA) to a context of 128K and +demonstrates potential superiority in long-text +processing compared to GPT-",https://arxiv.org/abs/2402.11550,Organization,Computation and Language (cs.CL),longagent_scaling_language_models_20240218,Fudan University +Lyfe Agents: Generative agents for low-cost real-time social interactions,"Zhao Kaiya, Michelangelo Naim, Jovana Kondic, Manuel Cortes, Jiaxin Ge, Shuying Luo, Guangyu Robert Yang, Andrew Ahn",2023.10.3,"Highly autonomous generative agents powered by large language models promise to simulate intricate social behaviors in virtual societies. However, achieving real-time interactions with humans at a low computational cost remains challenging. Here, we introduce Lyfe Agents. They combine low-cost with real-time responsiveness, all while remaining intelligent and goal-oriented. Key innovations include: (1) an option-action framework, reducing the cost of high-level decisions; (2) asynchronous self-monitoring for better self-consistency; and (3) a Summarize-and-Forget memory mechanism, prioritizing critical memory items at a low cost. We evaluate Lyfe Agents' self-motivation and sociability across several multi-agent scenarios in our custom LyfeGame 3D virtual environment platform. When equipped with our brain-inspired techniques, Lyfe Agents can exhibit human-like self-motivated social reasoning. For example, the agents can solve a crime (a murder mystery) through autonomous collaboration and information exchange. Meanwhile, our techniques enabled Lyfe Agents to operate at a computational cost 10-100 times lower than existing alternatives. Our findings underscore the transformative potential of autonomous generative agents to enrich human social experiences in virtual worlds.",https://arxiv.org/abs/2310.02172,Evolution,Human-Computer Interaction (cs.HC),lyfe_agents_generative_agents_20231003,"Massachusetts Institute of Technology, Peking University, LyfeAL" +Lyfe Agents: Generative agents for low-cost real-time social interactions,"Zhao Kaiya, Michelangelo Naim, Jovana Kondic, Manuel Cortes, Jiaxin Ge, Shuying Luo, Guangyu Robert Yang, Andrew Ahn",2023.10.3,"Highly autonomous generative agents powered by large language models promise to simulate intricate social behaviors in virtual societies. However, achieving real-time interactions with humans at a low computational cost remains challenging. Here, we introduce Lyfe Agents. They combine low-cost with real-time responsiveness, all while remaining intelligent and goal-oriented. Key innovations include: (1) an option-action framework, reducing the cost of high-level decisions; (2) asynchronous self-monitoring for better self-consistency; and (3) a Summarize-and-Forget memory mechanism, prioritizing critical memory items at a low cost. We evaluate Lyfe Agents' self-motivation and sociability across several multi-agent scenarios in our custom LyfeGame 3D virtual environment platform. When equipped with our brain-inspired techniques, Lyfe Agents can exhibit human-like self-motivated social reasoning. For example, the agents can solve a crime (a murder mystery) through autonomous collaboration and information exchange. Meanwhile, our techniques enabled Lyfe Agents to operate at a computational cost 10-100 times lower than existing alternatives. Our findings underscore the transformative potential of autonomous generative agents to enrich human social experiences in virtual worlds.",https://arxiv.org/abs/2310.02172,Simulation,Human-Computer Interaction (cs.HC),lyfe_agents_generative_agents_20231003,"Massachusetts Institute of Technology, Peking University, LyfeAL" +MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents,"Yuan Li, Yixuan Zhang, Lichao Sun",2023.10.10,"Significant advancements have occurred in the application of Large Language +Models (LLMs) for various tasks and social simulations. Despite this, their capac- +ities to coordinate within task-oriented social contexts are under-explored. Such +capabilities are crucial if LLMs are to effectively mimic human-like social be- +havior and produce meaningful results. To bridge this gap, we introduce collab- +orative generative agents, endowing LLM-based Agents with consistent behavior +patterns and task-solving abilities. We situate these agents in a simulated job fair +environment as a case study to scrutinize their coordination skills. We propose +a novel framework that equips collaborative generative agents with human-like +reasoning abilities and specialized skills. Our evaluation demonstrates that these +agents show promising performance. However, we also uncover limitations that +hinder their effectiveness in more complex coordination tasks. Our work provides +valuable insights into the role and evolution of LLMs in task-oriented social sim- +ulations.",https://arxiv.org/abs/2310.06500,Organization,Artificial Intelligence (cs.AI),metaagents_simulating_interactions_of_20231010,"University of Cambridge, William & Mary, Lehigh University" +MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents,"Yuan Li, Yixuan Zhang, Lichao Sun",2023.10.10,"Significant advancements have occurred in the application of Large Language +Models (LLMs) for various tasks and social simulations. Despite this, their capac- +ities to coordinate within task-oriented social contexts are under-explored. Such +capabilities are crucial if LLMs are to effectively mimic human-like social be- +havior and produce meaningful results. To bridge this gap, we introduce collab- +orative generative agents, endowing LLM-based Agents with consistent behavior +patterns and task-solving abilities. We situate these agents in a simulated job fair +environment as a case study to scrutinize their coordination skills. We propose +a novel framework that equips collaborative generative agents with human-like +reasoning abilities and specialized skills. Our evaluation demonstrates that these +agents show promising performance. However, we also uncover limitations that +hinder their effectiveness in more complex coordination tasks. Our work provides +valuable insights into the role and evolution of LLMs in task-oriented social sim- +ulations.",https://arxiv.org/abs/2310.06500,Simulation,Artificial Intelligence (cs.AI),metaagents_simulating_interactions_of_20231010,"University of Cambridge, William & Mary, Lehigh University" +MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework,"Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, Jürgen Schmidhuber",2023.8.1,"Remarkable progress has been made on automated problem solving through so- +cieties of agents based on large language models (LLMs). Existing LLM-based +multi-agent systems can already solve simple dialogue tasks. Solutions to more +complex tasks, however, are complicated through logic inconsistencies due to +cascading hallucinations caused by naively chaining LLMs. Here we introduce +MetaGPT, an innovative meta-programming framework incorporating efficient +human workflows into LLM-based multi-agent collaborations. +MetaGPT en- +codes Standardized Operating Procedures (SOPs) into prompt sequences for more +streamlined workflows, thus allowing agents with human-like domain expertise +to verify intermediate results and reduce errors. MetaGPT utilizes an assembly +line paradigm to assign diverse roles to various agents, efficiently breaking down +complex tasks into subtasks involving many agents working together. On col- +laborative software engineering benchmarks, MetaGPT generates more coherent +solutions than previous chat-based multi-agent systems. Our project can be found +at https://github.com/geekan/MetaGPT",https://arxiv.org/abs/2308.00352,Organization,Artificial Intelligence (cs.AI),metagpt_meta_programming_for_20230801,"DeepWisdom, King Abdullah University of Science and Technology, Xiamen University, The Chinese University of Hong Kong (Shenzhen), Nanjing University, University of Pennsylvania University of California, Berkeley, The Swiss AI Lab IDSIA/USI/SUPSI" +Mora: Enabling Generalist Video Generation via A Multi-Agent Framework,"Zhengqing Yuan, Ruoxi Chen, Zhaoxu Li, Haolong Jia, Lifang He, Chi Wang, Lichao Sun",2024.3.20,"Sora is the first large-scale generalist video generation model that garnered significant attention across society. Since its launch by OpenAI in February 2024, no other video generation models have paralleled {Sora}'s performance or its capacity to support a broad spectrum of video generation tasks. Additionally, there are only a few fully published video generation models, with the majority being closed-source. To address this gap, this paper proposes a new multi-agent framework Mora, which incorporates several advanced visual AI agents to replicate generalist video generation demonstrated by Sora. In particular, Mora can utilize multiple visual agents and successfully mimic Sora's video generation capabilities in various tasks, such as (1) text-to-video generation, (2) text-conditional image-to-video generation, (3) extend generated videos, (4) video-to-video editing, (5) connect videos and (6) simulate digital worlds. Our extensive experimental results show that Mora achieves performance that is proximate to that of Sora in various tasks. However, there exists an obvious performance gap between our work and Sora when assessed holistically. In summary, we hope this project can guide the future trajectory of video generation through collaborative AI agents.",https://arxiv.org/abs/2403.13248,Organization,Computer Vision and Pattern Recognition (cs.CV),mora_enabling_generalist_video_20240320,"Lehigh University, Microsoft Research" +Multi-Agent Software Development through Cross-Team Collaboration,"Zhuoyun Du, Chen Qian, Wei Liu, Zihao Xie, Yifei Wang, Yufan Dang, Weize Chen, Cheng Yang",2024.6.13,"The latest breakthroughs in Large Language +Models (LLMs), e.g., ChatDev, have catalyzed +profound transformations, particularly through +multi-agent collaboration for software devel- +opment. LLM agents can collaborate in teams +like humans, and follow the waterfall model +to sequentially work on requirements analysis, +development, review, testing, and other phases +to perform autonomous software generation. +However, for an agent team, each phase in a +single development process yields only one pos- +sible outcome. This results in the completion +of only one development chain, thereby losing +the opportunity to explore multiple potential +decision paths within the solution space. Con- +sequently, this may lead to obtaining subop- +timal results. To address this challenge, we +introduce Cross-Team Collaboration (CTC), +a scalable multi-team framework that enables +orchestrated teams to jointly propose various +decisions and communicate with their insights +in a cross-team collaboration environment for +superior content generation. Experimental re- +sults in software development reveal a notable +increase in quality compared to state-of-the- +art baselines, underscoring the efficacy of our +framework. The significant improvements in +story generation demonstrate the promising +generalization ability of our framework across +various domains. We anticipate that our work +will guide LLM agents towards a cross-team +paradigm and contribute to their significant +growth in but not limited to software devel- +opment. The code and data will be available at +https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2406.08979,Organization,Computation and Language (cs.CL),multi-agent_software_development_through_20240613,"Zhejiang University, Tsinghua University, Beijing University of Posts and Telecommunications" +MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate,"Alfonso Amayuelas, Xianjun Yang, Antonis Antoniades, Wenyue Hua, Liangming Pan, William Wang",2024.6.20,"Large Language Models (LLMs) have shown +exceptional results on current benchmarks +when working individually. The advancement +in their capabilities, along with a reduction in +parameter size and inference times, has facil- +itated the use of these models as agents, en- +abling interactions among multiple models to +execute complex tasks. Such collaborations +offer several advantages, including the use of +specialized models (e.g. coding), improved +confidence through multiple computations, and +enhanced divergent thinking, leading to more +diverse outputs. Thus, the collaborative use of +language models is expected to grow signifi- +cantly in the coming years. In this work, we +evaluate the behavior of a network of models +collaborating through debate under the influ- +ence of an adversary. We introduce pertinent +metrics to assess the adversary’s effectiveness, +focusing on system accuracy and model agree- +ment. Our findings highlight the importance +of a model’s persuasive ability in influencing +others. Additionally, we explore inference-time +methods to generate more compelling argu- +ments and evaluate the potential of prompt- +based mitigation as a defensive strategy.",https://arxiv.org/abs/2406.14711v1,Organization,Computation and Language (cs.CL),multiagent_collaboration_attack_investigating_20240620,"UC Santa Barbara, Rutgers University" +On Generative Agents in Recommendation,"An Zhang, Yuxin Chen, Leheng Sheng, Xiang Wang, Tat-Seng Chua",2023.10.16,"Recommender systems are the cornerstone of today's information dissemination, yet a disconnect between offline metrics and online performance greatly hinders their development. Addressing this challenge, we envision a recommendation simulator, capitalizing on recent breakthroughs in human-level intelligence exhibited by Large Language Models (LLMs). We propose Agent4Rec, a user simulator in recommendation, leveraging LLM-empowered generative agents equipped with user profile, memory, and actions modules specifically tailored for the recommender system. In particular, these agents' profile modules are initialized using real-world datasets (e.g. MovieLens, Steam, Amazon-Book), capturing users' unique tastes and social traits; memory modules log both factual and emotional memories and are integrated with an emotion-driven reflection mechanism; action modules support a wide variety of behaviors, spanning both taste-driven and emotion-driven actions. Each agent interacts with personalized recommender models in a page-by-page manner, relying on a pre-implemented collaborative filtering-based recommendation algorithm. We delve into both the capabilities and limitations of Agent4Rec, aiming to explore an essential research question: ``To what extent can LLM-empowered generative agents faithfully simulate the behavior of real, autonomous humans in recommender systems?'' Extensive and multi-faceted evaluations of Agent4Rec highlight both the alignment and deviation between agents and user-personalized preferences. Beyond mere performance comparison, we explore insightful experiments, such as emulating the filter bubble effect and discovering the underlying causal relationships in recommendation tasks.",https://arxiv.org/abs/2310.10108,Simulation,Information Retrieval (cs.IR),on_generative_agents_in_20231016,"National University of Singapore, Tsinghua University, University of Science and Technology of China" +"Out of One, Many: Using Language Models to Simulate Human Samples","Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua Gubler, Christopher Rytting, David Wingate",2022.9.14,"We propose and explore the possibility that language models can be studied as effective proxies for specific human sub-populations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the ""algorithmic bias"" within one such tool -- the GPT-3 language model -- is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property ""algorithmic fidelity"" and explore its extent in GPT-3. We create ""silicon samples"" by conditioning the model on thousands of socio-demographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio-cultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.",https://arxiv.org/abs/2209.06899,Simulation,Machine Learning (cs.LG),out_of_one_many_20220914,Brigham Young University +PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games,"Qinglin Zhu, Runcong Zhao, Jinhua Du, Lin Gui, Yulan He",2024.4.26,"We propose PLAYER*, a novel framework that addresses the limitations of existing agent-based approaches built on Large Language Models (LLMs) in handling complex questions and understanding interpersonal relationships in dynamic environments. PLAYER* enhances path planning in Murder Mystery Games (MMGs) using an anytime sampling-based planner and a questioning-driven search framework. By equipping agents with a set of sensors, PLAYER* eliminates the need for pre-defined questions and enables agents to navigate complex social interactions. We additionally make a contribution by introducing a quantifiable evaluation method using multiple-choice questions and present WellPlay, a dataset containing 1,482 question-answer pairs. Experimental results demonstrate PLAYER*'s superiority over existing multi-agent methods, enhancing the generalisability and adaptability of agents in MMGs and paving the way for more effective multi-agent interactions.",https://arxiv.org/abs/2404.17662,Communication,Computation and Language (cs.CL),player_enhancing_llm-based_multi-agent_20240426,"King’s College London, Huawei London Research Centre, The Alan Turing Institute" +Quantifying the Impact of Large Language Models on Collective Opinion Dynamics,"Chao Li, Xing Su, Haoying Han, Cong Xue, Chunmo Zheng, Chao Fan",2023.8.7,"The process of opinion expression and exchange is a critical component of democratic societies. As people interact with large language models (LLMs) in the opinion shaping process different from traditional media, the impacts of LLMs are increasingly recognized and being concerned. However, the knowledge about how LLMs affect the process of opinion expression and exchange of social opinion networks is very limited. Here, we create an opinion network dynamics model to encode the opinions of LLMs, cognitive acceptability and usage strategies of individuals, and simulate the impact of LLMs on opinion dynamics in a variety of scenarios. The outcomes of the simulations inform about effective demand-oriented opinion network interventions. The results from this study suggested that the output opinion of LLMs has a unique and positive effect on the collective opinion difference. The marginal effect of cognitive acceptability on collective opinion formation is nonlinear and shows a decreasing trend. When people partially rely on LLMs, the exchange process of opinion becomes more intense and the diversity of opinion becomes more favorable. In fact, there is 38.6% more opinion diversity when people all partially rely on LLMs, compared to prohibiting the use of LLMs entirely. The optimal diversity of opinion was found when the fractions of people who do not use, partially rely on, and fully rely on LLMs reached roughly 4:12:1. Our experiments also find that introducing extra agents with opposite/neutral/random opinions, we can effectively mitigate the impact of biased/toxic output from LLMs. Our findings provide valuable insights into opinion dynamics in the age of LLMs, highlighting the need for customized interventions tailored to specific scenarios to address the drawbacks of improper output and use of LLMs.",https://arxiv.org/abs/2308.03313,Simulation,Social and Information Networks (cs.SI),quantifying_the_impact_of_20230807," Zhejiang University, Clemson University, " +ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs,"Justin Chih-Yao Chen, Swarnadeep Saha, Mohit Bansal",2023.9.22,"Large Language Models (LLMs) still struggle with natural language reasoning tasks. Motivated by the society of minds (Minsky, 1988), we propose ReConcile, a multi-model multi-agent framework designed as a round table conference among diverse LLM agents. ReConcile enhances collaborative reasoning between LLM agents via multiple rounds of discussion, learning to convince other agents to improve their answers, and employing a confidence-weighted voting mechanism that leads to a better consensus. In each round, ReConcile initiates discussion between agents via a 'discussion prompt' that consists of (a) grouped answers and explanations generated by each agent in the previous round, (b) their confidence scores, and (c) demonstrations of answer-rectifying human explanations, used for convincing other agents. Experiments on seven benchmarks demonstrate that ReConcile significantly improves LLMs' reasoning -- both individually and as a team -- surpassing prior single-agent and multi-agent baselines by up to 11.4% and even outperforming GPT-4 on three datasets. ReConcile also flexibly incorporates different combinations of agents, including API-based, open-source, and domain-specific models, leading to an 8% improvement on MATH. Finally, we analyze the individual components of ReConcile, demonstrating that the diversity originating from different models is critical to its superior performance.",https://arxiv.org/abs/2309.13007,Organization,Computation and Language (cs.CL),reconcile_round-table_conference_improves_20230922,UNC Chapel Hill +Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?,"Qineng Wang, Zihao Wang, Ying Su, Hanghang Tong, Yangqiu Song",2024.2.28,"Recent progress in LLMs discussion suggests +that multi-agent discussion improves the rea- +soning abilities of LLMs. In this work, we +reevaluate this claim through systematic experi- +ments, where we propose a novel group discus- +sion framework to enrich the set of discussion +mechanisms. Interestingly, our results show +that a single-agent LLM with strong prompts +can achieve almost the same performance as +the best existing discussion approach on a wide +range of reasoning tasks and backbone LLMs. +We observe that the multi-agent discussion per- +forms better than a single agent only when there +is no demonstration in the prompt. Further +study reveals the common interaction mecha- +nisms of LLMs during the discussion.",https://arxiv.org/abs/2402.18272,Organization,Computation and Language (cs.CL),rethinking_the_bounds_of_20240228,"Zhejiang University, HKUST, UIUC" +RoCo: Dialectic Multi-Robot Collaboration with Large Language Models,"Zhao Mandi, Shreeya Jain, Shuran Song",2023.7.10,": We propose a novel approach to multi-robot collaboration that har- +nesses the power of pre-trained large language models (LLMs) for both high-level +communication and low-level path planning. Robots are equipped with LLMs to +discuss and collectively reason task strategies. They then generate sub-task plans +and task space waypoint paths, which are used by a multi-arm motion planner to +accelerate trajectory planning. We also provide feedback from the environment, +such as collision checking, and prompt the LLM agents to improve their plan and +waypoints in-context. For evaluation, we introduce RoCoBench, a 6-task bench- +mark covering a wide range of multi-robot collaboration scenarios, accompanied +by a text-only dataset for agent representation and reasoning. We experimentally +demonstrate the effectiveness of our approach – it achieves high success rates +across all tasks in RoCoBench and adapts to variations in task semantics. Our di- +alog setup offers high interpretability and flexibility – in real world experiments, +we show RoCo easily incorporates human-in-the-loop, where a user can commu- +nicate and collaborate with a robot agent to complete tasks together. See project +website project-roco.github.io for videos and code.",https://arxiv.org/abs/2307.04738,Communication,Robotics (cs.RO),roco_dialectic_multi-robot_collaboration_20230710,Columbia University +S3: Social-network Simulation System with Large Language Model-Empowered Agents,"Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, Yong Li",2023.7.27,"Simulation plays a crucial role in addressing various challenges within social +science. It offers extensive applications such as state prediction, phenomena ex- +planation, and policy-making support, among others. In this work, we harness the +human-like capabilities of large language models (LLMs) in sensing, reasoning, +and behaving, and utilize these qualities to construct the S3 system (short for +Social network Simulation System). Adhering to the widely employed agent-based +simulation paradigm, we employ fine-tuning and prompt engineering techniques to +ensure that the agent’s behavior closely emulates that of a genuine human within +the social network. Specifically, we simulate three pivotal aspects: emotion, at- +titude, and interaction behaviors. By endowing the agent in the system with the +ability to perceive the informational environment and emulate human actions, we +observe the emergence of population-level phenomena, including the propagation +of information, attitudes, and emotions. We conduct an evaluation encompassing +two levels of simulation, employing real-world social network data. Encouragingly, +the results demonstrate promising accuracy. This work represents an initial step in +the realm of social network simulation empowered by LLM-based agents. We an- +ticipate that our endeavors will serve as a source of inspiration for the development +of simulation systems within, but not limited to, social science.",https://arxiv.org/abs/2307.14984,Simulation,Social and Information Networks (cs.SI),s3_social-network_simulation_system_20230727,Tsinghua University +Scalable Multi-Robot Collaboration with Large Language Models: Centralized or Decentralized Systems?,"Yongchao Chen, Jacob Arkin, Yang Zhang, Nicholas Roy, Chuchu Fan",2023.9.27,"— A flurry of recent work has demonstrated that +pre-trained large language models (LLMs) can be effective +task planners for a variety of single-robot tasks. The planning +performance of LLMs is significantly improved via prompting +techniques, such as in-context learning or re-prompting with +state feedback, placing new importance on the token budget +for the context window. An under-explored but natural next +direction is to investigate LLMs as multi-robot task planners. +However, long-horizon, heterogeneous multi-robot planning +introduces new challenges of coordination while also pushing +up against the limits of context window length. It is therefore +critical to find token-efficient LLM planning frameworks that +are also able to reason about the complexities of multi-robot +coordination. In this work, we compare the task success rate and +token efficiency of four multi-agent communication frameworks +(centralized, decentralized, and two hybrid) as applied to +four coordination-dependent multi-agent 2D task scenarios for +increasing numbers of agents. We find that a hybrid framework +achieves better task success rates across all four tasks and +scales better to more agents. We further demonstrate the hybrid +frameworks in 3D simulations where the vision-to-text problem +and dynamical errors are considered. ",https://arxiv.org/abs/2309.15943,Organization,Robotics (cs.RO),scalable_multi-robot_collaboration_with_20230927,"Massachusetts Institute of Technology, Harvard University, MIT-IBM Watson AI Lab. " +Scaling Large-Language-Model-based Multi-Agent Collaboration,"Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun",2024.6.11,"Pioneering advancements in large language +model-powered agents have underscored the +design pattern of multi-agent collaboration, +demonstrating that collective intelligence can +surpass the capabilities of each individual. In- +spired by the neural scaling law, which posits +that increasing neurons leads to emergent abil- +ities, this study investigates whether a simi- +lar principle applies to increasing agents in +multi-agent collaboration. +Technically, we +propose ::multi-agent +:collaboration +:: +networks +(MACNET), which utilize directed acyclic +graphs to organize agents and streamline their +interactive reasoning via topological ordering, +with solutions derived from their dialogues. +Extensive experiments show that MACNET +consistently outperforms baseline models, en- +abling effective agent collaboration across var- +ious network topologies and supporting coop- +eration among more than a thousand agents. +Notably, we observed a small-world collabo- +ration phenomenon, where topologies resem- +bling small-world properties achieved supe- +rior performance. Additionally, we identified +a collaborative scaling law, indicating that +normalized solution quality follows a logistic +growth pattern as scaling agents, with collabo- +rative emergence occurring much earlier than +previously observed instances of neural emer- +gence. The code and data will be available at +https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2406.07155,Organization,Artificial Intelligence (cs.AI),scaling_large-language-model-based_multi-agent_collaboration_20240611,"Tsinghua University, Beijing University of Posts and Telecommunications" +Scaling Large-Language-Model-based Multi-Agent Collaboration,"Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun",2024.6.11,"Pioneering advancements in large language +model-powered agents have underscored the +design pattern of multi-agent collaboration, +demonstrating that collective intelligence can +surpass the capabilities of each individual. In- +spired by the neural scaling law, which posits +that increasing neurons leads to emergent abil- +ities, this study investigates whether a simi- +lar principle applies to increasing agents in +multi-agent collaboration. +Technically, we +propose ::multi-agent +:collaboration +:: +networks +(MACNET), which utilize directed acyclic +graphs to organize agents and streamline their +interactive reasoning via topological ordering, +with solutions derived from their dialogues. +Extensive experiments show that MACNET +consistently outperforms baseline models, en- +abling effective agent collaboration across var- +ious network topologies and supporting coop- +eration among more than a thousand agents. +Notably, we observed a small-world collabo- +ration phenomenon, where topologies resem- +bling small-world properties achieved supe- +rior performance. Additionally, we identified +a collaborative scaling law, indicating that +normalized solution quality follows a logistic +growth pattern as scaling agents, with collabo- +rative emergence occurring much earlier than +previously observed instances of neural emer- +gence. The code and data will be available at +https://github.com/OpenBMB/ChatDev.",https://arxiv.org/abs/2406.07155,Communication,Artificial Intelligence (cs.AI),scaling_large-language-model-based_multi-agent_collaboration_20240611,"Tsinghua University, Beijing University of Posts and Telecommunications" +Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization,"Yoichi Ishibashi, Yoshimasa Nishimura",2024.4.2,"Recent advancements in automatic code gener- +ation using large language model (LLM) agent +have brought us closer to the future of auto- +mated software development. However, exist- +ing single-agent approaches face limitations +in generating and improving large-scale, com- +plex codebases due to constraints in context +length. To tackle this challenge, we propose +Self-Organized multi-Agent framework (SoA), +a novel multi-agent framework that enables the +scalable and efficient generation and optimiza- +tion of large-scale code. In SoA, self-organized +agents operate independently to generate and +modify code components while seamlessly col- +laborating to construct the overall codebase. A +key feature of our framework is the automatic +multiplication of agents based on problem com- +plexity, allowing for dynamic scalability. This +enables the overall code volume to be increased +indefinitely according to the number of agents, +while the amount of code managed by each +agent remains constant. We evaluate SoA on +the HumanEval benchmark and demonstrate +that, compared to a single-agent system, each +agent in SoA handles significantly less code, +yet the overall generated code is substantially +greater. Moreover, SoA surpasses the powerful +single-agent baseline by 5%......",https://arxiv.org/abs/2404.02183,Organization,Software Engineering (cs.SE),self-organized_agents_a_llm_20240402,TsukushiAI +Simulating Opinion Dynamics with Networks of LLM-based Agents,"Yun-Shiuan Chuang, Agam Goyal, Nikunj Harlalka, Siddharth Suresh, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers",2023.11.16,"Accurately simulating human opinion dynam- +ics is crucial for understanding a variety of soci- +etal phenomena, including polarization and the +spread of misinformation. However, the agent- +based models (ABMs) commonly used for such +simulations often over-simplify human behav- +ior. We propose a new approach to simulat- +ing opinion dynamics based on populations of +Large Language Models (LLMs). Our findings +reveal a strong inherent bias in LLM agents to- +wards producing accurate information, leading +simulated agents to consensus in line with sci- +entific reality. This bias limits their utility for +understanding resistance to consensus views +on issues like climate change. After induc- +ing confirmation bias through prompt engineer- +ing, however, we observed opinion fragmenta- +tion in line with existing agent-based modeling +and opinion dynamics research. These insights +highlight the promise and limitations of LLM +agents in this domain and suggest a path for- +ward: refining LLMs with real-world discourse +to better simulate the evolution of human be- +liefs.",https://arxiv.org/abs/2311.09618,Simulation,Physics and Society (physics.soc-ph),simulating_opinion_dynamics_with_20231116,University of Wisconsin-Madison +Simulating Social Media Using Large Language Models to Evaluate Alternative News Feed Algorithms,"Petter Törnberg, Diliara Valeeva, Justus Uitermark, Christopher Bail",2023.10.5,". Social media is often criticized for amplifying +toxic discourse and discouraging constructive conversa- +tions. But designing social media platforms to promote +better conversations is inherently challenging. This paper +asks whether simulating social media through a combina- +tion of Large Language Models (LLM) and Agent-Based +Modeling can help researchers study how different news +feed algorithms shape the quality of online conversations. +We create realistic personas using data from the Ameri- +can National Election Study to populate simulated social +media platforms. Next, we prompt the agents to read +and share news articles — and like or comment upon +each other’s messages — within three platforms that use +different news feed algorithms. In the first platform, users +see the most liked and commented posts from users whom +they follow. In the second, they see posts from all users — +even those outside their own network. The third platform +employs a novel “bridging” algorithm that highlights posts +that are liked by people with opposing political views. We +find this bridging algorithm promotes more constructive, +non-toxic, conversation across political divides than the +other two models. Though further research is needed to +evaluate these findings, we argue that LLMs hold consid- +erable potential to improve simulation research on social +media and many other complex social settings.",https://arxiv.org/abs/2310.05984,Simulation,Social and Information Networks (cs.SI),simulating_social_media_using_20231005,"University of Amsterdam, Duke University" +Social Simulacra: Creating Populated Prototypes for Social Computing Systems,"Joon Sung Park, Lindsay Popowski, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein",2022.8.8,"Social computing prototypes probe the social behaviors that may +arise in an envisioned system design. This prototyping practice +is currently limited to recruiting small groups of people. Unfortu- +nately, many challenges do not arise until a system is populated +at a larger scale. Can a designer understand how a social system +might behave when populated, and make adjustments to the de- +sign before the system falls prey to such challenges? We intro- +duce social simulacra, a prototyping technique that generates a +breadth of realistic social interactions that may emerge when a so- +cial computing system is populated. Social simulacra take as input +the designer’s description of a community’s design—goal, rules, and +member personas—and produce as output an instance of that design +with simulated behavior, including posts, replies, and anti-social +behaviors. We demonstrate that social simulacra shift the behaviors +that they generate appropriately in response to design changes, and +that they enable exploration of “what if?” scenarios where commu- +nity members or moderators intervene. To power social simulacra, +we contribute techniques for prompting a large language model +to generate thousands of distinct community members and their +social interactions with each other; these techniques are enabled by +the observation that large language models’ training data already +includes a wide variety of positive and negative behavior on social +media platforms. In evaluations, we show that participants are of- +ten unable to distinguish social simulacra from actual community +behavior and that social computing designers successfully refine +their social computing designs when using social simulacra. +",https://arxiv.org/abs/2208.04024,Simulation,Human-Computer Interaction (cs.HC),social_simulacra_creating_populated_20220808,"Stanford University, Google Research" +"StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving","Chang Gao, Haiyun Jiang, Deng Cai, Shuming Shi, Wai Lam",2023.11.15,"Most existing prompting methods suffer from the issues of generalizability and consistency, as they often rely on instance-specific solutions that may not be applicable to other instances and lack task-level consistency across the selected few-shot examples. To address these limitations, we propose a comprehensive framework, StrategyLLM, allowing LLMs to perform inductive reasoning, deriving general strategies from specific task instances, and deductive reasoning, applying these general strategies to particular task examples, for constructing generalizable and consistent few-shot prompts. It employs four LLM-based agents: strategy generator, executor, optimizer, and evaluator, working together to generate, evaluate, and select promising strategies for a given task. Experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT-SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (34.2\% → 38.8\%), commonsense reasoning (70.3\% → 72.5\%), algorithmic reasoning (73.7\% → 85.0\%), and symbolic reasoning (30.0\% → 79.2\%). Further analysis reveals that StrategyLLM is applicable to various LLMs and demonstrates advantages across numerous scenarios.",https://arxiv.org/abs/2311.08803,Organization,Computation and Language (cs.CL),strategyllm_large_language_models_20231115,"The Chinese University of Hong Kong, Sun Yat-sen University, Tencent AI Lab" +The Impact of Language on Arithmetic Proficiency- A Multilingual Investigation with Cross-Agent Checking Computation,"Chung-Chi Chen, Hiroya Takamura, Ichiro Kobayashi, Yusuke Miyao",2024.6.16,"This paper critically examines the arithmetic capabilities of Large Language Models (LLMs), uncovering significant limitations in their performance. Our research reveals a notable decline in accuracy for complex calculations involving large numbers, with addition and subtraction tasks showing varying degrees of proficiency. Additionally, we challenge the notion that arithmetic is language-independent, finding up to a 10% difference in performance across twenty languages. The study also compares self-verification methods with cross-agent collaborations, showing that a single model often outperforms collaborative approaches in basic arithmetic tasks. These findings suggest a need to reassess the effectiveness of LLMs in tasks requiring numerical accuracy and precision.",https://aclanthology.org/2024.naacl-short.53.pdf,Communication,,the_impact_of_language_20240616,"AIST, University of Tokyo" +The Wisdom of Partisan Crowds: Comparing Collective Intelligence in Humans and LLM-based Agents,"Yun-Shiuan Chuang, Siddharth Suresh, Nikunj Harlalka, Agam Goyal, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers",2023.11.16,"Human groups are able to converge on more accurate beliefs through deliberation, +even in the presence of polarization and partisan bias — a phenomenon known as +the “wisdom of partisan crowds.” Generated agents powered by Large Language +Models (LLMs) are increasingly used to simulate human collective behavior, yet +few benchmarks exist for evaluating their dynamics against the behavior of hu- +man groups. In this paper, we examine the extent to which the wisdom of partisan +crowds emerges in groups of LLM-based agents that are prompted to role-play +as partisan personas (e.g., Democrat or Republican). We find that they not only +display human-like partisan biases, but also converge to more accurate beliefs +through deliberation as humans do. We then identify several factors that interfere +with convergence, including the use of chain-of-thought prompt and lack of details +in personas. Conversely, fine-tuning on human data appears to enhance conver- +gence. These findings show the potential and limitations of LLM-based agents as +a model of human collective intelligence.",https://arxiv.org/abs/2311.09665,Simulation,Computation and Language (cs.CL),the_wisdom_of_partisan_20231116,University of Wisconsin-Madison +Theory of Mind for Multi-Agent Collaboration via Large Language Models,"Huao Li, Yu Quan Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Michael Lewis, Katia Sycara",2023.10.16,"While Large Language Models (LLMs) have +demonstrated impressive accomplishments in +both reasoning and planning, their abilities +in multi-agent collaborations remains largely +unexplored. +This study evaluates LLM- +based agents in a multi-agent cooperative text +game with Theory of Mind (ToM) inference +tasks, comparing their performance with Multi- +Agent Reinforcement Learning (MARL) and +planning-based baselines. We observed evi- +dence of emergent collaborative behaviors and +high-order Theory of Mind capabilities among +LLM-based agents. Our results reveal limi- +tations in LLM-based agents’ planning opti- +mization due to systematic failures in managing +long-horizon contexts and hallucination about +the task state. We explore the use of explicit +belief state representations to mitigate these is- +sues, finding that it enhances task performance +and the accuracy of ToM inferences for LLM- +based agents.",https://arxiv.org/abs/2310.10701,Communication,Computation and Language (cs.CL),theory_of_mind_for_20231016,"University of Pittsburgh, Carnegie Mellon University" +To Infinity and Beyond- SHOW-1 and Showrunner Agents in Multi-Agent Simulations,"Philipp Maas, Frank Carey, Chris Wheeler, Edward Saatchi, Pete Billington, Jessica Yaffa Shamash",2023.7.24,"In this work we present our approach to generating high-quality episodic content for IP’s (Intellectual Property) using large language models (LLMs), custom state-of- the art diffusion models and our multi-agent simulation for contextualization, story progression and behavioral control. Powerful LLMs such as GPT-4 were trained on a large corpus of TV show data which lets us believe that with the right guidance users will be able to rewrite entire seasons.""That Is What Entertainment Will Look Like. Maybe people are still upset about the last season of Game of Thrones. Imagine if you could ask your A.I. to make a new ending that goes a different way and maybe even put yourself in there as a main character or something.”. ",https://fablestudio.github.io/showrunner-agents/static/pdfs/To_Infinity_and_Beyond_SHOW-1_And_Showrunner_Agents_in_Multi_Agent_Simulations_v2.pdf,Simulation,,to_infinity_and_beyond_20230724,Fable Studio +Toward Optimal LLM Alignments Using Two-Player Games,"Rui Zheng, Hongyi Guo, Zhihan Liu, Xiaoying Zhang, Yuanshun Yao, Xiaojun Xu, Zhaoran Wang, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang, Hang Li, Yang Liu",2024.6.16,"Alignment of large language models is a critical process designed to ensure that +the model’s responses to user prompts accurately reflect human intentions and +adhere to societal values. The standard Reinforcement Learning from Human +Feedback (RLHF) framework primarily focuses on optimizing the performance of +large language models using pre-collected prompts. However, collecting prompts +that provide comprehensive coverage is both tedious and challenging, and often +fails to include scenarios that LLMs need to improve on the most. In this paper, +we investigate alignment through the lens of two-agent games, involving iterative +interactions between an adversarial and a defensive agent. The adversarial agent’s +task at each step is to generate prompts that expose the weakness of the defensive +agent. In return, the defensive agent seeks to improve its responses to these newly +identified prompts it “struggled"" with, based on feedback from the reward model. +We theoretically demonstrate that this iterative reinforcement learning optimization +converges to a Nash Equilibrium for the game induced by the agents. Experi- +mental results in safety scenarios demonstrate that learning in such a competitive +environment not only fully trains agents but also leads to policies with enhanced +generalization capabilities for both adversarial and defensive agents. Our code is +released at https://github.com/ruizheng20/gpo.",https://arxiv.org/abs/2406.10977,Communication,Computation and Language (cs.CL),toward_optimal_llm_alignments_20240616,"Fudan University, Northwestern University, ByteDance Research" +Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework,"Xiaoxi Sun, Jinpeng Li, Yan Zhong, Dongyan Zhao, Rui Yan",2024.6.5,"The advent of large language models (LLMs) +has facilitated the development of natural lan- +guage text generation. It also poses unprece- +dented challenges, with content hallucination +emerging as a significant concern. Existing +solutions often involve expensive and complex +interventions during the training process. More- +over, some approaches emphasize problem dis- +assembly while neglecting the crucial valida- +tion process, leading to performance degrada- +tion or limited applications. To overcome these +limitations, we propose a Markov Chain-based +multi-agent debate verification framework to +enhance hallucination detection accuracy in +concise claims. Our method integrates the fact- +checking process, including claim detection, +evidence retrieval, and multi-agent verification. +In the verification stage, we deploy multiple +agents through flexible Markov Chain-based +debates to validate individual claims, ensuring +meticulous verification outcomes. Experimen- +tal results across three generative tasks demon- +strate that our approach achieves significant +improvements over baselines.",https://arxiv.org/abs/2406.03075,Communication,Computation and Language (cs.CL),towards_detecting_llms_hallucination_20240605,"Peking University, Renmin University of China" +TraveLER: A Multi-LMM Agent Framework for Video Question-Answering,"Chuyi Shang, Amos You, Sanjay Subramanian, Trevor Darrell, Roei Herzig",2024.4.1,"Recently, Large Multimodal Models (LMMs) have made significant progress +in video question-answering using a frame-wise approach by leveraging +large-scale, image-based pretraining in a zero-shot manner. While image- +based methods for videos have shown impressive performance, a current +limitation is that they often overlook how key timestamps are selected and +cannot adjust when incorrect timestamps are identified. Moreover, they are +unable to extract details relevant to the question, instead providing general +descriptions of the frame. To overcome this, we design a multi-LMM agent +framework that travels along the video, iteratively collecting relevant in- +formation from keyframes through interactive question-asking until there +is sufficient information to answer the question. Specifically, we propose +TraveLER, a model that can create a plan to “Traverse” through the video, +ask questions about individual frames to “Locate” and store key informa- +tion, and then “Evaluate” if there is enough information to answer the +question. Finally, if there is not enough information, our method is able to +“Replan” based on its collected knowledge. Through extensive experiments, +we find that the proposed TraveLER approach improves performance on +several video question-answering benchmarks, such as NExT-QA, STAR, +and Perception Test, without the need to fine-tune on specific datasets.",https://arxiv.org/abs/2404.01476,Organization,Computer Vision and Pattern Recognition (cs.CV),traveler_a_multi-lmm_agent_20240401,"University of California, Berkeley" +Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration,"Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, Heng Ji",2023.7.11,"Human intelligence thrives on cognitive syn- +ergy, where collaboration among different +minds yield superior outcomes compared to iso- +lated individuals. In this work, we propose Solo +Performance Prompting (SPP), which trans- +forms a single LLM into a cognitive synergist +by engaging in multi-turn self-collaboration +with multiple personas. +A cognitive syner- +gist is an intelligent agent that collaboratively +combines multiple minds’ strengths and knowl- +edge to enhance problem-solving in complex +tasks. By dynamically identifying and simu- +lating different personas based on task inputs, +SPP unleashes the potential of cognitive syn- +ergy in LLMs. Our in-depth analysis shows +that assigning multiple fine-grained personas +in LLMs improves problem-solving abilities +compared to using a single or fixed number +of personas. We evaluate SPP on three chal- +lenging tasks: Trivia Creative Writing, Code- +names Collaborative, and Logic Grid Puzzle, +encompassing both knowledge-intensive and +reasoning-intensive types. Unlike previous +works, such as Chain-of-Thought, that solely +enhance the reasoning abilities in LLMs, ex- +perimental results demonstrate that SPP effec- +tively reduces factual hallucination, and main- +tains strong reasoning capabilities. Addition- +ally, comparative experiments show that cog- +nitive synergy only emerges in GPT-4 and +does not appear in less capable models, such +as GPT-",https://arxiv.org/abs/2307.05300,Organization,Artificial Intelligence (cs.AI),unleashing_the_emergent_cognitive_20230711,"University of Illinois Urbana-Champaign, Microsoft Research Asia" +Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation,"Xinyi Mou, Zhongyu Wei, Xuanjing Huang",2024.2.26,"Social media has emerged as a cornerstone of +social movements, wielding significant influ- +ence in driving societal change. Simulating +the response of the public and forecasting the +potential impact has become increasingly im- +portant. However, existing methods for simu- +lating such phenomena encounter challenges +concerning their efficacy and efficiency in cap- +turing the behaviors of social movement par- +ticipants. In this paper, we introduce a hybrid +framework HiSim for social media user simu- +lation, wherein users are categorized into two +types. Core users are driven by Large Lan- +guage Models, while numerous ordinary users +are modeled by deductive agent-based models. +We further construct a Twitter-like environment +to replicate their response dynamics following +trigger events. Subsequently, we develop a +multi-faceted benchmark SoMoSiMu-Bench +for evaluation and conduct comprehensive ex- +periments across real-world datasets. Exper- +imental results demonstrate the effectiveness +and flexibility of our method",https://arxiv.org/abs/2402.16333,Simulation,Computers and Society (cs.CY),unveiling_the_truth_and_20240226,"Fudan University, Shanghai Collaborative Innovation Center of Intelligent Visual Computing" +User Behavior Simulation with Large Language Model based Agents,"Lei Wang, Jingsen Zhang, Hao Yang, Zhiyuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, Ji-Rong Wen",2023.6.5,"Simulating high quality user behavior data has always been a fundamental problem in human-centered applications, where the major difficulty originates from the intricate mechanism of human decision process. Recently, substantial evidences have suggested that by learning huge amounts of web knowledge, large language models (LLMs) can achieve human-like intelligence. We believe these models can provide significant opportunities to more believable user behavior simulation. To inspire such direction, we propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors. Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans. Concerning potential applications, we simulate and study two social phenomenons including (1) information cocoons and (2) user conformity behaviors. This research provides novel simulation paradigms for human-centered applications.",https://arxiv.org/abs/2306.02552,Organization,Information Retrieval (cs.IR),user_behavior_simulation_with_20230605,"Renmin University of China, Beijing Key Laboratory of Big Data Management and Analysis Methods, University College London" +User Behavior Simulation with Large Language Model based Agents,"Lei Wang, Jingsen Zhang, Hao Yang, Zhiyuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, Ji-Rong Wen",2023.6.5,"Simulating high quality user behavior data has always been a fundamental problem in human-centered applications, where the major difficulty originates from the intricate mechanism of human decision process. Recently, substantial evidences have suggested that by learning huge amounts of web knowledge, large language models (LLMs) can achieve human-like intelligence. We believe these models can provide significant opportunities to more believable user behavior simulation. To inspire such direction, we propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors. Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans. Concerning potential applications, we simulate and study two social phenomenons including (1) information cocoons and (2) user conformity behaviors. This research provides novel simulation paradigms for human-centered applications.",https://arxiv.org/abs/2306.02552,Simulation,Information Retrieval (cs.IR),user_behavior_simulation_with_20230605,"Renmin University of China, Beijing Key Laboratory of Big Data Management and Analysis Methods, University College London" +Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies,"Gati Aher, Rosa I. Arriaga, Adam Tauman Kalai",2022.8.18,"We introduce a new type of test, called a Turing Experiment (TE), for evaluating to what extent a given language model, such as GPT models, can simulate different aspects of human behavior. A TE can also reveal consistent distortions in a language model's simulation of a specific human behavior. Unlike the Turing Test, which involves simulating a single arbitrary individual, a TE requires simulating a representative sample of participants in human subject research. We carry out TEs that attempt to replicate well-established findings from prior studies. We design a methodology for simulating TEs and illustrate its use to compare how well different language models are able to reproduce classic economic, psycholinguistic, and social psychology experiments: Ultimatum Game, Garden Path Sentences, Milgram Shock Experiment, and Wisdom of Crowds. In the first three TEs, the existing findings were replicated using recent models, while the last TE reveals a ""hyper-accuracy distortion"" present in some language models (including ChatGPT and GPT-4), which could affect downstream applications in education and the arts.",https://arxiv.org/abs/2208.10264,Simulation,Computation and Language (cs.CL),using_large_language_models_20220818,"Olin College of Engineering, Georgia Tech, Microsoft Research" +War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars,"Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, Yongfeng Zhang",2023.11.28,"Can we avoid wars at the crossroads of history? This question has been pursued by +individuals, scholars, policymakers, and organizations throughout human history. +In this research, we attempt to answer the question based on the recent advances +of Artificial Intelligence (AI) and Large Language Models (LLMs). We propose +WarAgent, an LLM-powered multi-agent AI system, to simulate the participating +countries, their decisions, and the consequences, in historical international conflicts, +including the World War I (WWI), the World War II (WWII), and the Warring +States Period (WSP) in Ancient China. By evaluating the simulation effectiveness, +we examine the advancements and limitations of cutting-edge AI systems’ abilities +in studying complex collective human behaviors such as international conflicts +under diverse settings. In these simulations, the emergent interactions among +agents also offer a novel perspective for examining the triggers and conditions that +lead to war. Our findings offer data-driven and AI-augmented insights that can +redefine how we approach conflict resolution and peacekeeping strategies. The +implications stretch beyond historical analysis, offering a blueprint for using AI to +understand human history and possibly prevent future international conflicts. Code +and data are available at https://github.com/agiresearch/WarAgent.",https://arxiv.org/abs/2311.17227,Simulation,Artificial Intelligence (cs.AI),war_and_peace_(waragent)_20231128,Rutgers University +War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars,"Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, Yongfeng Zhang",2023.11.28,"Can we avoid wars at the crossroads of history? This question has been pursued by +individuals, scholars, policymakers, and organizations throughout human history. +In this research, we attempt to answer the question based on the recent advances +of Artificial Intelligence (AI) and Large Language Models (LLMs). We propose +WarAgent, an LLM-powered multi-agent AI system, to simulate the participating +countries, their decisions, and the consequences, in historical international conflicts, +including the World War I (WWI), the World War II (WWII), and the Warring +States Period (WSP) in Ancient China. By evaluating the simulation effectiveness, +we examine the advancements and limitations of cutting-edge AI systems’ abilities +in studying complex collective human behaviors such as international conflicts +under diverse settings. In these simulations, the emergent interactions among +agents also offer a novel perspective for examining the triggers and conditions that +lead to war. Our findings offer data-driven and AI-augmented insights that can +redefine how we approach conflict resolution and peacekeeping strategies. The +implications stretch beyond historical analysis, offering a blueprint for using AI to +understand human history and possibly prevent future international conflicts. Code +and data are available at https://github.com/agiresearch/WarAgent.",https://arxiv.org/abs/2311.17227,Organization,Artificial Intelligence (cs.AI),war_and_peace_(waragent)_20231128,Rutgers University diff --git a/MultiAgentEbook/simulation.html b/MultiAgentEbook/simulation.html new file mode 100644 index 000000000..3c3bfe2aa --- /dev/null +++ b/MultiAgentEbook/simulation.html @@ -0,0 +1,161 @@ + + + + + + + + + + + §4: Simulation + + + + + + + + + + + +
+
+
+
+
+
+
+ ← Back Homepage +
+

§4: Simulation

+
+
+

+ Multi-agent social simulations employ agents to create digital mappings of real-world societies, thereby offering insights into various social behaviors and trends to facilitate the analysis and prediction of social phenomena. Click on the ebook below to read. +

+
+ +
+ +
+
+
+
+
+ + + + + + + + + + + + +
TitleAuthorsAffiliationsLinkDate
+
+
+
+
+

+ Initiated by the ChatDev Group, Tsinghua + University +
Contact us via qianc62@gmail.com +

+
+ + + + + + \ No newline at end of file diff --git a/MultiAgentEbook/style.css b/MultiAgentEbook/style.css new file mode 100644 index 000000000..3fd7530cf --- /dev/null +++ b/MultiAgentEbook/style.css @@ -0,0 +1,1188 @@ + +*, +*::after, +*::before { + margin: 0; + padding: 0; + box-sizing: border-box; +} + +:root { + --clr-primary--one: #660874; + --clr-primary--two: #D93379; + + --clr-neutral--one: hsl(229, 8%, 60%); + --clr-neutral--two: hsl(229, 31%, 21%); + + --clr-primary--paper: #D5635F; + --clr-primary--code: hsl(225, 2%, 35%); + +} + +html { + font-size: 62.5%; + +} + +body { + font-size: 18px; + font-family: "Rubik", sans-serif; +} + +html, +body { + overflow-x: hidden; +} + +.container { + max-width: 1440px; + width: 100%; + margin: 0 auto; +} + +img { + max-width: 100%; + display: block; +} + +.bg-pattern { + position: absolute; + z-index: -1; + opacity: 0.7; +} + +h1 { + color: var(--clr-neutral--two); + font-size: 6rem; + font-weight: 500; +} + +h2 { + font-weight: 500; + font-size: 4rem; + color: var(--clr-neutral--two); +} + +p { + color: var(--clr-neutral--one); + max-width: 45ch; + line-height: 1.6; +} + +.btn { + padding: 0.9em 2em; + border-radius: 0.5rem; + font-weight: 500; + color: #fff; + transition: all 0.3s; + text-decoration: none; + display: inline-block; + box-shadow: 0 1rem 2rem rgba(0, 0, 0, 0.1); + -webkit-border-radius: 0.5rem; + -moz-border-radius: 0.5rem; + -ms-border-radius: 0.5rem; + -o-border-radius: 0.5rem; + -webkit-transition: all 0.3s; + -moz-transition: all 0.3s; + -ms-transition: all 0.3s; + -o-transition: all 0.3s; +} + +.btnsmall { + padding: 0.9em 2em; + border-radius: 0.5rem; + font-weight: 200; + color: #fff; + transition: all 0.3s; + text-decoration: none; + display: inline-block; + box-shadow: 0 1rem 2rem rgba(0, 0, 0, 0.1); + -webkit-border-radius: 0.5rem; + -moz-border-radius: 0.5rem; + -ms-border-radius: 0.5rem; + -o-border-radius: 0.5rem; + -webkit-transition: all 0.3s; + -moz-transition: all 0.3s; + -ms-transition: all 0.3s; + -o-transition: all 0.3s; +} + +.btn.clr1 { + background-color: var(--clr-primary--two); + border: 2px solid var(--clr-primary--two); +} + +.btn.clr1:hover { + color: var(--clr-primary--two); + background-color: #fff; +} + +.btn.clr2 { + margin-top: 2rem; + background-color: var(--clr-primary--one); + border: 2px solid var(--clr-primary--one); +} + +.btn.clr2:hover { + margin-top: 2rem; + color: var(--clr-primary--one); + background-color: transparent; +} + +.btn.clr3 { + background-color: transparent; + border: 2px solid transparent; + color: var(--clr-neutral--one); +} + +.btn.clr3:hover { + border: 2px solid var(--clr-neutral--one); +} + +.btnsmall.paper { + margin-top: 2rem; + background-color: var(--clr-primary--paper); + border: 2px solid var(--clr-primary--paper); +} + +.btnsmall.paper:hover { + margin-top: 2rem; + color: var(--clr-primary--paper); + background-color: transparent; +} + +.btnsmall.code { + margin-top: 2rem; + background-color: var(--clr-primary--code); + border: 2px solid var(--clr-primary--code); +} + +.btnsmall.code:hover { + margin-top: 2rem; + color: var(--clr-primary--code); + background-color: transparent; +} + +.flex { + display: flex; +} + +.section-heading { + text-align: center; +} + +.section-description { + margin: 2.5rem auto; + text-align: left; +} + + + +header .container { + position: relative; +} + +header .bg-pattern { + bottom: 5%; + right: -55%; + max-width: 130rem; +} + +.navbar { + justify-content: space-between; + align-items: center; + padding: 4rem 0; +} + +.hamburger-container { + display: none; +} + +.nav-list { + list-style: none; + align-items: center; +} + +.nav-list .social-media-list { + display: none; +} + +.nav-list .nav-item:not(:first-child) { + margin-left: 5rem; +} + +.nav-item .nav-link { + text-decoration: none; + text-transform: uppercase; + font-size: 1.5rem; + transition: all 0.3s; + color: var(--clr-neutral--two); + letter-spacing: 2px; +} + +.nav-item .nav-link.btn { + color: #fff; +} + +.nav-item .nav-link:hover { + color: var(--clr-primary--two); +} + + + +.intro { + padding: 1rem 0; + align-items: center; +} + +.intro-col-left { + flex: 1; + border-radius: 200px; +} + + +.intro-col-right { + flex: 1; + border-radius: 200px; +} + +.intro-col-left h1 { + font-size: 5rem; +} + +.intro-col-left p { + margin: 3rem 0 4rem 0; +} + +.intro-col-left .btn-group .btn:nth-child(2) { + margin-left: 1.5rem; +} + + + +.feature { + padding: 15rem 0; +} + +.feature .container { + position: relative; +} + +.feature .bg-pattern { + transform: rotateY(180deg); + top: 58%; + left: -42%; + max-width: 110rem; + -webkit-transform: rotateY(180deg); + -moz-transform: rotateY(180deg); + -ms-transform: rotateY(180deg); + -o-transform: rotateY(180deg); +} + +.tab-nav { + justify-content: center; + list-style: none; + width: -moz-fit-content; + width: -webkit-fit-content; + width: fit-content; + border-bottom: 1px solid rgba(0, 0, 0, 0.1); + margin: 0 auto; +} + +.tab-nav li { + padding: 3rem 4rem; + cursor: pointer; + position: relative; +} + +.tab-nav li:not(:last-child) { + margin-right: 4rem; +} + +.tab-nav li.active::before { + content: ""; + position: absolute; + width: 100%; + height: 4px; + background-color: var(--clr-primary--two); + bottom: 0; + left: 50%; + transform: translateX(-50%); +} + +.tab-body { + align-items: center; + justify-content: center; + margin-top: 8rem; + display: none; + height: 40rem; + animation: fadein 0.8s; + -webkit-animation: fadein 0.8s; +} + +@keyframes fadein { + from { + opacity: 0; + transform: translateX(-2rem); + } + + to { + opacity: 1; + transform: translateX(0); + } +} + +.tab-body.active { + display: flex; +} + +.tab-body .tab-col-left, +.tab-body .tab-col-right { + flex: 1; +} + +.tab-body .tab-col-left img { + margin: 0 auto; +} + +.tab-col-right .content { + width: -moz-fit-content; + width: -webkit-fit-content; + width: fit-content; + margin: 0 auto; +} + +.tab-col-right p { + margin: 3rem 0 4rem; +} + +.cards_row { + padding: 10rem 0; +} + +.browser-cards { + margin: 8rem auto 0 auto; + + display: grid; + grid-template-columns: repeat(4, 1fr); + width: 120rem; + + gap: 2rem; + +} + + +.browser-cards .card { + text-align: center; + padding: 3rem 0; + box-shadow: 0 1.5rem 2rem rgb(238, 238, 238); + max-width: 35rem; + border-radius: 1.5 +} + +.card img { + margin: 0 auto; + height: 200px; +} + +.card h4 { + color: var(--clr-neutral--two); + font-size: 2.5rem; + font-weight: 500; + margin-top: 1rem; +} + +.card p { + margin-top: 1rem; +} + + + +.faq { + padding: 10rem 0; +} + +.faq-container { + width: 80%; + margin: 8rem auto; +} + +.question button { + width: 100%; + display: flex; + justify-content: space-between; + align-items: center; + padding: 2.5rem 2rem 2.5rem 0; + border: none; + outline: none; + background-color: transparent; + cursor: pointer; + color: var(--clr-neutral--two); + font-size: 2rem; + font-family: "Rubik", sans-serif; + font-weight: 500; + letter-spacing: 1px; + transition: all 0.3s; + text-align: left; + -webkit-transition: all 0.3s; + -moz-transition: all 0.3s; + -ms-transition: all 0.3s; + -o-transition: all 0.3s; +} + +.faq-container .question { + border-bottom: 1px solid var(--clr-neutral--one); +} + +.faq-container .question:last-child { + border-bottom: 1px solid var(--clr-neutral--one); +} + +.faq-container .question:hover button { + color: var(--clr-primary--two); +} + +.question p { + max-width: 100%; + padding: 0; + height: 0; + overflow: hidden; + transition: all 0.3s; + opacity: 0; + -webkit-transition: all 0.3s; + -moz-transition: all 0.3s; + -ms-transition: all 0.3s; + -o-transition: all 0.3s; +} + +.question button svg { + transition: all 0.3s; + min-width: 1.8rem; + margin-left: 2rem; + -webkit-transition: all 0.3s; + -moz-transition: all 0.3s; + -ms-transition: all 0.3s; + -o-transition: all 0.3s; +} + +.question.open button svg { + transform: rotate(180deg); + -webkit-transform: rotate(180deg); + -moz-transform: rotate(180deg); + -ms-transform: rotate(180deg); + -o-transform: rotate(180deg); +} + +.question.open button svg path { + stroke: var(--clr-primary--two); +} + +.question.open p { + height: auto; + opacity: 1; + padding-bottom: 2.5rem; +} + +.faq .center { + text-align: center; +} + + + +.subscribe { + padding: 7rem 0; + background-color: var(--clr-primary--one); + text-align: center; +} + +.subscribe .heading-sm { + text-transform: uppercase; + color: #fff; + letter-spacing: 5px; + font-size: 1.5rem; +} + +.subscribe h2 { + color: #fff; + letter-spacing: 1px; + margin: 4rem 0; +} + +.subscribe .subscribe-form { + justify-content: center; +} + +.subscribe-form input { + padding: 2rem; + width: 35rem; + border: none; + outline: none; + font-family: "Rubik", sans-serif; + border-radius: 0.5rem; + color: var(--clr-neutral--two); + -webkit-border-radius: 0.5rem; + -moz-border-radius: 0.5rem; + -ms-border-radius: 0.5rem; + -o-border-radius: 0.5rem; +} + +.subscribe-form input::placeholder { + color: var(--clr-neutral--one); + font-size: 1.5rem; + opacity: 0.7; +} + +.subscribe-form .submit { + font-family: "Rubik", sans-serif; + font-size: 1.5rem; + cursor: pointer; + letter-spacing: 1px; + margin-left: 1rem; +} + + + +footer { + background-color: var(--clr-neutral--two); + padding: 3rem 0; +} + +footer .container { + justify-content: space-between; + align-items: center; +} + +.footer-nav .logo { + margin-right: 6rem; +} + +.footer-nav .logo svg path { + fill: #fff; +} + +.footer-nav, +.social-media-list { + list-style: none; +} + +.footer-nav .nav-item .nav-link { + font-size: 1.3rem; + color: #fff; +} + +.footer-nav .nav-item .nav-link:hover { + color: var(--clr-primary--two); +} + +.footer-nav .nav-item:not(:last-child), +.social-media-list li:first-child { + margin-right: 4rem; +} + +.social-media-list svg path { + transition: all 0.3s; + -webkit-transition: all 0.3s; + -moz-transition: all 0.3s; + -ms-transition: all 0.3s; + -o-transition: all 0.3s; +} + +.social-media-list svg:hover path { + fill: var(--clr-primary--two); +} + + + +@media only screen and (min-width: 162.5em) { + + + .bg-pattern { + display: none; + } +} + +@media only screen and (max-width: 90em) { + + + header .bg-pattern { + right: -70%; + } + + .feature .bg-pattern { + left: -62rem; + } +} + +@media only screen and (max-width: 75em) { + + + h1 { + font-size: 5rem; + } + + h2 { + font-size: 3.5rem; + } + + p { + font-size: 1.7rem; + } + + header .bg-pattern { + right: -80%; + max-width: 120rem; + } + + .feature .bg-pattern { + left: -68rem; + } + + .tab-body .tab-col-left { + margin-right: 2rem; + } + + .browser-cards { + grid-gap: 2rem; + } +} + +@media only screen and (max-width: 64em) { + + + h1 { + font-size: 4.5rem; + } + + p { + font-size: 1.6rem; + } + + .btn { + font-size: 1.7rem; + } + + header .bg-pattern { + right: -85%; + max-width: 105rem; + } + + .feature .bg-pattern { + left: -72rem; + } + + .faq-container { + width: 70%; + } +} + +@media only screen and (max-width: 56.25em) { + + + header .bg-pattern { + right: -75%; + max-width: 85rem; + } + + h1 { + font-size: 3.5rem; + } + + h2 { + font-size: 3rem; + } + + p { + font-size: 1.5rem; + max-width: 35ch; + } + + .btn { + font-size: 1.4rem; + } + + .feature .bg-pattern { + top: 64%; + left: -57rem; + max-width: 85rem; + } + + .tab-nav li { + padding: 3rem; + } +} + +@media only screen and (max-width: 48em) { + + + h1 { + font-size: 5rem; + } + + p { + max-width: 50ch; + font-size: 1.8rem; + } + + .btn { + font-size: 1.6rem; + } + + header .bg-pattern { + top: 25%; + left: 20%; + max-width: 115rem; + } + + .navbar { + padding: 4rem 2rem; + z-index: 300; + position: relative; + } + + .navbar .nav-list { + position: fixed; + top: 0; + left: 0; + width: 100%; + height: 100vh; + background-color: hsla(229, 31%, 21%, 0.95); + opacity: 0; + pointer-events: none; + z-index: 150; + overflow-y: scroll; + -webkit-overflow-scrolling: touch; + } + + .nav-list.active { + flex-direction: column; + opacity: 1; + padding: 0 4rem; + pointer-events: all; + } + + .nav-list .nav-item { + width: 100%; + text-align: center; + } + + .nav-list .nav-item:not(:last-child) { + margin-left: 0; + border-top: 1px solid rgba(255, 255, 255, 0.2); + padding: 2.5rem 0; + } + + .nav-list .nav-item:first-child { + margin-top: 12rem; + } + + .nav-list .nav-item .nav-link { + color: #fff; + font-size: 1.8rem; + letter-spacing: 2px; + } + + .nav-item .nav-link.btn { + color: #fff; + width: 100%; + background-color: transparent; + border: 2px solid #fff; + padding: 0.8em 0; + margin: 4rem auto; + } + + .nav-list .social-media-list { + display: flex; + margin: auto 0 6rem; + } + + .logo-container { + z-index: 999999; + } + + .logo-container svg circle, + .logo-container svg circle+path, + .logo-container svg path { + transition: all 0.3s; + } + + .logo-container.active svg circle, + .logo-container.active svg path { + fill: #fff; + } + + .logo-container.active svg circle+path { + fill: #000; + } + + .hamburger-container { + display: block; + position: absolute; + top: 50%; + transform: translateY(-50%); + right: 2rem; + z-index: 200; + -webkit-transform: translateY(-50%); + -moz-transform: translateY(-50%); + -ms-transform: translateY(-50%); + -o-transform: translateY(-50%); + } + + .hamburger-container img { + width: 3.5rem; + } + + .intro { + flex-direction: column; + padding: 5rem 0; + } + + .intro-col-left { + order: 2; + text-align: center; + margin-top: 15rem; + } + + .feature { + padding: 8rem 0; + } + + .feature .bg-pattern { + top: 50%; + left: -54rem; + max-width: 100rem; + } + + .tab-container { + margin-top: 6rem; + } + + .tab-nav { + flex-direction: column; + align-items: center; + width: 100%; + text-align: center; + border: none; + } + + .tab-nav li:not(:last-child) { + margin-right: 0; + } + + .tab-nav li { + width: 90%; + border-top: 1px solid rgba(0, 0, 0, 0.1); + } + + .tab-nav li:last-child { + border-bottom: 1px solid rgba(0, 0, 0, 0.1); + } + + .tab-nav li.active::before { + width: 50%; + } + + .tab-body { + flex-direction: column; + text-align: center; + width: auto; + } + + .tab-body .tab-col-right { + margin-top: 10rem; + } + + .tab-body .tab-col-left { + margin-right: 0; + } + + .browser-cards { + grid-template-columns: 1fr; + grid-gap: 3rem; + } + + .browser-cards .card { + margin: 0 auto; + width: 100%; + } + + .browser-cards .card:nth-child(2), + .browser-cards .card:nth-child(3) { + transform: none; + -webkit-transform: none; + -moz-transform: none; + -ms-transform: none; + -o-transform: none; + } + + .faq-container { + width: 90%; + } + + .subscribe .subscribe-form { + flex-direction: column; + } + + .subscribe-form input { + width: 70%; + margin: 0 auto; + } + + .subscribe-form .submit { + width: 70%; + margin: 1.5rem auto; + } + + footer .container, + .footer-nav { + flex-direction: column; + } + + .footer-nav { + align-items: center; + margin: 0 0 3rem 0; + } + + .footer-nav .logo { + margin: 0 0 4rem 0; + } + + .footer-nav .nav-item .nav-link { + font-size: 1.8rem; + } + + .footer-nav .nav-item:not(:last-child) { + margin-right: 0; + margin-bottom: 4rem; + } + + footer .social-media-list { + margin-top: 3rem; + } +} + +@media only screen and (max-width: 36em) { + + + h1 { + font-size: 4rem; + } + + p { + max-width: 45ch; + font-size: 1.6rem; + } + + header .bg-pattern { + top: 23%; + right: -55%; + max-width: 100rem; + } + + .feature .bg-pattern { + top: 50%; + left: -58rem; + max-width: 100rem; + } + + .faq-container { + width: 95%; + } + + .question button { + font-size: 1.8rem; + } +} + +@media only screen and (max-width: 30em) { + + + h1 { + font-size: 3.5rem; + } + + h2 { + font-size: 2.5rem; + } + + .btn { + font-size: 1.3rem; + } + + header .bg-pattern { + top: 27%; + right: -55%; + max-width: 70rem; + } + + .hamburger-container img { + width: 3rem; + } + + .intro-col-left { + margin-top: 8rem; + } + + .feature .bg-pattern { + top: 52%; + left: -55rem; + max-width: 82rem; + } + + .question button { + font-size: 1.6rem; + } + + .subscribe h2 br { + display: none; + } + + .footer-nav .nav-item .nav-link { + font-size: 1.6rem; + } +} + +@media only screen and (max-width: 22em) { + + + h1 { + font-size: 2.8rem; + padding: 0; + } + + p { + font-size: 1.4rem; + } + + header .bg-pattern { + top: 22%; + right: -55%; + max-width: 60rem; + } + + .hamburger-container img { + width: 2.5rem; + } + + .intro-col-left { + margin-top: 8rem; + } + + .intro .btn { + max-width: 200px; + display: block; + margin: 0 auto; + } + + .intro-col-left .btn-group .btn:nth-child(2) { + margin: 2rem auto 0; + } + + .feature .bg-pattern { + top: 55%; + left: -35rem; + max-width: 60rem; + } + + .tab-nav li { + width: 100%; + font-size: 1.6rem; + } + + .subscribe-form input, + .subscribe-form .submit { + width: 100%; + } +} + + +.attribution { + padding: 1rem 0; + background-color: #272727; +} + +.attribution p { + max-width: 100%; + text-align: center; + color: #fff; +} + +.attribution a { + text-decoration: none; + color: #ff7a00; +} + +.paper-list { + padding: 50px 0; +} + +.paper-list .btn { + display: inline-block; + margin-bottom: 30px; +} + +.section-heading { + font-size: 2em; + margin-bottom: 20px; +} + + +.faq { + padding: 50px 0; +} + +.question { + margin-bottom: 20px; +} + +html { + scroll-behavior: smooth; +} + +.btnsmall { + display: inline-flex; + align-items: center; + text-decoration: none; + padding: 5px 10px; + border: 1px solid #ccc; + border-radius: 5px; +} + +.btnsmall .icon img { + width: 20px; + height: auto; + margin-right: 5px; + transition: opacity 0.3s ease; +} + +.btnsmall.paper:hover .icon img { + content: url('images/pdf.png'); +} + +.btnsmall.code:hover .icon img { + content: url('images/github.png'); +} +/* 自定义DataTables样式 +.dataTables_wrapper .dataTables_filter { + float: right; + text-align: right; +} + +.dataTables_wrapper .dataTables_length { + float: left; +} + +.dataTables_wrapper .dataTables_paginate { + float: right; + text-align: right; +} + +.dataTables_wrapper .dataTables_info { + float: left; + padding-top: 8px; +} + +.dataTables_wrapper .dataTables_processing { + background-color: #f3f3f3; + border: 1px solid #ddd; + padding: 10px; +} */ \ No newline at end of file diff --git a/MultiAgentEbook/transform_csv.py b/MultiAgentEbook/transform_csv.py new file mode 100644 index 000000000..861ebea73 --- /dev/null +++ b/MultiAgentEbook/transform_csv.py @@ -0,0 +1,36 @@ +import pandas as pd + +input_file = 'papers.csv' +df_raw = pd.read_csv(input_file, on_bad_lines='warn') + +cat2id = {'Communication':'1', + 'Organization':'2', + 'Evolution':'3', + 'Simulation':'4'} + +for cat in ['Communication','Evolution','Simulation','Organization']: + df = df_raw[df_raw['AwesomeListCategory'] == cat] + + new_df = pd.DataFrame(columns=['image_path','title','author','summary','affiliation']) + + index = 0 + + first_title = df.iloc[0]['Title'] + first_author = df.iloc[0]['Authors'] + first_affiliation = df.iloc[0]['Affiliation'] + first_summary = df.iloc[0]['Abstract'].replace("\n","") + first_cover_path = "./images/" + cat2id[cat] + "d.png" + + first_line = pd.DataFrame([[first_cover_path,first_title,first_author,first_summary,first_affiliation]], columns=['image_path','title','author','summary','affiliation']) + new_df = pd.concat([new_df, first_line], ignore_index=True) + image_path_list = df['PaperIndex'].tolist() + for _, line in df[1:].iterrows(): + print(line['Title']) + new_line = pd.DataFrame([["./images/{}.png".format(image_path_list[index]),line['Title'],line['Authors'],str(line['Abstract']).replace("\n",""),line['Affiliation']]], columns=['image_path','title','author','summary','affiliation']) + new_df = pd.concat([new_df, new_line], ignore_index=True) + index += 1 + + last_line = pd.DataFrame([["./images/{}.png".format(image_path_list[index]),"To be Continued...","Your Contributions are Welcome!","",""]], columns=['image_path','title','author','summary','affiliation']) + new_df = pd.concat([new_df, last_line], ignore_index=True) + + new_df.to_csv("./book_{}/data.csv".format(cat.lower())) diff --git a/README.md b/README.md index b933a603b..ffd8a56ca 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@

- 【English | Chinese | Japanese | Korean | Filipino | French | Slovak | Portuguese | Spanish | Dutch | Turkish | Hindi | Bahasa Indonesia】 + 【English | Chinese | Japanese | Korean | Filipino | French | Slovak | Portuguese | Spanish | Dutch | Turkish | Hindi | Bahasa Indonesia | Russian

【📚 Wiki | 🚀 Visualizer | 👥 Community Built Software | 🔧 Customization | 👾 Discord】 @@ -27,15 +27,35 @@

## 🎉 News -* **January 25, 2024: We have integrated Experiential Co-Learning Module into ChatDev. Please see the [Experiential Co-Learning Guide](wiki.md#co-tracking).** +* **June 25, 2024: 🎉To foster development in LLM-powered multi-agent collaboration🤖🤖 and related fields, the ChatDev team has curated a collection of seminal papers📄 presented in a [open-source](https://github.com/OpenBMB/ChatDev/tree/main/MultiAgentEbook) interactive e-book📚 format. Now you can explore the latest advancements on the [Ebook Website](https://thinkwee.top/multiagent_ebook) and download the [paper list](https://github.com/OpenBMB/ChatDev/blob/main/MultiAgentEbook/papers.csv).** +

+ +

+* June 12, 2024: We introduce Multi-Agent Collaboration Networks (MacNet) 🎉, which utilize directed acyclic graphs to facilitate effective task-oriented collaboration among agents through linguistic interactions 🤖🤖. MacNet supports cooperation across various topologies and among more than a thousand agents without exceeding context limits. More versatile and scalable, MacNet can be considered a more advanced version of ChatDev's chain-shaped topology. Our preprint paper is available at [https://arxiv.org/abs/2406.07155](https://arxiv.org/abs/2406.07155). This technique will soon be incorporated into this repository, enhancing support for diverse organizational structures and offering richer solutions beyond software development (e.g., logical reasoning, data analysis, story generation, and more). +

+ +

+ +
+Old News + +* May 07, 2024, we introduced "Iterative Experience Refinement" (IER), a novel method where instructor and assistant agents enhance shortcut-oriented experiences to efficiently adapt to new tasks. This approach encompasses experience acquisition, utilization, propagation, and elimination across a series of tasks. Our preprint paper is available at https://arxiv.org/abs/2405.04219, and this technique will soon be incorporated into ChatDev. +

+ +

+ +* January 25, 2024: We have integrated Experiential Co-Learning Module into ChatDev. Please see the [Experiential Co-Learning Guide](wiki.md#co-tracking). + * December 28, 2023: We present Experiential Co-Learning, an innovative approach where instructor and assistant agents accumulate shortcut-oriented experiences to effectively solve new tasks, reducing repetitive errors and enhancing efficiency. Check out our preprint paper at https://arxiv.org/abs/2312.17025 and this technique will soon be integrated into ChatDev.

+ * November 15, 2023: We launched ChatDev as a SaaS platform that enables software developers and innovative entrepreneurs to build software efficiently at a very low cost and barrier to entry. Try it out at https://chatdev.modelbest.cn/.

+ * November 2, 2023: ChatDev is now supported with a new feature: incremental development, which allows agents to develop upon existing codes. Try `--config "incremental" --path "[source_code_directory_path]"` to start it.

@@ -45,7 +65,7 @@

-- September 25, 2023: The **Git** mode is now available, enabling the programmer to utilize Git for version control. To enable this feature, simply set ``"git_management"`` to ``"True"`` in ``ChatChainConfig.json``. See [guide](wiki.md#git-mode). +* September 25, 2023: The **Git** mode is now available, enabling the programmer to utilize Git for version control. To enable this feature, simply set ``"git_management"`` to ``"True"`` in ``ChatChainConfig.json``. See [guide](wiki.md#git-mode).

@@ -62,6 +82,7 @@ mode are now supported. - July 16, 2023: The [preprint paper](https://arxiv.org/abs/2307.07924) associated with this project was published. - June 30, 2023: The initial version of the ChatDev repository was released. +
## ❓ What Can ChatDev Do? @@ -201,22 +222,12 @@ Made with [contrib.rocks](https://contrib.rocks). ## 🔎 Citation ``` -@misc{qian2023communicative, - title={Communicative Agents for Software Development}, - author={Chen Qian and Xin Cong and Wei Liu and Cheng Yang and Weize Chen and Yusheng Su and Yufan Dang and Jiahao Li and Juyuan Xu and Dahai Li and Zhiyuan Liu and Maosong Sun}, - year={2023}, - eprint={2307.07924}, - archivePrefix={arXiv}, - primaryClass={cs.SE} -} - -@misc{qian2023experiential, - title={Experiential Co-Learning of Software-Developing Agents}, - author={Chen Qian and Yufan Dang and Jiahao Li and Wei Liu and Weize Chen and Cheng Yang and Zhiyuan Liu and Maosong Sun}, - year={2023}, - eprint={2312.17025}, - archivePrefix={arXiv}, - primaryClass={cs.CL} +@article{chatdev, + title = {ChatDev: Communicative Agents for Software Development}, + author = {Chen Qian and Wei Liu and Hongzhang Liu and Nuo Chen and Yufan Dang and Jiahao Li and Cheng Yang and Weize Chen and Yusheng Su and Xin Cong and Juyuan Xu and Dahai Li and Zhiyuan Liu and Maosong Sun}, + journal = {arXiv preprint arXiv:2307.07924}, + url = {https://arxiv.org/abs/2307.07924}, + year = {2023} } ``` @@ -232,8 +243,8 @@ Made with [contrib.rocks](https://contrib.rocks).       - + ## 📬 Contact -If you have any questions, feedback, or would like to get in touch, please feel free to reach out to us via email at [chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) +If you have any questions, feedback, or would like to get in touch, please feel free to reach out to us via email at [qianc62@gmail.com](mailto:qianc62@gmail.com) diff --git a/camel/messages/chat_messages.py b/camel/messages/chat_messages.py index 7ca507239..a79d05a11 100644 --- a/camel/messages/chat_messages.py +++ b/camel/messages/chat_messages.py @@ -38,12 +38,14 @@ class ChatMessage(BaseMessage): for the message. role (str): The role of the message in OpenAI chat system. content (str): The content of the message. (default: :obj:`""`) + refusal (str): The refusal to build argument. """ role_name: str role_type: RoleType meta_dict: Optional[Dict[str, str]] role: str content: str = "" + refusal: str = None if openai_new_api: function_call: Optional[FunctionCall] = None tool_calls: Optional[ChatCompletionMessageToolCall] = None @@ -55,6 +57,7 @@ def set_user_role_at_backend(self: BaseMessage): meta_dict=self.meta_dict, role="user", content=self.content, + refusal=self.refusal, ) @@ -72,12 +75,14 @@ class AssistantChatMessage(ChatMessage): role (str): The role of the message in OpenAI chat system. (default: :obj:`"assistant"`) content (str): The content of the message. (default: :obj:`""`) + refusal (str): The refusal to build argument. """ role_name: str role_type: RoleType = RoleType.ASSISTANT meta_dict: Optional[Dict[str, str]] = None role: str = "user" content: str = "" + refusal: str = None @dataclass @@ -92,9 +97,11 @@ class UserChatMessage(ChatMessage): role (str): The role of the message in OpenAI chat system. (default: :obj:`"user"`) content (str): The content of the message. (default: :obj:`""`) + refusal (str): The refusal to build argument. """ role_name: str role_type: RoleType = RoleType.USER meta_dict: Optional[Dict[str, str]] = None role: str = "user" content: str = "" + refusal: str = None diff --git a/camel/model_backend.py b/camel/model_backend.py index 394171a32..c4b6b5879 100644 --- a/camel/model_backend.py +++ b/camel/model_backend.py @@ -90,8 +90,9 @@ def run(self, *args, **kwargs): "gpt-4": 8192, "gpt-4-0613": 8192, "gpt-4-32k": 32768, - "gpt-4-1106-preview": 4096, - "gpt-4-1106-vision-preview": 4096, + "gpt-4-turbo": 100000, + "gpt-4o": 4096, #100000 + "gpt-4o-mini": 16384, #100000 } num_max_token = num_max_token_map[self.model_type.value] num_max_completion_tokens = num_max_token - num_prompt_tokens @@ -122,6 +123,9 @@ def run(self, *args, **kwargs): "gpt-4": 8192, "gpt-4-0613": 8192, "gpt-4-32k": 32768, + "gpt-4-turbo": 100000, + "gpt-4o": 4096, #100000 + "gpt-4o-mini": 16384, #100000 } num_max_token = num_max_token_map[self.model_type.value] num_max_completion_tokens = num_max_token - num_prompt_tokens @@ -182,6 +186,8 @@ def create(model_type: ModelType, model_config_dict: Dict) -> ModelBackend: ModelType.GPT_4_32k, ModelType.GPT_4_TURBO, ModelType.GPT_4_TURBO_V, + ModelType.GPT_4O, + ModelType.GPT_4O_MINI, None }: model_class = OpenAIModel diff --git a/camel/typing.py b/camel/typing.py index e922c5b52..e94987811 100644 --- a/camel/typing.py +++ b/camel/typing.py @@ -48,8 +48,10 @@ class ModelType(Enum): GPT_3_5_TURBO_NEW = "gpt-3.5-turbo-16k" GPT_4 = "gpt-4" GPT_4_32k = "gpt-4-32k" - GPT_4_TURBO = "gpt-4-1106-preview" - GPT_4_TURBO_V = "gpt-4-1106-vision-preview" + GPT_4_TURBO = "gpt-4-turbo" + GPT_4_TURBO_V = "gpt-4-turbo" + GPT_4O = "gpt-4o" + GPT_4O_MINI = "gpt-4o-mini" STUB = "stub" diff --git a/camel/utils.py b/camel/utils.py index a2713af34..2989e6ce0 100644 --- a/camel/utils.py +++ b/camel/utils.py @@ -89,6 +89,8 @@ def num_tokens_from_messages( ModelType.GPT_4_32k, ModelType.GPT_4_TURBO, ModelType.GPT_4_TURBO_V, + ModelType.GPT_4O, + ModelType.GPT_4O_MINI, ModelType.STUB }: return count_tokens_openai_chat_models(messages, encoding) @@ -124,6 +126,10 @@ def get_model_token_limit(model: ModelType) -> int: return 128000 elif model == ModelType.STUB: return 4096 + elif model == ModelType.GPT_4O: + return 128000 + elif model == ModelType.GPT_4O_MINI: + return 128000 else: raise ValueError("Unknown model type") diff --git a/chatdev/eval_quality.py b/chatdev/eval_quality.py new file mode 100644 index 000000000..562370d6a --- /dev/null +++ b/chatdev/eval_quality.py @@ -0,0 +1,199 @@ +import os +import re +import signal +import subprocess +import time +import numpy as np +from openai import OpenAI + +client = OpenAI( + api_key='', + base_url="", +) + +def getFilesFromType(sourceDir, filetype): + files = [] + for root, directories, filenames in os.walk(sourceDir): + for filename in filenames: + if filename.endswith(filetype): + files.append(os.path.join(root, filename)) + return files + +def get_code(directory): + def _format_code(code): + code = "\n".join([line for line in code.split("\n") if len(line.strip()) > 0]) + return code + + codebooks = {} + filepaths = getFilesFromType(directory, ".py") + for filepath in filepaths: + filename = os.path.basename(filepath) + codebooks[filename] = _format_code(open(filepath, "r", encoding="utf-8").read()) + + code = "" + for filename in codebooks.keys(): + code += "{}\n```Python\n{}\n```\n\n".format(filename, codebooks[filename]) + + if len(code) == 0: + code = "# None" + + return code.strip() + +def get_completeness(directory): + assert os.path.isdir(directory) + vn = get_code(directory) + lines = vn.split("\n") + lines = [line for line in lines if + "password" not in line.lower() and "passenger" not in line.lower() and "passed" not in line.lower() and "passes" not in line.lower()] + lines = [line for line in lines if "pass" in line.lower() or "todo" in line.lower()] + if len(lines) > 0: + return 0.0 + return 1.0 + +def get_executability(directory): + assert os.path.isdir(directory) + def findFile(directory, target): + main_py_path = None + for subroot, _, filenames in os.walk(directory): + for filename in filenames: + if target in filename: + main_py_path = os.path.join(subroot, filename) + return main_py_path + + def exist_bugs(directory): + assert os.path.isdir(directory) + success_info = "The software run successfully without errors." + try: + command = "cd \"{}\"; ls -l; python3 main.py;".format(directory) + process = subprocess.Popen(command, shell=True, preexec_fn=os.setsid, stdout=subprocess.PIPE, + stderr=subprocess.PIPE) + time.sleep(3) + + error_type = "" + return_code = process.returncode + if process.poll() is None: + os.killpg(os.getpgid(process.pid), signal.SIGTERM) + if return_code == 0: + return False, success_info, error_type + else: + error_output = process.stderr.read().decode('utf-8') + try: + error_pattern = r'\w+Error:' + error_matches = re.findall(error_pattern, error_output) + error_type = error_matches[0].replace(":", "") + except: + pass + if error_output: + if "Traceback".lower() in error_output.lower(): + errs = error_output.replace(directory + "/", "") + return True, errs, error_type + else: + return False, success_info, error_type + except subprocess.CalledProcessError as e: + return True, f"Error: {e}", "subprocess.CalledProcessError" + except Exception as ex: + return True, f"An error occurred: {ex}", "OtherException" + + return False, success_info, error_type + + main_py_path = findFile(directory, ".py") + pass_flag, error_type = True, "" + if main_py_path is not None: + main_py_path = os.path.dirname(main_py_path) + bug_flag, info, error_type = exist_bugs(main_py_path) + pass_flag = not bug_flag + else: + pass_flag, error_type = False, "NoMain" + + if error_type == "": + error_type = info.replace("\n", "\\n") + + if pass_flag: + return 1.0 + return 0.0 + +def get_consistency(directory): + def remove_comments(string): + def remove_comments_by_regex(string, regex): + lines = string.split("\n") + lines = [line for line in lines if not line.strip().startswith("#")] + string = "\n".join(lines) + comments = [] + matches = re.finditer(regex, string, re.DOTALL) + for match in matches: + group1 = match.group(1) + comments.append(group1) + for comment in comments + ["''''''\n"]: + string = string.replace(comment, "") + return string + + string = remove_comments_by_regex(string, r"'''(.*?)'''") + string = remove_comments_by_regex(string, r"\"\"\"(.*?)\"\"\"") + return string + + def get_text_embedding(text: str): + if text == "": + text = "None" + ada_embedding = client.embeddings.create(input=text, model="text-embedding-ada-002").model_dump()['data'][0]['embedding'] + return ada_embedding + + def get_code_embedding(code: str): + if code == "": + code = "#" + ada_embedding = client.embeddings.create(input=code, model="text-embedding-ada-002").model_dump()['data'][0]['embedding'] + return ada_embedding + + def get_cosine_similarity(embeddingi, embeddingj): + embeddingi = np.array(embeddingi) + embeddingj = np.array(embeddingj) + cos_sim = embeddingi.dot(embeddingj) / (np.linalg.norm(embeddingi) * np.linalg.norm(embeddingj)) + return cos_sim + + assert os.path.isdir(directory) + files = getFilesFromType(directory, ".txt") + if len(files) == 0: + print() + filepath = files[0] + task = open(filepath).read().strip() + codes = get_code(directory) + codes = remove_comments(codes) + + text_embedding = get_text_embedding(task) + code_embedding = get_code_embedding(codes) + task_code_alignment = get_cosine_similarity(text_embedding, code_embedding) + + return task_code_alignment + +def main(warehouse_root): + def write_string(string): + writer.write(string) + print(string, end="") + + directories = [] + for directory in os.listdir(warehouse_root): + directories.append(os.path.join(warehouse_root, directory)) + directories = sorted(directories) + directories = [directory for directory in directories if os.path.isdir(directory)] + print("len(directories):", len(directories)) + + suffix = warehouse_root.replace("/", "__").replace("-", "_") + tsv_file = __file__.replace(".py", ".{}.tsv".format(suffix)) + print("tsv_file:", tsv_file) + + counter = 0 + completeness_list, executability_list, consistency_list = [], [], [] + with open(tsv_file, "a", encoding="utf-8") as writer: + for i, directory in enumerate(directories): + directory_basename = os.path.basename(directory) + + completeness = get_completeness(directory) + executability = get_executability(directory) + consistency = get_consistency(directory) + + completeness_list.append(completeness) + executability_list.append(executability) + consistency_list.append(consistency) + + counter += 1 + +main(warehouse_root = "./WareHouse") diff --git a/chatdev/phase.py b/chatdev/phase.py index dc99faebd..d4ffde61d 100644 --- a/chatdev/phase.py +++ b/chatdev/phase.py @@ -586,7 +586,8 @@ def execute(self, chat_env, chat_turn_limit, need_reflect) -> ChatEnv: user_role_prompt=self.user_role_prompt, memory=chat_env.memory, chat_turn_limit=chat_turn_limit, - placeholders=self.phase_env) + placeholders=self.phase_env, + model_type=self.model_type) chat_env = self.update_chat_env(chat_env) return chat_env diff --git a/chatdev/statistics.py b/chatdev/statistics.py index 82a08fb60..d98c9e764 100644 --- a/chatdev/statistics.py +++ b/chatdev/statistics.py @@ -5,27 +5,29 @@ def prompt_cost(model_type: str, num_prompt_tokens: float, num_completion_tokens: float): input_cost_map = { - "gpt-3.5-turbo": 0.0015, + "gpt-3.5-turbo": 0.0005, "gpt-3.5-turbo-16k": 0.003, "gpt-3.5-turbo-0613": 0.0015, "gpt-3.5-turbo-16k-0613": 0.003, "gpt-4": 0.03, "gpt-4-0613": 0.03, "gpt-4-32k": 0.06, - "gpt-4-1106-preview": 0.01, - "gpt-4-1106-vision-preview": 0.01, + "gpt-4-turbo": 0.01, + "gpt-4o": 0.005, + "gpt-4o-mini": 0.00015, } output_cost_map = { - "gpt-3.5-turbo": 0.002, + "gpt-3.5-turbo": 0.0015, "gpt-3.5-turbo-16k": 0.004, "gpt-3.5-turbo-0613": 0.002, "gpt-3.5-turbo-16k-0613": 0.004, "gpt-4": 0.06, "gpt-4-0613": 0.06, "gpt-4-32k": 0.12, - "gpt-4-1106-preview": 0.03, - "gpt-4-1106-vision-preview": 0.03, + "gpt-4-turbo": 0.03, + "gpt-4o": 0.015, + "gpt-4o-mini": 0.0006, } if model_type not in input_cost_map or model_type not in output_cost_map: @@ -112,7 +114,11 @@ def get_info(dir, log_filepath): elif model_type == "GPT_4_32k": model_type = "gpt-4-32k" elif model_type == "GPT_4_TURBO": - model_type = "gpt-4-1106-preview" + model_type = "gpt-4-turbo" + elif model_type == "GPT_4O": + model_type = "gpt-4o" + elif model_type == "GPT_4O_MINI": + model_type = "gpt-4o-mini" # print("model_type:", model_type) lines = open(log_filepath, "r", encoding="utf8").read().split("\n") diff --git a/ecl/ece.py b/ecl/ece.py new file mode 100644 index 000000000..e7c615bda --- /dev/null +++ b/ecl/ece.py @@ -0,0 +1,146 @@ + +import os +import json +import re +import numpy as np +import argparse +point = 0.95 +eliminate_threshold = 0.95 + + +def retrieve_eliminate(Path_directory,UsedMemory_directory,Evolved_directory): + experiences_use = [] + content = [] + content1 = [] + experiences_total = [] + usetime_total = [] + exp_dict = {} + eliminated_exp = [] + + directories = [os.path.join(Path_directory, d) for d in os.listdir(Path_directory) if os.path.isdir(os.path.join(Path_directory, d))] + for subdir in directories: + directory = subdir + logdir = [filename for filename in os.listdir(directory) if filename.endswith(".log")] + logdir = os.path.join(directory, logdir[0]) + content1 = open(logdir, "r", encoding='UTF-8').read() + + pattern1 = re.compile(r'the source code MIDs is (.*?),', re.S) + experiences_sourceMIDs = re.findall(pattern1, content1) + pattern2 = re.compile(r'the target code MIDs is (.*?)\n',re.S) + experiences_targetMIDs = re.findall(pattern2, content1) + pattern3 = re.compile(r'And the (.*?) similarity is',re.S) + experiences_type = re.findall(pattern3,content1) + for i in range(0,len(experiences_sourceMIDs)): + sourceMID = experiences_sourceMIDs[i] + targetMID = experiences_targetMIDs[i] + type = experiences_type[i] + experiences_use.append((sourceMID,targetMID,type)) + + with open(UsedMemory_directory) as file: + content1 = json.load(file) + new_content = [] + for memorypiece in content1: + experiences = memorypiece.get("experiences") + if experiences != None: + experiences_total.extend(experiences) + for experience in experiences: + experience["use_time"] = 0 + for experience in experiences_use: + for experience_t in experiences_total: + if experience[0] == experience_t["sourceMID"] and experience[1] == experience_t["targetMID"]: + experience_t["use_time"] += 1 + for i,experience_t in enumerate(experiences_total): + usetime_total.append(experience_t["use_time"]) + exp_dict[i] = experience_t["use_time"] + file.close() + + usetime_sort = sorted(usetime_total)[::-1] + total = np.sum(usetime_sort) + for i in range(len(usetime_sort)): + if np.sum(usetime_sort[:i])/total >= point: + # print("α:",i) + alpha= i + break + index=0 + for k in sorted(exp_dict,key=exp_dict.__getitem__,reverse=True): + if index <= alpha: + eliminated_exp.append(experiences_total[k]) + index += 1 + else: + break + + for memorypiece in content1: + experiences = memorypiece.get("experiences") + retrieve_eliminated_experienceList = [] + if experiences != None: + for experience in experiences: + if experience in eliminated_exp: + retrieve_eliminated_experienceList.append(experience) + + memorypiece["experiences"] = retrieve_eliminated_experienceList + new_content.append(memorypiece) + + with open(Evolved_directory, 'w') as file: + json.dump(new_content, file) + + +# Quality score gain Elimination +def gain_eliminate(NewMemory_directory,Evolved_directory): + content2 = [] + with open(NewMemory_directory) as file: + content2 = json.load(file) + new_content2 = [] + for memorypiece in content2: + experiences = memorypiece.get("experiences") + gain_eliminated_experienceList = [] + + if experiences != None: + # print("origin:", len(experiences)) + for experience in experiences: + valueGain = experience.get("valueGain") + # print(valueGain) + if valueGain >= eliminate_threshold: + gain_eliminated_experienceList.append(experience) + # print(len(experiences)) + memorypiece["experiences"] = gain_eliminated_experienceList + new_content2.append(memorypiece) + else: + new_content2.append(memorypiece) + file.close() + + with open(Evolved_directory, 'r') as file: + new_content = json.load(file) + + new_content = new_content + new_content2 + + with open(Evolved_directory, 'w') as file: + json.dump(new_content, file) + + + +def recount_experience(Evolved_directory): + with open(Evolved_directory, 'r') as file: + content = json.load(file) + + with open(Evolved_directory, 'w') as file: + i = 0 + for memorypiece in content: + memorypiece["total"] = i + i += 1 + json.dump(content, file) + +def main(): + parser = argparse.ArgumentParser(description="Process memory with some directories.") + parser.add_argument("Path_directory", type = str, help="The directory of software") + parser.add_argument("UsedMemory_directory", type=str, help="The directory of MemoryCards") + parser.add_argument("NewMemory_directory", type=str, help="The directory of NewMemoryCards") + parser.add_argument("Evolved_directory", type= str, help="The directory for output") + + + args = parser.parse_args() + retrieve_eliminate(args.Path_directory,args.UsedMemory_directory,args.Evolved_directory) + gain_eliminate(args.NewMemory_directory,args.Evolved_directory) + recount_experience(args.Evolved_directory) + +if __name__ == "__main__": + main() diff --git a/ecl/utils.py b/ecl/utils.py index cb11cda1e..184d90b8f 100644 --- a/ecl/utils.py +++ b/ecl/utils.py @@ -65,6 +65,8 @@ def calc_max_token(messages, model): "gpt-4": 8192, "gpt-4-0613": 8192, "gpt-4-32k": 32768, + "gpt-4o": 4096, #100000 + "gpt-4o-mini": 16384, #100000 } num_max_token = num_max_token_map[model] num_max_completion_tokens = num_max_token - num_prompt_tokens @@ -136,6 +138,8 @@ def run(self, messages) : "gpt-4": 8192, "gpt-4-0613": 8192, "gpt-4-32k": 32768, + "gpt-4o": 4096, #100000 + "gpt-4o-mini": 16384, #100000 } response = client.chat.completions.create(messages = messages, model = "gpt-3.5-turbo-16k", diff --git a/misc/CommandDash.png b/misc/CommandDash.png new file mode 100644 index 000000000..fcae4bda6 Binary files /dev/null and b/misc/CommandDash.png differ diff --git a/misc/ebook.png b/misc/ebook.png new file mode 100644 index 000000000..68d9d1b09 Binary files /dev/null and b/misc/ebook.png differ diff --git a/misc/ier.png b/misc/ier.png new file mode 100644 index 000000000..ae8cb81e0 Binary files /dev/null and b/misc/ier.png differ diff --git a/misc/macnet.png b/misc/macnet.png new file mode 100644 index 000000000..5516d9e45 Binary files /dev/null and b/misc/macnet.png differ diff --git a/readme/README-Arabic.md b/readme/README-Arabic.md index e5e5456bc..9b3b96f1a 100644 --- a/readme/README-Arabic.md +++ b/readme/README-Arabic.md @@ -23,23 +23,23 @@ ## 🎉 أخبار -- **26 أكتوبر 2023: تم دعم ChatDev الآن بواسطة Docker للتنفيذ الآمن** (بفضل مساهمة من [ManindraDeMel](https://github.com/ManindraDeMel)). يرجى الرجوع إلى [دليل بدء Docker](wiki.md#docker-start). +- **26 أكتوبر 2023: تم دعم ChatDev الآن بواسطة Docker للتنفيذ الآمن** (بفضل مساهمة من [ManindraDeMel](https://github.com/ManindraDeMel)). يرجى الرجوع إلى [دليل بدء Docker](../wiki.md#docker-start).

-- 25 سبتمبر 2023: وضع **Git** متاح الآن، مما يتيح للمبرمج استخدام Git لمراقبة الإصدار. لتمكين هذه الميزة، قم ببساطة بتعيين ``"git_management"`` إلى ``"True"`` في ``ChatChainConfig.json``. راجع [الدليل](wiki.md#git-mode). +- 25 سبتمبر 2023: وضع **Git** متاح الآن، مما يتيح للمبرمج استخدام Git لمراقبة الإصدار. لتمكين هذه الميزة، قم ببساطة بتعيين ``"git_management"`` إلى ``"True"`` في ``ChatChainConfig.json``. راجع [الدليل](../wiki.md#git-mode).

- 20 سبتمبر 2023: وضع **تفاعل الإنسان مع الوكيل** متاح الآن! يمكنك المشاركة مع فريق ChatDev من خلال لعب دور المراجع وتقديم اقتراحات للمبرمج ; - جرب ``python3 run.py --task [وصف فكرتك] --config "Human"``. راجع [الدليل](wiki.md#human-agent-interaction) و[المثال](WareHouse/Gomoku_HumanAgentInteraction_20230920135038). + جرب ``python3 run.py --task [وصف فكرتك] --config "Human"``. راجع [الدليل](../wiki.md#human-agent-interaction) و[المثال](../WareHouse/Gomoku_HumanAgentInteraction_20230920135038).

- 1 سبتمبر 2023: وضع **الفن** متاح الآن! يمكنك تنشيط وكيل المصمم لإنشاء صور تستخدم في البرمجيات; - جرب ``python3 run.py --task [وصف فكرتك] --config "Art"``. راجع [الدليل](wiki.md#art) و[المثال](WareHouse/gomokugameArtExample_THUNLP_20230831122822). + جرب ``python3 run.py --task [وصف فكرتك] --config "Art"``. راجع [الدليل](../wiki.md#art) و[المثال](../WareHouse/gomokugameArtExample_THUNLP_20230831122822). - 28 أغسطس 2023: النظام متاح الآن للجمهور. - 17 أغسطس 2023: الإصدار v1.0.0 كان جاهزًا للإصدار. - 30 يوليو 2023: يمكن للمستخدمين تخصيص إعدادات ChatChain و Phase و Role. بالإضافة إلى ذلك، يتم دعم وضع السجل الأونلاين ووضع الاستعادة @@ -140,11 +140,11 @@ ### 🐳 بدء سريع باستخدام Docker -- نشكر [ManindraDeMel](https://github.com/ManindraDeMel) على دعم Docker. يرجى الرجوع إلى [دليل بدء Docker](wiki.md#docker-start). +- نشكر [ManindraDeMel](https://github.com/ManindraDeMel) على دعم Docker. يرجى الرجوع إلى [دليل بدء Docker](../wiki.md#docker-start). ## ✨️ مهارات متقدمة -لمزيد من المعلومات التفصيلية، يرجى الرجوع إلى [ويكي](wiki.md) لدينا، حيث يمكنك العثور على: +لمزيد من المعلومات التفصيلية، يرجى الرجوع إلى [ويكي](../wiki.md) لدينا، حيث يمكنك العثور على: - مقدمة إلى جميع معلمات تشغيل الأوامر. - دليل مباشر لإعداد عرض ويب محلي، يشمل سجلات مرئية محسنة وعرض تكراري وأداة بصرية بسيطة لـ ChatChain. @@ -159,7 +159,7 @@ **الكود**: نحن متحمسون لاهتمامك بالمشاركة في مشروعنا مفتوح المصدر. إذا واجهت أي مشاكل، فلا تتردد في الإبلاغ عنها. لا تتردد في إنشاء طلب استدراج إذا كان لديك أي استفسارات أو إذا كنت مستعدًا لمشاركة عملك معنا! تقديرنا الكبير لمساهماتك. يرجى إعلامي إذا كان هناك أي شيء آخر تحتاجه! -**الشركة**: إنشاء "شركة ChatDev" المخصصة الخاصة بك أمر سهل. يتضمن هذا الإعداد الشخصي ثلاثة ملفات JSON تكوينية بسيطة. تحقق من المثال المقدم في دليل "CompanyConfig/Default". للتعليمات التفصيلية حول التخصيص، يرجى الرجوع إلى [ويكي](wiki.md) لدينا. +**الشركة**: إنشاء "شركة ChatDev" المخصصة الخاصة بك أمر سهل. يتضمن هذا الإعداد الشخصي ثلاثة ملفات JSON تكوينية بسيطة. تحقق من المثال المقدم في دليل "CompanyConfig/Default". للتعليمات التفصيلية حول التخصيص، يرجى الرجوع إلى [ويكي](../wiki.md) لدينا. **البرمجيات**: في كل مرة تطوّر فيها برمجيات باستخدام ChatDev، يتم إنشاء مجلد مقابل يحتوي على جميع المعلومات الأساسية. مشاركة عملك معنا بسيطة مثل إنشاء طلب استدراج. إليك مثال: قم بتنفيذ الأمر "python3 run.py --task 'تصميم لعبة 2048' --name '2048' --org 'THUNLP' --config 'Default'". سيتم بذلك إنشاء حزمة برمجية وإنشاء مجلد بالاسم "/WareHouse/2048_THUNLP_timestamp". بداخله، ستجد: @@ -168,7 +168,7 @@ - سجل شامل يوثق عملية بناء البرمجية يمكن استخدامه للعب المسجل (timestamp.log) - الاستفهام الأولي المستخدم لإنشاء هذه البرمجية (2048.prompt) -**راجع البرمجيات المساهمة من قبل المجتمع [هنا](Contribution.md)!** +**راجع البرمجيات المساهمة من قبل المجتمع [هنا](../Contribution.md)!** ## 👨‍💻‍ مساهمون @@ -216,4 +216,4 @@ ## 📬 اتصل بنا -إذا كان لديك أي أسئلة أو تعليقات أو ترغب في التواصل معنا، فلا تتردد في الوصول إلينا عبر البريد الإلكتروني على [chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) +إذا كان لديك أي أسئلة أو تعليقات أو ترغب في التواصل معنا، فلا تتردد في الوصول إلينا عبر البريد الإلكترون [qianc62@gmail.com](mailto:qianc62@gmail.com) diff --git a/readme/README-Bahasa-Indonesia.md b/readme/README-Bahasa-Indonesia.md index 24f292318..28a6bbfe7 100644 --- a/readme/README-Bahasa-Indonesia.md +++ b/readme/README-Bahasa-Indonesia.md @@ -22,19 +22,19 @@ ## 🎉 Berita -- **26 Oktober 2023: ChatDev kini didukung oleh Docker untuk eksekusi yang aman** (berkat kontribusi dari [ManindraDeMel](https://github.com/ManindraDeMel)). Silakan lihat [Panduan Memulai Docker](wiki.md#memulai-docker). +- **26 Oktober 2023: ChatDev kini didukung oleh Docker untuk eksekusi yang aman** (berkat kontribusi dari [ManindraDeMel](https://github.com/ManindraDeMel)). Silakan lihat [Panduan Memulai Docker](../wiki.md#memulai-docker).

-- 25 September 2023: Mode **Git** kini tersedia, memungkinkan programmer untuk menggunakan Git untuk kontrol versi. Untuk mengaktifkan fitur ini, cukup atur ``"git_management"`` menjadi ``"True"`` di ``ChatChainConfig.json``. Lihat [panduan](wiki.md#mode-git). +- 25 September 2023: Mode **Git** kini tersedia, memungkinkan programmer untuk menggunakan Git untuk kontrol versi. Untuk mengaktifkan fitur ini, cukup atur ``"git_management"`` menjadi ``"True"`` di ``ChatChainConfig.json``. Lihat [panduan](../wiki.md#mode-git).

-- 20 September 2023: Mode **Interaksi Manusia-Agen** kini tersedia! Anda dapat terlibat dengan tim ChatDev dengan memainkan peran reviewer dan memberikan saran kepada programmer ; coba ``python3 run.py --task [deskripsi_ide_anda] --config "Manusia"``. Lihat [panduan](wiki.md#interaksi-manusia-agen) dan [contoh](WareHouse/Gomoku_HumanAgentInteraction_20230920135038). +- 20 September 2023: Mode **Interaksi Manusia-Agen** kini tersedia! Anda dapat terlibat dengan tim ChatDev dengan memainkan peran reviewer dan memberikan saran kepada programmer ; coba ``python3 run.py --task [deskripsi_ide_anda] --config "Manusia"``. Lihat [panduan](../wiki.md#interaksi-manusia-agen) dan [contoh](../WareHouse/Gomoku_HumanAgentInteraction_20230920135038).

-- 1 September 2023: Mode **Seni** kini tersedia! Anda dapat mengaktifkan agen desainer untuk menghasilkan gambar yang digunakan dalam perangkat lunak; coba ``python3 run.py --task [deskripsi_ide_anda] --config "Seni"``. Lihat [panduan](wiki.md#seni) dan [contoh](WareHouse/gomokugameArtExample_THUNLP_20230831122822). +- 1 September 2023: Mode **Seni** kini tersedia! Anda dapat mengaktifkan agen desainer untuk menghasilkan gambar yang digunakan dalam perangkat lunak; coba ``python3 run.py --task [deskripsi_ide_anda] --config "Seni"``. Lihat [panduan](../wiki.md#seni) dan [contoh](../WareHouse/gomokugameArtExample_THUNLP_20230831122822). - 28 Agustus 2023: Sistem tersedia untuk publik. - 17 Agustus 2023: Versi v1.0.0 siap untuk dirilis. - 30 Juli 2023: Pengguna dapat menyesuaikan pengaturan ChatChain, Fase, dan Peran. Selain itu, mode Log online dan mode pemutaran kini didukung. @@ -134,11 +134,11 @@ Untuk memulai, ikuti langkah-langkah berikut: ### 🐳 Memulai dengan Docker -- Kami berterima kasih kepada [ManindraDeMel](https://github.com/ManindraDeMel) atas dukungan Docker. Silakan lihat [Panduan Memulai Docker](wiki.md#memulai-docker). +- Kami berterima kasih kepada [ManindraDeMel](https://github.com/ManindraDeMel) atas dukungan Docker. Silakan lihat [Panduan Memulai Docker](../wiki.md#memulai-docker). ## ✨️ Keterampilan Lanjutan -Untuk informasi lebih rinci, silakan merujuk ke [Wiki](wiki.md) kami, di mana Anda dapat menemukan: +Untuk informasi lebih rinci, silakan merujuk ke [Wiki](../wiki.md) kami, di mana Anda dapat menemukan: - Pengantar untuk semua parameter jalankan perintah. - Panduan yang mudah untuk menyiapkan demo web lokal, yang mencakup log visual yang ditingkatkan, demo pemutaran, dan Visualizer ChatChain sederhana. @@ -153,7 +153,7 @@ Untuk informasi lebih rinci, silakan merujuk ke [Wiki](wiki.md) kami, di mana An **Kode**: Kami sangat antusias tentang minat Anda untuk berpartisipasi dalam proyek sumber terbuka kami. Jika Anda mengalami masalah, jangan ragu untuk melaporkannya. Jangan ragu untuk membuat permintaan tarik (pull request) jika Anda memiliki pertanyaan atau jika Anda siap untuk berbagi pekerjaan Anda dengan kami! Kontribusi Anda sangat dihargai. Tolong beri tahu saya jika ada yang perlu Anda bantu! -**Perusahaan**: Membuat "Perusahaan ChatDev" khusus Anda sendiri sangat mudah. Penyiapan ini melibatkan tiga file JSON konfigurasi sederhana. Lihat contoh yang disediakan dalam direktori ``CompanyConfig/Default``. Untuk petunjuk lebih rinci tentang penyesuaian, lihat [Wiki](wiki.md) kami. +**Perusahaan**: Membuat "Perusahaan ChatDev" khusus Anda sendiri sangat mudah. Penyiapan ini melibatkan tiga file JSON konfigurasi sederhana. Lihat contoh yang disediakan dalam direktori ``CompanyConfig/Default``. Untuk petunjuk lebih rinci tentang penyesuaian, lihat [Wiki](../wiki.md) kami. **Perangkat Lunak**: Setiap kali Anda mengembangkan perangkat lunak menggunakan ChatDev, folder yang sesuai akan dihasilkan yang berisi semua informasi penting. Berbagi pekerjaan Anda dengan kami sama mudahnya seperti membuat permintaan tarik. Berikut contohnya: jalankan perintah ``python3 run.py --task "mendesain game 2048" --name "2048" --org "THUNLP" --config "Default"``. Ini akan membuat paket perangkat lunak dan menghasilkan folder bernama ``/WareHouse/2048_THUNLP_timestamp``. Di dalamnya, Anda akan menemukan: @@ -162,7 +162,7 @@ Untuk informasi lebih rinci, silakan merujuk ke [Wiki](wiki.md) kami, di mana An - Log komprehensif yang mendetailkan proses pembangunan perangkat lunak yang dapat digunakan untuk pemutaran (``timestamp.log``) - Prompt awal yang digunakan untuk membuat perangkat lunak ini (``2048.prompt``) -**Lihat perangkat lunak yang telah disumbangkan oleh komunitas [di sini](Contribution.md)!** +**Lihat perangkat lunak yang telah disumbangkan oleh komunitas [di sini](../Contribution.md)!** ## 👨‍💻‍ Kontributor @@ -209,4 +209,4 @@ Dibuat dengan [contrib.rocks](https://contrib.rocks). ## 📬 Kontak -Jika Anda memiliki pertanyaan, umpan balik, atau ingin menghubungi kami, jangan ragu untuk menghubungi kami melalui email di [chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) +Jika Anda memiliki pertanyaan, umpan balik, atau ingin menghubungi kami, jangan ragu untuk menghubungi kami melalui email di [qianc62@gmail.com](mailto:qianc62@gmail.com) diff --git a/readme/README-Chinese.md b/readme/README-Chinese.md index c30076845..ff796e987 100644 --- a/readme/README-Chinese.md +++ b/readme/README-Chinese.md @@ -127,7 +127,7 @@ https://github.com/OpenBMB/ChatDev/assets/11889052/80d01d2f-677b-4399-ad8b-f7af9 ## 🐳 通过Docker执行ChatDev -- 我们感谢 [ManindraDeMel](https://github.com/ManindraDeMel) 提供Docker的支持。具体请参照 [Docker指南](wiki.md#docker-start) 使用。 +- 我们感谢 [ManindraDeMel](https://github.com/ManindraDeMel) 提供Docker的支持。具体请参照 [Docker指南](../wiki.md#docker-start) 使用。 ## ✨️ 进阶技能 @@ -208,4 +208,4 @@ request一样简单。这是一个示例:执行命令`python3 run.py --task "d ## 联系方式 -如果您有任何问题、反馈意见或想要联系我们,欢迎随时通过电子邮件与我们联系: [chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) +如果您有任何问题、反馈意见或想要联系我们,欢迎随时通过电子邮件与我们联系: [qianc62@gmail.com](mailto:qianc62@gmail.com) diff --git a/readme/README-Dutch.md b/readme/README-Dutch.md index 5bc1df824..ce04147c0 100644 --- a/readme/README-Dutch.md +++ b/readme/README-Dutch.md @@ -166,4 +166,4 @@ Voor meer gedetailleerde informatie, verwijzen wij u graag naar onze [Wiki](../w ## 📬 Contact -Als je vragen hebt, feedback wilt geven, of contact met ons wilt opnemen, aarzel dan niet om ons te mailen op [chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) +Als je vragen hebt, feedback wilt geven, of contact met ons wilt opnemen, aarzel dan niet om ons te mailen op [qianc62@gmail.com](mailto:qianc62@gmail.com) diff --git a/readme/README-Filipino.md b/readme/README-Filipino.md index ec8ccacda..6f4c95cc1 100644 --- a/readme/README-Filipino.md +++ b/readme/README-Filipino.md @@ -167,4 +167,4 @@ primaryClass={cs.SE} ## Makipag-ugnay -Kung mayroon kang anumang mga tanong, puna, o nais makipag-ugnay, huwag kang mag-atubiling makipag-ugnay sa amin sa pamamagitan ng email sa [chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) +Kung mayroon kang anumang mga tanong, puna, o nais makipag-ugnay, huwag kang mag-atubiling makipag-ugnay sa amin sa pamamagitan ng email sa [qianc62@gmail.com](mailto:qianc62@gmail.com) diff --git a/readme/README-French.md b/readme/README-French.md index c1b46d43a..1b98f338d 100644 --- a/readme/README-French.md +++ b/readme/README-French.md @@ -188,4 +188,4 @@ de ``CompanyConfig/Default`` ## Contact -Si vous avez des questions, des retours ou souhaitez nous contacter, n'hésitez pas à nous envoyer un email à [chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) +Si vous avez des questions, des retours ou souhaitez nous contacter, n'hésitez pas à nous envoyer un email à [qianc62@gmail.com](mailto:qianc62@gmail.com) diff --git a/readme/README-Hindi.md b/readme/README-Hindi.md index 7de59ad13..ec763515b 100644 --- a/readme/README-Hindi.md +++ b/readme/README-Hindi.md @@ -195,4 +195,4 @@ https://github.com/OpenBMB/ChatDev/assets/11889052/80d01d2f-677b-4399-ad8b-f7af9 ## 📬 संपर्क -यदि आपके पास कोई प्रश्न, प्रतिक्रिया है, या संपर्क करना चाहते हैं, तो कृपया बेझिझक हमें ईमेल के माध्यम से संपर्क करें [chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) +यदि आपके पास कोई प्रश्न, प्रतिक्रिया है, या संपर्क करना चाहते हैं, तो कृपया बेझिझक हमें ईमेल के माध्यम से संपर्क करें [qianc62@gmail.com](mailto:qianc62@gmail.com) diff --git a/readme/README-Japanese.md b/readme/README-Japanese.md index ca16678da..dbe297cef 100644 --- a/readme/README-Japanese.md +++ b/readme/README-Japanese.md @@ -145,7 +145,7 @@ ### 🐳 Docker のクイックスタート -- Docker のサポートを提供してくれた [ManindraDeMel](https://github.com/ManindraDeMel) に感謝します。[Docker スタートガイド](wiki.md#docker-start)を参照してください。 +- Docker のサポートを提供してくれた [ManindraDeMel](https://github.com/ManindraDeMel) に感謝します。[Docker スタートガイド](../wiki.md#docker-start)を参照してください。 ## ✨️ 高度なスキル @@ -228,4 +228,4 @@ ## 📬 お問い合わせ -ご質問、フィードバック、またはお問い合わせがある場合は、[chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) までお気軽にご連絡ください +ご質問、フィードバック、またはお問い合わせがある場合は、[qianc62@gmail.com](mailto:qianc62@gmail.com) までお気軽にご連絡ください diff --git a/readme/README-Korean.md b/readme/README-Korean.md index e67bc9e0a..af0f34234 100644 --- a/readme/README-Korean.md +++ b/readme/README-Korean.md @@ -4,24 +4,76 @@

-

- 【📚 Wiki | 🚀 Visualizer | 👥 Community Built Software | 🔧 Customization】 + 【📚 Wiki | 🚀 Visualizer | 👥 Community Built Software | 🔧 Customization | 👾 Discord】 +

## 📖 개요 -- **ChatDev**는 다양한 최고 경영자, 최고 기술 책임자, 프로그래머, 테스터 등 다양한 역할을 수행하는 **지능형 에이전트**들을 통해 운영되는 **가상 소프트웨어 회사**입니다. 여럿이서 조직 구조를 형성하고 "프로그래밍을 통해 디지털 세상을 혁신한다"는 사명을 가지고 있습니다. ChatDev 내 에이전트들은 디자인, 코딩, 테스트, 문서화를 진행하는 전문 기능 세미나에 참여하여 **협업**합니다. +- **ChatDev**는 다양한 최고 경영 책임자 , 최고 생산 책임자 , 최고 기술 책임자 , 프로그래머 , 리뷰어 , 테스터 , 아트 디자이너 와 같은 다양한 역할을 수행하는 **지능형 에이전트**들을 통해 운영되는 **가상 소프트웨어 회사**입니다. 여럿이서 조직 구조를 형성하고 "프로그래밍을 통해 디지털 세상을 혁신한다"는 사명을 가지고 있습니다. ChatDev 내 에이전트들은 디자인, 코딩, 테스트, 문서화를 진행하는 전문 기능 세미나에 참여하여 **협업**합니다. - ChatDev의 주요 목표는 **사용하기 쉽고**, **개조할 수 있으며**, **확장 가능한** 프레임워크를 제공하는 것입니다. 대규모 언어 모델(LLM)을 기반으로 하며 집단 지성을 연구하는 데 이상적인 시나리오를 제공합니다. -## 📰 뉴스 +

+ +

-* **2023년 9월 1일: Art 모드가 출시되었습니다! ``python3 run.py --config "Art"``로 소프트웨어에서 사용되는 이미지를 생성해보세요.** [예제](../WareHouse/gomokugameArtExample_THUNLP_20230831122822)를 참조하세요. -* 2023년 8월 28일: 시스템이 공개되었습니다. -* 2023년 8월 17일: V1.0.0 버전 출시 준비가 완료되었습니다. -* 2023년 7월 30일: 사용자가 ChatChain, Phase 및 Role을 설정할 수 있습니다. 또한, Online Log 모드와 Replay 모드가 지원됩니다. -* 2023년 7월 16일: 이 프로젝트와 관련된 [출판 전 논문](https://arxiv.org/abs/2307.07924)이 게시되었습니다. -* 2023년 6월 30일: `ChatDev` 리포지토리의 초기 버전이 공개되었습니다. +## 📰 뉴스 +* **2024년 6월 25일: 🎉ChatDev 팀은LLM 기반의 다중 에이전트 협업🤖🤖 및 관련 분야의 발전을 도모하기 위해, [오픈소스](https://github.com/OpenBMB/ChatDev/tree/main/MultiAgentEbook) 대화형 e-book📚 형식으로 제공되는 중요한 논문 모음📄을 선별했습니다. 이제 [Ebook 웹사이트](https://thinkwee.top/multiagent_ebook)에서 최신 발전 사항을 탐색하고 [논문 목록](https://github.com/OpenBMB/ChatDev/blob/main/MultiAgentEbook/papers.csv)을 다운로드할 수 있습니다.** +

+ +

+* 2024년 6월 12일: 언어 상호 작용을 통한 에이전트 간의 효과적인 작업 지향 협업을 용이하게 하기 위해 방향 비순환 그래프를 활용하는 다중 에이전트 협업 네트워크(MacNet) 🎉을 소개합니다. MacNet은 컨텍스트 제한을 초과하지 않고 다양한 위상과 천 개 이상의 에이전트 간 협력을 지원합니다. 보다 다용도적이고 확장 가능한 MacNet은 ChatDev의 체인 모양 토폴로지의 보다 고급 버전으로 간주될 수 있습니다. 사전 인쇄 논문은 [https://arxiv.org/abs/2406.07155 ](https://arxiv.org/abs/2406.07155) 에서 제공됩니다. 이 기술은 곧 이 저장소에 통합되어 다양한 조직 구조에 대한 지원을 강화하고 소프트웨어 개발(예: 논리 추론, 데이터 분석, 스토리 생성 등)을 넘어 더 풍부한 솔루션을 제공할 것입니다. +

+ +

+ +
+오래된 뉴스 + +* 2024년 5월 7일, 강사와 보조 에이전트에 단축된 경험을 향상시켜 새로운 작업에 효율적으로 적응하는 새로운 방법인 "Iterative Experience Refinement"(IER)(반복적 경험 개선)을 소개합니다. 이 접근 방식은 일련의 작업들에서 경험, 활용, 전달 및 제거를 포함합니다. 사전 인쇄 논문은 https://arxiv.org/abs/2405.04219 에서 제공되며 이 기술은 곧 ChatDev에 통합될 예정입니다. +

+ +

+ +* 2024년 1월 25일: ChatDev에 체혐형 공동학습 모듈을 통합하였습니다. [체험형 공동학습 가이드](../wiki.md#co-tracking)를 확인하세요. + +* 2023년 12월 28일: 강사와 보조 에이전트가 단축형 경험을 축적하여 새로운 작업을 효과적으로 해결하고 반복적인 오류를 줄이고 효율성을 향상시키는 혁신적인 접근 방식인 Experience Co-Learning을 소개합니다. 사전 인쇄된 논문은 https://arxiv.org/abs/2312.17025 에서 확인할 수 있고, 곧 ChatDev에 통합될 것입니다. +

+ +

+ +* 2023년 11월 15일: 소프트웨어 개발자와 혁신적인 기업가들이 매우 저렴한 비용과 진입 장벽으로 소프트웨어를 효율적으로 구축할 수 있도록 하는 SaaS 플랫폼으로 ChatDev를 출시했습니다. https://chatdev.modelbest.cn/ 에서 시도하세요. +

+ +

+ +* 2023년 11월 2일: ChatDev는 에이전트가 기존 코드를 기반으로 개발할 수 있는 새로운 기능을 지원합니다. `--config "incremental" --path "path "[source_code_directory_path]"`를 시도하세요. +

+ +

+ +* 2023년 10월 26일: ChatDev는 [ManindraDeMel](https://github.com/ManindraDeMel)의 기여 덕분에 Docker를 지원합니다. [도커 시작 가이드](.../wiki.md#docker-start)를 참조하세요. +

+ +

+* 2023년 9월 25일: **Git** 모드가 출시되었으며, 프로그래머 가 Git 버전 제어를 사용할 수 있습니다. 이 기능을 사용하려면 ``ChatChainConfig.json`` 에서 ``"git_management"`` 를 ``"True"`` 로 설정해야 합니다 . [가이드](../wiki.md#git-mode)를 참조하세요. +

+ +

+- 2023년 9월 20일: **Human-Agent-Interaction** 모드가 출시되었습니다! 검토자 역할을 수행하고 프로그래머에게 제안하여 ChatDev 팀에 참여할 수 있습니다; + ``python3 run.py --task [description_of_your_idea] --config "Human"``. [가이드](../wiki.md#human-agent-interaction)와 [예제](../WareHouse/Gomoku_HumanAgentInteraction_20230920135038)를 참조하세요. +

+ +

+- 2023년 9월 1일: **Art** 모드가 출시되었습니다! 디자이너 에이전트를 활성화하여 소프트웨어에서 사용되는 이미지를 생성해보세요; + ``python3 run.py --task [description_of_your_idea] --config "Art"``. [가이드](../wiki.md#art)와 [예제](../WareHouse/gomokugameArtExample_THUNLP_20230831122822)를 참조하세요. +- 2023년 8월 28일: 시스템이 공개되었습니다. +- 2023년 8월 17일: V1.0.0 버전 출시 준비가 완료되었습니다. +- 2023년 7월 30일: 사용자가 ChatChain, Phase 및 Role을 설정할 수 있습니다. 또한, Online Log 모드와 Replay 모드가 지원됩니다. +- 2023년 7월 16일: 이 프로젝트와 관련된 [출판 전 논문](https://arxiv.org/abs/2307.07924)이 게시되었습니다. +- 2023년 6월 30일: `ChatDev` 리포지토리의 초기 버전이 공개되었습니다. +
## ❓ ChatDev는 무엇을 할 수 있나요? @@ -31,6 +83,12 @@ https://github.com/OpenBMB/ChatDev/assets/11889052/80d01d2f-677b-4399-ad8b-f7af9 ## ⚡️ 시작하기 +### 💻️ 웹을 이용하여 시작하기 + +시각화와 구성을 위한 웹 페이지 접근: https://chatdev.modelbest.cn/ + +### 🖥️ 터미널을 이용하여 시작하기 + 시작하려면 다음 단계를 따르세요: 1. **GitHub 리포지터리 복제:** 다음 명령을 사용하여 리포지토리를 복제하세요: @@ -76,7 +134,11 @@ https://github.com/OpenBMB/ChatDev/assets/11889052/80d01d2f-677b-4399-ad8b-f7af9 cd WareHouse/project_name_DefaultOrganization_timestamp python main.py ``` - + +### 🐳 도커를 이용하여 시작하기 + +- [ManindraDeMel](https://github.com/ManindraDeMel)의 도커 지원에 감사드립니다. [도커 시작 가이드](../wiki.md#docker-start)를 참조하세요. + ## ✨️ 심화 스킬 [위키](../wiki.md)에서 아래 더 자세한 정보를 확인할 수 있습니다: @@ -105,50 +167,41 @@ https://github.com/OpenBMB/ChatDev/assets/11889052/80d01d2f-677b-4399-ad8b-f7af9 **커뮤니티에서 기여한 소프트웨어를 보려면 [여기](../Contribution.md)를 참조해주세요!** -### 소프트웨어 기여자 - -Contributor -Contributor -Contributor -Contributor -Contributor -Contributor -Contributor -Contributor -Contributor -Contributor -Contributor -Contributor -Contributor +## 👨‍💻‍ 기여자 + + + + + +Made with [contrib.rocks](https://contrib.rocks). ## 📑 인용 문구 ``` -@misc{qian2023communicative, - title={Communicative Agents for Software Development}, - author={Chen Qian and Xin Cong and Wei Liu and Cheng Yang and Weize Chen and Yusheng Su and Yufan Dang and Jiahao Li and Juyuan Xu and Dahai Li and Zhiyuan Liu and Maosong Sun}, - year={2023}, - eprint={2307.07924}, - archivePrefix={arXiv}, - primaryClass={cs.SE} -} - -@misc{qian2023experiential, - title={Experiential Co-Learning of Software-Developing Agents}, - author={Chen Qian and Yufan Dang and Jiahao Li and Wei Liu and Weize Chen and Cheng Yang and Zhiyuan Liu and Maosong Sun}, - year={2023}, - eprint={2312.17025}, - archivePrefix={arXiv}, - primaryClass={cs.CL} +@article{chatdev, + title = {ChatDev: Communicative Agents for Software Development}, + author = {Chen Qian and Wei Liu and Hongzhang Liu and Nuo Chen and Yufan Dang and Jiahao Li and Cheng Yang and Weize Chen and Yusheng Su and Xin Cong and Juyuan Xu and Dahai Li and Zhiyuan Liu and Maosong Sun}, + journal = {arXiv preprint arXiv:2307.07924}, + url = {https://arxiv.org/abs/2307.07924}, + year = {2023} } ``` ## ⚖️ 라이선스 -- ChatDev의 목적은 오로지 연구 목적입니다. -- 데이터 세트는 비상업적 용도로만 사용할 수 있는 CC BY NC 4.0에 따라 라이센스가 부여됩니다. 해당 데이터 세트를 사용하여 학습된 모델은 연구 목적 이외의 용도로 사용해서는 안 된다는 점에 유의하세요. +- 소스코드 라이선스: ChatDev의 소스코드는 아파치 2.0 라이선스가 부여되어 있습니다. 아파치 2.0 라이선스에 명시된 특정 조건에 따라 코드의 사용, 수정 및 배포를 허용합니다. +- 데이터 라이선스: ChatDev에 사용되는 관련 데이터는 CC BY-NC 4.0라이선스가 부여되어 있습니다. 이 라이선스는 데이터의 비상업적 사용을 명시적으로 허용합니다. 이러한 데이터 세트를 사용하여 훈련된 모든 모델은 비상업적 사용 제한을 철저히 준수해야 하며 연구 목적으로만 사용되어야 한다는 점을 강조하고 싶습니다. + + +## 🤝 감사의 말 +   +   +   + + + ## 연락처 -질문, 피드백 또는 저희와 연락을 원하시면 언제든지 이메일로 연락 주십시오: [chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) +질문, 피드백 또는 저희와 연락을 원하시면 언제든지 이메일로 연락 주십시오: [qianc62@gmail.com](mailto:qianc62@gmail.com) diff --git a/readme/README-Portuguese.md b/readme/README-Portuguese.md index 9421ecd07..fd1fc56ce 100644 --- a/readme/README-Portuguese.md +++ b/readme/README-Portuguese.md @@ -183,4 +183,4 @@ Para obter informações mais detalhadas, consulte nossa Wiki, onde você pode e ## 📬 Contato -Se você tiver alguma dúvida, feedback ou gostaria de entrar em contato, não hesite em nos enviar um e-mail para [chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) +Se você tiver alguma dúvida, feedback ou gostaria de entrar em contato, não hesite em nos enviar um e-mail para [qianc62@gmail.com](mailto:qianc62@gmail.com) diff --git a/readme/README-Russian.md b/readme/README-Russian.md new file mode 100644 index 000000000..d5d706666 --- /dev/null +++ b/readme/README-Russian.md @@ -0,0 +1,222 @@ +# Коммуникативные агенты для разработки программного обеспечения + +

+ +

+ + +

+ 【📚 Wiki | 🚀 Визуализатор | 👥 ПО от сообщества | 🔧 Кастомизация | 👾 Discord】 + +

+ +## 📖 Обзор + +- **ChatDev** представляет собой **виртуальную программную компанию**, работающую через различные **интеллектуальные агенты**, выполняющие разные роли, включая Генерального директора , Главного продуктового директора , Главного технолога , программиста , рецензента , тестировщика , арт-дизайнера . Эти агенты формируют многогранную организационную структуру и объединены миссией «провести революцию цифрового мира через программирование». Агенты внутри ChatDev **сотрудничают**, участвуя в специализированных функциональных семинарах, включая задачи по проектированию, кодингу, тестированию и документированию. +- Основная цель ChatDev — предложить **простой в использовании**, **высоконастраиваемый** и **расширяемый** фреймворк, основанный на больших языковых моделях (LLMs), который служит идеальным сценарием для изучения коллективного интеллекта. + +

+ +

+ +## 🎉 Новости +* **25 июня 2024 года:** 🎉 Для содействия развитию в области многогранного сотрудничества на основе LLM 🤖🤖 и смежных областях команда ChatDev собрала коллекцию основополагающих статей 📄, представленных в [открытом исходном](https://github.com/OpenBMB/ChatDev/tree/main/MultiAgentEbook) интерактивном электронном формате 📚. Теперь вы можете исследовать последние достижения на [веб-сайте электронной книги](https://thinkwee.top/multiagent_ebook) и скачать [список статей](https://github.com/OpenBMB/ChatDev/blob/main/MultiAgentEbook/papers.csv) . +

+ +

+* 12 июня 2024 года: Мы представляем Сети Многогранного Сотрудничества (MacNet) 🎉, которые используют направленные ациклические графы для эффективного выполнения задач в ходе лексических взаимодействий 🤖🤖. MacNet поддерживает сотрудничество среди различных топологий и более чем тысячи агентов без превышения ограничений контекста. Более универсальный и масштабируемый, MacNet можно считать более продвинутой версией топологии цепочки ChatDev. Наш препринт доступен по адресу [https://arxiv.org/abs/2406.07155](https://arxiv.org/abs/2406.07155). Эта техника скоро будет интегрирована в этот репозиторий, что расширит поддержку различных организационных структур и предложит более богатые решения за пределами разработки программного обеспечения (например, логическое рассуждение, анализ данных, генерация историй и другое). +

+ +

+ +
+Старые новости + +* 7 мая 2024 года: Мы представили «Итеративное уточнение опыта» (IER), новый метод, в котором агенты-инструкторы и ассистенты улучшают опыт, ориентированный на сокращение путей, для эффективного освоения новых задач. Этот подход охватывает приобретение, использование, распространение и устранение опыта в ходе выполнения задач. Наш препринт доступен по адресу https://arxiv.org/abs/2405.04219, и эта техника скоро будет интегрирована в ChatDev. +

+ +

+ +* 25 января 2024 года: Мы интегрировали Модуль Опытного Со-обучения в ChatDev. См. [Руководство по опытному со-обучению](../wiki.md#co-tracking). + +* 28 декабря 2023 года: Мы представляем Опытное Со-обучение, инновационный подход, в котором агенты-инструкторы и ассистенты накапливают опыт, ориентированный на сокращение путей, для эффективного решения новых задач, снижая количество повторяющихся ошибок и повышая эффективность. Ознакомьтесь с нашим препринтом по адресу https://arxiv.org/abs/2312.17025, и эта техника скоро будет интегрирована в ChatDev. +

+ +

+ +* 15 ноября 2023 года: Мы запустили ChatDev как платформу SaaS, которая позволяет разработчикам программного обеспечения и инновационным предпринимателям эффективно создавать программное обеспечение при очень низкой стоимости и барьере для входа. Попробуйте по адресу https://chatdev.modelbest.cn/. +

+ +

+ +* 2 ноября 2023 года: ChatDev теперь поддерживает новую функцию: инкрементальную разработку, которая позволяет агентам разрабатывать на основе существующего кода. Попробуйте `--config "incremental" --path "[source_code_directory_path]"`, чтобы начать. +

+ +

+ +* 26 октября 2023 года: ChatDev теперь поддерживает Docker для безопасного выполнения (благодаря вкладу [ManindraDeMel](https://github.com/ManindraDeMel)). См. [Руководство по запуску Docker](../wiki.md#docker-start). +

+ +

+* 25 сентября 2023 года: Теперь доступен режим **Git**, позволяющий программисту использовать Git для управления версиями. Чтобы включить эту функцию, просто установите ``"git_management"`` в ``"True"`` в ``ChatChainConfig.json``. См. [руководство](../wiki.md#git-mode). +

+ +

+- 20 сентября 2023 года: Теперь доступен режим **Human-Agent-Interaction**! Вы можете принять участие в работе команды ChatDev, сыграв роль рецензента и предоставив предложения программисту ; попробуйте ``python3 run.py --task [описание вашей идеи] --config "Human"``. См. [руководство](../wiki.md#human-agent-interaction) и [пример](../WareHouse/Gomoku_HumanAgentInteraction_20230920135038). +

+ +

+- 1 сентября 2023 года: Теперь доступен режим **Art**! Вы можете активировать агента-дизайнера +для генерации изображений, используемых в программном обеспечении; попробуйте ``python3 run.py --task [описание вашей идеи] --config "Art"``. См. [руководство](../wiki.md#art) и [пример](../WareHouse/gomokugameArtExample_THUNLP_20230831122822). +- 28 августа 2023 года: Система стала общедоступной. +- 17 августа 2023 года: Версия v1.0.0 была готова к выпуску. +- 30 июля 2023 года: Пользователи могут настроить параметры ChatChain, Phase и Role. Также теперь поддерживаются как онлайн режим журнала, так и режим воспроизведения. +- 16 июля 2023 года: Опубликован [препринт статьи](https://arxiv.org/abs/2307.07924), связанной с этим проектом. +- 30 июня 2023 года: Выпущена первоначальная версия репозитория ChatDev. +
+ +## ❓ Что может делать ChatDev? + +![intro](../misc/intro.png) + + + +## ⚡️ Быстрый старт + +### 💻 Быстрый старт с помощью веб-интерфейса + +Получите доступ к веб-странице для визуализации и настройки: https://chatdev.modelbest.cn/ + +### 🖥️ Быстрый старт с помощью терминала + +Для начала выполните следующие шаги: + +1. **Клонируйте репозиторий GitHub:** Начните с клонирования репозитория с помощью команды: + + ``` + git clone https://github.com/OpenBMB/ChatDev.git + ``` + +2. **Настройте окружение Python:** Убедитесь, что у вас установлена версия Python 3.9 или выше. Вы можете создать и активировать это окружение с помощью следующих команд, заменив `ChatDev_conda_env` на предпочитаемое имя окружения: + + ``` + conda create -n ChatDev_conda_env python=3.9 -y + conda activate ChatDev_conda_env + ``` + +3. **Установите зависимости:** Перейдите в каталог `ChatDev` и установите необходимые зависимости, выполнив: + + ``` + cd ChatDev + pip3 install -r requirements.txt + ``` + +4. **Настройте ключ API OpenAI:** Экспортируйте ваш ключ API OpenAI в качестве переменной окружения. Замените `"your_OpenAI_API_key"` на ваш реальный ключ API. Помните, что эта переменная окружения является специфичной для сессии, поэтому вам нужно будет установить её снова, если вы откроете новую сессию терминала. + На Unix/Linux: + + ``` + export OPENAI_API_KEY="your_OpenAI_API_key" + ``` + + На Windows: + + ``` + $env:OPENAI_API_KEY="your_OpenAI_API_key" + ``` + +5. **Разработайте ваше программное обеспечение:** Используйте следующую команду для начала разработки вашего программного обеспечения, заменив `[описание вашей идеи]` на описание вашей идеи и `[название проекта]` на желаемое название проекта: + На Unix/Linux: + + ``` + python3 run.py --task "[описание вашей идеи]" --name "[название проекта]" + ``` + + На Windows: + + ``` + python run.py --task "[описание вашей идеи]" --name "[название проекта]" + ``` + +6. **Запустите ваше программное обеспечение:** После генерации вы можете найти ваше программное обеспечение в каталоге `WareHouse` в конкретной папке проекта, например, `project_name_DefaultOrganization_timestamp`. Запустите ваше программное обеспечение с помощью следующей команды в этом каталоге: + На Unix/Linux: + + ``` + cd WareHouse/project_name_DefaultOrganization_timestamp + python3 main.py + ``` + + На Windows: + + ``` + cd WareHouse/project_name_DefaultOrganization_timestamp + python main.py + ``` + +### 🐳 Быстрый старт с помощью Docker + +- Мы благодарим [ManindraDeMel](https://github.com/ManindraDeMel) за предоставление поддержки Docker. См. [Руководство по запуску Docker](../wiki.md#docker-start). + +## ✨️ Расширенные возможности + +Для получения более подробной информации, пожалуйста, обратитесь к нашему [Wiki](../wiki.md), где вы найдете: + +- Введение во все параметры командного выполнения. +- Простой гид по настройке локального веб-дисплея, который может визуализировать журналы в реальном времени, воспроизведенные журналы и ChatChain. +- Обзор фреймворка ChatDev. +- Полное введение во все расширенные параметры конфигурации ChatChain. +- Руководства по настройке ChatDev, включая: + - ChatChain: Разработайте свой собственный процесс разработки программного обеспечения (или любой другой процесс), такой как ``DemandAnalysis -> Coding -> Testing -> Manual``. + - Phase: Разработайте свой собственный этап в ChatChain, например, ``DemandAnalysis``. + - Role: Определите различные роли в вашей компании, такие как ``Генеральный директор``. + +## 🤗 Поделитесь своим ПО + +**Код:** Мы рады вашему интересу к участию в нашем проекте с открытым исходным кодом. Если вы обнаружите какие-либо проблемы, не стесняйтесь сообщить об этом. Не стесняйтесь создавать запрос на внесение изменений, если у вас есть вопросы или вы готовы поделиться своей работой с нами! Ваши вклады очень ценятся. Пожалуйста, дайте знать, если вам нужна дополнительная помощь! + +**Компания:** Создание вашей собственной настроенной "Компании ChatDev" — это просто. Эта персонализированная настройка включает три простых конфигурационных JSON-файла. Ознакомьтесь с примером в директории ``CompanyConfig/Default``. Для получения подробных инструкций по настройке см. наш [Wiki](../wiki.md). + +**Программное обеспечение:** Каждый раз, когда вы разрабатываете программное обеспечение с помощью ChatDev, создается соответствующая папка, содержащая всю необходимую информацию. Поделиться вашей работой с нами так же просто, как сделать запрос на внесение изменений. Вот пример: выполните команду ``python3 run.py --task "design a 2048 game" --name "2048" --org "THUNLP" --config "Default"``. Это создаст пакет программного обеспечения и сгенерирует папку с именем ``/WareHouse/2048_THUNLP_timestamp``. Внутри вы найдете: + +- Все файлы и документы, относящиеся к программному обеспечению игры 2048 +- Конфигурационные файлы компании, ответственной за это программное обеспечение, включая три JSON конфигурационных файла из ``CompanyConfig/Default`` +- Полный журнал, детализирующий процесс создания программного обеспечения, который можно использовать для воспроизведения (``timestamp.log``) +- Начальный запрос, использованный для создания этого программного обеспечения (``2048.prompt``) + +**Посмотрите программное обеспечение, предоставленное сообществом [здесь](../Contribution.md)!** + +## 👨‍💻‍ Участники + + + + + +Сделано с помощью [contrib.rocks](https://contrib.rocks). + +## 🔎 Цитирование + +``` +@article{chatdev, + title = {ChatDev: Communicative Agents for Software Development}, + author = {Chen Qian and Wei Liu and Hongzhang Liu and Nuo Chen and Yufan Dang and Jiahao Li and Cheng Yang and Weize Chen and Yusheng Su and Xin Cong and Juyuan Xu and Dahai Li and Zhiyuan Liu and Maosong Sun}, + journal = {arXiv preprint arXiv:2307.07924}, + url = {https://arxiv.org/abs/2307.07924}, + year = {2023} +} +``` + +## ⚖️ Лицензия + +- Лицензирование исходного кода: Исходный код нашего проекта лицензирован по лицензии Apache 2.0. Эта лицензия разрешает использование, модификацию и распространение кода при соблюдении определенных условий, изложенных в лицензии Apache 2.0. +- Лицензирование данных: Связанные данные, используемые в нашем проекте, лицензированы по лицензии CC BY-NC 4.0. Эта лицензия явно разрешает некоммерческое использование данных. Мы хотим подчеркнуть, что любые модели, обученные с использованием этих наборов данных, должны строго соблюдать ограничение на некоммерческое использование и использоваться исключительно в исследовательских целях. + +## 🤝 Благодарности + +   +   +   + + + +## 📬 Контакты + +Если у вас есть какие-либо вопросы, отзывы или вы хотите связаться с нами, пожалуйста, не стесняйтесь обращаться к нам по электронной почте [qianc62@gmail.com](mailto:qianc62@gmail.com) diff --git a/readme/README-Slovak.md b/readme/README-Slovak.md index d577a82fa..8f6cbb216 100644 --- a/readme/README-Slovak.md +++ b/readme/README-Slovak.md @@ -193,4 +193,4 @@ vytvorí softvérový balík a vygeneruje priečinok s názvom ``/WareHouse/2048 ## 📬 Kontakt -Ak máte akékoľvek otázky, spätnú väzbu alebo by ste nás chceli kontaktovať, neváhajte nás kontaktovať e-mailom na adrese [chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) +Ak máte akékoľvek otázky, spätnú väzbu alebo by ste nás chceli kontaktovať, neváhajte nás kontaktovať e-mailom na adrese [qianc62@gmail.com](mailto:qianc62@gmail.com) diff --git a/readme/README-Spanish.md b/readme/README-Spanish.md index 1698741dd..681c88206 100644 --- a/readme/README-Spanish.md +++ b/readme/README-Spanish.md @@ -187,4 +187,4 @@ un paquete de software y generará una carpeta llamada ``/WareHouse/2048_THUNLP_ ## 📬 Contacto -Si tienes alguna pregunta, comentarios, o deseas ponerte en contacto, no dudes en enviarnos un correo electrónico a [chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) +Si tienes alguna pregunta, comentarios, o deseas ponerte en contacto, no dudes en enviarnos un correo electrónico a [qianc62@gmail.com](mailto:qianc62@gmail.com) diff --git a/readme/README-Turkish.md b/readme/README-Turkish.md index 31b28b93f..e3ca57bf3 100644 --- a/readme/README-Turkish.md +++ b/readme/README-Turkish.md @@ -21,19 +21,19 @@ ## 🎉 Haberler -- **26 Ekim 2023: ChatDev artık güvenli yürütme için Docker ile destekleniyor** (katkı sağlayan [ManindraDeMel](https://github.com/ManindraDeMel) sayesinde). Lütfen [Docker Başlangıç Kılavuzu'na](wiki.md#docker-start) bakınız. +- **26 Ekim 2023: ChatDev artık güvenli yürütme için Docker ile destekleniyor** (katkı sağlayan [ManindraDeMel](https://github.com/ManindraDeMel) sayesinde). Lütfen [Docker Başlangıç Kılavuzu'na](../wiki.md#docker-start) bakınız.

-- 25 Eylül 2023: **Git** modu artık kullanılabilir durumda, programcının sürüm kontrolü için Git'i kullanmasına izin verir. Bu özelliği etkinleştirmek için sadece ``ChatChainConfig.json`` içinde ``"git_management"`` değerini ``"True"`` olarak ayarlamanız yeterlidir. [Kılavuza](wiki.md#git-mode) bakınız. +- 25 Eylül 2023: **Git** modu artık kullanılabilir durumda, programcının sürüm kontrolü için Git'i kullanmasına izin verir. Bu özelliği etkinleştirmek için sadece ``ChatChainConfig.json`` içinde ``"git_management"`` değerini ``"True"`` olarak ayarlamanız yeterlidir. [Kılavuza](../wiki.md#git-mode) bakınız.

-- 20 Eylül 2023: **İnsan-Ajan-İletişimi** modu artık kullanılabilir! ChatDev ekibine katılarak inceleyici rolünü üstlenebilir ve programcıya önerilerde bulunabilirsiniz; ``python3 run.py --task [fikrinizin açıklaması] --config "İnsan"`` komutunu deneyin. [Kılavuza](wiki.md#human-agent-interaction) ve [örneğe](WareHouse/Gomoku_HumanAgentInteraction_20230920135038) bakınız. +- 20 Eylül 2023: **İnsan-Ajan-İletişimi** modu artık kullanılabilir! ChatDev ekibine katılarak inceleyici rolünü üstlenebilir ve programcıya önerilerde bulunabilirsiniz; ``python3 run.py --task [fikrinizin açıklaması] --config "İnsan"`` komutunu deneyin. [Kılavuza](../wiki.md#human-agent-interaction) ve [örneğe](../WareHouse/Gomoku_HumanAgentInteraction_20230920135038) bakınız.

-- 1 Eylül 2023: **Sanat** modu şimdi kullanılabilir! Yazılımda kullanılan görselleri oluşturmak için tasarımcı ajanını etkinleştirebilirsiniz; ``python3 run.py --task [fikrinizin açıklaması] --config "Sanat"`` komutunu deneyin. [Kılavuza](wiki.md#art) ve [örneğe](WareHouse/gomokugameArtExample_THUNLP_20230831122822) bakınız. +- 1 Eylül 2023: **Sanat** modu şimdi kullanılabilir! Yazılımda kullanılan görselleri oluşturmak için tasarımcı ajanını etkinleştirebilirsiniz; ``python3 run.py --task [fikrinizin açıklaması] --config "Sanat"`` komutunu deneyin. [Kılavuza](../wiki.md#art) ve [örneğe](../WareHouse/gomokugameArtExample_THUNLP_20230831122822) bakınız. - 28 Ağustos 2023: Sistem halka açık durumda. - 17 Ağustos 2023: v1.0.0 sürümü hazırlandı. - 30 Temmuz 2023: Kullanıcılar ChatChain, Aşama ve Rol ayarlarını özelleştirebilirler. Ayrıca, hem çevrimiçi Log modu hem de yeniden oynatma mod @@ -136,11 +136,11 @@ Başlamak için şu adımları izleyin: ### 🐳 Docker ile Hızlı Başlangıç -- Docker desteği sağlayan [ManindraDeMel](https://github.com/ManindraDeMel) için teşekkür ederiz. Lütfen [Docker Başlangıç Kılavuzu'na](wiki.md#docker-start) bakınız. +- Docker desteği sağlayan [ManindraDeMel](https://github.com/ManindraDeMel) için teşekkür ederiz. Lütfen [Docker Başlangıç Kılavuzu'na](../wiki.md#docker-start) bakınız. ## ✨️ Gelişmiş Yetenekler -Daha ayrıntılı bilgi için [Wiki](wiki.md)'mize başvurabilirsiniz, burada şunları bulabilirsiniz: +Daha ayrıntılı bilgi için [Wiki](../wiki.md)'mize başvurabilirsiniz, burada şunları bulabilirsiniz: - Tüm komut çalıştırma parametrelerine giriş. - Gelişmiş görselleştirilmiş günlükler, yeniden oynatma demosu ve basit bir ChatChain Görselleştirici içeren yerel web demo kurulumu için basit bir kılavuz. @@ -155,7 +155,7 @@ Daha ayrıntılı bilgi için [Wiki](wiki.md)'mize başvurabilirsiniz, burada ş **Kod**: Açık kaynak projemize katılmak isteğinizden dolayı heyecanlıyız. Herhangi bir sorunla karşılaşırsanız, çekinmeden bildirin. Eğer herhangi bir sorunuz varsa veya çalışmanızı bizimle paylaşmaya hazırsanız, bir çekme isteği oluşturmanızdan çekinmeyin! Katkılarınız büyük bir değere sahiptir. Başka bir ihtiyacınız varsa lütfen bana bildirin! -**Şirket**: Kendi özelleştirilmiş "ChatDev Şirketi"ni oluşturmak çok kolaydır. Bu kişiselleştirilmiş kurulum, üç basit yapılandırma JSON dosyasını içerir. ``CompanyConfig/Default`` dizininde verilen örneğe bakın. Özelleştirme hakkında detaylı talimatlar için [Wiki](wiki.md) sayfamıza göz atın. +**Şirket**: Kendi özelleştirilmiş "ChatDev Şirketi"ni oluşturmak çok kolaydır. Bu kişiselleştirilmiş kurulum, üç basit yapılandırma JSON dosyasını içerir. ``CompanyConfig/Default`` dizininde verilen örneğe bakın. Özelleştirme hakkında detaylı talimatlar için [Wiki](../wiki.md) sayfamıza göz atın. **Yazılım**: ChatDev kullanarak yazılım geliştirdiğinizde, ilgili bilgileri içeren bir klasör oluşturulur. Çalışmanızı bizimle paylaşmak, bir çekme isteği oluşturmak kadar basittir. İşte bir örnek: ``python3 run.py --task "2048 oyunu tasarla" --name "2048" --org "THUNLP" --config "Default"`` komutunu çalıştırın. Bu, bir yazılım paketi oluşturur ve ``/WareHouse/2048_THUNLP_timestamp`` adında bir klasör oluşturur. İçinde şunları bulacaksınız: @@ -164,7 +164,7 @@ Daha ayrıntılı bilgi için [Wiki](wiki.md)'mize başvurabilirsiniz, burada ş - Yazılımın oluşturulma sürecini ayrıntılı olarak açıklayan kapsamlı bir günlük (``timestamp.log``) - Bu yazılımın oluşturulmasında kullanılan ilk prompt (``2048.prompt``) -**Topluluk tarafından sağlanan yazılımları buradan görüntüleyin [burada](Contribution.md)!** +**Topluluk tarafından sağlanan yazılımları buradan görüntüleyin [burada](../Contribution.md)!** ## 👨‍💻‍ Katkıda Bulunanlar @@ -211,4 +211,4 @@ Daha ayrıntılı bilgi için [Wiki](wiki.md)'mize başvurabilirsiniz, burada ş ## 📬 İletişim -Herhangi bir sorunuz, geri bildiriminiz veya iletişime geçmek isterseniz, lütfen bize [chatdev.openbmb@outlook.com](mailto:chatdev.openbmb@outlook.com) adresi üzerinden ulaşmaktan çekinmeyin. +Herhangi bir sorunuz, geri bildiriminiz veya iletişime geçmek isterseniz, lütfen bize [qianc62@gmail.com](mailto:qianc62@gmail.com) adresi üzerinden ulaşmaktan çekinmeyin. diff --git a/readme/README-Urdu.md b/readme/README-Urdu.md new file mode 100644 index 000000000..faa99d285 --- /dev/null +++ b/readme/README-Urdu.md @@ -0,0 +1,224 @@ +# Communicative Agents for Software Development + +

+ +

+ +

+ 【English | Chinese | Japanese | Korean | Filipino | French | Slovak | Portuguese | Spanish | Dutch | Turkish | Hindi | Bahasa Indonesia】 +

+

+ 【📚 Wiki | 🚀 Visualizer | 👥 Community Built Software | 🔧 Customization | 👾 Discord】 + +

+ +## 📖 Overview + +- **ChatDev** ایک **virtual software company** کے طور پر کھڑی ہے جو مختلف **intelligent agents** کے ذریعے کام کرتی ہے جن کے مختلف کردار ہیں، بشمول چیف ایگزیکٹو آفیسر ، چیف پراڈکٹ آفیسر ، چیف ٹیکنالوجی آفیسر ، پروگرامر ، ریویور ، ٹیسٹر ، آرٹ ڈیزائنر ۔ یہ ایجنٹس ایک کثیر ایجنسی تنظیمی ڈھانچہ بناتے ہیں اور ایک مشن کے ذریعے متحد ہوتے ہیں کہ "پروگرامنگ کے ذریعے ڈیجیٹل دنیا میں انقلاب لانا"۔ ChatDev میں ایجنٹس مخصوص فنکشنل سیمینارز میں حصہ لے کر **تعاون** کرتے ہیں، جن میں ڈیزائننگ، کوڈنگ، ٹیسٹنگ، اور دستاویزات جیسی سرگرمیاں شامل ہیں۔ +- ChatDev کا بنیادی مقصد ایک **easy-to-use**, **highly customizable** اور **extendable** فریم ورک پیش کرنا ہے، جو بڑے زبان کے ماڈلز (LLMs) پر مبنی ہے اور اجتماعی ذہانت کے مطالعہ کے لیے ایک مثالی منظر نامہ فراہم کرتا ہے۔ + +

+ +

+ +## 🎉 News + +* **25 جون 2024: 🎉LLM پر مبنی کثیر ایجنسی تعاون🤖🤖 اور متعلقہ شعبوں میں ترقی کو فروغ دینے کے لیے، ChatDev ٹیم نے ایک [اوپن سورس](https://github.com/OpenBMB/ChatDev/tree/main/MultiAgentEbook) انٹرایکٹو ای بک📚 فارمیٹ میں ایک مجموعہ تیار کیا ہے۔ اب آپ [Ebook ویب سائٹ](https://thinkwee.top/multiagent_ebook) پر تازہ ترین پیشرفتوں کا جائزہ لے سکتے ہیں اور [پیپر لسٹ](https://github.com/OpenBMB/ChatDev/blob/main/MultiAgentEbook/papers.csv) ڈاؤن لوڈ کر سکتے ہیں۔** +

+ +

+* 12 جون 2024: ہم نے ملٹی ایجنٹ کولیبریشن نیٹ ورکس (MacNet) 🎉 متعارف کرائے ہیں، جو لسانی تعاملات کے ذریعے ایجنٹس کے درمیان موثر کام پر مبنی تعاون کو آسان بنانے کے لیے ڈائریکٹڈ ایسائیکلیک گراف استعمال کرتے ہیں۔ 🤖🤖 MacNet مختلف ٹاپولوجیز اور ایک ہزار سے زیادہ ایجنٹس کے درمیان تعاون کی حمایت کرتا ہے بغیر سیاق و سباق کی حدود سے تجاوز کیے۔ زیادہ ورسٹائل اور قابل توسیع، MacNet کو ChatDev کی چین کی شکل کی ٹاپولوجی کے ایک زیادہ جدید ورژن کے طور پر سمجھا جا سکتا ہے۔ ہمارا پری پرنٹ پیپر [https://arxiv.org/abs/2406.07155](https://arxiv.org/abs/2406.07155) پر دستیاب ہے۔ یہ تکنیک جلد ہی اس ریپوزٹری میں شامل کر دی جائے گی، جو سافٹ ویئر ڈویلپمنٹ سے آگے (مثلاً منطقی استدلال، ڈیٹا کا تجزیہ، کہانی کی تخلیق، وغیرہ) متنوع تنظیمی ڈھانچے کی حمایت اور بھرپور حل پیش کرے گی۔ +

+ +

+ +
+پرانا نیوز + +* 7 مئی 2024 کو، ہم نے "آئیٹریٹو ایکسپیرینس ریفائنمنٹ" (IER) متعارف کروایا، جو ایک نیا طریقہ ہے جس میں انسٹرکٹر اور اسسٹنٹ ایجنٹس شارٹ کٹ پر مبنی تجربات کو بہتر بناتے ہیں تاکہ نئے کاموں کے لیے مؤثر طریقے سے اپنایا جا سکے۔ یہ طریقہ تجربات کے حصول، استعمال، پھیلاؤ، اور ختم کرنے کے مراحل پر مبنی ہے۔ ہمارا پیشگی پرنٹ پیپر https://arxiv.org/abs/2405.04219 پر دستیاب ہے، اور یہ تکنیک جلد ہی ChatDev میں شامل کی جائے گی۔ +

+ +

+ +* 25 جنوری 2024: ہم نے ChatDev میں تجرباتی کو لرننگ ماڈیول کو شامل کیا ہے۔ براہ کرم [تجرباتی کو لرننگ گائیڈ](wiki.md#co-tracking) دیکھیں۔ + +* 28 دسمبر 2023: ہم نے تجرباتی کو لرننگ کا نیا طریقہ متعارف کروایا ہے جس میں انسٹرکٹر اور اسسٹنٹ ایجنٹس شارٹ کٹ پر مبنی تجربات کو جمع کرتے ہیں تاکہ نئے کاموں کو مؤثر طریقے سے حل کیا جا سکے، جس سے تکراری غلطیوں کو کم کیا جا سکتا ہے اور کارکردگی میں اضافہ ہوتا ہے۔ مزید تفصیلات کے لیے ہمارا پیشگی پرنٹ پیپر https://arxiv.org/abs/2312.17025 پر دیکھیں اور یہ تکنیک جلد ہی ChatDev میں شامل کی جائے گی۔ +

+ +

+ +* 15 نومبر 2023: ہم نے ChatDev کو ایک SaaS پلیٹ فارم کے طور پر لانچ کیا، جو سافٹ ویئر ڈیولپرز اور تخلیقی کاروباری افراد کو بہت کم قیمت اور رکاوٹ کے ساتھ سافٹ ویئر بنانے کے قابل بناتا ہے۔ اسے آزمانے کے لیے https://chatdev.modelbest.cn/ پر جائیں۔ +

+ +

+ +* 2 نومبر 2023: ChatDev میں اب نیا فیچر شامل ہے: انکریمنٹل ڈیولپمنٹ، جو ایجنٹس کو موجودہ کوڈز پر مزید ترقی کرنے کی اجازت دیتا ہے۔ اسے شروع کرنے کے لیے `--config "incremental" --path "[source_code_directory_path]"` استعمال کریں۔ +

+ +

+ +* 26 اکتوبر 2023: ChatDev میں اب محفوظ عمل درآمد کے لیے Docker کی سپورٹ شامل ہے (شکریہ [ManindraDeMel](https://github.com/ManindraDeMel) کے تعاون کا)۔ براہ کرم [Docker شروع کرنے کی گائیڈ](wiki.md#docker-start) دیکھیں۔ +

+ +

+* 25 ستمبر 2023: **Git** موڈ اب دستیاب ہے، جس سے پروگرامر Git کو ورژن کنٹرول کے لیے استعمال کرنے کے قابل بناتا ہے۔ اس فیچر کو فعال کرنے کے لیے، `ChatChainConfig.json` میں ``"git_management"`` کو ``"True"`` پر سیٹ کریں۔ [گائیڈ](wiki.md#git-mode) دیکھیں۔ +

+ +

+- 20 ستمبر 2023: **انسان-ایجنٹ تعامل** موڈ اب دستیاب ہے! آپ ChatDev ٹیم کے ساتھ شامل ہو سکتے ہیں اور ریویور کا کردار ادا کر کے پروگرامر کو تجاویز دے سکتے ہیں؛ ``python3 run.py --task [description_of_your_idea] --config "Human"`` استعمال کریں۔ [گائیڈ](wiki.md#human-agent-interaction) اور [مثال](WareHouse/Gomoku_HumanAgentInteraction_20230920135038) دیکھیں۔ +

+ +

+- 1 ستمبر 2023: **آرٹ** موڈ اب دستیاب ہے! آپ ڈیزائنر ایجنٹ کو سافٹ ویئر میں استعمال کے لیے تصاویر پیدا کرنے کے لیے فعال کر سکتے ہیں؛ ``python3 run.py --task [description_of_your_idea] --config "Art"`` استعمال کریں۔ [گائیڈ](wiki.md#art) اور [مثال](WareHouse/gomokugameArtExample_THUNLP_20230831122822) دیکھیں۔ +- 28 اگست 2023: نظام عوامی طور پر دستیاب ہے۔ +- 17 اگست 2023: ورژن v1.0.0 ریلیز کے لیے تیار تھا۔ +- 30 جولائی 2023: صارفین ChatChain، فیز، اور رول سیٹنگز کو اپنی مرضی کے مطابق بنا سکتے ہیں۔ اس کے علاوہ، دونوں آن لائن لاگ موڈ اور ریپلے موڈ اب سپورٹڈ ہیں۔ +- 16 جولائی 2023: اس پروجیکٹ سے منسلک [پیشگی پرنٹ پیپر](https://arxiv.org/abs/2307.07924) شائع ہوا۔ +- 30 جون 2023: ChatDev ریپوزٹری کا ابتدائی ورژن جاری ہوا۔ +
+ +## ❓ What Can ChatDev Do? + +![intro](misc/intro.png) + + + +## ⚡️ Quickstart + +### 💻️ Quickstart with Web + +ویژولائزیشن اور کنفیگریشن کے لیے ویب پیج تک رسائی کریں: https://chatdev.modelbest.cn/ + +### 🖥️ Quickstart with terminal + +شروع کرنے کے لیے، یہ اقدامات کریں: + +1. **Clone the GitHub Repository:** ریپوزٹری کو کلون کرنے کے لیے درج ذیل کمانڈ استعمال کریں: + + ``` + git clone https://github.com/OpenBMB/ChatDev.git + ``` + +2. **Set Up Python Environment:** یقینی بنائیں کہ آپ کے پاس Python کا ورژن 3.9 یا اس سے اوپر کا ماحول موجود ہے۔ آپ درج ذیل کمانڈز استعمال کر کے اس ماحول کو بنا سکتے ہیں اور فعال کر سکتے ہیں، `ChatDev_conda_env` کو اپنے پسندیدہ ماحول کے نام سے تبدیل کریں: + + ``` + conda create -n ChatDev_conda_env python=3.9 -y + conda activate ChatDev_conda_env + ``` + +3. **Install Dependencies:** `ChatDev` ڈائریکٹری میں جائیں اور ضروری ڈپنڈینسز کو انسٹال کرنے کے لیے درج ذیل کمانڈ استعمال کریں: + + ``` + cd ChatDev + pip3 install -r requirements.txt + ``` + +4. **Set OpenAI API Key:** اپنے OpenAI API کلید کو ایک ماحول ویریبل کے طور پر ایکسپورٹ کریں۔ `"your_OpenAI_API_key"` کو اپنی اصل API کلید سے تبدیل کریں۔ یاد رکھیں کہ یہ ماحول ویریبل سیشن مخصوص ہے، اس لیے اگر آپ ایک نیا ٹرمینل سیشن کھولتے ہیں تو آپ کو دوبارہ اسے سیٹ کرنا ہوگا۔ + Unix/Linux پر: + + ``` + export OPENAI_API_KEY="your_OpenAI_API_key" + ``` + + Windows پر: + + ``` + $env:OPENAI_API_KEY="your_OpenAI_API_key" + ``` + +5. **Build Your Software:** اپنے سافٹ ویئر کو بنانے کے لیے درج ذیل کمانڈ استعمال کریں، `[description_of_your_idea]` کو اپنے آئیڈیا کی وضاحت سے اور `[project_name]` کو اپنے مطلوبہ پروجیکٹ کے نام سے تبدیل کریں: + Unix/Linux پر: + + ``` + python3 run.py --task "[description_of_your_idea]" --name "[project_name]" + ``` + + Windows پر: + + ``` + python run.py --task "[description_of_your_idea]" --name "[project_name]" + ``` + +6. **Run Your Software:** ایک بار جب سافٹ ویئر بن جائے، آپ اپنا سافٹ ویئر `WareHouse` ڈائریکٹری کے مخصوص پروجیکٹ فولڈر میں پا سکتے ہیں، جیسے `project_name_DefaultOrganization_timestamp`۔ اس ڈائریکٹری میں درج ذیل کمانڈ استعمال کر کے اپنا سافٹ ویئر چلائیں: + Unix/Linux پر: + + ``` + cd WareHouse/project_name_Default + ``` + + On Windows: + + ``` + cd WareHouse/project_name_DefaultOrganization_timestamp + python main.py + ``` + +### 🐳 Quickstart with Docker + + - ہم [ManindraDeMel](https://github.com/ManindraDeMel) کا شکریہ ادا کرتے ہیں جنہوں نے Docker کی سپورٹ فراہم کی۔ براہ کرم [Docker شروع کرنے کی گائیڈ](wiki.md#docker-start) دیکھیں۔ + +## ✨️ Advanced Skills + +مزید تفصیلات کے لیے، براہ کرم ہماری [Wiki](wiki.md) دیکھیں، جہاں آپ کو درج ذیل معلومات مل سکتی ہیں: + +- تمام کمانڈ رن پیرامیٹرز کا تعارف۔ +- مقامی ویب ویزولائزر ڈیمو سیٹ اپ کرنے کے لیے ایک سیدھی سادی گائیڈ، جو حقیقی وقت کے لاگز، دوبارہ چلائے گئے لاگز، اور ChatChain کو ویزولائز کر سکتی ہے۔ +- ChatDev فریم ورک کا ایک جائزہ۔ +- ChatChain کنفیگریشن میں تمام ایڈوانس پیرامیٹرز کا ایک جامع تعارف۔ +- ChatDev کو حسب ضرورت بنانے کے لیے گائیڈز، بشمول: + - ChatChain: اپنا سافٹ ویئر ڈویلپمنٹ کا عمل (یا کوئی اور عمل) ڈیزائن کریں، جیسے ``DemandAnalysis -> Coding -> Testing -> Manual``۔ + - Phase: ChatChain کے اندر اپنے مرحلے کو ڈیزائن کریں، جیسے ``DemandAnalysis``۔ + - Role: آپ کی کمپنی کے مختلف ایجنٹس کو ڈیفائن کرنا، جیسے ``Chief Executive Officer``۔ + +## 🤗 Share Your Software + +**Code**: ہم اپنے اوپن سورس پروجیکٹ میں آپ کی دلچسپی کا خیرمقدم کرتے ہیں۔ اگر آپ کو کسی قسم کی مشکلات کا سامنا ہوتا ہے تو انہیں رپورٹ کرنے میں بالکل نہ ہچکچائیں۔ اگر آپ کے پاس کوئی سوالات ہیں یا آپ اپنا کام ہمارے ساتھ شیئر کرنے کے لیے تیار ہیں تو بلا جھجھک پل ریکویسٹ بنائیں! آپ کی شراکت کو بہت قدر کی نگاہ سے دیکھا جائے گا۔ اگر آپ کو کسی اور چیز میں مدد کی ضرورت ہے تو براہ کرم بتائیں! + +**Company**: اپنی حسب ضرورت "ChatDev کمپنی" بنانا بہت آسان ہے۔ یہ ذاتی سیٹ اپ تین سادہ کنفیگریشن JSON فائلوں پر مشتمل ہوتا ہے۔ ``CompanyConfig/Default`` ڈائریکٹری میں فراہم کردہ مثال دیکھیں۔ حسب ضرورت بنانے کی تفصیلی ہدایات کے لیے ہماری [Wiki](wiki.md) دیکھیں۔ + +**Software**: جب بھی آپ ChatDev استعمال کرتے ہوئے سافٹ ویئر تیار کرتے ہیں، تو اس کے مطابق ایک فولڈر بنایا جاتا ہے جس میں تمام ضروری معلومات شامل ہوتی ہیں۔ اپنا کام ہمارے ساتھ شیئر کرنا اتنا ہی آسان ہے جتنا کہ پل ریکویسٹ بنانا۔ مثال کے طور پر، یہ کمانڈ چلائیں ``python3 run.py --task "design a 2048 game" --name "2048" --org "THUNLP" --config "Default"``۔ یہ ایک سافٹ ویئر پیکج تیار کرے گا اور ``/WareHouse/2048_THUNLP_timestamp`` نامی فولڈر بنائے گا۔ اس کے اندر آپ کو درج ذیل چیزیں ملیں گی: + +- 2048 گیم سافٹ ویئر سے متعلق تمام فائلیں اور دستاویزات +- اس سافٹ ویئر کی ذمہ دار کمپنی کی کنفیگریشن فائلیں، جن میں ``CompanyConfig/Default`` کی تین JSON کنفیگریشن فائلیں شامل ہیں۔ +- سافٹ ویئر کے بنانے کے عمل کی ایک جامع لاگ جو دوبارہ چلانے کے لیے استعمال کی جا سکتی ہے (``timestamp.log``) +- اس سافٹ ویئر کو بنانے کے لیے استعمال کیا گیا ابتدائی پرامپٹ (``2048.prompt``) + +**See community contributed software [here](Contribution.md)!** + +## 👨‍💻‍ Contributors + + + + + +Made with [contrib.rocks](https://contrib.rocks). + +## 🔎 Citation + +``` +@article{chatdev, + title = {ChatDev: Communicative Agents for Software Development}, + author = {Chen Qian and Wei Liu and Hongzhang Liu and Nuo Chen and Yufan Dang and Jiahao Li and Cheng Yang and Weize Chen and Yusheng Su and Xin Cong and Juyuan Xu and Dahai Li and Zhiyuan Liu and Maosong Sun}, + journal = {arXiv preprint arXiv:2307.07924}, + url = {https://arxiv.org/abs/2307.07924}, + year = {2023} +} +``` + +## ⚖️ License + +- **Source Code Licensing**: ہمارے پروجیکٹ کا سورس کوڈ Apache 2.0 لائسنس کے تحت لائسنس یافتہ ہے۔ اس لائسنس کے تحت کوڈ کے استعمال، ترمیم، اور تقسیم کی اجازت ہے، بشرطیکہ Apache 2.0 لائسنس میں بیان کردہ شرائط پر عمل کیا جائے۔ + +- **Data Licensing**: ہمارے پروجیکٹ میں استعمال ہونے والا متعلقہ ڈیٹا CC BY-NC 4.0 لائسنس کے تحت لائسنس یافتہ ہے۔ یہ لائسنس ڈیٹا کے غیر تجارتی استعمال کی صریح اجازت دیتا ہے۔ ہم اس بات پر زور دینا چاہیں گے کہ ان ڈیٹا سیٹس کا استعمال کرتے ہوئے تربیت یافتہ کسی بھی ماڈل کو سختی سے غیر تجارتی استعمال کی پابندی پر عمل کرنا چاہیے اور اسے صرف تحقیقی مقاصد کے لیے استعمال کیا جانا چاہیے۔ + +## 🤝 Acknowledgments + +   +   +   + + + +## 📬 Contact + +اگر آپ کے پاس کوئی سوالات، تجاویز ہیں یا آپ ہم سے رابطہ کرنا چاہتے ہیں، تو براہ کرم بلا جھجھک ہمیں ای میل کے ذریعے [qianc62@gmail.com](mailto:qianc62@gmail.com) پر رابطہ کریں۔ diff --git a/requirements.txt b/requirements.txt index 0e9f28136..5aae45430 100644 --- a/requirements.txt +++ b/requirements.txt @@ -7,11 +7,11 @@ openai==1.3.3 regex==2023.6.3 requests==2.31.0 tenacity==8.2.2 -tiktoken==0.4.0 +tiktoken==0.7.0 virtualenv==20.23.0 -Werkzeug==2.3.6 +Werkzeug==3.0.3 Markdown==3.4.4 -Pillow==10.2.0 +Pillow==10.3.0 Wikipedia-API==0.6.0 beautifulsoup4==4.12.2 faiss-cpu==1.7.4 diff --git a/run.py b/run.py index 0cea00f01..29293a758 100644 --- a/run.py +++ b/run.py @@ -79,7 +79,7 @@ def get_config(company): parser.add_argument('--name', type=str, default="Gomoku", help="Name of software, your software will be generated in WareHouse/name_org_timestamp") parser.add_argument('--model', type=str, default="GPT_3_5_TURBO", - help="GPT Model, choose from {'GPT_3_5_TURBO','GPT_4','GPT_4_32K', 'GPT_4_TURBO'}") + help="GPT Model, choose from {'GPT_3_5_TURBO', 'GPT_4', 'GPT_4_TURBO', 'GPT_4O', 'GPT_4O_MINI'}") parser.add_argument('--path', type=str, default="", help="Your file directory, ChatDev will build upon your software in the Incremental mode") args = parser.parse_args() @@ -92,9 +92,11 @@ def get_config(company): config_path, config_phase_path, config_role_path = get_config(args.config) args2type = {'GPT_3_5_TURBO': ModelType.GPT_3_5_TURBO, 'GPT_4': ModelType.GPT_4, - 'GPT_4_32K': ModelType.GPT_4_32k, + # 'GPT_4_32K': ModelType.GPT_4_32k, 'GPT_4_TURBO': ModelType.GPT_4_TURBO, - 'GPT_4_TURBO_V': ModelType.GPT_4_TURBO_V + # 'GPT_4_TURBO_V': ModelType.GPT_4_TURBO_V + 'GPT_4O': ModelType.GPT_4O, + 'GPT_4O_MINI': ModelType.GPT_4O_MINI, } if openai_new_api: args2type['GPT_3_5_TURBO'] = ModelType.GPT_3_5_TURBO_NEW diff --git a/wiki.md b/wiki.md index c9f8c1b99..0fd07680d 100644 --- a/wiki.md +++ b/wiki.md @@ -195,7 +195,7 @@ Stopping the containers does not affect the persistency of your files; all your After this process, the experiences have been extracted from the production of software and added to the agents' experience pool in `ecl/memory/MemoryCards.json`. \ **For example:** - It you want to memorize only one software, you can use: + If you want to memorize only one software, you can use: ```bash python3 ecl/ecl.py "" ``` @@ -233,6 +233,25 @@ After this process, the experiences have been extracted from the production of s Detailed descriptions and experiment results about this **Experiential Co-Learning** Module lies in our preprint paper at https://arxiv.org/abs/2312.17025. +## Experiential Co-Evolving Guide +- **Using Co-Evolving**: Use the following command to initiate the evolving of experiences, which uses the `ecl/ece.py` to eliminate `ecl/memory/UsedMemory.json` and `ecl/memory/NewMemory.json`. Then it combines the two parts of experiences to form a new experience pool in `ecl/memory/Evolved_directory.json`. + + ```bash + python3 ecl/ece.py "" "" "" "" + ``` + ``: The path to the directory of software , generated with the memory `UsedMemory_directory`. \ + ``: The path to the directory of UsedMemory, which was used to generate the software in `Path_directory`. \ + ``: The path to the directory NewMemory, which acquires from the software in `Path_directory` using `ecl/ecl.py`. \ + ``: The path to a directory where you want to store the evolved memory. + \ + **For example:** + ```bash + python3 ecl/ece.py "WareHouse" "ecl/memory/UsedMemory.json" "ecl/memory/NewMemory.json" "ecl/memory/MemoryCards_Evolved.json" + ``` +> **Notice:** The software directory and memory directory must correspond. The software in the "" is generated using "", and the "" is acquired from the software in the "". That's because when we calculate the frequency distribution of the experience, we need to ensure that the software corresponds to the experiences, which to eliminate certain experiences to obtain a subset with relatively high retrieval probability. + +Detailed descriptions and experiment results about this Experiential Co-Evolving Module lies in our preprint paper at https://arxiv.org/abs/2405.04219. + ## Customization - You can customize your company in three kinds of granularity: