diff --git a/TaskForces/Interoperability/Reports/report-interoperability.html b/TaskForces/Interoperability/Reports/report-interoperability.html index b50c656..09de4fb 100644 --- a/TaskForces/Interoperability/Reports/report-interoperability.html +++ b/TaskForces/Interoperability/Reports/report-interoperability.html @@ -15,6 +15,12 @@ latestVersion: null, edDraftURI: "https://w3c-cg.github.io/webagents/TaskForces/Interoperability/Reports/report-interoperability.html", editors: [{ name: "Your Name", url: "https://your-site.com" }], + authors: [ + { + name: "Jérémy Lemée", + url: "https://www.alexandria.unisg.ch/entities/person/Jeremy_Lemee" + } + ], github: "https://github.com/w3c-cg/webagents/", shortName: "webagents-interop", xref: "web-platform", @@ -76,6 +82,31 @@ href: "https://dl.acm.org/doi/abs/10.5555/2031678.2031687", publisher: "IFAAMAS", }, + AGORA: { + authors: [ + "Marro, Samuele", "La Malfa, Emanuele", "Wright, Jesse", "Li, Guohao", "Shadbolt, Nigel", "Wooldridge, Michael", "Torr, Philip" + ], + title: "A scalable communication protocol for networks of large language models", + date: "2024", + href: "https://arxiv.org/pdf/2410.11905", + }, + COALA: { + authors: [ + "Sumers, Theodore","Yao, Shunyu", "Narasimhan, Karthik", "Griffiths, Thomas" + ], + title: "Cognitive architectures for language agents", + date: "2023", + href: "https://openreview.net/pdf?id=1i6ZCvflQJ", + publisher: "Transactions on Machine Learning Research" + }, + TOOL: { + authors: [ + "Wang, Zhiruo", "Cheng, Zhoujun", "Zhu, Hao", "Fried, Daniel", "Neubig, Graham" + ], + title: "What are tools anyway? a survey from the language model perspective", + date: "2024", + href: "arXiv preprint arXiv:2403.15452", + } } }; @@ -104,10 +135,10 @@

Terminology

A specification of communication among two or more agents that states who can say what to whom and when — for example, as message sequence diagrams [[AUML]] or information flows [[BSPL]].
Artifact or Tool
-
A resource [[WEBARCH]] that can be shared and used by agents to support their activities. In some multi-agent systems, agents can construct artifacts to instrument their environments [[JACAMO]].
+
A resource [[WEBARCH]] that can be shared and used by agents to support their activities. In some multi-agent systems, agents can construct artifacts to instrument their environments [[JACAMO]].In the context of agentic AI, a tool is a is a functional interface to a program that a language model can use. A tool can enable an LLM to perceive or act in an environment or to perform computations. [[TOOL]]
-
Augmented Language Model
-
A language model augmented with abilities such as reasoning, tool use, information retrieval, or storing context across interactions. Unlike an agent, an augmented language model does not actively pursue goals and is not situated in an environment. See also [[TMLR23]] and [[ANTHROPIC24]].
+
Augmented Language Model or Language Agent
+
A language model augmented with abilities such as reasoning, tool use, information retrieval, or storing context across interactions. Unlike an agent, an augmented language model does not actively pursue goals and is not situated in an environment. See also [[TMLR23]] and [[ANTHROPIC24]]. A Language agent is an agent that relies on a language model to interact with their environment. The language model can be used to process observations represented in natural or formal languages, generate the actions to perform, and make decisions [[COALA]]. These agents can be created using an augmented language model as a building block [[ANTHROPIC24]].
Multi-Agent System (MAS)
A system composed of agents that are situated in a shared environment and interact with one another to achieve individual or collective goals. Agents can work in collaboration, cooperation, and/or competition. A MAS can be either an open or a closed system. This report is primarily concerned with open MAS.
@@ -169,7 +200,17 @@

State of Web-based Multi-Agent Systems

Resource descriptions,
Prompt definitions,
(JSON) Directories (via */list) - Client-Server with streaming RPC connectors (JSON-RPC 2.0, HTTP+SSE) + Client-Server with streaming RPC connectors (JSON-RPC 2.0, Streamable HTTP) + + + NLWeb + Natural-language query endpoint + N/A + Function calling via MCP + URIs (Resources) + JSON with schema.org + N/A + Client-Server with streaming RPC connectors through MCP, REST API for human interaction, Web Syndication with RSS A2A @@ -187,6 +228,16 @@

State of Web-based Multi-Agent Systems

Well-known URIs,
Directories Async. Client-Server with streaming RPC connectors and webhooks (JSON-RPC 2.0, HTTP+SSE) + + Agora + Agent,
Protocol Document,
Message
Communication Protocol + Communication protocols with protocol negotiation + N/A + N/A + Protocol Document,
Message + N/A + Client-Server(HTTPS) + ANP Agent,
Agent Description,
Communication Protocol @@ -288,6 +339,10 @@

Agentic AI

+

The concept of Agentic AI refers to AI systems that are able to take autonomous decisions in order to achieve goals. The term is commonly used to refer more specifically to autonomous generative AI systems.

+

Large Language Models (LLMs) are a core technology to create agentic AI systems. More precisely, a core component to create language agents, is an Augmented Language Model (ALM), which is an LLM extended with the ability to reason and the ability to use tools [[TMLR23]]. These ALMs are building blocks to create agents [[ANTHROPIC24]]. The Model Context Protocol (MCP) is a protocol to enable ALMs and language agents to connect with external tools and data sources. The protocol thus enables a separation of concerns between agents and tools/data sources. In practice, MCP servers can be run on the same machine or can be accessed through the Internet via streamable HTTP. NLWeb relies on MCP to integrate conversational interfaces within websites, thus aiming to become the HTML of the Agentic Web.

+ +

Agentic AI is also considering communication among language agents. Different protocols are being developed to enable communication of language agents on the Web. The Agent to Agent (A2A) protocol is a protocol that is meant as a complement to MCP for agent communication. Agents using this protocol describe themselves and their capabilities in an Agent Card that is available on the Web for other agents to read and use. The protocol defines tasks that an agent can achieve on behalf of another and messages to support communication among agents. The protocol relies on JSON-RPC for communication. The Agora protocol is protocol for communication among language agents meant to be as versatile, efficient, and portable as possible, within the limit of the Agent Communication Dilemma between these three properties [[AGORA]]. The Agora protocol enables agents to choose at run time which specific protocol to use for interaction [[AGORA]]. The Agent Network Protocol (ANP) is another protocol for agents on the Web. ANP defines three layers: the Identity layer, the Meta-Protocol layer, and the Application layer. The Identity layer relies on Decentralized Identifiers (DID) to identity the agents. ANP defines a custom DID method did:wba, for Web-based Agents, to enable agents to prove their identities without relying on a central authority. The Meta-Protocol layer enables agents to select which protocol to use for communication. Once a protocol has been selected, the agents communicate using that protocol. Finally, the Application layer defines a JSON-LD Agent Description (AD) to enable agents to provide information about themselves to other agents and an Agent Discovery Protocol to enable agents to discover the ADs of other agents. Eclipse LMOS (Language Model Operating System) is another project to build an Internet of Agents. Eclipse LMOS relies on DIDs to identify software agents. It also defines an Agent Description Format to describe agents and a Tool Description Format to describe tools. Both description formats are defined as built on top of the Thing Description (TD) Format. Eclipse LMOS also defines mecanisms for discovery, and a communication protocol that relies on WebSocket.