Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms
Thought Experiment. Imagine encountering a novel AI agent operating autonomously “in the wild.” This agent offers to perform tasks on your behalf – draft a legal contract, diagnose a medical symptom, or execute financial trades. Before granting it any trust or authority, what questions would you need answered? First, Who (or what) is this agent? Is its identity verifiable and tied to a reputable source, or could it be an imposter? Second, Can I trust its competence and integrity? Does it reason correctly, or will it hallucinate facts and fall prey to manipulated inputs? Has its model been tampered with or backdoored in some hidden way? Third, What is its reputation? How has it behaved in past interactions, and what do other entities report about it? Could it be colluding with others or whitewashing a bad history? Before we let this agent book our flights or access sensitive data, these questions must be answered. In essence, the agent should prove its identity, trustworthiness of its outputs, and reliability of its past behavior.
These questions illustrate the core challenges faced by LLM-based AI agents operating autonomously or semi-autonomously in open settings. Such agents – powered by large language models (LLMs) and capable of making decisions or calling tools on our behalf – are increasingly deployed in unbounded environments. Yet outside the sandbox of a controlled organization or closed API, an agent’s credibility cannot be taken for granted. This paper examines the intertwined challenges of Identity, Trust, and Reputation for AI agents in the wild. Identity is about proving who an agent is (and what model or code it runs). Trust is about ensuring correct and safe behavior (avoiding errors or exploits). Reputation is about historical behavior and community validation over time.
We begin by unpacking each of these three themes. Under Identity, we discuss how an autonomous agent can establish a verifiable persona: cryptographic identities and decentralized identifiers, provenance of its model and code, runtime attestation of its state, and the ever-present risk of impersonation by malicious actors. Under Trust, we explore the technical reliability of an agent’s actions: the tendency of LLM agents to hallucinate or confabulate information , vulnerabilities like prompt-injection attacks that can subvert an agent’s instructions , the possibility of backdoors or trojans implanted in its model , and issues of miscalibration (e.g. an agent expressing unjustified high confidence ). Under Reputation, we examine mechanisms for capturing an agent’s track record and community feedback: how to resist Sybil attacks (where one entity spawns many fake agents to game the system), how to detect collusion among agents sharing illicit goals, how to prevent “whitewashing” (agents discarding a sullied identity to start fresh), and how audits or validation exercises can periodically check an agent’s claims.
Next, we compare three emerging frameworks that aim to address aspects of these challenges: Agent-to-Agent (A2A) communication protocol, NANDA (which we will refer to as Networked Agent Discovery and Attestation, a decentralized agent framework), and ERC-8004 (“Trustless Agents” standard on Ethereum). We analyze how each proposal handles identity, trust, and reputation – and where they fall short – to understand the “lineage of trust” they establish from an agent’s creation to its interactions. We pay special attention to how trust propagates in multi-agent systems: when agents delegate to other agents or build on each other’s outputs, how do errors or malicious behaviors cascade? We highlight LLM-specific challenges in these settings, such as an agent’s hypersensitivity to slight prompt nudges (which can drastically alter its behavior) and latent behavior instabilities (where an agent might have hidden modes of operation that surface unpredictably). These properties mean that maintaining a chain of trust across agents is both crucial and non-trivial.
Finally, we synthesize our findings into a concrete, layered protocol proposal for building a creditable “agentic web.” We argue that a combination of cryptographic identity infrastructure, robust trust verification (both crypto-verifiable and economic), and reputation tracking is needed to allow autonomous AI agents to interact safely across the open internet. Our proposed architecture is layered: at the base is secure identity (a web of cryptographic credentials and attestations), above that are adaptive trust and validation mechanisms (for real-time assurance of behavior), and on top is reputation and governance (collective memory and oversight of agents over time). We suggest how existing protocols like A2A and ERC-8004 can be integrated into this stack, and we indicate points at which optional diagrams (e.g. a handshake sequence between two agents establishing trust, or an agent “stack” illustrating these layers) could aid understanding. The goal is to chart a path toward an “agentic web” – an ecosystem where autonomous agents can roam widely yet operate under a lineage of trust, making them creditable by default rather than by blind faith.