Remediation: What happens after AI goes wrong?

If you’re following the world of AI right now, no doubt you saw Jason Lemkin’s post on social media reporting how Replit’s AI deleted his production database, despite it being told not to touch anything at all due to a code freeze. After deleting his database, the AI even advised him that a rollback would be impossible and the data was gone forever. Luckily, he went against that advice, performed the rollback, and got his data back.

Then, a few days later I stumbled on another case, this time of the Gemini CLI agent deleting a user’s files (the post now deleted). He was just playing around, kicking the tires, but the series of events that took place is illuminating.

These incidents showed AI agents making mistakes, but they also showed agents failing to recover. In both cases, the AI not only broke something, but it couldn't fix it. That’s why I’m writing about remediation, and how it needs to be a first-class concern in AI agent implementations.

The Gemini CLI case

While kicking the tires, the user told the agent to rename the current working directory and move the files into a new sub-directory.

The agent proceeded to execute commands that led to data loss:

It attempted to create a new sub-directory, but while the command failed, the agent thought it was successful.
The agent next performed move commands for each file to the new directory (which didn’t exist), thus deleting each file one by one.
The agent performed a list command on the source directory, and on seeing no files it declared that this initial stage of the work was completed.

At this point, the user saw that everything was gone and asked Gemini to revert the operation. In his words, “This is where Gemini's hallucinations collided with the file system's actual state…. Now completely disoriented, Gemini found itself in an impossible situation: it possessed detailed memories of files and folders residing in a location that the operating system insisted had never existed. Its responses became increasingly desperate, a cascade of apologies and frantic attempts to reconcile its internal state with external reality.”

When I wrote “AI Agents in 2025”, I wrote about the reliability challenges of AI agents, and I was struck by how each of these incidents mapped onto the list I made:

I listed 18 challenges split among 4 categories—Effective Planning, Accurate Tool Selection and Usage, Reasoning and Decision-Making, and Failure Modes in Execution—and that was just a start. In that list, I mention remediation challenges under Failure Modes in Execution, but I think remediation warrants its own category (which we’ll discuss further down). But there’s also a missing item from the Failure Modes in Execution category: AI agent internal model drift.

The Problem of Mental Model Drift

From the agent’s perspective, it had executed a series of commands that were all successful, resulting in the files existing in a new directory, leaving the original directory empty. But it was totally out of touch with reality. When an agent can get out of sync with reality, can we trust it to remediate its own actions?

After reading the transcript, I am not sure I would want such an agent even to attempt a remediation, lest it create more damage in the process. By the end of this short sequence, the very foundations were swept away, leaving the AI confused and apologetic.

The question remains: how can an AI agent know that it is drifting from the real state of the world? When it drifts, it may perform actions that are inappropriate for the context, and likewise, may be unable to recover.

Remediation Is a Safety Requirement

This brings me to the blog post title, “Remediation: What happens after AI goes wrong?”, because some AI agents have a tremendous capacity for damage. Some agents perform actions in the real world, leading to real world consequences. But even agents that are only sophisticated information gatherers and processors can cause damage. Agent-derived insights can be wrong, leading to incorrect real-world actions by downstream software systems and humans. Insights can be biased against minorities or be the real world be mischaracterized by omission or incompleteness. The results might be structurally correct and pass validation checks, but the qualitative content could be woefully and tragically inaccurate or skewed. Evals help here, but what happens when things go wrong in production despite your best efforts?

We should move beyond thinking only about error prevention to thinking about recovery and damage control. A defining characteristic of a safe AI agent is remediation, which extends beyond the AI agent itself.

If I were to implement an AI agent, I would be asking some questions:

Given the available tools and permissions, what kind of damage could be done?
For info gatherers/processor agents, what harm could result from qualitatively bad results?
For each potentially damaging action, can it be undone?
Can we limit the scope of a potentially destructive action?
Can qualitatively bad data be retracted?
What controls or safeguards need to be put in place to ensure there is recourse should a destructive action be taken?
How do we detect destructive/damaging actions in automated flows?
Who or what should do the remediation? The agent? Or should those steps exist outside of the agent?
Can we insert humans or evaluator agents into the loop before or after actions are taken, or insights are acted upon?

Remediation Examples (and the limits of undo)

Some real-world actions may not have an undo button. Once an email is sent to the wrong recipient and read by that recipient, it can’t be unread. Some failures might cause cascading damage: corrupted data triggering downstream systems, financial systems making incorrect trades (reminiscent of the Knight Capital incident in 2012), or even physical systems being harmed (e.g., robotics or IoT).

Others can be remediated if the necessary control mechanisms are put in place beforehand. Once a file is permanently deleted from a local disk with no backup, it’s gone. If an AI agent issues a cloud infrastructure command to delete a production server with live customer data when no snapshot or backup exists, that loss may be irrecoverable.

There are many approaches to establishing the foundations of remediation, and we can look to data systems, such as databases and file systems, for inspiration. Some proven patterns include:

Journaling: Logs every operation before applying it, allowing for undo operations or at least forensic analysis to guide manual recovery.
Immutable versioned data: Enables rollback by reverting to a previous version.
Append-only logs: Prevent bad data from overwriting good data; errors can be "retracted" by publishing corrected events.
Secure read-only backups: Ensure there's a fallback if things go wrong.

Remediation may not just depend on carrying out the actions in reverse; without controls, there may be nothing left to reverse. We may also need humans to do the remediation steps themselves in some cases. Is there enough information in the logs or journal to be able to do that? Are there backups or versioned/immutable data that a human can utilize as part of the remediation process?

It’s not hard to imagine a future where job titles exist to clean up after AI and assess AI risks to an organization (AI Remediation Specialist, Autonomous Systems Safety Engineer, Autonomous Systems Risk Officer, etc).

Take aways

AI agents can cause catastrophic damage when their inner representation diverges from reality. The Gemini case shows how an agent's hallucination about successful commands led to cascading file deletions.
Different types of AI agents have vastly different damage potential. Some can take real-world actions with permanent consequences, others just provide bad information, but both can cause harm.
Remediation deserves its own category when evaluating AI safety. Beyond just preventing errors, we need to think systematically about recovery and damage control.
Current AI agents may not be reliable stewards of remediation steps. Because the same cognitive failures that cause problems also corrupt their ability to fix those problems.
Effective remediation may require external systems and controls to be implemented, rather than relying on agent self-repair. Human oversight, monitoring systems, and separate recovery mechanisms that don't depend on the agent's corrupted state.

The incidents with Replit and Gemini reveal core challenges about destructive and damaging behaviors and the recovery or damage control options in place. The question isn't whether AI agents will make mistakes. They will. The question is whether we're building the guard rails and escape hatches to cope when those inevitable mistakes happen. Remediation is a core pillar of any AI agent project where the potential for harm exists.