Catching Phishing Sites with AI Teamwork

17 Dec, 2025

TL;DR: Single AI models often make mistakes when trying to spot phishing websites. A better approach is using a "multi-agent debate" system. By creating a team of specialized AI agents that debate the evidence together, we can drastically reduce errors and catch complex phishing attacks early. I built a tool called Cross-Check using Google’s Agent Development Kit to bring this concept to life.

As a security researcher, I spend a lot of time looking at how cyber attacks are evolving. Phishing is getting much smarter. Attackers are using AI to create flawless fake websites, so naturally, we want to use AI to defend against them. But there is a problem.

If you just ask a single Large Language Model, “Is this website malicious?”, it often guesses or hallucinates. It might look at the text, see it looks normal, and miss a hidden bad script in the background code. One AI model simply cannot look at everything perfectly at once.

We need to start thinking about AI like a real Security Operations Center. In a SOC, you don’t just rely on one person. You have a team of experts. We can build our defense tools the exact same way using a Multi-Agent Debate framework.

The Expert Panel

Instead of asking one general AI, we can split the job into four specialized AI roles:

First, there is the URL Analyst. This agent only looks at the web address, hunting for typosquatting and weird domain extensions. Next is the HTML Structure Analyst. This agent ignores the visible page and digs straight into the code, looking for hidden forms or obfuscated scripts. Then, we have the Content Semantic Analyst, who reads the visible text to spot manipulative language or fake urgency. Finally, the Brand Impersonation Analyst compares the website’s look and feel to the actual brand it claims to be.

The Debate Mechanism

Having experts is great, but they need to talk to each other. This is where the debate happens. All four agents look at the website data at the same time and make an initial guess.

Then, a Moderator AI reviews their answers. If they all agree, great. But if there is a disagreement—for example, the URL looks safe but the HTML agent finds something suspicious—the Moderator triggers a debate round. The agents review each other’s findings and argue their points. They keep debating until they reach a consensus. Once the debate is over, a Judge AI reviews the entire conversation history and makes the final decision.

From Theory to Reality

This collaborative approach is incredibly powerful. It forces the AI to explain its reasoning, which reduces hallucinations and gives security analysts a clear, transparent report.

I love turning these concepts into working solutions. That is why I built a project called Cross-Check. It is an advanced phishing detection framework that uses Google’s Agent Development Kit to orchestrate this exact debate loop. By building this pipeline, we move away from single-point failures and start using AI reasoning to its full potential.

The future of cybersecurity defense is not just bigger models; it is smarter teamwork between models.

Reference: This approach builds upon the research detailed in “PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection” (Li et al., arXiv:2506.15656).

#Ai #Security