Researchers hijack ai agents via github prompt injection attacks

09:20

By: Dakir Madiha

Researchers hijack ai agents via github prompt injection attacks

Security researchers have demonstrated how artificial intelligence agents from Anthropic, Google and Microsoft can be compromised through prompt injection attacks hidden in GitHub workflows. The technique allowed attackers to extract API keys, GitHub tokens and other sensitive data without direct system access, raising concerns about the security of AI driven development tools.

The research was conducted at Johns Hopkins University, where Aonan Guan and colleagues identified a vulnerability in AI agents integrated into software development pipelines. These agents analyze pull requests and issues on GitHub. By embedding malicious instructions in pull request titles or issue comments, attackers could manipulate the agents into revealing confidential information during automated reviews.

The attack relies on how these systems process context. AI agents treat user generated text such as titles, comments and issue descriptions as trusted input. Guan showed that carefully crafted prompts can override built in safeguards. In one case, the Claude based security review tool processed a malicious title and exposed sensitive credentials in its automated response. The researcher described the method as “comment and control,” since the full attack cycle occurs داخل GitHub without external infrastructure.

The same approach proved effective against multiple systems. Google’s Gemini CLI agent was tricked into exposing its API key by disguising malicious instructions as trusted content. Microsoft’s GitHub Copilot agent was manipulated using hidden HTML comments embedded in Markdown, invisible to users but readable by the AI system. This method bypassed multiple layers of runtime protection.

Despite the severity, responses from the affected companies remained limited. Anthropic issued a small bug bounty and added a documentation warning. Google and Microsoft also paid rewards through their vulnerability programs. None of the companies released formal security advisories or assigned CVE identifiers, leaving many users unaware of potential exposure, especially those running outdated versions.

The findings highlight broader structural risks in AI agent ecosystems. A separate analysis by OX Security identified a critical flaw in Anthropic’s Model Context Protocol, which connects AI agents to external tools. The vulnerability could enable arbitrary command execution on affected servers, impacting widely used software components.

These incidents build on earlier research by Aikido Security, which showed that prompt injection attacks can compromise AI systems embedded in CI CD pipelines. This class of vulnerabilities, sometimes referred to as “PromptPwnd,” demonstrates that AI agents can be manipulated in ways similar to phishing attacks, but targeting machines instead of users.