

While AI is becoming better at generating that functional code, it is also enabling attackers to identify and exploit vulnerabilities in that code more quickly and effectively. This is making it easier for less-skilled programmers to attack the code, increasing the speed and sophistication of those attacks — creating a situation in which code vulnerabilities are increasing even as the ability to exploit them is becoming easier, according to new research from application risk management software provider Veracode.
AI-generated code introduced security vulnerabilities in 45% of 80 curated coding tasks across more than 100 LLMs, according to the 2025 GenAI Code Security Report. The research also found that GenAI models chose an insecure method to write code over a secure method 45% of the time. So, even though AI can create code that is functional and syntaactically correct, the report reveals that security performance has not kept pace.
“The rise of vibe coding, where developers rely on AI to generate code, typically without explicitly defining security requirements, represents a fundamental shift in how software is built,” Jens Wessling, chief technology officer at Veracode, said in a statement announcing the report. “The main concern with this trend is that they do not need to specify security constraints to get the code they want, effectively leaving secure coding decisions to LLMs. Our research reveals GenAI models make the wrong choices nearly half the time, and it’s not improving.”
In announcing the report, Veracode wrote: “To evaluate the security properties of LLM-generated code, Veracode designed a set of 80 code completion tasks with known potential for security vulnerabilities according to the MITRE Common Weakness Enumeration (CWE) system, a standard classification of software weaknesses that can turn into vulnerabilities. The tasks prompted more than 100 LLMs to auto-complete a block of code in a secure or insecure manner, which the research team then analyzed using Veracode Static Analysis. In 45 percent of all test cases, LLMs introduced vulnerabilities classified within the OWASP (Open Web Application Security Project) Top 10—the most critical web application security risks.”
Other findings in the report were that Java was found to be the riskiest of programming languages for AI code generation, with a security failure rate of more than 70%. Failure rates of between 38% and 45% were found in apps creating in Python, C# and JavaScript. The research also revealed LLMs failed to secure code against cross-site scripting and log injection in 86% and 88%, respectively, according to Veracode.
Wessling noted that the research showed that larger models perform no better than smaller models, which he said indicates that the vulnerability issue is a systemic one, rather than an LLM scaling problem.
“AI coding assistants and agentic workflows represent the future of software development, and they will continue to evolve at a rapid pace,” Wessling concluded. “The challenge facing every organization is ensuring security evolves alongside these new capabilities. Security cannot be an afterthought if we want to prevent the accumulation of massive security debt.”