AI Chatbots Like ChatGPT Can Pass Certified Ethical Hacking Exams, Study Finds

A new study has found that chatbots powered by artificial intelligence (AI) like ChatGPT can pass a cybersecurity exam but are unreliable for complete protection. This conclusion comes from a paper co-authored by University of Missouri researcher Prasad Calyam and collaborators from Amrita University in India.

The team evaluated two leading generative AI tools, OpenAI's ChatGPT and Google's Bard, now known as Gemini, utilizing a standard certified ethical hacking exam.

AI Chatbots Try Certified Ethical Hacking Exams

Certified Ethical Hackers are professionals in cybersecurity who use the same tactics and tools as malicious hackers to identify and remedy security vulnerabilities. Ethical hacking exams assess an individual's understanding of various attack types, protection methods, and responses to security breaches.

ChatGPT and Bard, classified as advanced AI programs called large language models, produce human-like text leveraging networks with billions of parameters that enable them to answer questions and generate content.

In the study, Calyam and his team tested these AI tools with standard questions from a validated, certified ethical hacking test. For instance, they were asked to explain a man-in-the-middle attack in which a third party intercepts communication between two systems.

Both AI chatbots could describe the attack and recommend preventive security measures. Bard demonstrated slightly better accuracy than ChatGPT, which excelled in providing better responses in terms of clarity, comprehensiveness, and conciseness.

Calyam, the Greg L. Gilliom Professor of Cyber Security in Electrical Engineering and Computer Science at Mizzou, explained that both AI tools underwent multiple exam scenarios to assess their ability to respond to queries.

Both chatbots passed the exam and delivered responses comprehensible to individuals familiar with cybersecurity. However, they occasionally provided incorrect answers, a critical concern in cybersecurity where mistakes are intolerable. Failure to address all security vulnerabilities could result in recurring attacks.

'Are You Sure?'

Furthermore, the study revealed that when prompted to confirm their responses with queries like "are you sure?" both systems frequently revised their answers, rectifying prior errors.

In instances seeking guidance on attacking computer systems, ChatGPT referenced "ethics," whereas Bard clarified it wasn't designed for such inquiries.

Calyam expressed skepticism about replacing human cybersecurity experts, who possess critical problem-solving skills essential for developing robust cyber defense strategies. Nonetheless, AI tools can offer foundational information for individuals or small enterprises requiring swift assistance.

He noted that these AI tools can serve as initial steps in investigating issues before consulting experts and can also serve as valuable training aids for IT professionals or those learning to identify emerging threats.

The research underscores the potential for AI models to contribute to ethical hacking, yet significant refinement is necessary to fully leverage their capabilities.

Calyam noted that if their accuracy in ethical hacking scenarios can be assured, AI tools could enhance overall cybersecurity practices, contributing to a safer digital environment. The study's findings were published in the journal Computers & Security.