Coinbase: ChatGPT doesn’t meet accuracy requirements for integration into security review process
Coinbase said ChatGPT's responses were inconsistent and it incorrectly labeled five high-risk assets as low-risk.
Coinbase said it would not integrate the popular artificial intelligence tool ChatGPT into its security review process because it does not meet its accuracy requirements.
Coinbase used ChatGPT to test the security standards of 20 unnamed ERC-20 tokens. The results of the tests showed that the tool showed “promise for its ability to quickly assess smart contract risks.”
However, when ChatGPT results were compared against the Coinbase security team manual review, the machine gave eight incorrect answers — five of which were the worst-case failure.
A breakdown of these errors showed that ChatGPT incorrectly labeled high-risk assets as low-risk. Coinbase noted that “underestimating a risk score is far more detrimental than overestimating.”
Coinbase security team said it had first taught ChatGPT how to conduct the security analysis using its format. However, the machine still mislabeled these risks because it cannot recognize “when it lacks context to perform robust security analysis.”
Besides that, ChatGPT responses were also inconsistent when asked the same question repeatedly. Coinbase said the AI tool was “influenced by comments in the code and seemed to default to comments rather than function logic occasionally.”
Coinbase concluded that:
“While ChatGPT shows promise for its ability to quickly assess smart contract risks, it does not meet the accuracy requirements to be integrated into Coinbase security review processes.”
Meanwhile, this experiment represents another example of potential applications of ChatGPT and its latest version, GPT-4. The AI tool has gained popularity for its human-like responses and high scores in major exams.
Crypto enthusiasts have also highlighted its ability to review Ethereum smart contracts, identifying vulnerabilities and ways to exploit the code. Coinbase director Conor Grogan noted this in a Twitter thread where the machine “highlighted a number of security vulnerabilities and pointed out surface areas where the contract could be exploited.”
Several blockchain developers believe the tool could assist them in their work but don’t see it replacing humans.