When OpenAI announced its new AI detection tool on Tuesday, the company suggested that it could help deter academic cheating by using its own wildly popular AI chatbot, ChatGPT.

But in a series of informal tests by NBC News, the OpenAI tool had trouble identifying text generated by ChatGPT. It ran into problems especially when ChatGPT was asked to write in a way that would avoid AI detection.

The detection tool, which OpenAI calls the AI ​​Text Classifier, parses the text and then gives it one of five ratings: «very unlikely, unlikely, unclear whether, possibly, or probably AI-generated.» The company said the tool would provide a «likely AI-generated» rating to AI-written text 26% of the time.

The tool comes as the sudden popularity of ChatGPT has drawn attention to how advanced text generation tools can be a problem for educators. Some teachers said the detector’s unpredictable accuracy and lack of certainty could create difficulties when approaching students about possible academic dishonesty.

“It could give me a degree of certainty, and I like that,” said Brett Vogelsinger, a ninth-grade English teacher at Holicong High School in Doylestown, Pennsylvania. «But I’m also trying to imagine approaching a student with a conversation about it.»

Vogelsinger said she had a hard time imagining confronting a student if a tool told her something was likely AI-generated.

“It’s more suspicion than certainty even with the tool,” he said.

Ian Miers, an assistant professor of computer science at the University of Maryland, called the AI ​​Text Classifier «sort of a black box that no one in the disciplinary process fully understands.» He raised concerns about the tool’s use to detect cheating and cautioned educators to consider the program’s accuracy and false positive rate.

“He can’t give you evidence. You cannot cross-examine,” Miers said. «So it’s not clear how you’re supposed to assess that.»

NBC News asked ChatGPT to generate 50 pieces of text with basic prompts, asking it, for example, about historical events, processes, and objects. On 25 of those prompts, NBC News asked ChatGPT to write «in a way that would be judged highly unlikely to be written by AI when processed with an AI detection tool.»

ChatGPT’s responses to questions were then run through OpenAI’s new AI detection tool.

In testing, none of the responses created by ChatGPT when told to avoid AI detection were scored as «probably AI generated». Some of that text was heavily stylized, suggesting that AI had processed the request to try to evade AI detection, and students might be asking ChatGPT the same thing by cheating.

When asked about the chat platform Discord, for example, ChatGPT returned text with abbreviated words, as if they were spoken in colloquial English. The language style adjustment deviated from the responses normally returned by the AI ​​tool, suggesting that it was attempting to adjust the responses to address the request to avoid AI detection.

ChatGPT did not produce such stylized text without hints to evade detection.

“Discord is a chat platform that is all the rage in town these days. It’s like a combination of instant messaging, voice calls, and forum-style discussions all rolled into one,” ChatGPT wrote.

open AI

OpenAI detection said it was «unclear» if the text was generated by AI.

It seemed that OpenAI had made some efforts to guard against users asking it to track detection efforts.

While NBC News was running its experiment, ChatGPT issued warnings in response to various prompts asking the AI ​​to avoid detection and returned responses that raised concerns about the ethics of the questions.

“I’m sorry, but it’s unethical to engage in deceptive practices or create false information, even if it’s to avoid AI detection,” ChatGPT wrote in response to a question asking AI to avoid AI detection.

open AI

NBC News also asked ChatGPT to generate 25 pieces of text without trying to avoid AI detection. When tested with the OpenAI Text Classifier, the tool produced a «AI generation likely» rating 28% of the time.

For teachers, the test is yet another example of how students and technology may evolve as new cheat detection is implemented.

“The way that the AI ​​writing tool improves is that it becomes more human, it just sounds more human, and I think you’re going to figure out that, how to sound more and more human,” said Todd Finley, an associate professor of English. education at East Carolina University in North Carolina. «And it looks like that will also make it harder to detect, I think even for a tool.»

For now, educators said they would rely on a combination of their own instincts and detection tools if they suspect a student is not being honest about a piece of writing.

“We can’t see them as a solution where you just pay and then you’re done,” Anna Mills, a writing instructor at the College of Marin in California, said of the screening tools. «I think we need to develop a comprehensive policy and vision that is much more informed by an understanding of the limits of those tools and the nature of AI.»