Last fall, I had my students do an AI activity in class, and this was one of the things I really drilled into them, that false information from AI is really hard to detect because it seems legit. Often, it鈥檚 plausible, which is part of it, but more than that, it鈥檚 because the language is so well-constructed, i.e., there are no grammatical mistakes, the word choice is not only good but natural, the response fits the query quite well etc. Oh, and one other thing: AI output in response to a query is almost always pretty long. Rather than one or two sentences, you often get several paragraphs, and the sheer amount of text can sometimes be overwhelming.
And because of all that, the user has a hard time approaching it with an appropriate degree of suspicion, which ends up making the user gullible and almost too easily convinced. And if your goal is to not be fooled by AI, there鈥檚 sort of an embedded catch 22 there, because the most reliable way to detect an AI hallucination is when the topic and content are things the user knows, but the catch 22 is that if you already know, you鈥檙e not going to be asking AI.
BTW here鈥檚 the activity we did: in the latter half of the semester, I had students work in groups to put together a set of essay type questions based on material covered in the first half of the semester. The idea was that this was all information and ideas that the students knew very well, because we鈥檇 spent the last 8-10 weeks talking about it. After crafting all the questions and creating rubrics with the kinds of info/content they鈥檇 want to see in the answers, each group asked two questions to AI and then evaluated and scored the answers.
Students were very critical of the quality of the answers. There weren鈥檛 a lot of flat-out wrong answers, but AI scored low across the board for things like lack of depth, missing the main point of the questions, and so on. The students pretty much felt that the AI answers seemed like something coming from someone who didn鈥檛 really know anything about the subject and was just trying fake it by throwing out a lot of commonly known tidbits.
I won鈥檛 teach that class again until Spring 2025, so it will be interesting to see if 1) this kind of exercise is still relevant, and 2) how well the AI performs if we do it.