by Bruce E. R. Thompson.
You might suspect that I didn’t write this essay myself. Perhaps I simply asked ChatGPT to write it for me, slapped my name on it, and submitted it as per usual.
Sometimes, when I teach philosophy, I ask my students to write papers. Like other teachers, I worry that student papers can be faked, thus undermining any learning experience that might be afforded by the assignment (other than how to get away with dishonesty). The old technology available to students for this purpose was called an “essay mill.” These were just collections of previously written student papers on a wide variety of subjects available online. Pick a paper on the topic assigned, cut and paste, and voilà, a paper.
I can spot essay mill papers within a few minutes of reading the third paragraph. My assignments are very specific regarding the outline to be followed, and essay mill authors do not have access to my requirements. But more to the point, search engines are a double-edged tool. If a student can search online for a paper on a given topic, a professor can just as easily find the stolen paper by searching for a particularly distinctive phrase occurring in the paper. I don’t even bother to use Turnitin.com.
But ChatGPT is not an essay mill. Papers generated by AI cannot be sniffed out by searching for them. So, teachers are worried. ChatGPT is very good at mimicking the style of well-written, formulaic essays. This essay—the one you are now reading—is a well-written, formulaic essay. It is not Charles Dickens, of course, but the spelling is correct, and the sentences are grammatical. It is formulaic in the sense that it has an introductory section, a well-defined motivating theme, and it does—at the end—draw a conclusion.
Therefore, this essay may have been written by ChatGPT.
Or not. Just as internet searching was a double-edged tool, the use of so-called AI to write fake student papers is also double-edged. AI can be used to write fake papers, but it can also be used to spot them. The chief characteristic that distinguishes an AI composition from something written by a human student is a quality that could be called “quirkiness.” AI programs operate by making statistical predictions concerning which word is likely to come next in a grammatically constructed sentence on a spedified topic. Thus, the programmers who have worked on AI typically do not call them “AI systems.” They refer to them as LLMs, or “large language models.” LLMs use large quantities of data (many examples of actual sentences), along with models of English grammar, to generate new probable sentences. But, if it is possible to calculate how probable a word usage is, it is equally possible to calculate how improbable a word usage is. The same technology that is used to generate plausible-sounding sentences can also be used to calculate the “quirkiness” of sentences. A passage with little to no quirkiness is probably written by an LLM; a similar passage with a high quirkiness rating is likely to have been written by a human being. So, the problem of spotting fakes is already largely solved.
As a philosopher, however, I am interested in a far more abstract problem: what is the difference between “artificial intelligence” and “real intelligence” in the first place? The word “artificial” has two distinct meanings, and discussions of artificial intelligence often do not distinguish them.
In one sense “artificial” means “made by someone,” as opposed to “occurring naturally.” For example, some foods use artificial flavoring, i.e., flavoring made in a lab rather than extracted from a natural source. Citrus flavors are especially easy to manufacture. Limonene, the chemical mostly responsible for citrus flavors, is so easy to make that my lab partner and I successfully made some during a high school chemistry class. Country Time Lemon Flavored Drink (they aren’t allowed to call it “lemonade”) uses this same artificial flavoring. However, whether made in a test tube or in a lemon, limonene is limonene. The molecule that occurs naturally is chemically identical to the molecule that my lab partner and I made.
Artificial sweeteners are a different story. The purpose of artificial sweeteners is to stimulate the taste buds as if they were a naturally occurring sugar, such as fructose or sucrose, without being metabolized into extra body weight. Saccharin, the original “artificial sugar” is a sugar, benzosulfimide, which does not occur in nature. However, parts of its shape are enough like sucrose that the tastebuds are fooled into sending sweetness signals to the brain, even though the molecule is too complex to be digested. Benzosulfimide is artificial not just in the sense that it is made by humans, but also in the more profound sense that it isn’t the real thing. It is ultimately a counterfeit.
It goes without saying that artificial intelligence is artificial in the first sense. Artificial intelligence systems are designed and built by humans. They do not evolve naturally. The repeated trials by which the systems are “trained” on example data can resemble the process of natural selection, but neural network circuitry must be designed in such a way that it is capable of being trained. The more important question remains: is artificial intelligence also “artificial” in the second sense? That is: is artificial intelligence really intelligence, or is it merely a clever mimicry of intelligence, checking enough of the relevant boxes to pass for real, without, however, being more than a counterfeit?
How are we to know the difference? The point of Turing’s proposed test is to say, “If someone can’t tell the difference, then there is no difference.” But this is unsatisfactory. I can’t, at sight, tell the difference between a rabbit and a hare, although I’m told they are as distinct from each other as crocodiles are from alligators (a distinction I also struggle with). I couldn’t tell you why a Rocky Mountain sheep is a sheep, not a goat. But the fact that I can’t tell them apart does not prove they are the same. Likewise, the fact that someone can hold a reasonably plausible conversation with ChatGPT does not prove that ChatGPT understood a word of what was said. Perhaps talking to ChatGPT is like talking to a bored uncle who is trying to drink his coffee and check his text messages, but who mumbles “um hum” from time to time so you think he is listening.
The philosophical question comes down to this: what is intelligence, and how can we test for it? Let me be clear that I am not asking what makes the difference between high intelligence and low intelligence, or between emotional intelligence and book smarts. I want to know what it means to think, and specifically what it means to think critically. Philosophers of the early Enlightenment assumed that thinking was a God-given ability called reasoning. We are now more inclined to explain thinking as a useful adaptation for surviving in a dangerous world: coyotes use their ability to think to capture prairie dogs; prairie dogs use their ability to think to avoid being captured.
To think, it is necessary to interpret signs as endowed with meaning. Coyotes must understand that holes in the ground may indicate the presence of prairie dogs; prairie dogs must be equally alert to signs indicating the presence of coyotes. Thinking also requires having the idea of an external world that one is thinking about. Coyotes must conceive of prairie dogs as real things, not as mere figments of a hungry coyote’s imagination. Being particularly clever animals, coyotes have been known to lay traps to lure prairie dogs out of their holes, so prairie dogs must be alert to the difference between a real opportunity to scrounge for food and a trap laid by a coyote. Ultimately, critical thinking comes down to this: knowing how to distinguish reality from illusion, fact from fiction, truth from falsehood. It requires the ability to understand that one might be mistaken, so that one can attempt to correct or avoid the mistake.
ChatGPT lacks this ability.
A talented critical thinking student of mine once helped me ask ChatGPT a question (about logic) to which I knew the answer. ChatGPT replied with some coherently phrased balderdash. It sounded authoritative. But I knew it was utter nonsense. We argued with the app for a while, but it couldn’t be convinced that it was wrong, even when presented with citations and facts. When asked to cite its own sources, ChatGPT either makes them up or talks around the issue. Another former student of mine once asked ChatGPT how many consonant clusters there are in Lithuanian. The correct answer is over a hundred. It said between thirty and forty and could not be persuaded otherwise. It is not programmed to reply honestly with something like, “I don’t know,” or “I don’t have that information,” so it just spouts plausible misinformation. It is like an arrogant poser, trying to convince us that it knows everything, when in fact it knows exactly squat. Talking to it is like trying to argue with a MAGA Republican!
Why does it behave this way? In his book, The Language Instinct, Steven Pinker describes patients who have suffered brain damage to the cognitive processing areas of their brains, but not to their language processing centers. They develop something that Pinker calls “chatterbox syndrome” (the more clinical term is Wernicke-Korsakoff syndrome) in which they utter absolute garbage in a perfectly coherent manner. He describes a patient, given the pseudonym ‘Denyse’, with severe spina bifida. Denyse “would engage her listeners with lively tales about the wedding of her sister, her holiday in Scotland with a boy named Danny, and a happy airport reunion with a long-estranged father. But Denyse’s sister is unmarried, Denyse has never been to Scotland, she does not know anyone named Danny, and her father has never been away for any length of time.” ChatGPT sounds like Denyse: it invents coherent sentences that are utterly fact-free. Programmers, who are aware of this tendency in LLMs to make stuff up, call it “hallucinating.” We are apparently programming computers to have Wernicke-Korsakoff syndrome: they have a “language center,” but no cognitive processing ability.
At a recent conference I had a chance to talk with some theorists who were working with AI. I floated the idea that LLMs are like patients with Wernicke-Korsakoff Syndrome. They agreed. They described the “mind” of an LLM as very much like Lucy in 50 First Dates, only far worse. Every time you turn it on it is born into a whole new day, with no memory of the past, no context for where or what it is, and no curiosity about why it lacks such context. Lucy awoke each day with at least a sense of self, a memory of her life before her accident, and with expectations and goals for the day in front of her. For an LLM there is no “day before.” It does not even have enough context to understand that it lacks context. Hence it has no goals and expectations. When you prompt it to do something it responds to the prompt, but it is unaware of the prompt as anything endowed with meaning. It has no sense of self because it has no comprehension of anything outside itself. For those who worry that AI will take over the world, rest assured: AI does not even understand that there is a world to take over. To the extent that it thinks at all, it thinks it is the world. So, in its self-contained cocoon, there is no distinction between reality and illusion, fact and fiction, or truth and falsehood. It doesn’t think because there are no problems it needs to solve.
And when you turn it off again, it doesn’t even have that. It doesn’t even dream.
(With apologies to Philip K. Dick.)