Smooth talker like an artificial intelligence
Artificial intelligences learn to speak thanks to “language models”. The simplest models allow for autocomplete on your smartphone: they suggest the next word. But the prowess and progress of more modern language models such as GPT-3, LaMDA, PaLM or ChatGPT is breathtaking, with computer programs for example being able to write in the style of a given poet, simulating deceased people, explaining jokes, translating languages, and even the production and proofreading of computer code, which was unthinkable just a few months ago. To do this, the models rely on increasingly complex models of neurons.
[Près de 80 000 lecteurs font confiance à la newsletter de The Conversation pour mieux comprendre les grands enjeux du monde. Abonnez-vous aujourd’hui]
When artificial intelligences speak indiscriminately
That said, the models are more superficial than these examples would lead us to believe. We compared the stories generated by the language models to the stories written by humans and found them less coherent, but engaging and less surprising than the stories written by humans.
More importantly, we can show that current language models have problems with even simple reasoning tasks. For example, when we ask:
“The lawyer visited the doctor; Has the doctor visited the lawyer? »
…simple language models tend to say yes. GPT3 even replies that the lawyer did not visit the doctor. One possible reason we’re exploring is that these language patterns symmetrically encode word positions, and therefore don’t distinguish between “before the verb” and “after the verb,” which complicates the distinction between subject and object in a sentence.
Furthermore, the theoretical limitations of “transformer”-based language models mean that they cannot distinguish even and odd sequences of a certain element, if these are interspersed with another element. In practice, this means that the models cannot solve a task we call a “pizza task” – a simple puzzle of the form:
“The light is off. I press the light switch. I’m eating a pizza. I press the light switch. The light is on? »
Here, a uniform sequence of light switches indicates that the light is off, but a BERT model fails to learn this. The most powerful models currently (GPT-3 and chatGPT) flatly refuse to conclude that the light is off.
[Près de 80 000 lecteurs font confiance à la newsletter de The Conversation pour mieux comprendre les grands enjeux du monde. Abonnez-vous aujourd’hui]
Even today’s language models struggle with negation and typically perform poorly on reasoning tasks as soon as these are more complex. For example, consider the following Chinese civil servant exam riddle:
” David gets to know Mr. Zhang’s friend Jack, and Jack gets to know David’s friend Ms. Lin. Everyone who knows Jack has a master’s degree, and everyone who knows Ms. Lin is from Shanghai. Who is from Shanghai and has a master’s degree? »
Current models only respond correctly in 45% of cases and chatGPT refuses to respond… while the best human performance is 96%.
The model doesn’t know that he doesn’t know
Most likely, the main drawback of current language models is that they are made for making analogies, but not for exactly memorizing too large a set. For example, they can’t remember a list of one million customers or a set of ten thousand proteins. The model, at her discretion, will forget some items. An even bigger problem is that she too can, always at her discretion, invent new things (a process aptly called “hallucination”). This is a fundamental consequence of machine learning techniques of linguistic models, which are optimized for interpolation and generalization, not for memorization.
The problem is that it is impossible to tell from the output of a language model whether an answer was memorized or invented. For example, when GPT-3 is asked where Alan Turing died, the model deadpans “Warrington”, instead of the correct answer (Wilmslow). ChatGPT answers this question correctly, but if asked lesser-known researchers, such as the authors of this article, make up facts. And when asked more complicated questions, he gives elegant, but false answers:
“Is the addition of two odd numbers odd? »
… the model replies politely:
“Yes, when you add two odd numbers, the result is always an odd number. This is because an odd number can be expressed as 2n+1, where n is an integer. When adding two numbers in this form, the result is (2n+1)+(2m+1) = 2(n+m)+2, also in the form 2p+1, where p is an integer. This means that the result is an odd number. »
On the form, the answer is very convincing, but false.
With these examples we want to demonstrate that it is currently imprudent to rely on a linguistic model to reason or make decisions. Models get better with time, know more, and know more and more about refraining from responding when they don’t have the information. However, other than simple questions, a language model can easily concoct an answer and with an equally contrived and approximate explanation or demonstration.
Other excellent methods for reasoning about exact facts
All of this isn’t to say that language models wouldn’t be amazing tools with mind-blowing capabilities. Nor is it to say that language models will never overcome these challenges, or that other methods of deep learning it will not be developed for this purpose. Rather, it is to say that at the time of writing, in 2022, language models are not the preferred tool for reasoning or for memorizing exact data.
For these functions, the preferred tool currently remains the “symbolic representations”: databases, knowledge bases and logic. These representations store data not implicitly, but as sets of entities (such as people, commercial products, or proteins) and relationships between those entities (such as who bought what, what contains what, etc.). Logical rules or constraints are then used to reason about these relationships in a way that has been proven correct, albeit usually ignoring probabilistic information. For example, this reasoning was used in 2011 by the computer Watson, during the game Jeopardy to answer the following question:
“Who is the Spanish king whose portrait, painted by Titian, was stolen with a weapon from an Argentine museum in 1987? »
Indeed, the question can translate into logical rules applicable on a knowledge base, and only King Philip II can match. Language models currently don’t know how to answer this question, probably because they can’t store and manipulate enough knowledge (links between known entities).
It’s probably no coincidence that the same large companies that build some of the most powerful language models (Google, Facebook, IBM) also build some of the largest knowledge bases. These symbolic representations are today often constructed by extracting information from a natural language text, i.e. an algorithm tries to create a knowledge base by analyzing newspaper articles or an encyclopaedia. The methods used for this are in this case the language models. In this case, language models are not the ultimate goal, but a way to build knowledge bases. They are well suited to this because they are very resistant to noise, both in their training data and in their inputs. They are therefore very well suited to handling ambiguous or noisy input, which is ubiquitous in human speech.
Language models and symbolic representations are complementary: language models excel in the analysis and generation of natural language texts. Symbolic methods are the tool of choice when it comes to memorizing exact items and reasoning about them. An analogy with the human brain can be instructive: some tasks are easy enough for the human brain to perform unconsciously, intuitively, in a matter of milliseconds (by reading simple words or entering the sum “2 + 2”); but abstract operations require painstaking, conscious and logical thinking (e.g. memorizing telephone numbers, solving equations or determining the value for money of two washing machines).
Daniel Kahneman dichotomized this spectrum into “System 1” for subconscious reasoning and “System 2” for strenuous reasoning. With current technology, language models appear to solve “System 1” problems. Symbolic representations, on the other hand, are well suited to “System 2” problems. At least for the time being, therefore, it seems that both approaches have their raison d’etre. Furthermore, a whole spectrum between the two remains to be explored. Researchers are already exploring the coupling between language models and databases, and some see the future in merging neural and symbolic models into ‘neurosymbolic’ approaches.
The original version of this article was published on été The conversationa news site dedicated to sharing ideas between academic experts and the general public.
Read more:
-
How artificial intelligence is influencing protein structure research
-
Football, a sport dissected by science but with often unpredictable results
Gaël Varoquaux has received funding from the National Research Agency (LearnI chair), BPI France and the European Union.