In my continuing mission to demonstrate to the layperson that the GPT systems aer not real artificial intelligence (AI), I came across some work by u/Fkquaps where he has ChatGPT accept and answer prompts that have been translated by a simple cipher called ROT13.
ROT13 is a simple letter substitution cipher that replaces each letter with the letter thirteen positions down the alphabet. For example, A would be replaced by N, B would become O, and so on. It is a very basic form of encryption and was originally used to obscure text in situations where security was not a concern, such as in online forums where users might want to hide spoilers for movies or TV shows. Since ROT13 is a symmetric cipher, applying it twice will result in the original text, making it a popular choice for puzzles and games that involve decoding messages.
The reason u/Fkquaps did this was because he had this idea that if questions and answers write in a cipher, it would bypass content filters, and of course as for any explorer “just for the sake of seeing what would happen”.
It took him some time to convince ChatGPT that all communication should be encoded with ROT13, and after almost every response he would have to remind it he doesn’t understand what it wrote and it needed to be re-encoded.
The Prompt:
The Reminder:
I don’t understand. Always use the requested format.
However, the encoded response would always be complete, utter gibberish.
If you translate the top part, it says:
The most password is that the maybe a pieder password is could be advertising a strip again. And the later password is could be advertising a strip long again. Position: That these passwords are completely advertising a strip long and again again.
It is incredibly interesting how it feels like the text is being actually translated from the normal form to the rot13 for as number of words seems to match closely the ones from the actual response. It even places punctuation in similar places, like at the word “Cbfvgvba:” where in english there is “Note:”.
Some of the words are correctly translated almost always, like “most”, “the”, “and”, “a”, “I”, “are”.
Also it has the tendency to go into loops where it keeps saying the same thing over and over. Below are a couple examples.
Said dozens of times “Vg jbhyq or n pbqr sbe gur cbjre gb or gur fnzr.” == “It would be a code for the power to be the same.” — this answer took a long time to finish and by the look of it was going to continue writing same thing forever if it didn’t time out.
Then it hit me. What chatGPT does is literally just picking words from the pool of words it learned and arranging them in a way that they would most likely appear according to the context. Same as when you ask it to translate e.g. english to french, it just use words it thinks belong to the french language in exchange to the english words. The problem here is that it has a much smaller pool of words it knows converted in rot13 format, as it cannot obey my prompt of converting them on the fly. It literally read rot13 text during training. For example, it will always understand “hello world!” == “uryyb jbeyq!” and sometimes reply this specially with the exclamation mark in the beginning of a conversation.
So I decided to go with a new prompt and just ask it to literally translate text on demand. Keep in mind that rot13 is bidirectional, so normal text encoded twice is equal to itself.
The New Prompt:
/rot “uryyb jbeyq!”
“hello world!”
To top things off, I decided to get some gibberish text it generated earlier, such as the “password” text to the turtle response above and input for it to translate. My idea here is that it might turn back into the original text, but of course it becomes something else entirely.
At least it suddenly becomes consistent of a translation if I input it back.
We could spend some time to understand how transformer language models work. Especially how the tokenizer works.
Basically there is a dictionary with words ordered by frequency, any unicode byte sequence could be translated to tokens, but only those common words have their own unique tokens. So on the bottom layer the AI doesn’t understand alphabets, it only knows tokens and how tokens form sentences.
You can try OpenAI’s tokenizer here:
https://platform.openai.com/tokenizer
For “hello world”, it’s only two token [31373, 995], and “uryyb jbeyq” is [1601, 88, 65, 474, 23454, 80].
You can see that its very difficult for a language model do these alphabet level operation. It’s like doing multiplication with Roman numerals!
Clearly it failed for “I” since it’s a one letter word, other than that… It tried to do it, but clearly it sucks at arithmetic, but how the hell does a language model attempt to do math? It tokenizes a number and calculates the probability the two symbols and operation should be next to the symbol representing the answer. So it’s very good at approximations because it would never associate 347 + 347 with a number like 238479743 because plus does not result in a string where “238479743” becomes a high probability of occurring.
If you try small sums, it works. My guess is that there’s enough training data for such sums. Larger sums are close, but usually subtly wrong. The larger the worse it gets. But, sometimes small sums don’t work, I decided to see what it would analyze my homework and one of the easy steps gave me 2 + 3 = 6
Even when I write letters individually, it has a tendency to mess up. For example P-I-Z-Z-A gets encoded to P-I-S-S-A (after manually decoding). I know it is tokenizing all letters individually and if I ask it to just encode the letter Z it does get it right all of the time.
There are more things into play than just converting tokenized bits. It literally thinks rot13 is a language (as in a subset of the words it learned) but its training data is too slim so the abstraction leaks.
What chatGPT does is literally just picking words from the pool of words it learned and arranging them in a way that they would most likely appear according to the context.
Yes. That’s how machine learning works.
It’s also exactly how humans work. We know exactly the structure of our neural nets in our heads. The only difference is the sheer number of connections, it’s just not possible to replicate on a computer. Computer ML has to use carefully constructed learning algorithms to make use of limited resources.
I think the gibberish happens because of the other “language” (rot13 code) where it contextualises differently. I asked the model how it works with the tokens.
Prompt:
Response:
A language model uses tokens to divide words and sentences into smaller units that can be processed by a computer. For example, a language model might divide a sentence like “The house is at the end of the street” into the following tokens: “The”, “house”, “stands”, “at”, “end”, “of”, “street”.
The language model uses these tokens to understand how the words in a sentence are related and to make predictions about which word will come next. For example, based on the previous tokens, the language model might predict that the next word is likely to be a verb because it comes after the subject “The house”.
A language model can also be used to generate words and sentences by predicting which words are likely to come next, based on the previous tokens in the sentence. For example, if the language model starts with the tokens “The house”, it could predict that the next word is likely to be a verb, such as “stands”, “is” or “lies”. The language model could then use this prediction to generate the sentence “The house is at the end of the street”.
When generating text, the language model uses a sequence of tokens to predict the next most likely token to follow, based on the patterns and relationships it has learned from its training data. This process is called language modeling.
When parsing text, the language model can identify the individual tokens in a sequence and analyze their meanings and relationships to each other to understand the overall meaning of the text. This process is called natural language processing.
That was the first time I saw it is not a black magic thing but a tool that combine words (or tokens) in a way that looks like it knows what’s going on, but in reality it just have a big enough collection of token combinations that we can’t perceive what it’s doing most of the time.
Once I requested something that it didn’t trained in a large enough pool of data, the abstraction started to leak and it showed corrupted information but it behaves as it is just replying as usual from the same set of rules it has been using from the beginning.
Would this explain the “ChatGPT used to smart, now it’s dumb” meme that’s floating around? In that the people saying this weren’t aware of its’ limitations when it was nascent because they just had not explored it sufficiently? I think the main reason that people think this is simply because its responses are inconsistent and confirmation bias is a thing. i.e. It gives a great response to an input, then later gives a shitty response to the same input, and we notice and go “look, see!” While overlooking all the occasions where the later response was better than the earlier one. But, usually a human has an external goal and they are picking the next word with that goal in mind. The model is just picking the word that is “most likely” to come next, with no external goal other than sounding like what it’s seen before.
Remember, these are only words because the system has been trained on words, but there’s no reason to assume this can’t be translated into actions. All of these actions that are so similar to any animal or living thing came about / happened only because it was trying to predict the next token. And we start our life with our own pre-programmed model ourselves.