mircea

Notes on ChatGPT and CoPilot

#notes

Generative AI models are all the hype at the moment. Tools like ChatGPT and Copilot that integrate large language models (LLM) into the IDE or into a fancy chat web interface get a lot of positive attention. And for good reason: they are surprisingly good compared to the status-quo before them.

They are so good that some of the smartest people I know are wondering whether these tools will forever change education and maybe even render programming obsolete.

I am much more moderate in my expectations. They will influence education, they will impact software development, they will steal some of the StackOverflow and Google traffic, but programming won’t become obsolete and education won’t change that much. Au contraire, understanding complex programs will even more important and being able to judge an essay critically and attentively will also become much more important.

Blinded by the success of the ChatGPT we forgot about the limitations of these tools. A good reminder is the Galactica “scientific text generation system” published and quickly retracted by Facebook/Meta research: the system which was trained on only scientific papers learned how to “parrot” scientific and encyclopedic articles; but people found so many horrible examples that it generated an outrage, and Facebook was forced to close the public demo.

Are the Generative ML Models the new Electronic Calculators?

I hear people compare ChatGPT with electronic calculators. They observe that once electronic calculators were widely available, schools stopped teaching students arithmetic. Instead, mathematics educators started teaching higher-level skills because knowing how to do arithmetic was not a necessary skill anymore. They wonder whether it’s going to be the same with ChatGPT and Co-Pilot. Alas, writing and thinking are so tightly related that we can not leave the writing to the OpenAI glorious auto-complete bot.

There is anohter significant difference between electronic calculators and large language models. Electronic calculators will still function even none of us can do arithmetic anymore. So there is indeed little reason for us to keep computing things in our head. LLMs, on the other hand, can only be a parasite on real code written by real developers, code that is used for training these tools. If developers stop writing new code, these tools don’t have what to learn from. So the skill of writing code will never go away.

What might end up happening is that for every new technology, and every new language, a cohort of pioneers will go in and carve the path – those will be the amazingly well paid programmers. After they have written sufficiently many programs that use that new language or API the large language model will start providing decent recommendations for “the masses”.

In any case, because of the need for a large number of examples that these systems need before they can become useful, the generative ML models will never do to programing what calculators did to arithmetic: render it unnecessary.

Will Our Students Not Need to Know How to Write Anymore?

Only if our students won’t know how to think, then they won’t need to know how to write.

Writing and deep thinking are tightly integrated. You can’t have one without the other. The only exception being LLM themselves… But, as the dictum goes, “God gave us writing to see how lame our thoughts are”. Only in writing can we spot the errors of reason in our thinking. However, the LLMs can’t spot errors of thinking. They can only generate the most likely next token in a series.

And thus, as with the code, the human duty will be to now start debugging the generated text.

Yes, the general essay about “life in a village in the southern france” can be automatically be generated. It can even be generated in the form of a sonnet. Or a hip-hop song in the style of Snoop dogg. But a thesis on “a tool that intercepts procrastination in order to induce more useful habits” will not be “generated” anytime soon.

A Valid Use Case: The Ultimate Auto-Correct

Given that these systems are good at predicting text, and most(?) of the writers of texts and reports at the moment are non-native speakers, I expect that one of the most wonderful usages will be correcting prose to make it more readable and understandable. Oh, how happy will I be if my students started using ChatGPT and they sent in more readable reports!

A workflow that I imagine is writing a paragraph, and then submitting it to ChatGPT asking it to rephrase for clarity.

However, a serious responsibility of the writer using the workflow above is making sure that the “corrected” text still expresses the original idea. Indeed, while the past spell checkers didn’t change the meaning of the text, the new generation will rephrase boldly, and at times will change the meaning. Thus, the writer has to attentively scrutinize the work of their co-author and ensure that it still expresses the original idea.

Another Possible Use Case

A system that rates your writing on surpriseness.

You write a blog post, and essay. You submit it to the system and it tells you how “surprising” it is. If it’s not very surprising maybe you’re not that original. Some kind of idea duplication detection. So not syntactic but semantic duplication detection.

Sounds attractive. However, the originality might be in the details.

To think about

Further notes