Initial impressions of MIT paper: Your Brain on ChatGPT

There’s a paper in pre-print from MIT called “Your Brain on ChatGPT” making the rounds that I’ve heard mentioned by enough friends or in enough places online that I wanted to blog a few of my initial impressions & thoughts.

For starters, this isn’t merely preliminary in the sense of “the paper is still in pre-print;” the authors themselves acknowledge the research was small-scale enough to call preliminary.

Said another way, it’s a good-enough study to say “this field is worthy of further research, please fund further research,” but not enough to say “we have concluded you should do XYZ,” so anyone already jumping to declare “you shouldn’t use AI because brain” isn’t using theirs enough themselves.

The easiest thing to point to as evidence of “this is just preliminary” is the way it selected & then grouped/separated users. If this were trying to purport itself as landmark conclusive research-backed recommendations on what we do going forward (which the authors aren’t), it’d be trivially easy to contend that, for starters, every participant was a university student in one of five of the region’s major schools:

“between the ages of 18 to 39 years old (age M = 22.9, SD = 1.69) and all recruited from the following 5 universities in greater Boston area: MIT (14F, 5M), Wellesley (18F), Harvard (1N/A, 7M, 2 Non-Binary), Tufts (5M), and Northeastern (2M) (Figure 3). 35 participants reported pursuing undergraduate studies and 14 postgraduate studies. 6 participants either finished their studies with MSc or PhD degrees, and were currently working at the universities as post-docs”

…so it’s pre-selecting for both age, above-average intelligence to start with, a certain generation’s technology literacy level, etc. (Or tech illiteracy level — see, for example, various prior reports of young-enough tech users being so accustomed to & reliant upon built-in search capabilities that they’re unfamiliar with the notion of folders/directories, or how files on their devices are stored/structured/located.)

And if you’re thinking 50-something participants is low enough to call statistical significance into question, get this: only 18 of those participants took part in all 4 sessions.

But that’s not even the biggest “gap” I see in what this paper tells us. I see a substantial missed opportunity in the way it grouped and separated participants. And there are entire questions participants were asked in the intake questionnaire which hardly get a mention later in the paper, and are much more of interest to me vs. the comparisons/contrasts the authors did make.

Those university-student-only participants were assigned as either ChatGPT-4o users, plain Google search users, or unassisted users:

ChatGPT-4o is a “non-reasoning” model, functionally different from a model which purposely takes more time & presents the users with more of its chain of logic such as o3. Subjectively, I cannot remember the last time I preferred the speed of a non-reasoning model over the additional transparency of a reasoning model. Starting from the point of model choice — which can be a proactive choice, itself involving non-negligible brainpower — LLM users who’ve been trained on how to make use of LLMs (e.g. are familiar with, or actively seeking to develop, novel techniques such as Chain of Density or inversion of question-asker) use them substantially differently than untrained users, who tend to be content with “vibes,” using it not unlike a 1990’s or early 2000’s user would’ve approached the “Ask Jeeves” search engine. This paper does mention that participants were given a welcome, briefing, and questionnaire, wherein “examples of the questions included: ‘How often do you use LLM tools like ChatGPT?’, ‘What tasks do you use LLM tools for?’, etc.” But I see nothing in the paper indicating how results differed for familiar vs. unfamiliar students, or students who were self-taught vs. had at least semi-formal instruction or comparable professional experience. I want to see studies looking for those distinctions. The closest this paper gets is when they flipped who was in the unassisted vs. LLM group, but that’s a recent-tool-history-effect comparison/contrast, not a trained vs. untrained user analysis.

Plain Google search has gotten steadily worse over the past several years by now. https://www.wheresyoured.at/the-men-who-killed-google/ is a decent summary. tl;dr Google can (and likely does) artificially force having to enter more searches before you find what you want (and therefore Google can get paid to serve more ads to your eyeballs) if they purposely make search worse than it used to be, and worse than it was on track to becoming prior to certain leadership.

If you’re still using plain Google searches, you’re both subjecting yourself to both more privacy-invasiveness, and more fighting with an uncooperative tool, than you need to be. Today, Google search is past its prime, so you’re better off both in terms of peace of mind and search performance by choosing something else entirely. Therefore I wouldn’t agree that measuring brainwave activity for plain Google usage is the most useful or clearest/cleanest comparison/contrast; if I was asked to use Google, I’d be concerned I was spending about as much time fighting the tool as getting what I want out of it, and I would expect that to affect brain scans.

Unassisted is quaint, but for a lot of tasks “ain’t nobody got time for that.” Brain activity is not the be-all end-all metric for intelligence. I’m glad to see the study included trying to make other observations such as students “sense of ownership” over their work, or satisfaction upon completion. Follow-up studies would do well to do similar questionnaires or measure biomarkers to likewise acknowledge the value in working smarter instead of working harder when it comes to day-to-day tasks. After all, there’s value in proverbially enjoying the journey, not merely the destination: It’s shortsighted to only desire delivery of written-word output, completed and completed quickly. We ideally would also do want the process of creation to be enjoyable in its own right. Or if impossible to be enjoyed directly, we want to complete day-to-day tasks efficiently, because the opportunity cost is how well we can attend to other things we’d find valuable.

Overall I found the paper interesting enough to skim through and write about, but it’s yet another textbook case of “many people online are cherry-picking stats and posting clickbait for content & likes without citing the paper or encouraging people to read it more deeply.”

Initial impressions of MIT paper: Your Brain on ChatGPT

Comments

Leave a Reply Cancel reply