Through Kuhn’s Lens
The Expertise Inversion
May 17, 2026 | 2620 words
The Tutor Who Cannot Read the Book
Through Kuhn’s Lens
A composition instructor at a regional state university posted a complaint to a teaching forum this month. She had asked her freshmen to write a 600-word personal essay without AI assistance. Twenty-three of twenty-six submissions tripped her detector. When she confronted the class, four students offered to show her, on their phones, the prompts they had used — not to confess, but to teach her. One walked her through a chain-of-thought scaffold she did not recognize. Another asked, with what she described as genuine concern, whether she had tried “the new Claude.” She had not. She was forty-one years old, tenured, and had read perhaps three articles about generative AI in the previous year. Her students treated her, she wrote, “like a substitute teacher who had wandered into the wrong room.”
The scene is not exotic. Tyton Partners’ 2024 Time for Class survey found that 59 percent of students reported using generative AI tools at least monthly, against 40 percent of faculty — and that the gap widened sharply when the question shifted from “have you tried it” to “do you use it in your weekly work.” A separate Wiley instructor survey this fall put faculty who feel “well-prepared” to address AI in their courses at under one in five. Inside that ratio sits the phenomenon worth examining: not that students use AI, but that they use it more skillfully than the people who grade them.
This column reads that inversion through the instruments Thomas Kuhn assembled in The Structure of Scientific Revolutions and refined in The Essential Tension. The point is not to dramatize the gap. It is to ask what the gap is actually measuring, and whether the loudest names for it — crisis, paradigm shift, cheating epidemic — are doing analytic work or merely emotional work.
The Anomaly Underneath the Anomaly
Kuhn used anomaly to mean a finding the current framework cannot absorb without strain — the result that does not fit, that the practitioner first tries to explain away, then to localize, then, if it persists, to take seriously. Anomalies are interesting not because they are loud but because of what they reveal about the frame that cannot accommodate them.
The surface anomaly here is easy to name. The older paradigm of classroom and workplace pedagogy — paradigm in Kuhn’s sense being the shared set of assumptions, methods, and model problems that organize a community’s practice — assumed the instructor’s superior fluency with the medium of instruction. The English professor read more books than her students. The senior analyst built better spreadsheets than the junior. Authority and competence were aligned, and certification flowed from the more skilled to the less. The inversion breaks that alignment. The student demonstrably operates the medium more deftly than the teacher, yet the teacher must still issue the grade.
Most of the discourse stops there. Faculty development offices respond with training; opinion pieces call for “upskilling”; deans commission task forces. The implicit theory is that the gap is a training lag, and that pouring professional development into one side of the ratio will close it.
But there is a second anomaly underneath, and it is the more interesting one. The two sides of the ratio may not be measuring the same kind of knowing at all. When a student says she is “fluent” with ChatGPT, she typically means she can produce acceptable outputs quickly — she has internalized which prompts yield which textures of response, which models handle which tasks, when to switch tools. When a faculty member says a student is “fluent,” she often means something closer to facile: the student can generate text but cannot evaluate it, cannot detect its hallucinations, cannot situate its claims in a literature. Each side uses the word fluency. Each side believes the other side has misunderstood what fluency means.
This is where the loud discourse — students are cheating, faculty need training — papers over the quiet question. The quiet question is whether the prior paradigm’s notion of competence (the slow accumulation of background knowledge against which any new claim is checked) and the emerging notion (the rapid orchestration of generative systems whose outputs are checked against, well, what?) are continuous skills at different stages, or different skills entirely. If continuous, the gap closes with training. If different, no amount of faculty workshops will close it, because the two sides are running different races and counting different finish lines.
Kuhn’s diagnostic move is to ask what the anomaly forces into view. Here it forces into view a question the AI literacy field has been reluctant to pose directly: what, specifically, is the knowing that AI fluency consists of, and is it the same kind of knowing the older curriculum was built to certify?
Two Communities, Two Exemplars
In “Second Thoughts on Paradigms,” collected in The Essential Tension, Kuhn drew a distinction that matters here. The disciplinary matrix is the broad apparatus a community shares — its concepts, its values, its standard techniques. The exemplar is narrower and more powerful: it is the concrete model problem, the worked case, that a community holds up as showing what good work looks like. Students learn physics not by memorizing Newton’s laws in the abstract but by working through the specific pulley problems that demonstrate what applying those laws feels like. The exemplar trains the eye.
Two communities are arguing past each other about AI in education, and they have different exemplars in mind. This is what Kuhn called incommensurability — meaning, roughly, that each side’s standard for what counts as a good example does not map onto the other side’s, so the two cannot even agree on what they are disagreeing about. He developed the concept across his career and refined it in The Last Writings — Incommensurability in Science, where he located incommensurability not in whole worldviews but in specific term-by-term translation failures between communities.
The AI industry community — vendors, developer-evangelists, the trade press, the venture-capital ecosystem that funds them — holds up a particular exemplar of student fluency. The model case is the high-schooler who builds a working web app over a weekend by prompting Claude, or the undergraduate who replicates a published economics paper in an afternoon using o1 and a CSV file. The exemplar is generative: the student produces something. The standard for fluency is throughput. Native adaptation is the praise term. In this frame, the faculty member who cannot do these things is, simply, behind.
The faculty community holds up a different exemplar. The model case is the student who submits a polished essay that misattributes a quote to the wrong philosopher, or the medical resident who confidently presents a fabricated drug interaction pulled from a chatbot. The exemplar is evaluative: the student fails to detect what is wrong with the output. The standard for fluency is judgment. Shortcut-taking is the criticism term. In this frame, the student who impresses the industry observer is, simply, undertrained in the skill the older curriculum was built to develop.
Notice what happens when the two exemplars are placed beside each other. They are not contradictory. The web-app-building student and the misattribution-submitting student may be the same student. The industry sees the first behavior and calls it fluency. The faculty sees the second behavior and calls it incompetence. Each side has correctly described something real, and each side has selected the exemplar that confirms its existing disciplinary commitments — the industry to production, the academy to verification.
This is why the dispute cannot be settled by exchanging more facts. When a vendor publishes a study showing that students using its tool complete assignments 40 percent faster, the faculty community asks whether the assignments were any good. When a faculty group publishes a study showing that AI-assisted student writing contains more factual errors, the industry community asks whether the errors mattered to the task. Each side hears the other’s evidence as off-topic. The disagreement is not at the level of data. It is at the level of what counts as the relevant case.
The Tyton statistic — 59 percent student use against 40 percent faculty use — does some work here, but only if pressed. Use is doing enormous work in that sentence. It includes the student who pastes an essay prompt into ChatGPT and copies the output, the student who uses Claude as a thesaurus, the student who runs three competing models against each other and synthesizes a draft, and the faculty member who has used Gemini once to summarize a meeting. These are not commensurable activities. The 19-point gap might reflect a skill difference; it might reflect a willingness-to-disclose difference, with faculty under-reporting because disclosure carries professional risk; it might reflect a definitional difference, with students counting any AI touch and faculty counting only substantive use. The number does not, by itself, tell us which.
What the number does establish is that the two communities are operating at different rates of contact with the tools. Higher contact does not necessarily mean better judgment about the tools. It usually does mean more articulated exemplars of what working with them looks like. The students have a thicker vocabulary for the activity than their instructors do. That, more than any raw skill gap, is what produces the scene in the freshman composition classroom — not that the student can do something the instructor cannot, but that the student has a richer set of model cases against which to interpret what is happening.
Auditing the “Paradigm Shift” Claim
The press will reach for the phrase paradigm shift. It almost always does. Kuhn set a high bar for the term, and the bar matters because cheapening it makes the concept useless for the cases where it actually applies.
In The Structure of Scientific Revolutions, a paradigm shift requires four things in sequence. There must be sustained anomaly that the existing frame cannot absorb. Normal science — the patient puzzle-solving work that fills in the established frame — must visibly fail to repair the anomaly. A rival frame must emerge that handles the anomaly while preserving most of what the old frame explained. And the transition must involve a generational change in practitioners, because older practitioners rarely convert. Kuhn was emphatic that local disruptions and tool upgrades do not qualify. Most of what gets called paradigm shift in the trade press is what he would have called normal science absorbing a new instrument.
The historical pattern is worth taking seriously. The pocket calculator arrived in classrooms in the early 1970s and produced a near-identical inversion: students who had been raised on the devices outpaced teachers who had been trained on slide rules. The discourse predicted the death of arithmetic. What happened instead was that the curriculum adjusted — long division retreated, estimation and number sense advanced, the calculator became a sanctioned tool on some exams and a forbidden one on others. The hierarchy re-stabilized within a decade. The same pattern played out with statistical software in the social sciences, where graduate students initially knew SPSS better than their advisors, and with internet search in the 1990s, where librarians watched undergraduates Google their way past reference desks. In each case the inversion was transient. The frame adjusted. Normal science absorbed the instrument.
Why expect this case to be different? Three features deserve weight, and the column should weight them reluctantly, because the easy answer is to declare novelty and the harder answer is to look for continuity.
The first feature is the breadth of the affected task domain. Calculators displaced arithmetic. Statistical software displaced certain computations. Generative AI plausibly touches reading, writing, coding, image-making, and analysis simultaneously. The surface area is larger. But surface area alone does not constitute a Kuhnian shift; it constitutes a wider absorption problem.
The second feature is opacity. A calculator’s output can be checked. A regression coefficient can be recomputed. A language model’s output cannot be checked by the same procedure that produced it, and often cannot be checked at all without bringing in the older paradigm’s verification skills — the close reading, the source check, the literature search. This is genuinely new, and it may be what makes the inversion stick longer than its predecessors. The instrument does not just speed up the old task; it produces outputs whose evaluation requires the very skills the instrument is being used to bypass.
The third feature is the speed of model change. Each prior inversion had a stable instrument. The calculator of 1975 was the calculator of 1985. The models that students are fluent with this semester will not be the models they are fluent with next semester. This favors the side that updates quickly — typically students — and disadvantages the side that institutionalizes slowly — typically faculty. The inversion may persist not because the underlying skill gap is permanent but because the instrument refuses to hold still long enough for the older community to catch up.
None of these features amounts to crisis in Kuhn’s sense. There is no failed normal science yet; there is barely any normal science yet. There is no rival frame; there are competing rhetorics. There is no generational replacement of practitioners; there is a training lag that institutions are already trying to close. The honest description is that this looks like a difficult absorption problem, not a revolution. The frame of pedagogical authority is under strain. It is not broken.
This matters because the paradigm shift framing licenses certain moves — wholesale curricular abandonment, the dismissal of older skills as obsolete, the celebration of any new fluency as progress — that the absorption problem framing does not. The first frame says: throw it out. The second says: figure out where the new instrument fits and what it changes about the work. Kuhn’s machinery, applied carefully, recommends the second.
What Would Move the Reading
Kuhn’s final discipline was to ask what evidence would actually shift the analysis. The question is harder than it sounds because it requires specifying, in advance, what observations would distinguish a transient training lag from a genuine reframing of what expertise means in this domain.
Three classes of evidence would push the reading toward genuine reframing.
The first is durable inversion at the post-training stage. If, five years from now, faculty cohorts who have received substantial professional development still cannot match the AI fluency of incoming students, the training-lag hypothesis weakens. The calculator inversion closed inside a decade. If this one does not close by 2030 despite institutional investment, the case for a deeper reframing strengthens. The specific evidence to watch: longitudinal surveys of faculty self-rated and observer-rated AI use, disaggregated by years of training received.
The second is the emergence of a stable evaluative vocabulary on the student side. Right now, students have richer exemplars but thinner concepts. They can do more with the tools than they can articulate about them. If, over the next several years, students develop a shared technical vocabulary for evaluating AI outputs — for distinguishing hallucination from error, for characterizing model strengths, for naming the failure modes — and if that vocabulary does not map onto the older paradigm’s evaluative vocabulary, that would be evidence of a genuinely new disciplinary matrix forming. The specific evidence to watch: do AI literacy curricula stabilize around terms the faculty community does not recognize, or around terms that translate cleanly into existing pedagogy?
The third is the failure of hybrid models. If, by 2028 or so, the assessment formats that currently look promising — oral defenses, in-class writing, process portfolios, AI-permitted exams with disclosure — turn out to produce reliable signals about student competence