In November 2022, OpenAI put ChatGPT in front of the public, and within a few months the meaning of a fifty-year-old word quietly narrowed. Ask someone in 2021 what artificial intelligence was and you might have heard about the recommendation engine deciding what plays next, the spam filter, the model that reads a chest x-ray, the routing that gets a package across a continent overnight. Ask the same question now and most people describe a single thing: you type, it types back.
None of that was the first time a person typed a question to a machine. Ask Jeeves was taking plain-English questions in the nineties, Google has fielded them ever since, and Siri has been in every pocket since 2011. But those tools retrieved — they pointed you at an answer that already existed somewhere. What went public at the end of 2022 generated: it produced fluent, original-looking text on demand, about almost anything, in seconds. That is the line it crossed, and it is why a fifty-year-old word collapsed onto a single product so fast. Hold on to that distinction — retrieval finds, generation produces — because the rest of this pillar turns on it.
The capability did not arrive overnight, either. The architecture underneath it — the transformer — was published in 2017. OpenAI’s GPT line built on that paper year after year: GPT-2 in 2019, GPT-3 in 2020, each one a sharper next-word predictor than the last. November 2022 was the moment that five-year-old machinery got wrapped in a free chat box anyone could open, and the public met all of it at once. The technology was old news to the field; the exposure was new. That is the date’s real significance — not an invention, but the morning a fifty-year-old research field became, in the public mind, a single website.
The narrowing looks like a harmless figure of speech. In practice it is a strategic blind spot, and it fails in two directions at once. You stop seeing the tools you already own, and you over-reach for the one tool you can see — trusting its answers in ways it was never built to earn. Both directions cost real money.
Everything else in this series depends on getting this first part right, so we start at the foundation: seeing the field at its real size, and knowing what the model in front of you is actually doing when it answers.
AI was never one thing
Artificial intelligence is the umbrella, not the product. Underneath it sits a research field that predates today’s chatbots by decades and that, on any given day, is doing far more economic work than generative AI does. This is not nostalgia for old techniques: as recently as 2024, across most industries more organizations reported putting over a fifth of their AI budget into traditional, non-generative analytical AI than into the generative kind. (McKinsey, The state of AI, 2024)
Six of the families living under that umbrella. Only the last one is generative:
Fraud scoring, churn, demand forecasting, credit decisions — the statistical workhorse, usually gradient-boosted trees like XGBoost.
Reading a scan, inspecting a part on the line, counting cars, guiding a tractor down a row.
Crew scheduling, vehicle routing, warehouse slotting, pricing. The reason your overnight package arrives on time.
Ordering a near-infinite catalog for one person: the feed, the next track, the product grid.
Explicit rules and logic encoded by people: tax engines, clinical decision rules, configuration checkers.
New text, images, audio, or code. The loudest corner — and just one of the six here.
None of this is a knock on generative models. The point is the proportion. When a board asks “what’s our AI strategy” and the room only hears “what’s our chatbot strategy,” the linear-programming model that already routes the delivery fleet and the vision system that could halve inspection cost never reach the agenda. Neither looks like the thing on the screen. The map shrank to a single landmark, and the rest of the territory went dark.
What text-generating AI is actually doing
Now the second half of the pillar, and the part that changes how you should read every answer one of these systems gives you.
Text-generating AI — the chatbots and coding assistants a lot of people now use every day — runs on a language engine that, stripped to its core, does not look up facts and does not reason its way from premises to a conclusion the way a person does. It predicts. Given the text so far, it produces a probability distribution over what the next small chunk of text — the next token — is most likely to be, samples one, appends it, and does the whole thing again. The fluency is real. The mechanism underneath the fluency is statistical pattern completion, learned from an enormous amount of text. (Vaswani et al., “Attention Is All You Need”, 2017)
Sit with what that implies. The model is optimizing for plausible, not for true. Most of the time plausible and true travel together, because text written by people who knew what they were talking about is the most common pattern in the training data. But the two can come apart, and when they do the model has no internal alarm that fires. It will complete the sentence “the case that established this precedent was” with a citation that has the exact shape of a real one and refers to nothing, in the same even cadence it uses to quote a statute that does exist. The tone never changes, because there is no internal signal that tracks truth. The confidence you hear is a property of the writing, not a measure of the facts.
This is not a thought experiment. In 2023 two New York lawyers filed a federal brief built on cases ChatGPT had invented, cited them in good faith, and were sanctioned $5,000 when the court and opposing counsel could not locate a single one. The decision is worth reading for the detail that the model, asked to confirm the cases were real, simply said yes. (Mata v. Avianca, S.D.N.Y., June 22 2023 — the sanctions ruling)
A clarification, since “generative AI” now covers a lot of ground. This pillar is about the models that generate text and code — the chatbots, the coding assistants, the agents that write and act on your behalf. Image, audio, and video generators are generative too, but they run on a different engine: diffusion, which sculpts a result out of random noise rather than guessing the next word. That is a separate mechanism and a separate conversation. Wherever this article says a model predicts, it means the text-generating kind you read, write, and build with.
This is also why a generative model can be brilliant on a topic with deep, consistent coverage in its training data and quietly unreliable on a niche where the patterns are thin or contradictory. The quality of the prediction rises and falls with the strength of the patterns it learned. The interface gives you no way to tell the two regimes apart from the outside, which is the whole reason the next five pillars exist.
The risk almost never lives in the model being wrong. It lives in the handoff — the moment a person takes a fluent, confident, unverified output and acts on it as if it had been checked.
But mine searches the web, and it has a thinking mode
If you use these tools daily, two objections should be forming, and they are the right ones. The assistant you use can look things up. And the newer models visibly do stop to think. If this article pretended otherwise, you would be right to close the tab. Both are real, and neither changes the engine underneath. Take them one at a time.
Retrieval and tools. A modern assistant is usually not a bare model. It is a model wired to a search engine, a database, or your own documents — the pattern engineers call retrieval-augmented generation, or RAG. When the system “looks something up,” a retrieval step fetches real text and drops it into the model’s context, and the model then predicts its answer over that fetched passage instead of over training-data patterns alone. (Lewis et al., 2020) This genuinely helps, and grounding a model in your own verified data is one of the highest-leverage things a business can do with it. The serious version goes well beyond pasting in a few documents: a curated knowledge base, an ontology that encodes how your entities actually relate, a knowledge graph the retrieval can walk — structured ground truth the model predicts against rather than loose text it predicts over. The richer and cleaner that scaffolding, the less room the model has to invent, which is most of the work in keeping hallucination down. But look at what moved. Retrieval improved the inputs; the thing consuming them is still a predictor. Hand it a real passage and it will predict a fluent summary — sometimes including a confident claim the passage never made. Retrieval lowers the rate of invented answers. It does not install a fact-checker.
Reasoning and “thinking” modes. The models that pause to work through steps before answering are doing something real, and on hard problems they are markedly better for it. What they are not doing is trading prediction for a different kind of cognition. The thinking is more prediction — the model writes out intermediate steps that make the final answer more likely to be right, the way working a long-division problem on paper helps a student arrive at the number. The extra steps are worth a great deal. They are still not the same as a system that understands what it wrote, which is why the discipline of checking the output does not get to retire just because the model showed its work.
The genuinely different thing — a system that understands and reasons across any domain the way a person does, rather than predicting within learned patterns — is what people mean by artificial general intelligence, and it does not exist yet. It is worth naming only because of what its arrival would do: the first organization to build something with real general competence would hold an advantage so large that everyone else would have to follow it, fast. Until that day, and nobody can tell you when or whether it comes, the responsible posture is the unglamorous one. You are working with prediction-based systems. Build for their strengths, design around their limits, and never quietly assume they have crossed a line they have not.
A lens for reading any AI system
Because the interface hides the mechanism, you need a habit that makes the mechanism visible again before you trust an output. The framework we teach is deliberately small enough to use in a meeting. We call it CIDA, and you can run it against any AI system in front of you — generative or not.
Where and how it's used. A throwaway draft and a clinical decision sit at opposite ends of the risk scale, even with the same model.
What data goes in, and how it's filtered. Inputs set accuracy, leak private information, and decide what the system could possibly know.
The model, its system prompt, the retrieval and grounding, the guardrails — the whole apparatus that produces the output. This is the step you actually engineer, and the one that moves the result most. For a language model the engine inside it is next-token prediction.
The biggest leverWhat happens to that output. The hand-off — to a person, or to the next automated step — is where most real-world failures occur. Decide who or what checks it before it acts.
Three of the four you mostly assess. The Decision procedure is the one you build — the grounding, the guardrails, the evaluation that decide whether the output is worth anything. It is the heaviest lever in the whole chain, and the place a data team earns its keep.
Run those four questions and the fog lifts. You stop asking “is the AI good” — an unanswerable question — and start asking “is this the right kind of system for this context, fed the right inputs, with a sane check on the action it triggers.” That is a question a team can actually answer, and it is the through-line of every pillar that follows.
Questions a skeptic asks
None of this is a reason to avoid text-generating AI. It is how you get the most — and the most accurate — work out of it. The people who win with these tools are not the ones who trust them the most or the least, but the ones who understand the machine well enough to know where to lean on it and where to check it. A few questions in that spirit, with straight answers.
If it only predicts the next word, how is it writing working code and passing the bar exam?
Because working code and exam answers are abundant in the training data, and predicting the next token over that corpus reproduces those patterns remarkably well. The capability is real, and the mechanism is still prediction. The two were never in conflict.
People are shipping whole apps in a day with it. Doesn't that prove it reasons?
It proves the patterns for common app scaffolding are dense in the training data and the model reproduces them fast. These tools shine on well-trodden ground and get shakier the more novel the problem gets. The day-one demo and the system that survives real users are different animals, and the distance between them is exactly the limitation worth respecting.
Aren't you underselling it to sound cautious?
Underselling is as much a mistake as overselling, and this is neither. The argument is not that the tool is weak. It is that understanding how it works tells you precisely where to trust it.
Mine cites sources and has a thinking mode. Doesn't that solve it?
Those help, and you should use them. Retrieval lowers invented answers; reasoning steps lift accuracy on hard problems. Neither turns prediction into knowing, so the habit of checking the output stays.
Won't grounding it in our own data — RAG, a knowledge base, an ontology — fix the hallucination?
It helps a great deal, and building that scaffolding well is one of the highest-value things you can do with these tools. But the model still predicts over the grounding and can misstate even a correct source. Good grounding lowers the risk sharply; it does not remove the need to check.
So should we just not use it?
The opposite. Pointed at the work where prediction fits, with a human check on the handoff, it is among the most valuable tools available right now. The limits are not a reason to avoid it — they are the map to getting the most accurate work out of it.
Won't the next model make all of this obsolete?
Every generation gets better at prediction. None has changed that it predicts. The literacy in this article is about the shape of the tool, which outlasts any single version of it.
"It doesn't understand" — isn't that just philosophy?
We keep it operational, not metaphysical. The testable question is whether the system has an internal signal that tracks truth. It does not, which is the whole practical reason you verify what it tells you.
What we will not claim (anti-fabrication)
GenAI is just autocomplete, so it's useless.
The mechanism is prediction; the capability is still real and, on the right tasks, remarkable. Underselling it is as much an error as overselling it.
LLMs can't reason at all.
They do not reason the way people do, and the newer models are genuinely stronger at multi-step problems. We describe the mechanism honestly without pretending the capability isn't there.
Where this goes next
This pillar is the foundation: see the field at full size, and know that the generative model in front of you is predicting rather than knowing. The next pillar follows directly from the mechanism — if the model optimizes for plausible over true, then hallucination is not a bug to be patched out but a property to be managed, and Part 2 is about managing it well.
Whether you already have AI systems and tools in place and want to maximize their impact, or you’re preparing to bring generative AI into the business for the first time and want to start on solid ground, the CIDA lens is where we’d begin — pointing each system at the work where it pays off and engineering the Decision step so the output earns trust. That is what a free 60-minute call is built for. For how this thinking shapes the data and AI work we actually deliver, see our approach to AI engagements.
––