A Short History of AI: From Turing to Transformers

Artificial intelligence did not invent itself in 2022 when ChatGPT went public. The ideas, the algorithms, and most of the engineering tricks that make modern AI work were built up over seven decades of quiet (and occasionally loud) progress. Knowing this history matters because it tells you what is genuinely new, what is recycled, and what might be coming next. Every wave in AI has followed roughly the same pattern — breakthrough, hype, disappointment, quiet years of consolidation, then a fresh breakthrough that builds on the last — and the current wave is no exception. Here is eighty years of AI in one readable timeline, landing us exactly where the field stands in 2026.

1950s: the birth of an idea

The formal field of AI was born at a famous workshop at Dartmouth College in 1956, where John McCarthy, Marvin Minsky, Claude Shannon, and a small group of collaborators gathered for the summer to discuss "machines that think." McCarthy coined the term "artificial intelligence" for that gathering. The initial mood was boundless optimism: some attendees predicted human-level machine intelligence within a generation.

The decade's intellectual underpinnings predated Dartmouth. Alan Turing's 1950 paper "Computing Machinery and Intelligence" proposed the imitation game (now called the Turing Test) as a way to sidestep the unanswerable question of whether machines can truly think. His framing — can a machine behave indistinguishably from a human in conversation? — shaped the field's ambitions for decades.

The late 1950s produced the first wave of practical results. The Logic Theorist, developed by Allen Newell and Herbert Simon, proved mathematical theorems from Principia Mathematica. Arthur Samuel's checkers program learned to beat its creator. Frank Rosenblatt's Perceptron, a single-neuron learning machine, was demonstrated to the New York Times and treated as a breakthrough. The era ended with widespread belief that general-purpose thinking machines were only a decade or two away.

1960s and early 1970s: ambition, reality, and the first winter

The 1960s pushed ambitious projects that ran into hard walls. Machine translation, lavishly funded by US defence agencies during the Cold War, produced famously garbled Russian-to-English output, most memorably the apocryphal rendering of "the spirit is willing but the flesh is weak" as "the vodka is good but the meat is rotten." A 1966 report from the US National Research Council concluded that machine translation was not going to work and defunded it.

In 1969, Minsky and Seymour Papert published a book titled Perceptrons that mathematically demonstrated fundamental limits of single-layer perceptrons. The book was essentially correct but was widely misread as a death sentence for neural network research. Funding dried up.

By the mid-1970s the first AI winter had set in. Research budgets contracted. Departments closed. Ambitious projects were mothballed. The general mood shifted from "intelligent machines are imminent" to "AI was all hype, let us stop pouring money into it."

The lesson for readers in 2026: AI has been over-promised and under-delivered for seventy years. Every cycle feels different in the moment and looks remarkably similar in retrospect.

1980s: expert systems rise and fall

The 1980s brought a partial rehabilitation. Expert systems — hand-coded rules written by domain experts to mimic specialist knowledge — found commercial success. Programs like MYCIN (medical diagnosis) and DENDRAL (chemistry) demonstrated that narrowly-scoped rule-based AI could outperform humans in specific tasks.

Japan launched the Fifth Generation Computer Systems project in 1982, a ten-year, billion-dollar government effort to leapfrog the West with AI-first computers. It inspired competitive investments in the US and UK. For a while, the expert systems boom echoed the 1950s optimism.

The boom ended harder than the last. Expert systems turned out to be brittle — they handled cases they had been hand-coded for and failed catastrophically on edge cases. Maintaining the rule bases was expensive. When cheaper personal computers undercut specialised AI hardware, the market collapsed. The second AI winter arrived in the late 1980s and lasted into the mid-1990s.

In parallel, a small group of researchers (Geoffrey Hinton, Yann LeCun, Yoshua Bengio among them) kept neural network research alive through these lean years. They spent the winter refining the backpropagation algorithm and building tools. Their patience would pay off spectacularly later.

1990s and 2000s: the statistical revolution

The 1990s belonged to statistical machine learning. Rule-based expert systems gave way to methods that learned from data. Support vector machines, Bayesian networks, hidden Markov models, and decision tree ensembles became the new canon. These methods were pragmatic, competed effectively on benchmarks, and scaled.

A marquee moment came in 1997 when IBM's Deep Blue defeated world chess champion Garry Kasparov — not through general intelligence but through brute-force search combined with position-evaluation heuristics. The media treated it as AI's comeback. Inside the field, it was understood to be a specialised, narrow success with limited general lessons.

The early 2000s saw the rise of the web and with it an explosion of data. Google rebuilt search around statistical methods. Netflix launched a $1 million competition to improve its collaborative-filtering recommendations, which sparked a creative explosion in ensemble methods. The phrase "big data" entered the vocabulary around 2008, and the infrastructure — Hadoop, MapReduce, and eventually Spark — caught up with it.

Throughout this era, neural networks remained a backwater. They worked, but they were slow to train, required large amounts of data, and frequently underperformed simpler methods. That was about to change.

2012: AlexNet and the deep-learning breakout

In 2012 a team from the University of Toronto — Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton — submitted a convolutional neural network to the annual ImageNet image-classification competition. ImageNet was the standard benchmark of the era: 1.2 million images across 1,000 categories. The previous best error rate was roughly 26%. AlexNet dropped it to 15.3%, a landslide margin that made clear something fundamental had changed.

Three ingredients converged to make AlexNet possible. First, big data: ImageNet itself, curated over years by Fei-Fei Li. Second, GPU compute: the team ran their network on commodity Nvidia GPUs, which had quietly become powerful enough to handle large convolutional networks. Third, techniques: dropout for regularisation, ReLU activations for efficient training, and a deep-enough architecture to learn hierarchical features.

Within two years every major technology company had reorganised around deep learning. Google, Facebook, Microsoft, and Baidu set up research labs, hired the field's academics, and started shipping deep-learning-powered products at scale. Image recognition, speech recognition, and machine translation all saw dramatic accuracy improvements within eighteen months of AlexNet.

The deep-learning era had begun, and it has run continuously from 2012 to the present.

2017: Attention Is All You Need

In June 2017, a team at Google Research published a paper with the unusually cheeky title "Attention Is All You Need." The paper introduced a new neural network architecture called the transformer, built around a mechanism called self-attention. The pitch: handle sequences like language more efficiently by letting every element in the sequence attend to every other element in parallel, rather than processing them one at a time.

Transformers immediately took over natural language processing. Within two years, every major language task — translation, summarisation, question answering — was being dominated by transformer-based systems. Researchers also quickly discovered that transformers scaled beautifully: bigger models trained on more data kept getting better, with no apparent ceiling.

The paper was one of the most important in AI history. It unlocked what would eventually become GPT, BERT, Claude, Gemini, and every other large language model. If you want to read one paper that changed the course of AI, "Attention Is All You Need" is the one. In hindsight, every step after 2017 has been a consequence of that paper.

2020–2022: the scaling era and the GPT explosion

OpenAI released GPT-3 in 2020. It had 175 billion parameters, a then-unprecedented scale, and demonstrated that a single transformer model could perform dozens of language tasks — translation, summarisation, question answering, even basic coding — without task-specific fine-tuning. Just the right prompt was enough.

This was a philosophical shift. Previously, models were trained for specific tasks. GPT-3 showed that scale alone produced something resembling general-purpose language intelligence. Researchers at Anthropic and OpenAI began formalising scaling laws — quantitative relationships between compute, data, and model quality — that predicted capabilities before training.

A wave of frontier models followed: DALL-E (image generation), Codex (coding), AlphaFold (protein folding), and many others. The mood inside the field shifted from "can we reach AGI?" to "what do we do when we get closer?"

November 2022: ChatGPT and the public inflection point

On 30 November 2022, OpenAI released ChatGPT — a consumer-friendly web interface on top of a fine-tuned GPT-3.5 model. Within five days it had a million users. Within two months it had a hundred million. It became the fastest-growing consumer application in history.

ChatGPT was not the most capable AI system in the world at launch. Inside Google, Anthropic, and DeepMind, researchers had comparable models. What ChatGPT did was package one of those models with a clean chat interface, free access, and a conversational style that worked well for everyday users. It taught the world that AI could feel like a useful tool rather than a research curiosity.

The consequences were immediate. Microsoft invested ten billion dollars in OpenAI. Google launched Bard (later Gemini). Anthropic released Claude. Meta open-sourced Llama. A wave of AI startups and products began within months. The public discourse around AI intensified to a pitch not seen since the 1950s. Regulators began drafting laws.

The four years since ChatGPT's launch have changed the technology industry as profoundly as the launch of the iPhone in 2007.

2024–2026: reasoning, multimodal, and agents

The current era, still unfolding, has three defining themes.

The first is reasoning models. Starting with OpenAI's o1 in late 2024, a new class of model emerged that spends extra compute deliberating before answering. These models crush traditional benchmarks on maths, code, and science. Claude, Gemini, and DeepSeek all shipped their own reasoning variants within months. By 2026, reasoning modes are standard across the frontier.

The second is multimodality. Models have become fluent in images, audio, and video, not just text. You can show a model a diagram and ask it to explain the bug; you can play it a guitar riff and ask it to suggest chords. The boundary between "AI for text" and "AI for everything" has dissolved.

The third is agents. AI systems that can plan, use tools, and complete multi-step tasks are moving from fragile research demos to production-grade software. Claude Code, Cursor, and the wave of vertical AI agents (for customer support, scheduling, sales research, software engineering) are reshaping what "using AI" means.

As of 2026, the frontier is advancing faster than any regulatory, cultural, or economic system was designed to handle. That is the defining tension of the current decade.

Parallel tracks: robotics, vision, and games

The narrative above follows the language-and-reasoning thread, but AI has progressed in parallel along several other axes worth knowing about.

Computer vision followed its own arc. Convolutional networks dominated from 2012 through around 2020, when transformer-based vision models (ViT, Swin Transformer) began to overtake them. Today's frontier multimodal models handle images and text natively in the same architecture.

Reinforcement learning and games delivered some of the most visible AI moments. DeepMind's AlphaGo beat Lee Sedol at Go in 2016 — a result most researchers had not expected for another decade. AlphaZero then generalised the approach to chess and shogi, learning at superhuman level from scratch. AlphaStar mastered StarCraft II in 2019. These results showed that reinforcement learning at scale could crack problems previously thought intractable.

Robotics has advanced more slowly but steadily. Progress in manipulation (Boston Dynamics, Tesla Optimus, Figure), locomotion, and end-to-end learned control has turned into credible humanoid prototypes by 2026. The sim-to-real gap, long a blocker, is narrowing as simulation environments get more realistic and as large-action-space models learn to generalise from demonstration.

Scientific AI has been perhaps the quietest but most consequential thread. AlphaFold's 2020 solution to protein structure prediction is widely regarded as one of the most important scientific tools of the decade. Domain-specific models for materials science, weather forecasting, and mathematics are reshaping what research productivity looks like.

The people who shaped AI

A few names recur across the timeline, and they are worth knowing.

Alan Turing defined the field's philosophical question in 1950. Marvin Minsky and John McCarthy founded the field at Dartmouth in 1956. Geoffrey Hinton, Yann LeCun, and Yoshua Bengio — later jointly awarded the 2018 Turing Award — carried neural network research through the 1980s-1990s winters and led the deep-learning revolution. Fei-Fei Li built ImageNet, the dataset that made AlexNet's 2012 breakthrough possible. Demis Hassabis and David Silver led the DeepMind team that produced AlphaGo and AlphaFold. Ilya Sutskever, Sam Altman, and the OpenAI team shipped GPT-3 and ChatGPT. Dario and Daniela Amodei founded Anthropic and pioneered constitutional AI. Yann LeCun at Meta pushed open-source AI with Llama. Knowing these names makes reading AI news far easier; they are the recurring cast.

What the timeline teaches us

A few patterns repeat across seven decades.

Every wave is preceded by a quiet period where a specific technical idea (perceptrons, expert systems, statistical methods, deep learning, transformers) matures unnoticed. The "sudden" breakthrough is never sudden.

Every wave is followed by over-promising and eventual disappointment. The peak hype point is rarely the peak capability point. Expect the current wave to follow the same curve.

Every wave builds on the previous one. Today's transformers are not a rejection of 1950s perceptrons; they are a distant descendant of the same core idea.

And every wave produces durable, useful systems that quietly integrate into daily life. The expert systems boom left behind production rule engines that still run banks today. The statistical era left behind the recommendation engines that still drive Netflix and YouTube. The current wave will leave behind a durable layer of AI infrastructure that outlasts the hype cycle.

What is probably next

Predicting AI is a hazardous sport, but a few near-term trends look reasonably safe. Agents will get genuinely reliable at multi-step tasks, turning AI from a conversational tool into an automation layer. On-device AI will reach capability levels that make many cloud-based features obsolete for privacy-sensitive users. Scientific and engineering AI will compound quietly, producing research productivity gains that show up in papers and patents rather than consumer apps. Regulation will ship unevenly across the world, creating compliance complexity that favours larger players. And inference costs will keep halving roughly every year, pulling frontier capability down to free and cheap tiers about eighteen months behind where the flagship products live. If you want to look smart in 2028, bet on these. Any one of them could be wrong; the combination of several of them is almost certainly right.

AI has progressed in bursts — rule-based, statistical, deep-learning, transformer, reasoning — each breakthrough making the previous paradigm look quaint. The next one is already quietly incubating somewhere.

The short version

AI has been a seven-decade project, not a two-year sensation. It went through two winters, a statistical revolution, a deep-learning breakout in 2012, a transformer architecture breakthrough in 2017, and a public inflection point in late 2022. The current reasoning-model, multimodal, agent-oriented era is the most capable phase yet, and it is being built on ideas that are decades old. Understanding this lineage makes the present less mystifying and the future considerably less surprising.