Layer L2

Models

Intelligence refinement. Rent early, build custom at scale.

Why it matters

Foundation models are the smelters, expensive, few can operate at scale. Once refined, the gold is a commodity, which is why model providers need to move up the chain.

The Smelter & Refinery

Raw ore becomes pure gold through smelting. In AI: foundation, specialized, and reasoning models refine raw data into intelligence. Refining is expensive and only a few can do it at scale, but once refined, the gold is a commodity.

The 5 sublayers

L2a

Foundation & Multimodal Models

Large pre-trained generalists, GPT, Claude, Gemini, Llama, and vision-language-action and video models (Sora, Veo) that span text, image, audio, and motion

L2b

Specialized & Fine-Tuned Models

Domain-tuned, distilled, and PEFT/LoRA-adapted models for specific verticals or tasks (BloombergGPT, Med-PaLM, Codestral)

L2c

Embedding & Retrieval

Vector representations, search indices, reranking, and RAG infrastructure

L2d

Model Routing & Composition

Selecting, chaining, ensembling, or mixture-of-experts routing across multiple models per task to balance cost, latency, and quality

L2e

Reasoning & World Models

Extended chain-of-thought, planning, and multi-step inference, plus predictive world models (V-JEPA, Genie, Sora-as-simulator) that let agents and robots imagine outcomes before acting

, Layer diagnostic card · SCOI v1

Is a company really at L2?

The smelter, foundation, specialized, and reasoning models that refine data into general intelligence.

Inclusion tests · include if ALL

Trains models from scratch (or substantively post-trains with proprietary L1).
Owns model weights and can ship without third-party model licenses.
Compute spend is the dominant cost line.

Exclusion tests · exclude if ANY

Calls a closed-source API and fine-tunes prompts. That is L7, not L2.
Distills or wraps another lab's open weights with no novel training.
RAG over a model you don't own, L2c at most, usually L5c.

The L2 removal test

Remove your in-house model and substitute the best public foundation model. If the product is unchanged, you are not at L2.

Economic work this layer does

Converts raw data + compute into generalized capability that downstream layers can rent per token.

Canonical examples

OpenAI
Trains frontier models; charges per token; absorbs L7 wrappers structurally.
Anthropic
Frontier models plus L3 trust posture for regulated buyers.
Google DeepMind
Frontier models tied to L0/L4 distribution, a fortress, not a pure L2.

Anti-examples · look-alikes that fail

Most 'foundation model' startups
Fine-tunes on someone else's base. L2b at best, no frontier compute.
Open-source distillers
Weights ship to everyone, by Law I, no margin lasts.
Companies pitching "our model"
Quietly using GPT-4 underneath. L7, not L2.

Disagree with a classification?Open the classification table →

Who's playing here

OpenAIAnthropicGoogle DeepMindMeta AI

Verdict: Winner-take-most. Commodity risk high.

Case studies touching L2

Gamma at $2.1B: The Thin-Layer Graveyard in Real Time

Presentation generation lives at L7b, a single thin slice of the stack. Claude, Copilot, and Gemini now do it for free inside surfaces 100× larger than Gamma's. The Intelligence Cube predicted this before the market priced it in: when your entire product is one prompt away from being free inside an L4 you don't own, the valuation is a liability, not a moat.

Stack Overflow: When Your Community Becomes Training Data

Stack Overflow's traffic dropped roughly 35–50% after ChatGPT shipped. Fifteen years of community-built knowledge, packaged as L7b content and scraped into L2 training sets. The community that built the data captured none of the value; the model layer captured all of it. A textbook case of L1 data mis-packaged as L7 content.

Stability AI vs Midjourney: Why Open-Source L2 Couldn't Monetize

Stability AI open-sourced Stable Diffusion and watched the L2a it created become free infrastructure for everyone *except* Stability. Midjourney kept the model closed, built an obsessive Discord community, and compounded aesthetic memory at L8c. Same underlying technology, opposite layer architecture, 100× valuation gap. The cleanest L2-vs-L8 lesson in the open-vs-closed model debate.

McKinsey + OpenAI (Lilli): When the Consulting Firm Owns the Memory, Not the Model

McKinsey didn't build a model. It built Lilli, an internal assistant trained on 100,000+ McKinsey documents, 70 years of proprietary studies, and the firm's named expert network. OpenAI provides L2. McKinsey owns L1 (the IP) and L8 (the firm's institutional memory). The consultant doesn't get disrupted by the model, the consultant rents the model and keeps the moat.

L1 Data L3 Gatekeeping