On This Pageexpand_more
AI Research & Industry

The Open-Source LLM Power Shift: How Qwen, DeepSeek, and Mistral Changed Everything

Explore how open-source LLMs like Qwen, DeepSeek, Mistral, and Nemotron closed the gap with proprietary models in 2025-2026, reshaping AI's competitive landscape.

RayZ
The Open-Source LLM Power Shift: How Qwen, DeepSeek, and Mistral Changed Everything

In January 2025, the most-downloaded large language model on Hugging Face was not from Google, not from OpenAI, and not from any US lab; it was Qwen2.5-7B, from Alibaba Cloud. By mid-2025, the trend had become undeniable: across Hugging Face's download charts, the top ten models were dominated by open-weight releases from Chinese labs, European startups, and NVIDIA, while proprietary API providers watched their moats erode in real time. DeepSeek's models were pulling millions of downloads per month. Mistral held ground in Europe. And the old narrative that you needed a frontier proprietary model for serious work had quietly collapsed.

This is the story of how open-source LLM models rewrote the rules of the AI industry between 2023 and early 2026, who the key players are, what "open" actually means in this context, and what it all implies if you are building with these models today.

A Brief Timeline: From the First Leaks to the Present

The Open-Source LLM Timeline

To understand where we are, you need to see how fast this moved.

Early 2023: The open-source spark. Meta's leak of Llama 1 weights in early 2023 catalyzed the open-source movement, proving that capable large language models could be run outside of big labs. Stanford built Alpaca on top of leaked weights for under $600. The era of accessible large models began, almost by accident.

Late 2023: Mistral enters. Mistral AI, the Paris-based startup, releases Mistral 7B under Apache 2.0, a genuinely open license. It outperforms models twice its size on most benchmarks. Then comes Mixtral 8x7B, a Mixture-of-Experts model that competes with GPT-3.5 Turbo. Europe has a horse in the race.

Early 2024: Qwen emerges. Alibaba's Qwen team, which had been quietly iterating, releases the Qwen1.5 series. The models are good. Surprisingly good for a lab that most Western researchers had been ignoring. Qwen1.5-72B matches or exceeds competing open models across multiple benchmarks, and the smaller variants punch well above their weight.

Mid-2024: The floodgates. DeepSeek releases DeepSeek-V2, introducing a novel Multi-head Latent Attention (MLA) mechanism and a fine-grained MoE architecture that slashes inference costs. Qwen2 arrives with strong multilingual performance. Mistral releases Mistral Large and Codestral. The open ecosystem is no longer trailing the frontier; it is approaching it.

September-December 2024: Qwen2.5 and the Chinese surge. Alibaba releases the Qwen2.5 family, spanning from 0.5B to 72B parameters, plus specialized coding and math variants. Qwen2.5-72B-Instruct achieves GPT-4-class performance on key benchmarks at a fraction of the compute cost. This was the release that put Alibaba's Qwen lab firmly on the map. Before Qwen2.5, most practitioners outside Asia treated them as an afterthought; after it, ignoring them was no longer an option. DeepSeek releases DeepSeek-V2.5, merging their chat and coder models. The center of gravity in open-source AI visibly shifts eastward.

January 2025: DeepSeek-R1 shocks the world. DeepSeek releases DeepSeek-R1, a reasoning model that matches OpenAI's o1 on math and coding benchmarks. Trained with reinforcement learning and chain-of-thought, it demonstrates that frontier-level reasoning is achievable in open models. The release sends shockwaves through the industry and briefly affects tech stock prices. It also comes with controversy: OpenAI publicly accuses DeepSeek of distilling from its models, and rumors circulate about unauthorized access to proprietary data. Whether those claims hold water or not, the model is out there, and it works.

Early-Mid 2025: Qwen3 and the new normal. Alibaba releases QwQ-32B (a reasoning model) and then the Qwen3 series, which lands as a genuine turning point. The MoE flagship, Qwen3-235B-A22B, handles multi-step reasoning, agentic tool use, and complex coding tasks that six months earlier were the exclusive territory of GPT-4o and Claude, all while activating only 22B parameters per token. The 32B variant in particular becomes a go-to for teams that need serious capability without paying API prices. Alibaba also releases Qwen3-Coder, a 480B-parameter coding specialist that goes head-to-head with the best proprietary code generation APIs. For the first time, "just use the open model" stops being a compromise and starts being the obvious call for a wide range of production workloads.

Late 2025-Early 2026: Qwen 3.5 and the maturation. Alibaba releases Qwen 3.5, further extending its lead in the open-weight space with improved reasoning, coding, and multilingual capabilities across every size tier. Mistral ships its biggest release yet with Mistral Large 3 (675B) in December 2025, followed by Mistral Small 4 in March 2026. NVIDIA rolls out the Nemotron 3 family. After over a year of Chinese labs dominating the open-weight leaderboards, the West is finally clawing back some ground. The open-source ecosystem has fully matured.

The Qwen Story: From Underdog to Benchmark Leader

If there is a single storyline that captures the open-source LLM shift, it is Alibaba's Qwen.

In 2023, when Western AI circles were fixated on whichever US lab had the latest release, Alibaba Cloud's Qwen team was building methodically. Their early models were competent but unremarkable. The Qwen1.5 series in early 2024 got people's attention. By Qwen2.5 in late 2024, they had the attention of every serious ML engineer on the planet.

What Made Qwen2.5 Special

The Qwen2.5 release was not a single model but a comprehensive family. The lineup included models at 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, plus Qwen2.5-Coder and Qwen2.5-Math specialist variants. Key factors behind its impact:

  • Training data scale and quality. Qwen2.5 was trained on up to 18 trillion tokens of carefully curated data, with strong multilingual coverage across 29+ languages. The data pipeline emphasized quality filtering at a level that matched or exceeded any competing lab.
  • Instruction tuning depth. The instruct variants benefited from extensive RLHF and DPO training, making them unusually capable at following complex instructions out of the box.
  • The 72B sweet spot. Qwen2.5-72B hit an extraordinary performance-to-cost ratio. On benchmarks like MMLU, HumanEval, GSM8K, and MATH, it competed with models nearly six times its size. For enterprises doing inference cost calculations, this was transformative.
  • Specialist variants that actually worked. Qwen2.5-Coder-32B-Instruct became arguably the best open-source coding model available, rivaling proprietary code-generation APIs on SWE-bench and HumanEval+.

Qwen3 and the Reasoning Push

Following DeepSeek-R1's demonstration that open models could achieve frontier reasoning, Alibaba responded aggressively. QwQ-32B, released in early 2025, was their first dedicated reasoning model, featuring extended chain-of-thought capabilities. The subsequent Qwen3 series expanded this across their full model lineup, offering both "thinking" and standard modes. Qwen3 also introduced a broader MoE variant lineup, signaling that Alibaba was not just keeping pace but actively pushing the frontier of what open models could do.

The Qwen story matters because it proved that a non-US lab could compete at the absolute highest level of model development, release those models openly, and build a thriving ecosystem around them. Qwen models now power thousands of applications in Asia and increasingly in the West.

DeepSeek: The Architecture Innovators

If Qwen won on breadth and polish, DeepSeek led this charge with radical architecture innovations that changed what the community thought was possible.

DeepSeek-V2: Rethinking Attention and MoE

DeepSeek-V2, released in mid-2024, introduced two key innovations:

  1. Multi-head Latent Attention (MLA). Instead of standard multi-head attention with its large KV-cache overhead, MLA compresses key-value pairs into a low-rank latent space. This dramatically reduces memory requirements during inference (by roughly 90% compared to standard MHA), making long-context generation feasible on more modest hardware.
  2. Fine-grained MoE with shared experts. DeepSeek's MoE implementation used 160 experts with a fine-grained routing strategy, plus shared experts that are always active. This gave the model 236B total parameters but only ~21B active per token, achieving a remarkable balance of capability and efficiency.

The result was a model that cost roughly 1/10th of the per-token inference cost of comparably performing dense models. DeepSeek essentially proved that you could have GPT-4-class capability at GPT-3.5-class costs, and then released it to the world.

DeepSeek-R1: The Reasoning Breakthrough

DeepSeek-R1, released in January 2025, was arguably the single most consequential open model release in the history of the movement. It demonstrated that:

  • Reinforcement learning (specifically, GRPO, or Group Relative Policy Optimization) applied to a strong base model could produce chain-of-thought reasoning rivaling OpenAI's o1.
  • This capability could be distilled into smaller models (1.5B to 70B) that retained meaningful reasoning ability.
  • Frontier AI capabilities were not the exclusive domain of well-funded US labs with proprietary training pipelines.

The paper accompanying R1 was unusually transparent, detailing the training process, reward shaping, and the "aha moment" where the model spontaneously developed self-verification behavior during RL training. This openness accelerated the entire field's understanding of reasoning model training.

NVIDIA Nemotron: The American Entry

For most of the open-source LLM race, US companies were either keeping their best models proprietary (OpenAI, Google, Anthropic) or releasing solid but unspectacular open weights (Meta's Llama). NVIDIA changed that calculus with the Nemotron 3 family, which started rolling out in December 2025 and continued through early 2026.

The Nemotron 3 Lineup

The family uses a hybrid Mamba-Transformer MoE architecture, with three tiers:

  • Nemotron 3 Nano (30B total, 3B active per token) — released December 2025. Optimized for cost-efficient agentic tasks like debugging, summarization, and retrieval.
  • Nemotron 3 Super (120B total, 12B active per token) — released early 2026. A high-throughput reasoning model aimed at multi-agent workflows.
  • Nemotron 3 Ultra (500B total, 50B active per token) — announced but not yet released at the time of writing (April 2026). Intended for complex reasoning at scale.

All models support up to 1M token context and are optimized for NVIDIA's TensorRT-LLM inference stack, meaning they run best on the GPUs that already dominate the market. NVIDIA releases them under the Nemotron Open Model License, which is commercially permissive but requires attribution.

Where Nemotron Stands

Let's be honest: Nemotron 3 is not beating Qwen on accuracy. Comparing apples to apples at the ~120B tier, Qwen3.5-122B-A10B outscores Nemotron 3 Super on knowledge and reasoning benchmarks like GPQA (86.6% vs 79.2%). Across most accuracy-focused evals, Qwen maintains a consistent lead. Where Nemotron does shine is throughput and long-context handling; Super delivers up to 7.5x faster inference than Qwen3.5-122B, and holds up remarkably well at 1M token contexts. If you're running multi-agent systems at scale where speed matters more than peak accuracy, that's a real advantage.

But the real significance of Nemotron 3 is not the benchmarks. It's that there is finally a credible American horse in the open LLM race. For enterprises that have compliance concerns about deploying Chinese-origin models, or that operate in regulated industries where the provenance of AI models matters, Nemotron gives them a viable US-based alternative. That alone makes it strategically important, even if Alibaba's Qwen team continues to stay a step ahead on raw capability.

The shift to MoE architecture is what makes large open models practical for real-world deployment. MoE models that only activate a fraction of their total parameters per token offer the knowledge capacity of a very large model at the inference cost of a relatively small one. That's a key insight that DeepSeek, Mistral, and NVIDIA have all embraced.

Mistral: Europe's Standard-Bearer

Mistral AI occupies a unique position in the open-source LLM landscape. The Paris-based company, founded by former Meta and Google DeepMind researchers, has consistently punched above its weight.

Key Mistral contributions:

  • Mistral 7B (September 2023): Released under Apache 2.0, it proved that a well-trained 7B model could outperform models twice its size. The sliding window attention mechanism it introduced was widely adopted.
  • Mixtral 8x7B (December 2023): One of the first widely available MoE language models, demonstrating the architecture's viability for the open community.
  • Mistral Large and Medium (2024): API-first models that showed Mistral could compete commercially, with Mistral Large upgraded multiple times through the year.
  • Mistral Medium 3 and Magistral (2025): Medium 3 (May 2025) pushed efficiency further, while the Magistral series (June 2025) marked Mistral's entry into chain-of-thought reasoning models.
  • Mistral Large 3 and Ministral 3 (December 2025): The big leap. Mistral Large 3, a 675B MoE with 41B active parameters, landed as one of the strongest open-weight models available. Ministral 3 filled out the small end with dense 3B, 7B, and 14B models.
  • Mistral Small 4 (March 2026): A 119B MoE activating just 6B per token, combining instruction following, reasoning, vision, and coding in a single model. Punches well above its active parameter count.

Mistral's role has been less about scale and more about efficiency and openness. They were the first major lab to use a genuinely permissive license (Apache 2.0) for a competitive model, setting a standard that pressured other labs. They also championed the idea that European AI sovereignty required open models, a political argument that has gained traction with EU policymakers.

What "Open" Actually Means: The License Debate

One of the most contentious issues in the open-source LLM space is what "open" actually means. The term is applied to models with wildly different levels of openness, and the distinction matters enormously for enterprises and researchers.

The Spectrum of Openness

Fully open source (OSI-compliant):

  • Model weights, training code, training data, and evaluation code are all available under OSI-approved licenses.
  • Examples: Very few models qualify. Some smaller research models and datasets like OLMo from AI2.

Open-weight with permissive license:

  • Model weights released under Apache 2.0, MIT, or similar licenses. Training code and data may not be included.
  • Examples: Mistral 7B, Mixtral (Apache 2.0); DeepSeek models (MIT license); NVIDIA Nemotron (permissive NVIDIA Open Model License).

Open-weight with custom license:

  • Model weights available for download, but under a bespoke license with restrictions.
  • Examples: Qwen models (Qwen license for larger variants, Apache 2.0 for smaller ones); some NVIDIA models with specific use-case terms.

Open-weight with significant restrictions:

  • Weights available but with limitations on commercial use, modification, or specific use cases.
  • Examples: Some early Qwen releases, various research-only models.

Why It Matters

For enterprises, the license question is not academic:

  • Can you fine-tune and deploy commercially? Under Apache 2.0 (Mistral, DeepSeek) or MIT, yes, without restriction. Under the Qwen license, yes, with some terms for very large-scale deployments. Under NVIDIA's license, yes, with enterprise-friendly terms.
  • Can you use outputs to train other models? Most permissive licenses (Apache 2.0, MIT) allow this freely. Some custom licenses have specific terms around synthetic data generation.
  • Are you subject to geographic restrictions? Some licenses include jurisdiction-specific terms. This is particularly relevant given US-China trade tensions.
  • Does the license hold up legally? This remains an open question. Model weights occupy an uncertain legal space. They are arguably neither traditional software nor creative works, and copyright law has not fully caught up.

The Open Source Initiative (OSI) released its Open Source AI Definition in late 2024, which requires access to training data and code in addition to weights. Under this definition, most "open-source" LLMs are not actually open source. They are open-weight. This distinction has generated significant debate, but the practical reality is that for most users, open-weight access is sufficient to fine-tune, deploy, and build upon these models.

The China Factor

The most geopolitically significant development in the open-source LLM space is the rise of Chinese labs to parity (and in some cases, superiority) with their US counterparts in open model releases.

How Chinese Labs Outpaced US Open-Source

Several factors drove this:

  1. Massive investment. Chinese tech giants (Alibaba, Baidu, Tencent, ByteDance) and startups (DeepSeek, 01.AI, Zhipu) poured resources into foundation model development, backed by national AI strategy goals.
  2. Talent density. Top Chinese universities produce a large share of the world's AI researchers, and many who trained at US institutions returned to China to lead model development efforts.
  3. Training efficiency innovations. DeepSeek, in particular, demonstrated that architectural innovation could compensate for hardware constraints. Their models achieved remarkable performance with reportedly lower training compute budgets than comparable US models.
  4. Strategic openness. By releasing models openly, Chinese labs built global adoption and community goodwill, making their models the default choice for many developers worldwide. This is a form of soft power that has not gone unnoticed by policymakers.
  5. Data advantages in specific domains. For multilingual capabilities, especially across Asian languages, Chinese labs had natural advantages in data collection and quality assessment.

The Export Control Shadow

This story has a geopolitical backdrop: US export controls on advanced AI chips (A100, H100, and successors) to China, imposed starting in October 2022 and progressively tightened. These restrictions were intended to slow Chinese AI development. Instead, they appear to have accelerated architectural innovation, as Chinese labs optimized their models to do more with less compute. DeepSeek-V2's efficiency innovations were arguably born of necessity.

The irony is stark: export controls designed to maintain US AI superiority may have catalyzed the very efficiency breakthroughs that made Chinese open-source models globally competitive.

Model Size Tiers: What to Use When

One of the most practical questions facing practitioners is which model size to use for a given task. The open-source ecosystem now covers an extraordinary range, from sub-1B parameter models that run on smartphones to 400B+ parameter MoE models that require multi-GPU clusters.

Tier Comparison Table

TierSize RangeExample ModelsHardware NeededBest ForTypical Performance
Nano0.5B - 1.5BQwen 3.5-0.5B, Qwen 3.5-1.5BPhone / Raspberry PiOn-device, simple classification, basic extraction~40-50% MMLU
Micro3B - 4BQwen 3.5-3B, Phi-3.5-miniLaptop CPU, edge devicesSummarization, simple Q&A, mobile assistants~55-63% MMLU
Small7B - 8BQwen 3.5-8B, Mistral 7B, DeepSeek-R1-Distill-7BSingle consumer GPU (8-16GB VRAM)General chat, code completion, RAG applications~65-75% MMLU
Medium14B - 32BQwen 3.5-14B, Qwen 3.5-32B, QwQ-32B, DeepSeek-R1-Distill-32BSingle prosumer GPU (24-48GB VRAM)Complex reasoning, professional coding, multilingual~75-83% MMLU
Large~100B+ totalQwen3.5-122B-A10B, Nemotron 3 Super (120B), Mistral Small 4 (119B)2-4 GPUs (80GB+ each) or quantized on 1x H100Enterprise applications, complex analysis, near-frontier quality~83-88% MMLU
XL (MoE)200B+ totalDeepSeek-V3 (671B), Mistral Large 3 (675B), Qwen3.5 397B2-8 GPUs depending on modelMaximum capability, multi-turn reasoning, complex agentic tasks~87-92% MMLU

Note: MMLU scores are approximate and vary by exact model version and evaluation methodology. They serve as a rough capability proxy, not a definitive ranking.

Practical Guidance

  • If you need to run on-device or at extreme scale with minimal cost: Start at the Nano/Micro tier. Qwen 3.5-3B is remarkably capable for its size after good instruction tuning.
  • If you want the best single-GPU experience: The 7B-8B tier remains the sweet spot. Qwen 3.5-8B runs comfortably on a consumer RTX 4090 with quantization.
  • If you need strong reasoning without a GPU cluster: The 32B tier is the new sweet spot for quality. QwQ-32B and Qwen 3.5-32B offer outstanding reasoning at a manageable size.
  • If you need near-frontier quality: Qwen3.5-122B, Mistral Small 4 (119B), and similar models deliver performance that, for most tasks, is indistinguishable from early GPT-4.
  • If you need maximum capability and have the hardware: The MoE models (DeepSeek-V3, Mistral Large 3, Qwen3.5 397B) offer the best absolute performance in the open ecosystem.

Running these models locally or at scale? See LLM Inference Optimization for a detailed guide on quantization, batching, and serving strategies.

The Ecosystem: Infrastructure That Made It Real

Open models would be academic curiosities without the infrastructure to run them. The ecosystem that has developed around open-source LLMs is arguably as important as the models themselves.

Hugging Face: The GitHub of AI

Hugging Face has become the central platform for open model distribution. Key developments:

  • Transformers library: The standard interface for loading and running open models, with support for virtually every architecture released.
  • Hub hosting: Over 1 million models hosted as of early 2025, with robust versioning, model cards, and community discussion.
  • GGUF format standardization: Hugging Face's embrace of quantized formats (particularly GGUF, used by llama.cpp) made it easy for anyone to find deployment-ready model files.
  • Leaderboards: The Open LLM Leaderboard became the de facto benchmark comparison tool, despite ongoing debates about benchmark gaming and saturation.

Ollama: AI for Everyone

Ollama transformed local model deployment from a multi-step technical process into a single command. Running ollama run qwen3.5 on a MacBook became as simple as opening a web browser. By abstracting away quantization, memory management, and model loading, Ollama brought open models to hundreds of thousands of developers who would never have touched a CUDA toolkit.

vLLM: Production-Grade Serving

For production deployments, vLLM became the standard serving engine. Its PagedAttention mechanism, continuous batching, and support for tensor parallelism made it possible to serve open models at throughputs that rivaled proprietary API providers. vLLM's support for new architectures (including MoE models) has been critical in making cutting-edge open models production-ready.

The Broader Stack

  • llama.cpp: CPU-optimized inference that runs on everything from servers to Raspberry Pis. The project's GGUF quantization format became a community standard.
  • SGLang: A high-performance serving framework with RadixAttention for efficient prefix caching.
  • TensorRT-LLM: NVIDIA's optimized inference library, essential for squeezing maximum performance from GPU deployments.
  • ExLlamaV2: Specialized in extreme quantization (down to 2-3 bits) while maintaining quality, enabling large models on consumer hardware.

Implications for Enterprises

The open-source LLM shift has profound implications for how organizations approach AI adoption. The decision framework has fundamentally changed.

When to Choose Open Models

Choose open models when:

  • Data privacy is critical. Running models on your own infrastructure means data never leaves your environment. For healthcare, finance, legal, and government applications, this can be decisive.
  • Cost predictability matters. API pricing can be volatile and scale unpredictably. Self-hosted models have fixed infrastructure costs.
  • You need fine-tuning control. Open models can be fine-tuned on domain-specific data. Fine-tuning open models is now accessible to anyone (a topic for our upcoming LoRA guide).
  • You want to avoid vendor lock-in. Building on open models means you can switch between providers, hosting options, and model versions freely.
  • Latency requirements are strict. Self-hosted models on local GPUs eliminate network round-trips.

Consider proprietary APIs when:

  • You need absolute frontier capability. For the most demanding tasks (complex multi-step reasoning, highly nuanced creative writing, cutting-edge multimodal understanding), the top proprietary models still hold a slight edge.
  • You lack ML infrastructure expertise. Running open models well requires DevOps and ML engineering knowledge.
  • You need rapid iteration without infrastructure overhead. API-first development is faster for prototyping.

The Hybrid Approach

Most sophisticated organizations are adopting a hybrid strategy: proprietary APIs for the hardest tasks and prototyping, open models for production workloads where they meet quality bars. This approach optimizes both cost and capability. The key insight is that the "quality bar" open models can meet has risen dramatically: what required GPT-4 in 2023 can often be handled by a well-tuned Qwen 3.5-32B in 2026.

Build vs. Buy Calculus

A rough framework: if you are processing more than roughly 10 million tokens per day for a sustained workload, the economics of self-hosting open models almost certainly beat API pricing. Below that threshold, it depends on your team's infrastructure expertise, latency requirements, and data sensitivity constraints.

What Comes Next

Several trends are shaping the next phase of open-source LLMs:

  1. MoE becomes the default. Dense models are increasingly giving way to MoE architectures at larger scales. DeepSeek-V3 and Mixtral demonstrated the viability of this approach, and Qwen, NVIDIA, and others are following suit across their full lineups.
  2. Reasoning as a standard feature. Following DeepSeek-R1 and QwQ-32B, "thinking" modes with chain-of-thought reasoning are becoming standard offerings alongside traditional instruction-following modes.
  3. Multimodal expansion. Open vision-language models, audio models, and video understanding models are following the same trajectory that text LLMs traced 18 months earlier. The gap with proprietary multimodal models is closing fast.
  4. Smaller models get smarter. The 1B-3B tier is becoming surprisingly capable through distillation from larger models, better training data, and architectural improvements. On-device AI is increasingly viable.
  5. Agentic capabilities. Open models are being optimized for tool use, code execution, and multi-step planning, the capabilities needed for AI agent frameworks.
  6. Geopolitical fragmentation risk. US-China tensions could lead to ecosystem splits, export control expansion, or licensing restrictions that fragment the global open-source model community. This is the most significant downside risk to the open ecosystem.

Key Takeaways

  • The gap has closed. For the vast majority of production use cases, open-weight models now match or approach proprietary model performance. The days when "open source" meant "significantly worse" are over.
  • The center of gravity shifted East. Alibaba's Qwen and DeepSeek have emerged as genuine leaders in open-source AI, not just fast followers. Chinese labs are setting the pace in architecture innovation and training efficiency.
  • MoE changes the economics. Mixture-of-Experts architectures make large-capacity models practical to deploy, and this architectural shift is reshaping what "model size" even means.
  • "Open" is a spectrum. From Apache 2.0 (Mistral, DeepSeek) to custom licenses (Qwen, NVIDIA), the actual freedoms you get vary significantly. Read the license before building your business on a model.
  • The ecosystem is the moat. Hugging Face, Ollama, vLLM, and the surrounding toolchain have matured to the point where deploying open models is a solved problem, not a research project.
  • Hybrid strategies win. The most effective approach for enterprises is not "open vs. proprietary" but rather using each where it makes sense: open models for cost-effective production workloads, proprietary APIs for frontier-difficulty tasks.
  • This is not slowing down. The pace of open model releases has accelerated, not plateaued. Every quarter brings models that would have been considered frontier six months earlier.

The open-source LLM revolution is not a future event. It has already happened. The question for practitioners is no longer whether open models are good enough, but how to best leverage an embarrassment of riches.