Differentiation in SaaS AI
A lot has happened in the last few months as the field has advanced (new models, new choices, lower prices), several companies have "inflected," and some things (CUDA dominance) have stayed the same.
What has remained elusive, however, is where differentiation will ultimately lie, specifically among startups.
In the interest of minimizing noise, I will omit opining on the most common themes that other people have been discussing at length (inference vs. training hardware, open-source vs. closed-source models, large vs. small models, various picks and shovels, and fine-tuning—though the latter remains an unsolved problem for application developers). All these areas have one thing in common: they require a lot of money and are thus mostly suitable for larger players.
The question is—where is, or is there, a differentiated opportunity? (And by differentiated, I mean outside of dumping a ton of money into a seed-stage company against a deck and a team—which can be differentiating on its own—or likely not.)
Let’s look at the basic structure of an LLM-first SaaS app. What does it need?
Access to a set of competent LLMs of different sizes at a reasonable cost and speed (many choices, nothing super differentiated).
A RAG system (i.e., pgvector)—not differentiated, especially with larger context windows, where the emphasis on retrieval accuracy is somewhat diminished.
An orchestration infrastructure: there’s basically a classic flow, such as intent detection, query rewriting, summarization, output checking, and correction. These frameworks will evolve, and there will be many options. In fact, with more and more LLM usage to write code itself, one could argue that frameworks like LangChain add more layers of complexity and problems than they solve—though I could be proven wrong. There could be new ways to think about frameworks in a world where most of the code would be generated.
A testing and observability infrastructure—definitely a major pain, but the problem area is obvious and many people are working on it. The question is whether it will spur s slew of new SaaS companies, or - considering immaturity of frameworks it will be more tightly coupled with service providers offering models and data services?
Coding tools (Copilot, Cursor, etc.)—again, quite obvious, and various tools will exist wrapping around ever-better models. The IDE becomes more of a very complex prompting interface and a feedback collection tool. There are several possibilities here:
A revolutionary new interface that we have not yet seen or thought of
A plethora of Cursor-like forks and VSCode plug-ins, though it’s difficult to see how they win, unless Microsoft simply yields. Cursor being superior to GitHub Copilot is mostly a function of Cursor RAG-ing your codebase, and GitHub being presumably more “privacy” conscious. A simple business decision by Kevin Scott can alter that equation overnight.
Differential fine-tuning and associated dataset management—no great solutions, but probably best addressed by model providers and companies like Scale.AI.
Alright. Some winners could emerge (especially around testing, and observability). It will likely converge into a few larger winners, most closely tied to model and data providers.
What are we overlooking? Can building an LLM wrapper be intrinsically differentiated? By intrinsically, I mean: is there an accumulation of leverage that creates a competitive moat over time? There’s always an opportunity to build a better product—better UX, clever influencer marketing, superior GTM—that leads to the acquisition of market share, and that can be differentiated, albeit hardly a sure thing.
There is likely one area that can be a source of leverage. We know proprietary data is that source in AI. But that’s too unstructured; to be differentiated, it needs to lead to a vastly superior outcome that is not obtainable by just using commonly available models.
So, what form is that, and how might it work?
When we look at a classic SaaS workflow (for example, responding to a customer service query), what happens is quite specific to each customer in two ways:
Prompts are likely different for each domain and further different for each customer.
Prompts involving function calling (i.e., opening a Jira ticket) are definitely different, especially if a custom integration is involved.
Workflow routing logic may be different (though this is technically outside LLM data).
Error checking and validation at the end of the action can likewise be very specific to a customer domain and environment.
Moreover, in other cases where user feedback is part of the loop, prompts may be dynamically updated (via LLM).
So, these prompts, in fact, constitute a customer-specific and user-specific dataset that grows over time, requires deep knowledge of customer workflows (and thus is not easily replicable), and grows over time.
Is this enough differentiation? Unclear—I think it depends on whether business workflows can converge and be serviced by a more intelligent model. However, to do so will require understanding these workflows first. While I think ultimately it will converge, there’s probably room on the way there.
Love to hear your thoughts,
Ruslan