A quick look at some popular AI predictions
Recently, there have been a number of predictions voiced in several respected channels to which we all pay attention.
A quick evening note here—without going too deep or referencing sources and benchmarks—on the 3 claims that I am personally skeptical about.
Open source models will catch up with closed source models
The argument goes something like this: "The web is finite, therefore all data is/will ultimately be available for training, and what is not available (proprietary data) will remain a small fraction and thus of little impact. From there, it follows that if all models are trained on the same data, the performance will inevitably converge, and therefore there is not much value add in proprietary models."
The thing is, even if we assume that most relevant data is freely available (various legal issues aside), it takes a lot of resources to train and update the base model and even more resources to instruct it to make it useful (considering all the infrastructure setup and humans involved). While off-the-shelf models like LlamaN… may work in some contexts, it is unlikely to form a foundation for a true competitive advantage.
What can alter this dynamic? Perhaps new architectures and approaches to training and fine-tuning, making the process a lot less resource and cost intensive.
LLM utility is limited to text generation
(or image generation)
From a first principles standpoint, this is true. We see that auto-regressive generation (LLMs) is "an exponentially divergent diffusion process, hence not controllable."
However, a combination of LLM with pure logic software may be able to yield good planning/reasoning outcomes, albeit not yet broadly generalizable.
For example, having an LLM generate code, attempt to run/fix that code, and later evaluate the output demonstrates this is feasible. In addition, this limitation is so well understood and so many people are working on solving it that inevitably—scalable solutions will come, or at least it’s not unreasonable to bet on it.
LLMs are too slow for real-time user workflows
This is true on the face of it, and a massive improvement is unlikely in the medium term.
However,
there are a lot of (if not the majority of) workflows, especially enterprise workflows, that are not real-time and do not need to be real-time - communication, document generation, analytics, and many others). So, not only can we take our sweet time, but we can also execute very complex, multi-step LLM workflows without issues (except for cost). In fact, the cost right now is a much bigger problem.
There are also a ton of performance tricks one can play (from caching, progressive enhancement, different models for different tasks) to make the system look a lot more performant than it really is.
Happy Sunday and to the productive Monday,
Ruslan