The Near Term Future Belongs to Agentic AI
In the pursuit of Intelligence, the journey from simple LLMs to tangible agentic solutions is fraught with challenges. But we are solemnly coming closer.

Written by
Bryan Cardenas
For the past few years, Large Language Models (LLMs) have captivated the world with their ability to generate human-like text, code, and images. We've witnessed an incredible scaling race, with models growing ever larger. But we're now hitting a wall. Simply making models bigger yields diminishing returns. The true next frontier is about turning these models from passive oracles into active agents that can do things for us.
From Language Models to Action Models
The current generation of LLMs are brilliant conversationalists, but their utility often stops at the chat window. The near-term future belongs to Agentic AI: autonomous systems that can understand a goal, create a plan, and execute multi-step tasks in the digital world.
Google DeepMind’s Project Mariner offers a glimpse into this paradigm where the agent sees what's on a screen, text, images, forms. Subsequently it reasons about a complex goal and breaks it down into actionable steps. For this it needs to navigates websites, clicks buttons, and enters data to accomplish the task.
So to enable this we predict that new breeds of LLMs wil be soon emerging, trained specifically for these agentic workflows; excelling at reasoning, planning, and tool use.
The Bottleneck: Speed of Thought
The biggest obstacle to sophisticated agentic AI has been speed. The complex, chain-of-thought reasoning required for an agent to plan and act can be painfully slow on general-purpose hardware like GPUs, which were primarily designed for training, not ultra-low-latency inference. A task that takes a minute to complete is a novelty; a task that finishes in one second is a utility.
This is where a revolution in hardware and software is changing the game. Companies are no longer trying to fit a square peg in a round hole. Instead, they're building purpose-built hardware for AI inference.
Take Cerebras , which runs OpenAI's 120B reasoning model at a staggering 3,000 tokens per second, which is over 15 times faster than leading GPU clouds. Reasoning tasks that took nearly a minute now complete in a single second.
Or look at Groq, whose Language Processing Units (LPUs) are built from the ground up for inference speed. By using ultra-fast on-chip SRAM instead of slower HBM and a compiler that schedules every operation in advance, Groq eliminates the bottlenecks that plague GPUs. We need a fundamental architectural shift that allows even massive models to run in real-time.
The Hardware Endgame: Baking AI into Silicon
These developments are just the beginning. We are rapidly moving toward a future of ASIC-like hardware for LLMs, where the core components of the transformer architecture are baked directly into the silicon. This specialization will unleash an exponential increase in performance and a drastic reduction in cost, accelerating the adoption of agentic AI by orders of magnitude.
In a way, this mirrors the history of computing itself. Early analog computers were powerful but inflexible, with their function defined by their physical structure. The digital revolution, which separated memory from processing, enabled the programmable, general-purpose computers we use today.
Current LLMs, with their vast, homogeneous neural structures, are akin to digital models of those old analog computers. The next step is this hyper-specialization of hardware. By designing chips specifically for the mathematical operations that power AI we are creating systems that are more efficient at the task of intelligence. Remember, the human brain is consumign 12-25 watts of power at all times, while LLMs need far more for just one token inference, let alone what they require for training.
hence, the gap between a user's intent and a completed task is bound to collapse. Powered by a new class of specialized hardware, agentic AI will move from a research concept to a ubiquitous productivity tool, transforming how we work, plan, and interact with the digital world. The revolution won't be felt, at some point we take all these workflows and tooling for granted just as we've done with previous technology, by then we hope to stand prepared to supply.

