Perplexity turns the AI PC into a cloud traffic controller

Perplexity is trying to make AI agents smarter about where they run. The bigger story is not the demo itself, but the cost pressure forcing AI work away from the cloud-only model.

Perplexity used Computex in Taipei to show where AI infrastructure may be heading next: not fully local, not fully cloud, but a constant negotiation between the two. Its new hybrid inference platform decides in real time which AI tasks can run on a PC and which need to be pushed to larger cloud systems.

That sounds technical, but the business point is simple. AI companies are spending heavily to answer questions, run agents, search the web, read files and take actions on behalf of users. Every one of those steps consumes compute. When an agent thinks, checks, revises and acts, the meter keeps running.

Perplexity CEO Aravind Srinivas appeared with Intel CEO Lip-Bu Tan at Computex 2026, where the companies described a system that can route workloads across local devices and cloud infrastructure. Intel said the approach is meant to help with privacy, security, compliance and cost, while letting heavier tasks still use cloud scale when they need it.

As Reuters-sourced coverage carried by Investing.com reported, Srinivas described the system as acting like an air-traffic controller for AI jobs. That is the useful image. The PC becomes more than a screen for cloud software. It becomes one part of the compute network.

For the first wave of generative AI apps, cloud dependence was an acceptable trade. Models were large, consumer devices were weak, and the fastest path to market was to send the work to centralized servers. That logic still holds for many advanced tasks, but it is getting more expensive as AI moves from chat into agents.

An answer engine can often complete one turn and stop. An agent does not behave that way. It searches, reads, calls tools, checks permissions, opens files, compares options and sometimes repeats the whole process. Intel said at Computex that agentic AI can consume up to 1,000 times more tokens than single-turn reasoning in some cases. Even if that figure depends heavily on the task, it explains why inference is becoming a hard economic problem.

Perplexity already knows this pressure. In January, Bloomberg reported that the company signed a three-year, $750 million agreement with Microsoft to use Azure cloud services and access models through Microsoft Foundry, including systems from OpenAI, Anthropic and xAI. The company also said it had not shifted spending away from Amazon Web Services, its main cloud provider at the time.

Then in March, CoreWeave announced a multi-year partnership to support Perplexity's inference workloads on its AI cloud, using dedicated Nvidia GB200 NVL72-powered clusters. That tells us something important. Perplexity is not replacing the cloud. It is trying to make its cloud dependence more flexible.

Why enterprises will care first

The consumer version of this story is better battery life, faster responses and fewer trips to a remote server. Useful, but not enough by itself to change buying behavior. The enterprise version is stronger.

Companies do not want every document, customer record, contract, spreadsheet or engineering file leaving the device if it can be avoided. They also do not want employees waiting while every small AI operation travels through a cloud model. A hybrid system promises a cleaner division: local models handle sensitive or lightweight work, cloud systems handle the larger reasoning jobs that need more context or more power.

This is why Intel is eager to attach Perplexity to its AI PC push. Intel's Computex materials said the hybrid orchestration capability demonstrated on stage is available on Intel processors and in the Perplexity for Windows PC application. That gives Intel a more concrete argument for AI PCs than the usual claim that future software will eventually need more local acceleration.

There is a practical sales pitch here. If an enterprise can reduce cloud calls, keep more data near the user and still access frontier models when needed, the PC refresh cycle becomes easier to justify. It is not just a faster laptop. It is a local node in the company's AI system.

The risk is that this only works if the orchestration is genuinely good. A bad router creates new problems. Send too much to the device and the user gets slow or weaker answers. Send too much to the cloud and the cost and data concerns return. The product has to make those decisions quietly, quickly and reliably.

Perplexity also has to balance partners with control. Azure, AWS and CoreWeave give it access to scale, models and specialized GPU infrastructure. Local inference gives it another lever. The more intelligently it can shift work between those layers, the less exposed it becomes to any single provider's pricing, capacity or strategic priorities.

This is the next infrastructure fight hiding inside the AI PC story. The winners will not simply be the companies with the largest models or the flashiest agents. They will be the ones that know where each piece of work should run, what it should cost, and which data should never leave the machine in the first place.

Also read: China has put reusable rockets back in focus with Long March 12B • Google is making Android verify who is really calling. • Amazon faces a privacy test over Ring’s facial recognition feature