Startups & Business

Fractile Raises $220M to Speed Up AI Token Processing in Breakthrough Chip Deal

Fractile, a UK AI inference chip startup, raised $220M Series B. Its near-memory compute aims to drastically speed up AI token generation, rivaling GPUs.

Published 2026-05-14 10:16:33 • Bitvise Staff

Fractile Ltd., a British startup specializing in artificial intelligence inference chips, has secured $220 million in Series B funding, the company announced today. The investment underscores surging demand for hardware that can accelerate the costly, energy-intensive process of generating AI tokens—the basic units of output in large language models.

Founded in 2022 by Oxford-educated chip designer Walter Goodwin, who serves as CEO, Fractile has built custom inference chips aimed at slashing the time and power needed to run trained AI models. The company claims its architecture can deliver a tenfold improvement in token throughput compared to conventional GPU-based systems.

Background

The race to deploy generative AI at scale has exposed a critical bottleneck: inference. While training models grabs headlines, running them—converting billions of parameters into words or images—demands immense compute. Most firms still rely on Nvidia GPUs, but specialized chips for inference are emerging as a faster, cheaper alternative.

Fractile Raises $220M to Speed Up AI Token Processing in Breakthrough Chip Deal — Source: siliconangle.com

Fractile's approach centers on what it calls "near-memory compute," embedding processing logic directly next to memory cells. This reduces data movement, a major source of latency and energy waste. The company's first product targets token consumption—the rate at which a model outputs tokens per second—a key metric for real-time applications like chatbots.

Industry Experts Weigh In

"Inference is becoming the new frontier in AI hardware, and Fractile has shown a promising path to break the memory wall," said Dr. Elena Marchetti, a semiconductor analyst at TechInsights. "Their near-memory design could shave milliseconds off responses while cutting power draw by 40–50%."

Walter Goodwin, Fractile's CEO, emphasized the funding's strategic importance. "This round gives us the fuel to move from prototypes to volume production," he said. "Our goal is to let every developer run models as fast as they train them, without the astronomical electricity bills."

"The token consumption race is on, and the winner will dominate the next trillion-dollar inference market," added James Henderson, managing partner at lead investor Index Ventures, which participated in the round. "Fractile's technology is one of the most exciting we've seen in this space."

What This Means

For companies deploying AI at scale, Fractile's chips promise lower latency and operational costs. A query that today takes three seconds and costs a fraction of a cent could become nearly instantaneous and half the price. That shifts the economics of everything from customer-service bots to real-time video generation.

For the broader chip industry, the deal signals that specialized inference accelerators are no longer a niche. Fractile joins a crowded field including Groq, Cerebras, and d-Matrix, all vying to unseat the GPU. $220 million—one of the largest early-stage rounds in British AI hardware—validates the thesis that token throughput is the next battleground.

Regulators and data-center operators will watch closely. Faster inference means higher request volumes, potentially straining power grids. But Fractile claims its chips can deliver more tokens per watt, aligning with net-zero targets. Goodwin stated, "We're building for efficiency first. The planet can't afford brute-force scaling."

Funding and Milestones

The Series B was co-led by Index Ventures and DN Capital, with participation from existing backers including Oxford Science Enterprises. Total capital raised to date exceeds $300 million. The company plans to hire 150 engineers, expand its Bristol lab, and begin customer trials in the second half of next year.

Fractile's first commercial chip, code-named Telos, is scheduled for sampling in early 2026. It targets data centers running generative AI and could compete directly with Nvidia's L40S and AMD's MI300X.

For more on inference chip innovations, see our earlier report on the rise of near-memory compute and its impact on token consumption metrics.