New ASIC Chip Embeds AI Models Directly Into Hardware

New inference hardware claims up to 10x faster AI response times with drastically lower power and cost by embedding models directly into custom silicon rather than relying on GPUs.

Screenshot 2026 05 13 103043 — New ASIC Chip Embeds AI Models Directly Into Hardware

A new AI accelerator architecture by Taalas is challenging the conventional GPU-driven approach to artificial intelligence inference by converting entire AI models into dedicated silicon. The newly introduced chip architecture hardwires model weights and parameters directly into ASICs, enabling ultra-low latency and significantly lower energy consumption for large language model (LLM) inference workloads.

– Advertisement –

Unlike GPUs and even many AI accelerators that remain programmable, the new approach creates model-specific chips optimized for a single neural network. The result is dramatically faster inference performance, especially for high-volume workloads where the same AI model serves millions of queries repeatedly. According to disclosed benchmarks, the first-generation chip running a compressed Llama 3.1 8B model reportedly achieves over 14,000 to 17,000 tokens per second—far beyond most currently deployed inference hardware.

The key features are:

Hardwires AI models directly into ASIC silicon
Delivers over 14,000–17,000 tokens per second
Eliminates dependence on external HBM memory
Operates at significantly lower rack-level power
Supports rapid chip revisions in nearly two months

The architecture removes dependence on high-bandwidth memory (HBM), one of the costliest and most supply-constrained elements in modern AI systems. Instead, model parameters are etched directly into silicon, reducing memory bottlenecks and enabling lower rack-level power consumption. The company claims AI racks based on the architecture consume roughly 12–15kW, compared to 120–600kW for GPU-based AI racks.

The system also avoids liquid cooling and advanced packaging requirements, potentially allowing deployment in standard server infrastructure. Analysts suggest this could significantly reduce infrastructure costs for cloud providers and enterprise AI deployments.

However, the tradeoff is flexibility. Since each chip is tied to a specific AI model, updates require new silicon revisions. To address this, the startup says only two metal layers need modification during redesign, allowing updated chips to be produced within roughly two months instead of years.

The company plans to scale the architecture beyond smaller LLMs toward larger reasoning models and multi-chip AI systems later this year. Industry observers see the development as part of a broader trend toward highly specialized AI hardware, similar to how TPUs transformed cloud AI acceleration.

Post Views: 9

New ASIC Chip Embeds AI Models Directly Into Hardware

By uttu

Leave a Reply Cancel reply

You Missed

Insider One Acquires Bluecore to Strengthen Agentic Customer Engagement Platform – AlleyWatch

K18 Leave-in molecular repair hair mask Review – sarah fynn

The Hantavirus Isn’t the Biggest Threat We’re Facing

Filipino Doctor Sells 6,000 Soaps on TikTok to Send Mother to Hajj

We influence 20 million users and is the number one business and technology news network on the planet

New ASIC Chip Embeds AI Models Directly Into Hardware

By uttu

Related Post

JOB: Circuit Design Engineer At NVIDIA In Bengaluru

Electronics and Embedded Engineering Intern At Vector Robotics In Pune

Storage Drive Built for Growing AI Demand

Leave a Reply Cancel reply

You Missed

Insider One Acquires Bluecore to Strengthen Agentic Customer Engagement Platform – AlleyWatch

K18 Leave-in molecular repair hair mask Review – sarah fynn

The Hantavirus Isn’t the Biggest Threat We’re Facing

Filipino Doctor Sells 6,000 Soaps on TikTok to Send Mother to Hajj