OpenAI Broadcom LLM Inference Chip
Disclosure: This post may contain affiliate links. If you purchase through these links, we may earn a small commission at no extra cost to you.
OpenAI and Broadcom built a purpose-made chip for AI workloads. This is OpenAI's first custom processor, not a tweak of off-the-shelf silicon.
Broadcom handled manufacturing. Celestica is on system integration.
OpenAI and Broadcom introduced Jalapeño, a custom AI chip built specifically for LLM inference to improve performance, efficiency, and scale across AI systems. The chip is tuned for serving large language models at scale.
The goal: address the compute bottlenecks in ChatGPT-scale inference. The collaboration covers chip architecture, networking, and system deployment-each piece aimed at running LLMs more efficiently than commodity hardware.
1) Jalapeño - OpenAI & Broadcom LLM inference accelerator
Jalapeño is OpenAI's first custom AI chip for LLM inference. This is not a GPU with new firmware.
Jalapeño was designed from scratch for modern LLM inference, not retrofitted from legacy accelerators. OpenAI brought its operational experience from running ChatGPT, Codex, and its APIs directly into the design.
Nine months from concept to silicon. Broadcom built it; OpenAI will run it for serving AI models.
The chip mixes fixed-function and programmable compute. The architecture is focused on performance-per-watt, aiming to outclass GPUs for this workload.
The design supports today's LLMs and is meant to flex for future agentic AI products.
2) OpenAI-designed fixed-function tensor engines
Jalapeño uses fixed-function compute hardware for LLM inference. These tensor engines are built for the matrix math at the heart of transformers.
OpenAI designed these accelerators for its own LLM workloads. Fixed-function means less flexibility, more speed for the math that matters.
This is a break from GPU design. Jalapeño is optimized for the actual computation patterns of large language models.
Tensor engines sit alongside programmable compute in the chip. The hybrid design gives some adaptability for new AI techniques.
OpenAI led the accelerator design; Broadcom handled development and manufacturing. This division let OpenAI bake inference-specific optimizations into the silicon.
3) Broadcom Tomahawk integration for high-performance networking
OpenAI integrated Broadcom's Tomahawk networking chips into Jalapeño. Tomahawk is for high-throughput data movement between components.
This networking is critical for scaling multi-card clusters on frontier models. The integration is aimed at eliminating data-movement bottlenecks that can throttle LLM inference.
Broadcom handled the silicon and board/rack system engineering. Their networking tech is paired directly with the custom inference chip.
The Tomahawk networking solution balances compute, memory, and network resources. The result: less wasted data movement, more bandwidth for LLM workloads.
Broadcom's networking allows scaling across multiple chips and racks without choking on interconnects.
4) Celestica rack and system integration for Jalapeño deployments
Celestica's role: get Jalapeño from chip to data center. They handle board, rack, and system integration for the compute platform targeted for deployment by the end of 2026.
Celestica manages board, rack, and production integration. That means assembling the chips, configuring racks, and ensuring thermal and power are handled.
The partnership is clear-cut: OpenAI architects the chip, Broadcom implements silicon and networking, Celestica integrates the system.
This work is what gets Jalapeño into production data centers for inference workloads like ChatGPT.
5) Jalapeño's optimized inference kernels for ChatGPT-scale models
Jalapeño's inference kernels are tuned for LLMs. OpenAI leveraged its own knowledge of models, kernels, and serving systems at production scale.
Custom kernels target the compute-heavy operations: attention, matrix multiplies, token generation-done billions of times a day.
OpenAI used its own models to help develop the chip, optimizing for real-world deployment. Kernels are tuned for the GPT-5.x family.
Jalapeño is inference-only, not a training chip. This lets OpenAI design hardware that cuts inference costs sharply.
The tight coupling of chip and kernel means more performance for OpenAI's workloads.
6) First-generation Jalapeño performance-per-watt improvements
Early testing shows Jalapeño beats current chips on performance per watt. No public benchmarks yet.
The custom-built inference chip is tailored for LLMs and future agentic AI.
Better performance per watt means lower costs and more efficient scaling. The chip uses a systolic array with eight HBM stacks on TSMC's 3nm node.
Engineering samples are running GPT-5.3 Codex workloads in-house. Deployment is aimed for late 2026 at data center scale.
7) Multi-generation Intelligence Processor roadmap from OpenAI & Broadcom
Jalapeño is only the beginning-a multi-generation compute platform is in the works.
OpenAI brings LLM fundamentals and a roadmap for future models. Broadcom delivers chip implementation, board design, and rack integration.
Celestica is on board for manufacturing and scaling. The goal: deliver new processors for evolving AI workloads.
The roadmap is built to handle both current LLMs and future agentic AI. Jalapeño is architected to flex as models and workloads change.
8) On-device inference telemetry and observability features
Jalapeño includes built-in telemetry for real-time inference monitoring. Operators get performance metrics, latency, and resource use from the hardware itself.
OpenAI and Broadcom implemented observability tools to capture detailed span data during inference. The system logs metadata for each request, like processing time and token throughput.
Telemetry aligns with standard observability practices. Operators can trace how the chip handles queries and batches.
These features help teams spot bottlenecks and tune deployments. Data collected reveals how LLM applications perform under different loads.
Observability hooks into existing monitoring via standard protocols. Jalapeño metrics fit into broader dashboards without a new tooling stack.
9) Board-level solutions co-developed with Broadcom silicon
OpenAI and Broadcom built out board-level integration for Jalapeño. Broadcom handled chip, board, and rack system design for a full hardware stack.
Specialized circuit boards are optimized for the custom silicon. High-performance networking and power delivery are tuned for LLM inference.
Celestica's role: industrialize the platform through system integration. The three companies worked to ensure the boards scale across data centers.
Boards address thermal management, bandwidth, and memory layout for LLMs. OpenAI designed these from scratch with its own model serving needs in mind.
This approach gives OpenAI more control over the infrastructure behind ChatGPT and its APIs.
10) Jalapeño-enabled OpenAI serving stack optimizations
OpenAI designed Jalapeño from scratch around its LLM stack and product roadmap.
The processor plugs directly into OpenAI's infrastructure. This means software can be tuned to the chip, not the other way around.
Early tests of the custom accelerator show better performance per watt. The chip lets OpenAI refine its serving systems specifically for Jalapeño.
Optimization covers chip, board, and rack. Broadcom and Celestica handled networking and scalable deployment.
The chip went from design to tape-out in nine months. Large-scale deployment is set for late 2026.
Technical Overview of Custom AI Hardware
Jalapeño is a purpose-built ASIC, designed and taped out in nine months using AI-accelerated co-development. Manufactured on TSMC's 3nm process, it aims for roughly 50% lower inference costs than current GPUs.
Collaboration Dynamics between OpenAI and Broadcom
OpenAI and Broadcom jointly unveiled the Jalapeño chip on June 24, 2026. This is OpenAI's first custom-designed inference processor.
OpenAI designed the chip from scratch, drawing on its understanding of LLM fundamentals, model roadmaps, and product requirements. Broadcom handled chip implementation.
Celestica contributed to board and rack system integration, high-performance networking, and scalable infrastructure. The partnership delivered a nine-month turnaround from design to tape-out-fast by any measure.
OpenAI used its own AI models to accelerate parts of the chip design process. This software-hardware co-development compressed the schedule while keeping LLM-specific optimization in focus.
Architectural Innovations for LLM Inference
Jalapeño is a custom accelerator built for large language model inference workloads-not general-purpose AI. The architecture zeroes in on the computational patterns and memory access needs of transformer-based models.
Early testing showed better performance per watt than existing AI accelerators. Independent verification remains pending.
The chip targets roughly 50% lower inference cost per token than mainstream AI GPUs. The custom design is OpenAI's answer to compute-driven operational costs.
Large-scale deployment is planned for late 2026. Jalapeño is set as the foundation for a multi-generation compute platform.
Impact on Enterprise AI Applications
The Jalapeño chip targets gigawatt-scale data center deployment with significant cost reductions for organizations running AI workloads. Enterprise applications gain from improved processing efficiency and expanded deployment options.
Data Center Efficiency and Scalability
OpenAI and Broadcom built Jalapeño as an Application-Specific Integrated Circuit for LLM inference. The chip delivers 50% lower cost per token compared to traditional NVIDIA GPUs.
Organizations can process more language model requests within existing power and cooling constraints. The custom ASIC architecture is tuned for inference workloads.
Data centers running chatbots, coding assistants, and enterprise AI workflows see reduced operational expenses. The custom AI chip is optimized for LLM inference to handle model deployment at scale.
Enterprises get more predictable infrastructure costs as purpose-built silicon replaces GPU clusters. The nine-month development timeline shows what rapid hardware iteration for AI-specific needs can look like.
Implications for Real-Time Language Processing
Real-time applications live and die by latency and throughput. Jalapeño isn't a training chip-it's built for inference, and that focus shows up in every design choice.
Customer service platforms and conversational AI need predictable response times. OpenAI's partnership with Broadcom is about accelerating LLM inference and cutting operational costs for those workloads.
If you're deploying AI assistants, you can serve more users on less hardware. The chip's architecture strips out the overhead you get with general-purpose accelerators.
Summarization, translation, and content generation get cheaper when you halve inference costs versus GPUs. That's the difference between a proof-of-concept and a real deployment.
Related reading

Tech enthusiast and founder of Technize. Passionate about making technology accessible and helping people make smarter buying decisions.