Photon — Inference at the Speed of Light

PHOTON INFERENCE ENGINE — LIVE BENCHMARK DATA

PHOTON P99

0.00ms

◈ Photonic

H100 GPU P99

0ms

◇ NVIDIA H100

17.75× faster p99

Measured on 7B parameter transformer inference, batch size 32, PyTorch 2.3

Run Benchmark on Your Model Download Full Report

PERFORMANCE MATRIX

Photon vs. GPU vs. TPU

Independent benchmarks. Production workloads. No cherry-picked scenarios — these numbers are reproducible on your models via our public benchmark harness.

Metric	◈ PHOTON PHOTONIC	NVIDIA H100 GPU	Google TPU v5 TPU
P99 Inference Latency Batch 32, 7B param transformer	0.8 ms	14.2 ms	6.4 ms
Throughput Sustained, single node	840 K inf/sec	47 K inf/sec	180 K inf/sec
Cost per 1M Inferences On-demand, us-east-1, Feb 2026	$0.04 USD	$0.87 USD	$0.31 USD
Power Draw per TFLOP Measured at rack PDU	12 W	700 W	290 W
Provisioning Time From API call to first token	< 2 min	4–12 weeks min	2–6 weeks min
Uptime SLA Contractual, multi-zone	99.99 %	99.9 %	99.9 %

All benchmarks reproducible via public harness at bench.photon.io

Last updated: Feb 27, 2026 · Methodology v3.1

Run Benchmark on Your Model →

HOW IT WORKS

Computation at the
speed of light.

Photon's Optical Processing Unit (OPU) natively executes matrix multiplications — the backbone of transformer inference — using photons instead of electrons. Wavelength Division Multiplexing encodes multiple data streams on distinct wavelengths, enabling massively parallel MAC operations across a single silicon photonic waveguide.

▸ Zero thermal throttling

Photons generate near-zero heat. No TDP ceiling, no frequency scaling.

▸ Framework-transparent

Standard PyTorch, JAX, and ONNX runtimes. No model rewrite required.

▸ In-memory compute

Near-memory architecture eliminates intermediate storage reads.

160KTOPS

Peak throughput

300TOPS/W

Energy efficiency

1.33ns

Activation response

4–8bit

Native precision

INFERENCE PIPELINE

→

Input

PyTorch/JAX

◈

WDM Encode

Multi-λ

◈

OPU Compute

160K TOPS

◈

Activation

1.33ns

◈

Output

< 1ms

PRODUCTION RESULTS

Real workloads. Verified numbers.

The following benchmarks are from production deployments, not lab conditions. Each result was verified by the customer's engineering team.

ML Platform Engineering

Meridian AI

Llama-3 70B · PyTorch 2.3

P99 Latency

BEFORE

18.4ms

→

AFTER

0.9ms

20.4× faster

"We were running 48 H100s for our inference cluster. After migrating to Photon, we decommissioned 44 of them. Same throughput, zero procurement queue."

CTO Office

Vertex Systems

Custom MoE 12B · JAX

Monthly Compute Spend

BEFORE

$218K/mo

→

AFTER

$11K/mo

94.9% reduction

"The board was asking questions about our cloud bill every quarter. After the Photon migration, that line item disappeared from the conversation entirely."

Infrastructure Lead

Cascade Labs

Whisper Large v3 · ONNX

Power Draw

BEFORE

840W rack

→

AFTER

48W rack

17.5× efficiency

"Our data center had a power density problem — we were hitting limits. Photon let us triple our inference capacity in the same rack footprint."

COMPATIBLE FRAMEWORKS & RUNTIMES

◈ PyTorch 2.3◈ JAX 0.4◈ ONNX Runtime◈ TensorRT◈ Triton◈ vLLM◈ TGI◈ llama.cpp◈ PyTorch 2.3◈ JAX 0.4◈ ONNX Runtime◈ TensorRT◈ Triton◈ vLLM◈ TGI◈ llama.cpp

PRIMARY PATH

Run a Benchmark
on Your Model.

Three inputs. Sixty seconds. We'll return p99 latency, throughput, and projected monthly cost side-by-side against your current stack.

SECONDARY PATH

Download Full
Benchmark Report

42 pages of methodology, raw data, and reproducible test harnesses. Built for engineering teams who need internal ammunition before committing to a trial.

▸Full methodology v3.1 + test harness code

▸H100 · A100 · TPU v5 · Photon side-by-side

▸TCO model with 12-month projections

▸Migration playbook for PyTorch / JAX / ONNX

▸Security & compliance appendix (SOC 2 Type II)

Photon vs. GPU vs. TPU

Computation at thespeed of light.

Real workloads. Verified numbers.

Run a Benchmarkon Your Model.

Download FullBenchmark Report

Computation at the
speed of light.

Run a Benchmark
on Your Model.

Download Full
Benchmark Report