Measured on 7B parameter transformer inference, batch size 32, PyTorch 2.3
Photon vs. GPU vs. TPU
Independent benchmarks. Production workloads. No cherry-picked scenarios — these numbers are reproducible on your models via our public benchmark harness.
| Metric | ◈ PHOTON PHOTONIC | NVIDIA H100 GPU | Google TPU v5 TPU |
|---|---|---|---|
P99 Inference Latency Batch 32, 7B param transformer | 0.8 ms | 14.2 ms | 6.4 ms |
Throughput Sustained, single node | 840 K inf/sec | 47 K inf/sec | 180 K inf/sec |
Cost per 1M Inferences On-demand, us-east-1, Feb 2026 | $0.04 USD | $0.87 USD | $0.31 USD |
Power Draw per TFLOP Measured at rack PDU | 12 W | 700 W | 290 W |
Provisioning Time From API call to first token | < 2 min | 4–12 weeks min | 2–6 weeks min |
Uptime SLA Contractual, multi-zone | 99.99 % | 99.9 % | 99.9 % |
All benchmarks reproducible via public harness at bench.photon.io
Last updated: Feb 27, 2026 · Methodology v3.1
Computation at the
speed of light.
Photon's Optical Processing Unit (OPU) natively executes matrix multiplications — the backbone of transformer inference — using photons instead of electrons. Wavelength Division Multiplexing encodes multiple data streams on distinct wavelengths, enabling massively parallel MAC operations across a single silicon photonic waveguide.
Real workloads. Verified numbers.
The following benchmarks are from production deployments, not lab conditions. Each result was verified by the customer's engineering team.
"We were running 48 H100s for our inference cluster. After migrating to Photon, we decommissioned 44 of them. Same throughput, zero procurement queue."
"The board was asking questions about our cloud bill every quarter. After the Photon migration, that line item disappeared from the conversation entirely."
"Our data center had a power density problem — we were hitting limits. Photon let us triple our inference capacity in the same rack footprint."
Run a Benchmark
on Your Model.
Three inputs. Sixty seconds. We'll return p99 latency, throughput, and projected monthly cost side-by-side against your current stack.
Download Full
Benchmark Report
42 pages of methodology, raw data, and reproducible test harnesses. Built for engineering teams who need internal ammunition before committing to a trial.