PHOTON P99
0.00ms
◈ Photonic
H100 GPU P99
0ms
◇ NVIDIA H100
17.75× faster p99

Measured on 7B parameter transformer inference, batch size 32, PyTorch 2.3

Photon vs. GPU vs. TPU

Independent benchmarks. Production workloads. No cherry-picked scenarios — these numbers are reproducible on your models via our public benchmark harness.

Metric
◈ PHOTON
PHOTONIC
NVIDIA H100
GPU
Google TPU v5
TPU
P99 Inference Latency
Batch 32, 7B param transformer
0.8
ms
14.2
ms
6.4
ms
Throughput
Sustained, single node
840
K inf/sec
47
K inf/sec
180
K inf/sec
Cost per 1M Inferences
On-demand, us-east-1, Feb 2026
$0.04
USD
$0.87
USD
$0.31
USD
Power Draw per TFLOP
Measured at rack PDU
12
W
700
W
290
W
Provisioning Time
From API call to first token
< 2
min
4–12 weeks
min
2–6 weeks
min
Uptime SLA
Contractual, multi-zone
99.99
%
99.9
%
99.9
%

All benchmarks reproducible via public harness at bench.photon.io

Last updated: Feb 27, 2026 · Methodology v3.1

Run Benchmark on Your Model →

Computation at the
speed of light.

Photon's Optical Processing Unit (OPU) natively executes matrix multiplications — the backbone of transformer inference — using photons instead of electrons. Wavelength Division Multiplexing encodes multiple data streams on distinct wavelengths, enabling massively parallel MAC operations across a single silicon photonic waveguide.

Zero thermal throttling
Photons generate near-zero heat. No TDP ceiling, no frequency scaling.
Framework-transparent
Standard PyTorch, JAX, and ONNX runtimes. No model rewrite required.
In-memory compute
Near-memory architecture eliminates intermediate storage reads.
160KTOPS
Peak throughput
300TOPS/W
Energy efficiency
1.33ns
Activation response
4–8bit
Native precision
INFERENCE PIPELINE
Input
PyTorch/JAX
WDM Encode
Multi-λ
OPU Compute
160K TOPS
Activation
1.33ns
Output
< 1ms

Real workloads. Verified numbers.

The following benchmarks are from production deployments, not lab conditions. Each result was verified by the customer's engineering team.

ML Platform Engineering
Meridian AI
Llama-3 70B · PyTorch 2.3
P99 Latency
BEFORE
18.4ms
AFTER
0.9ms
20.4× faster

"We were running 48 H100s for our inference cluster. After migrating to Photon, we decommissioned 44 of them. Same throughput, zero procurement queue."

CTO Office
Vertex Systems
Custom MoE 12B · JAX
Monthly Compute Spend
BEFORE
$218K/mo
AFTER
$11K/mo
94.9% reduction

"The board was asking questions about our cloud bill every quarter. After the Photon migration, that line item disappeared from the conversation entirely."

Infrastructure Lead
Cascade Labs
Whisper Large v3 · ONNX
Power Draw
BEFORE
840W rack
AFTER
48W rack
17.5× efficiency

"Our data center had a power density problem — we were hitting limits. Photon let us triple our inference capacity in the same rack footprint."

COMPATIBLE FRAMEWORKS & RUNTIMES
PyTorch 2.3JAX 0.4ONNX RuntimeTensorRTTritonvLLMTGIllama.cppPyTorch 2.3JAX 0.4ONNX RuntimeTensorRTTritonvLLMTGIllama.cpp

Run a Benchmark
on Your Model.

Three inputs. Sixty seconds. We'll return p99 latency, throughput, and projected monthly cost side-by-side against your current stack.

Typical inference batch size in production

Results delivered via email within 60 seconds. No credit card required.

Download Full
Benchmark Report

42 pages of methodology, raw data, and reproducible test harnesses. Built for engineering teams who need internal ammunition before committing to a trial.

Full methodology v3.1 + test harness code
H100 · A100 · TPU v5 · Photon side-by-side
TCO model with 12-month projections
Migration playbook for PyTorch / JAX / ONNX
Security & compliance appendix (SOC 2 Type II)

Work email required. PDF delivered instantly.

0.8ms p99 · 21.8× cheaper · Zero procurement queue

Photon Inference Engine — managed photonic compute