Evals

This site, evaluated honestly.

I like models with one job, and evals that prove it. It would be hypocritical to ship a website without one. Two tables: real measurements, then an LLM-as-judge pass with a judge who is, in fairness, me.

The numbers (real)

MetricValueVerdict
Lighthouse · Performancestatic site, optimized images, zero framework JS 99 / 100 PASS
Lighthouse · Accessibilitytwo findings under appeal 96 / 100 PASS
Lighthouse · Best practices 100 / 100 PASS
Lighthouse · SEO 100 / 100 PASS
First contentful paint 1.2 s PASS
Largest contentful paint 2.0 s PASS
Cumulative layout shiftwas 0.206 at the first audit. the judge flagged it. fixed the same day. 0.000 PASS
Total transfer (home)most of it is fonts and my face ~214 KB PASS
Client-side frameworkshand-written vanilla scripts only 0 PASS
Em dashes shippedhard ban, enforced at the source 0 PASS
Pages builtAstro, static output 11 in ~1 s PASS

The judge (less real)

Hero buzzword densityno 'passionate', no 'leverage', no 'journey'. clean. PASS
Footer jokesbarely PASS
Arsenal references11 across 6 pages. flagged excessive. appeal denied. FLAGGED
Side-project countacknowledged. no remediation planned. WONTFIX
Easter-egg discoverabilitykeep clicking things PASS
Humilitythis page exists FAIL

Methodology: top table is a real Lighthouse run on the production build (headless Chrome, measured June 2026), plus counts from the build output. Bottom table is LLM-as-judge without human anchoring, which every eval person will tell you is malpractice. Appeals: shubhamgoel27@gmail.com.