Evals

This site, evaluated honestly.

I like models with one job, and evals that prove it. It would be hypocritical to ship a website without one. Two tables: real measurements, then an LLM-as-judge pass with a judge who is, in fairness, me.

The numbers (real)

Metric	Value	Verdict
Lighthouse · Performancestatic site, optimized images, zero framework JS	99 / 100	PASS
Lighthouse · Accessibilitytwo findings under appeal	96 / 100	PASS
Lighthouse · Best practices	100 / 100	PASS
Lighthouse · SEO	100 / 100	PASS
First contentful paint	1.2 s	PASS
Largest contentful paint	2.0 s	PASS
Cumulative layout shiftwas 0.206 at the first audit. the judge flagged it. fixed the same day.	0.000	PASS
Total transfer (home)most of it is fonts and my face	~214 KB	PASS
Client-side frameworkshand-written vanilla scripts only	0	PASS
Em dashes shippedhard ban, enforced at the source	0	PASS
Pages builtAstro, static output	11 in ~1 s	PASS

The judge (less real)

Hero buzzword densityno 'passionate', no 'leverage', no 'journey'. clean.	PASS
Footer jokesbarely	PASS
Arsenal references11 across 6 pages. flagged excessive. appeal denied.	FLAGGED
Side-project countacknowledged. no remediation planned.	WONTFIX
Easter-egg discoverabilitykeep clicking things	PASS
Humilitythis page exists	FAIL

Methodology: top table is a real Lighthouse run on the production build (headless Chrome, measured June 2026), plus counts from the build output. Bottom table is LLM-as-judge without human anchoring, which every eval person will tell you is malpractice. Appeals: shubhamgoel27@gmail.com.