I asked multiple coding agents and models to build the same app:
Create a single-page web app at
index.htmlthat beautifully renders a GitHub user profile and activity comprehensively. Pick the ID in the URL ?id=…, default to ?id=torvalds.
… and compared their quality, cost, and speed.
My observations:
Quality variance is the highest. Some models / agents produce great visuals, some average, some fail completely.
Cost and time variance are lower among the successful models. About 2X variance in each.
This is unlike non-code usage, where quality varies less than cost.
My takeaway: Pick the best model / agent. Don’t worry about speed and cost - the variance is lower.
Results: https://sanand0.github.io/llmevals/coding-agents/
