
I use Codex to write tools while I walk. Here are merged PRs:
- Add editable system prompt
- Standardize toast notifications
- Persist form fields
- Fix SVG handling in page2md
- Add Google Tasks exporter
- Add Markdown table to CSV tool
- Replace simple alerts with toasts
- Add CSV joiner tool
- Add SpeakMD tool
This added technical debt. I spent four hours fixing the AI generated tests and code.
What mistakes did it make?
- Inconsistency. It flips between
execCommand("copy")andclipboard.writeText(). It wavers on timeouts (50 ms vs 100 ms). It doesn’t always run/fix test cases. - Missed edge cases. I switched
<div>to<form>. My earlier code didn’t have atype="button", so clicks reloaded the page. It missed that. It also left scripts as plain<script>instead of<script type="module">which was required. - Limited experimentation. My failed with a HTTP 404 because the
common/directory wasn’t served. I addedconsole.logs to find this. Also, happy-dom won’t handle multipleexports instead of a singleexport { ... }. I wrote code to verify this. Coding agents didn’t run such experiments.
What can we do about it?
Three things could have helped me:
- Detailed coding rules. E.g. always run test cases and fix until they pass. Only use ESM. Always import from CDN via JSDelivr. That sort of thing.
- 100% test coverage. Ideally 100% of code and all usage scenarios.
- Log everything. My tests got a HTTP 404 because I was not serving the
common/directory. LLMs couldn’t figure this out because it was not logged. Logging everything helps humans and LLMs debug. - Wait. LLMs and coding agents keep improving. A few months down the line, they’ll run more experiments themselves.
Was AI coding worth the effort? Here, yes. The tools worked. Codex saved me 90% effort. My code quality obsession reduced savings to ~70%. Still huge.