
Part 2 covered skills and memory. Today I actually wrote code - and learned that having one AI review another AI’s work catches bugs neither would find alone.
Oh, and I renamed the project. TimeBlocker became ProFocusWork. Better branding, same concept.
Phases 1 and 2: Done
With the documentation and workflow infrastructure from Parts 1 and 2, actual implementation moved fast:
Phase 1: Monorepo Foundation
- pnpm workspace with Turborepo
- Shared TypeScript config
- ESLint 9 flat config
- packages/types and packages/utils
Phase 2: Convex Backend
- Full schema: users, projects, tasks, subtasks, timeBlocks, timeEntries, userSettings, pomodoroSessions, calendarConnections
- CRUD mutations for all entities
- Helper functions: requireAuth, getCurrentUser, verifyOwnership
- Custom validators
The Convex backend agent handled most of it. I’d describe what I needed, it would implement, the Distinguished Engineer agent would review. Standard workflow from Part 1.
New Agent: QA Engineer
Before diving into the review story, I also added a new agent to the roster: @qa-engineer.
The original workflow was: Implement → Distinguished Engineer Review → Done. But that’s incomplete. Code review catches design issues, not runtime bugs. I needed verification that features actually work.
The new workflow:
Implement → Engineer Review → QA Verification → Done
The QA engineer agent knows how to verify each client:
| Client | Verification Method |
|---|---|
| Convex Backend | Run pnpm test |
| Web (SvelteKit) | Playwright MCP - navigate, screenshot, interact |
| Desktop (Tauri) | Playwright MCP for UI + cargo test for Rust |
| iOS (Expo) | AXe CLI - screenshot, describe UI, tap elements |
I also added reference docs for AXe CLI (iOS Simulator automation) and Playwright MCP to CLAUDE.md, so the agents know the exact commands to run.
Then I did something different.
Enter Codex CLI
I’ve been testing OpenAI’s Codex CLI with GPT 5.2 Codex (xhigh reasoning mode). It’s good at code review - really good at finding edge cases and invariant violations.
After Phase 2 was “complete,” I opened up Codex CLI in a separate terminal and asked it to review the changes. The workflow is manual: I describe what was built, paste relevant code snippets, and ask for a critical review.
Me: Review the timeEntries.ts and timeBlocks.ts mutations.
Look for state invariant violations and edge cases.
Codex: [analyzes code, returns findings]
The findings were uncomfortable:
| Severity | Finding |
|---|---|
| High | timeEntries.start can reactivate completed/skipped blocks, bypassing guards in timeBlocks.start |
| Medium | timeBlocks.update permits setting durationMinutes on scheduled blocks (and vice versa) |
| Medium | Pomodoro timing uses userSettings instead of session’s stored durations |
| Medium | pomodoroSessions and calendarConnections defined in schema but no CRUD modules |
| Low | Phase 2 marked completed despite pending Convex initialization |
The Distinguished Engineer agent had approved this code. Codex found five issues it missed.
The State Invariant Bug
The high severity finding was subtle. Here’s what the code looked like:
// timeEntries.ts - start mutation
export const start = mutation({
handler: async (ctx, args) => {
const block = await ctx.db.get(args.blockId);
verifyOwnership(block, userId, 'time block');
// Stop any currently active time entries
// ... stops entries ...
// Update block status to active if not already
if (block.status !== 'active') {
await ctx.db.patch(args.blockId, {
status: 'active', // <-- This is the bug
startedAt: now,
});
}
},
});
The problem: if a block is completed or skipped, this code happily reactivates it. But timeBlocks.start has a guard:
// timeBlocks.ts - start mutation
if (block.status === 'completed' || block.status === 'skipped') {
throw new Error('Cannot start a completed or skipped time block');
}
Two entry points to the same state, one guarded, one not. Classic invariant violation.
The Fix Cycle
I copied Codex’s findings and pasted them into Claude Code. “Here’s a principal engineer review of the last commit.” Then Claude would fix the issues. I’d copy the updated code back to Codex for another review. Rinse, repeat.
Here’s how it went:
Round 1: Claude fixes the issues
Added the guard to timeEntries.start, fixed timeBlocks.update validation, corrected pomodoro timing, created the missing CRUD modules.
Round 2: Codex reviews the fixes
Findings:
- Medium: pomodoroSessions.start accepts override values without validation;
workMinutes can be <=0, longBreakAfter can be 0 (causes NaN in modulo)
- Medium: pomodoroSessions.start doesn't guard against completed/skipped blocks
New bugs in the fix code. The pomodoro session could be started on a finished block, and duration overrides weren’t validated.
Round 3: Claude fixes again
Added block status guard and input validation to pomodoroSessions.start.
Round 4: Codex approves
Findings:
- None. The two issues are addressed.
Recommendations:
- Consider enforcing integer inputs for pomodoro durations
Round 5: Final refinement
Added Number.isInteger() checks to prevent fractional modulo edge cases like pomodoroCount % 4.5.
The Final Changes
Here’s what got fixed across the review cycles:
// timeEntries.ts - guard BEFORE side effects
if (block.status === 'completed' || block.status === 'skipped') {
throw new Error('Cannot start a time entry for a completed or skipped block');
}
// timeBlocks.ts - reject cross-type fields
if (block.blockType === 'scheduled') {
if (args.durationMinutes !== undefined) {
throw new Error('Cannot set durationMinutes on a scheduled block');
}
}
if (block.blockType === 'duration') {
if (args.startTime !== undefined || args.endTime !== undefined) {
throw new Error('Cannot set startTime/endTime on a duration block');
}
}
// pomodoroSessions.ts - integer validation
if (args.workMinutes !== undefined) {
if (!Number.isInteger(args.workMinutes) || args.workMinutes <= 0) {
throw new Error('Work duration must be a positive integer');
}
}
And two new CRUD modules: pomodoroSessions.ts (320 lines) and calendarConnections.ts (262 lines).
Why AI Reviewing AI Works
The Distinguished Engineer agent is good at checking against project constraints and patterns. It knows the architecture, enforces conventions, catches obvious mistakes.
Codex (xhigh) is good at finding edge cases, state invariants, and logical inconsistencies. It doesn’t know my project constraints, but it’s relentless about “what if this value is zero” and “what if this function is called from a different path.”
They catch different things.
| Claude’s DE Agent | Codex xhigh |
|---|---|
| Project constraint violations | State invariant bugs |
| Architecture mismatches | Edge case inputs |
| Convention enforcement | Logical inconsistencies |
| Pattern adherence | Cross-function interactions |
Running both is like having a domain expert and a pedantic QA engineer. You need both.
The Multi-AI Workflow
The workflow is manual but effective. Two terminals, two AIs, one clipboard:
┌─────────────────────────────────────────────────────────┐
│ Terminal 1: Claude Code │
│ - Implementation │
│ - Project context │
│ - Fixes based on review │
└─────────────────────────────────────────────────────────┘
│
│ copy/paste
▼
┌─────────────────────────────────────────────────────────┐
│ Terminal 2: Codex CLI (xhigh) │
│ - Critical review │
│ - Edge case analysis │
│ - "What if this is zero?" │
└─────────────────────────────────────────────────────────┘
│
│ copy/paste findings
▼
Claude fixes
│
│ copy/paste updated code
▼
Codex re-reviews
│
▼
Loop until clean
It’s not automated, and that’s fine. The manual step forces me to read the findings, understand them, and decide what to pass along. Sometimes Codex flags something that’s actually intentional - I can filter that out before Claude wastes time on it.
It took three rounds today. Better to catch the state invariant bug now than debug it in production.
Hardening the Workflow
After this experience, I added a new rule to CLAUDE.md:
## MANDATORY: Never Skip Reviews
**⚠️ NEVER skip the Distinguished Engineer review after ANY implementation.**
This is a hard rule with no exceptions:
1. Implementation agent completes work → MUST invoke @distinguished-engineer
2. If review requests changes → fix them → re-review until APPROVED
3. Only after approval → proceed to QA verification or commit
The Distinguished Engineer review was already part of the workflow, but it was optional in practice. Now it’s enforced in the project instructions. Claude Code reads CLAUDE.md at startup, so this rule persists across sessions.
What I Learned
Different AIs have different blind spots. Claude is great at implementation and context-aware review. Codex is great at adversarial thinking and edge cases. Use both.
Guard placement matters. The timeEntries.start guard was initially placed after stopping active entries. If the check failed, the side effect already happened. Guard first, act second.
Review the fixes. The first round of fixes introduced new bugs. Always re-review after changes.
Integer validation prevents weird math. 4.5 % 4 = 0.5, not what you expect when checking “is this the 4th pomodoro?” Add Number.isInteger() for values used in modulo.
Current State
Phase 1: Monorepo Foundation - completed
Phase 2: Convex Backend - completed
Phase 3: Desktop App Shell - not_started
Commits:
- cc6f71d feat: add Convex backend with core schema
- 4f00a8b fix: address principal engineer review findings
Pending:
- Run `npx convex dev` to initialize and generate types
- Start Phase 3: Tauri + Svelte desktop shell
The Convex backend is done. All CRUD modules implemented, all review findings addressed. Tomorrow, the desktop app.
Part 3 of the ProFocusWork build series. Part 4 will cover Phase 3: building the Tauri desktop shell with Svelte.