process

Building ProFocusWork Part 3: Multi-AI Code Review

7 min read

ProFocusWork - Multi-AI Code Review

Part 2 covered skills and memory. Today I actually wrote code - and learned that having one AI review another AI’s work catches bugs neither would find alone.

Oh, and I renamed the project. TimeBlocker became ProFocusWork. Better branding, same concept.

Phases 1 and 2: Done

With the documentation and workflow infrastructure from Parts 1 and 2, actual implementation moved fast:

Phase 1: Monorepo Foundation

Phase 2: Convex Backend

The Convex backend agent handled most of it. I’d describe what I needed, it would implement, the Distinguished Engineer agent would review. Standard workflow from Part 1.

New Agent: QA Engineer

Before diving into the review story, I also added a new agent to the roster: @qa-engineer.

The original workflow was: Implement → Distinguished Engineer Review → Done. But that’s incomplete. Code review catches design issues, not runtime bugs. I needed verification that features actually work.

The new workflow:

Implement → Engineer Review → QA Verification → Done

The QA engineer agent knows how to verify each client:

ClientVerification Method
Convex BackendRun pnpm test
Web (SvelteKit)Playwright MCP - navigate, screenshot, interact
Desktop (Tauri)Playwright MCP for UI + cargo test for Rust
iOS (Expo)AXe CLI - screenshot, describe UI, tap elements

I also added reference docs for AXe CLI (iOS Simulator automation) and Playwright MCP to CLAUDE.md, so the agents know the exact commands to run.

Then I did something different.

Enter Codex CLI

I’ve been testing OpenAI’s Codex CLI with GPT 5.2 Codex (xhigh reasoning mode). It’s good at code review - really good at finding edge cases and invariant violations.

After Phase 2 was “complete,” I opened up Codex CLI in a separate terminal and asked it to review the changes. The workflow is manual: I describe what was built, paste relevant code snippets, and ask for a critical review.

Me: Review the timeEntries.ts and timeBlocks.ts mutations.
    Look for state invariant violations and edge cases.

Codex: [analyzes code, returns findings]

The findings were uncomfortable:

SeverityFinding
HightimeEntries.start can reactivate completed/skipped blocks, bypassing guards in timeBlocks.start
MediumtimeBlocks.update permits setting durationMinutes on scheduled blocks (and vice versa)
MediumPomodoro timing uses userSettings instead of session’s stored durations
MediumpomodoroSessions and calendarConnections defined in schema but no CRUD modules
LowPhase 2 marked completed despite pending Convex initialization

The Distinguished Engineer agent had approved this code. Codex found five issues it missed.

The State Invariant Bug

The high severity finding was subtle. Here’s what the code looked like:

// timeEntries.ts - start mutation
export const start = mutation({
  handler: async (ctx, args) => {
    const block = await ctx.db.get(args.blockId);
    verifyOwnership(block, userId, 'time block');

    // Stop any currently active time entries
    // ... stops entries ...

    // Update block status to active if not already
    if (block.status !== 'active') {
      await ctx.db.patch(args.blockId, {
        status: 'active',  // <-- This is the bug
        startedAt: now,
      });
    }
  },
});

The problem: if a block is completed or skipped, this code happily reactivates it. But timeBlocks.start has a guard:

// timeBlocks.ts - start mutation
if (block.status === 'completed' || block.status === 'skipped') {
  throw new Error('Cannot start a completed or skipped time block');
}

Two entry points to the same state, one guarded, one not. Classic invariant violation.

The Fix Cycle

I copied Codex’s findings and pasted them into Claude Code. “Here’s a principal engineer review of the last commit.” Then Claude would fix the issues. I’d copy the updated code back to Codex for another review. Rinse, repeat.

Here’s how it went:

Round 1: Claude fixes the issues

Added the guard to timeEntries.start, fixed timeBlocks.update validation, corrected pomodoro timing, created the missing CRUD modules.

Round 2: Codex reviews the fixes

Findings:
- Medium: pomodoroSessions.start accepts override values without validation;
  workMinutes can be <=0, longBreakAfter can be 0 (causes NaN in modulo)
- Medium: pomodoroSessions.start doesn't guard against completed/skipped blocks

New bugs in the fix code. The pomodoro session could be started on a finished block, and duration overrides weren’t validated.

Round 3: Claude fixes again

Added block status guard and input validation to pomodoroSessions.start.

Round 4: Codex approves

Findings:
- None. The two issues are addressed.

Recommendations:
- Consider enforcing integer inputs for pomodoro durations

Round 5: Final refinement

Added Number.isInteger() checks to prevent fractional modulo edge cases like pomodoroCount % 4.5.

The Final Changes

Here’s what got fixed across the review cycles:

// timeEntries.ts - guard BEFORE side effects
if (block.status === 'completed' || block.status === 'skipped') {
  throw new Error('Cannot start a time entry for a completed or skipped block');
}

// timeBlocks.ts - reject cross-type fields
if (block.blockType === 'scheduled') {
  if (args.durationMinutes !== undefined) {
    throw new Error('Cannot set durationMinutes on a scheduled block');
  }
}
if (block.blockType === 'duration') {
  if (args.startTime !== undefined || args.endTime !== undefined) {
    throw new Error('Cannot set startTime/endTime on a duration block');
  }
}

// pomodoroSessions.ts - integer validation
if (args.workMinutes !== undefined) {
  if (!Number.isInteger(args.workMinutes) || args.workMinutes <= 0) {
    throw new Error('Work duration must be a positive integer');
  }
}

And two new CRUD modules: pomodoroSessions.ts (320 lines) and calendarConnections.ts (262 lines).

Why AI Reviewing AI Works

The Distinguished Engineer agent is good at checking against project constraints and patterns. It knows the architecture, enforces conventions, catches obvious mistakes.

Codex (xhigh) is good at finding edge cases, state invariants, and logical inconsistencies. It doesn’t know my project constraints, but it’s relentless about “what if this value is zero” and “what if this function is called from a different path.”

They catch different things.

Claude’s DE AgentCodex xhigh
Project constraint violationsState invariant bugs
Architecture mismatchesEdge case inputs
Convention enforcementLogical inconsistencies
Pattern adherenceCross-function interactions

Running both is like having a domain expert and a pedantic QA engineer. You need both.

The Multi-AI Workflow

The workflow is manual but effective. Two terminals, two AIs, one clipboard:

┌─────────────────────────────────────────────────────────┐
│  Terminal 1: Claude Code                                │
│  - Implementation                                       │
│  - Project context                                      │
│  - Fixes based on review                                │
└─────────────────────────────────────────────────────────┘

                    │ copy/paste

┌─────────────────────────────────────────────────────────┐
│  Terminal 2: Codex CLI (xhigh)                          │
│  - Critical review                                      │
│  - Edge case analysis                                   │
│  - "What if this is zero?"                              │
└─────────────────────────────────────────────────────────┘

                    │ copy/paste findings

              Claude fixes

                    │ copy/paste updated code

              Codex re-reviews


              Loop until clean

It’s not automated, and that’s fine. The manual step forces me to read the findings, understand them, and decide what to pass along. Sometimes Codex flags something that’s actually intentional - I can filter that out before Claude wastes time on it.

It took three rounds today. Better to catch the state invariant bug now than debug it in production.

Hardening the Workflow

After this experience, I added a new rule to CLAUDE.md:

## MANDATORY: Never Skip Reviews

**⚠️ NEVER skip the Distinguished Engineer review after ANY implementation.**

This is a hard rule with no exceptions:
1. Implementation agent completes work → MUST invoke @distinguished-engineer
2. If review requests changes → fix them → re-review until APPROVED
3. Only after approval → proceed to QA verification or commit

The Distinguished Engineer review was already part of the workflow, but it was optional in practice. Now it’s enforced in the project instructions. Claude Code reads CLAUDE.md at startup, so this rule persists across sessions.

What I Learned

Different AIs have different blind spots. Claude is great at implementation and context-aware review. Codex is great at adversarial thinking and edge cases. Use both.

Guard placement matters. The timeEntries.start guard was initially placed after stopping active entries. If the check failed, the side effect already happened. Guard first, act second.

Review the fixes. The first round of fixes introduced new bugs. Always re-review after changes.

Integer validation prevents weird math. 4.5 % 4 = 0.5, not what you expect when checking “is this the 4th pomodoro?” Add Number.isInteger() for values used in modulo.

Current State

Phase 1: Monorepo Foundation - completed
Phase 2: Convex Backend - completed
Phase 3: Desktop App Shell - not_started

Commits:
- cc6f71d feat: add Convex backend with core schema
- 4f00a8b fix: address principal engineer review findings

Pending:
- Run `npx convex dev` to initialize and generate types
- Start Phase 3: Tauri + Svelte desktop shell

The Convex backend is done. All CRUD modules implemented, all review findings addressed. Tomorrow, the desktop app.


Part 3 of the ProFocusWork build series. Part 4 will cover Phase 3: building the Tauri desktop shell with Svelte.