Files
red/.planning/research/PITFALLS.md
2026-03-21 11:28:42 -06:00

382 lines
35 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Pitfalls Research
**Domain:** Real estate broker web app — v1.1 additions: AI field placement, expanded field types, agent saved signature, filled document preview
**Researched:** 2026-03-21
**Confidence:** HIGH (all pitfalls grounded in the actual v1.0 codebase reviewed; no speculative claims)
---
## Context: What v1.1 Is Adding to the Existing System
The v1.0 codebase has been reviewed. Key facts that shape every pitfall below:
- `SignatureFieldData` (schema.ts) has **no `type` field** — it stores only `{ id, page, x, y, width, height }`. Every field is treated as a signature.
- `FieldPlacer.tsx` has **one draggable token** labeled "Signature" — no other field types exist in the palette.
- `SigningPageClient.tsx` **iterates `signatureFields`** and opens the signature modal for every field. It has no concept of field type.
- `embed-signature.ts` **only draws PNG images** — no logic for text, checkboxes, or dates.
- `prepare-document.ts` uses `@cantoo/pdf-lib` (confirmed import), fills AcroForm text fields and draws blue rectangles for signature placeholders. It does not handle the new field types.
- Prepared PDF paths are stored as relative local filesystem paths (not Vercel Blob URLs). The signing route builds absolute paths from these.
- Agent saved signature: no infrastructure exists yet. The v1.0 `SignatureModal` checks `localStorage` for a saved signature — that is the only "save" mechanism today, and it is per-browser only.
---
## Critical Pitfalls
### Pitfall 1: Breaking the Signing Page by Adding Field Types Without Type Discrimination
**What goes wrong:**
`SignatureFieldData` has no `type` field. `SigningPageClient.tsx` opens the signature-draw modal for every field in `signatureFields`. When new field types (text, checkbox, initials, date, agent-signature) are stored in that same array with only coordinates, the client signing page either (a) shows a signature canvas for a checkbox field, or (b) crashes with a runtime error when it encounters a field type it doesn't handle, blocking the entire signing page.
**Why it happens:**
The schema change is made on the agent side first (adding a `type` discriminant to `SignatureFieldData` and new field types to `FieldPlacer`), but the signing page is not updated in the same commit. Even one deployed document with mixed field types — sent before the signing page update — will be broken for that client.
**How to avoid:**
Add `type` to `SignatureFieldData` as a string literal union **before** any field placement UI changes ship. Make the signing page's field renderer branch on `type` defensively: unknown types default to a placeholder ("not required") rather than throwing. Ship both changes atomically — schema migration, `FieldPlacer` update, and `SigningPageClient` update must be deployed together. Never have a deployed state where the schema supports types the signing page doesn't handle.
**Warning signs:**
- `SignatureFieldData` in `schema.ts` gains a `type` property but `SigningPageClient.tsx` still iterates fields without branching on it.
- The FieldPlacer palette has more tokens than the signing page has rendering branches.
- A document is sent before the signing page is updated to handle the new types.
**Phase to address:**
Phase 1 of v1.1 (schema and signing page update) — must be the first change, before any AI or UI work touches field types.
---
### Pitfall 2: AI Coordinate System Mismatch — OpenAI Returns CSS-Space Percentages, pdf-lib Expects PDF Points
**What goes wrong:**
The OpenAI response for field placement will return bounding boxes in one of several formats: percentage of page (01 or 0100), pixel coordinates at an assumed render resolution, or CSS-style top-left origin. The existing `SignatureFieldData` schema stores **PDF user space coordinates** (bottom-left origin, points). When the AI output is stored without conversion, every AI-placed field appears at the wrong position — often inverted on the Y axis. The mismatch is not obvious during development if you test with PDFs where fields land approximately near the correct area.
**Why it happens:**
The current `FieldPlacer.tsx` already has a correct `screenToPdfCoords` function for converting drag events. But that function takes rendered pixel dimensions as input. When AI output arrives as a JSON payload, developers mistakenly store the raw AI coordinates directly into the database without passing them through the same conversion. The sign-on-screen overlay in `SigningPageClient.tsx` then applies `getFieldOverlayStyle()` which expects PDF-space coords, producing the wrong position.
**Concrete example from the codebase:**
`screenToPdfCoords` in `FieldPlacer.tsx` computes:
```
pdfY = ((renderedH - screenY) / renderedH) * pageInfo.originalHeight
```
If the AI returns a y_min as fraction of page height from the top (0 = top), storing it directly as `field.y` means the field appears at the bottom of the page instead of the top, because PDF Y=0 is the bottom.
**How to avoid:**
Define a canonical AI output format contract before building the prompt. Use normalized coordinates (01 fractions from top-left) in the AI JSON response, then convert server-side using a single `aiCoordsToPagePdfSpace(norm_x, norm_y, norm_w, norm_h, pageWidthPts, pageHeightPts)` utility. This utility mirrors the existing `screenToPdfCoords` logic. Unit-test it against a known Utah purchase agreement with known field positions before shipping.
**Warning signs:**
- AI-placed fields appear clustered at the bottom or top of the page regardless of document content.
- The AI integration test uses visual eyeballing rather than coordinate assertions.
- The conversion function is not covered by the existing test suite (`prepare-document.test.ts`).
**Phase to address:**
AI field placement phase — write the coordinate conversion utility and its test before the OpenAI API call is made.
---
### Pitfall 3: OpenAI Token Limits on Large Utah Real Estate PDFs
**What goes wrong:**
Utah standard real estate forms (REPC, listing agreements, buyer representation agreements) are 1030 pages. Sending the raw PDF bytes or a base64-encoded PDF to GPT-4o-mini will immediately hit the 128k context window limit for multi-page forms, or produce truncated/hallucinated field detection when the document is silently cut off mid-content. GPT-4o-mini's vision context limit is further constrained by image tokens — a single PDF page rendered at 72 DPI costs roughly 1,700 tokens; a 20-page document at standard resolution consumes ~34,000 tokens before any prompt text.
**Why it happens:**
Developers prototype with short test PDFs (23 pages) where the approach works, then discover it fails on production forms. The failure mode is not a hard error — the API returns a response, but field positions are wrong or missing because the model never saw the later pages.
**How to avoid:**
Page-by-page processing: render each PDF page to a base64 PNG (using `pdfjs-dist` or `sharp` on the server), send each page image in a separate API call, then merge the field results. Cap input image resolution to 1024px wide (sufficient for field detection). Set a token budget guard before each API call and log when pages approach the limit. Use structured output (JSON mode) so partial responses fail loudly rather than silently returning incomplete data.
**Warning signs:**
- AI analysis is tested with only a 2-page or 3-page sample PDF.
- The implementation sends the entire PDF to OpenAI in a single request.
- Field detection success rate degrades noticeably on page 8+.
**Phase to address:**
AI integration phase — establish the page-by-page pipeline pattern before testing with real Utah forms.
---
### Pitfall 4: Prompt Design — AI Hallucinates Fields That Don't Exist or Misses Required Fields
**What goes wrong:**
Without a carefully constrained prompt, GPT-4o-mini will "helpfully" infer field locations that don't exist in the PDF (e.g., detecting a printed date as a fillable date field) or will use inconsistent field type names that don't match the application's `type` enum (`"text_input"` instead of `"text"`, `"check_box"` instead of `"checkbox"`). This produces spurious fields in the agent's document and breaks the downstream field type renderer.
**Why it happens:**
The default behavior of vision models is to be helpful and infer structure. Without explicit constraints (exact allowed types, instructions to return empty array when no fields exist, max field count), the output is non-deterministic and schema-incompatible.
**How to avoid:**
Use OpenAI's structured output (JSON schema mode) with an explicit enum for field types matching the application's type discriminant exactly. Include a negative instruction: "Only detect fields that have an explicit visual placeholder (blank line, box, checkbox square) — do not infer fields from printed text labels." Include a `confidence` score per field so the agent UI can filter low-confidence placements. Validate the response JSON against a Zod schema server-side before storing — reject the entire AI response if any field has an invalid type.
**Warning signs:**
- The prompt asks the model to "detect all form fields" without specifying what counts as a field.
- The response is stored directly in the database without Zod validation.
- The agent sees unexpected fields on pages with no visual placeholders.
**Phase to address:**
AI integration phase — validate prompt output against Zod before the first real Utah form is tested.
---
### Pitfall 5: Agent Saved Signature Stored as Raw DataURL — Database Bloat and Serving Risk
**What goes wrong:**
A canvas signature exported as `toDataURL('image/png')` produces a base64-encoded PNG string. A typical signature on a 400x150 canvas is 1560KB as base64. If this is stored directly in the database (e.g., a `TEXT` column in the `users` table), every query that fetches the user row will carry 1560KB of base64 data it may not need. More critically, if the dataURL is ever sent to the client to pre-populate a form field, it exposes the full signature as a downloadable string in page source.
**How to avoid:**
Store the signature as a file (Vercel Blob or the existing `uploads/` directory), and store only the file path/URL in the database. On the signing page and preview, serve the signature through an authenticated API route that streams the file bytes — never expose the raw dataURL to the client page. Alternatively, convert the dataURL to a `Uint8Array` immediately on the server (for PDF embedding only) and discard the string — only the file path goes to the DB.
**Warning signs:**
- A `savedSignatureDataUrl TEXT` column is added to the `users` table.
- The agent dashboard page fetches the user row and passes `savedSignatureDataUrl` to a React component prop.
- The signature appears in the React devtools component tree as a base64 string.
**Phase to address:**
Agent saved signature phase — establish the storage pattern (file + path, not dataURL + column) before any signature saving UI is built.
---
### Pitfall 6: Race Condition — Agent Updates Saved Signature While Client Is Mid-Signing
**What goes wrong:**
The agent draws a new saved signature and saves it while a client has the signing page open. The signing page has already loaded the signing request data (including `signatureFields`). When the agent applies their new saved signature to an agent-signature field and re-prepares the document, there are now two versions of the prepared PDF on disk: the one the client is looking at and the newly generated one. If the client submits their signature concurrently with the agent's re-preparation, `embedSignatureInPdf()` may read a partially-written prepared PDF (before the atomic rename completes) or the document may be marked "Sent" again after already being in "Viewed" state, breaking the audit trail.
**Why it happens:**
The existing prepare flow in `PreparePanel.tsx` allows re-preparation of Draft documents. Once agent signing is added, the agent can re-run preparation on a "Sent" or "Viewed" document to swap their signature, creating a mutable prepared PDF while a client session is active.
**How to avoid:**
Lock prepared documents once the first signing link is sent. Gate the agent re-prepare action behind a confirmation: "Resending will invalidate the existing signing link — the client will receive a new email." On confirmation, atomically: (1) mark the old signing token as `usedAt = now()` with reason "superseded", (2) delete the old prepared PDF (or rename to `_prepared_v1.pdf`), (3) generate a new prepared PDF, (4) issue a new signing token, (5) send a new email. This prevents mid-session clobber. The existing `embedSignatureInPdf` already uses atomic rename (`tmp → final`) which prevents partial-read corruption — preserve this.
**Warning signs:**
- Agent can click "Prepare and Send" on a document with status "Sent" without any confirmation dialog.
- The prepared PDF path is deterministic and overwritten in place (e.g. always `{docId}_prepared.pdf`).
- No "superseded" state exists in the `signingTokens` table.
**Phase to address:**
Agent signing phase — implement the supersede-and-resend flow before any agent signature is applied to a sent document.
---
### Pitfall 7: Filled Preview Is Served From the Same Path as the Prepared PDF — Stale Preview After Field Changes
**What goes wrong:**
The agent makes changes to field placement or pre-fill values after generating a preview. The preview file on disk is now stale. The preview URL is cached by the browser (or a CDN). The agent sees the old preview and believes the document is correct, then sends it to the client. The client receives a document with the old pre-fill values, not the updated ones.
**Why it happens:**
The existing `prepare-document.ts` writes to a deterministic path: `{docId}_prepared.pdf`. If the preview is served from the same path, any browser cache of that URL shows the old version. The agent has no visual indication that the preview is stale.
**How to avoid:**
Generate preview PDFs to a separate path with a timestamp or version suffix: `{docId}_preview_{timestamp}.pdf`. Never serve the preview from the same path as the final prepared PDF. Add a "Preview is stale — regenerate before sending" banner that appears when `signatureFields` or `textFillData` are changed after the last preview was generated. Store `lastPreviewGeneratedAt` in the document record and compare to `updatedAt`. The "Send" button should be disabled until a fresh preview has been generated (or explicitly skipped by the agent).
**Warning signs:**
- The preview endpoint serves `/api/documents/{id}/prepared` without a cache-busting mechanism.
- The agent can modify fields after generating a preview and the preview URL does not change.
- No "stale preview" indicator exists in the UI.
**Phase to address:**
Filled document preview phase — establish the versioned preview path and staleness indicator before the first preview is rendered.
---
### Pitfall 8: Memory Issues Rendering Large PDFs for Preview on the Server
**What goes wrong:**
Generating a filled preview requires loading the PDF into memory (via `@cantoo/pdf-lib`), modifying it, and either returning the bytes for streaming or writing to disk. Utah real estate forms (REPC, addendums) can be 1530 pages and 28MB as raw PDFs. Running `PDFDocument.load()` on an 8MB PDF in a Vercel serverless function that has a 256MB memory limit can cause OOM errors under concurrent load. The Vercel function timeout (10s default, 60s max on Pro) can also be exceeded for large PDFs with many embedded fonts.
**Why it happens:**
Developers test with a small 2-page PDF in development and the function works fine. The function hits the memory wall only when a real Utah standard form (often 20+ pages with embedded images) is processed in production.
**How to avoid:**
Do not generate the preview inline in a serverless function on every request. Instead: generate the preview once (as a write operation), store the result in the `uploads/` directory or Vercel Blob, and serve it from there. The preview generation can be triggered on-demand (agent clicks "Generate Preview") and is idempotent. Set a timeout guard: if `PDFDocument.load()` takes longer than 8 seconds, return a 504 with "Preview temporarily unavailable." Monitor the Vercel function execution time and memory in the dashboard — alert at 70% of the memory limit.
**Warning signs:**
- Preview is regenerated on every page load (no stored preview file).
- The preview route calls `PDFDocument.load()` within a synchronous request handler.
- Tests only use PDFs smaller than 2MB.
**Phase to address:**
Filled document preview phase — establish the "generate once, serve cached" pattern from the start.
---
### Pitfall 9: Client Signing Page Confusion — Preview Shows Agent Pre-Fill but Client Signs a Different Document
**What goes wrong:**
The filled preview shows the document with all text pre-fills applied (client name, property address, price). The client signing page also renders the prepared PDF — which already contains those fills (because `prepare-document.ts` fills AcroForm fields and draws text onto the PDF). But the visual design difference between "this is a preview for review" and "this is the actual document you are signing" is unclear. If the agent generates a stale preview and the client signs a different (more recent) prepared PDF, the client believes they signed what they previewed, but the legal document has different content.
**How to avoid:**
The client signing page must always serve the **same** prepared PDF that was cryptographically hashed at prepare time. The preview the agent saw must be generated from that exact file — not a re-generation. Store the SHA-256 hash of the prepared PDF at preparation time (same pattern as the existing `pdfHash` for signed PDFs). When serving the client's signing PDF, recompute and verify the hash matches before streaming. This ties the signed document back to the exact bytes the agent previewed.
**Warning signs:**
- The preview is generated by a different code path than `prepare-document.ts` (e.g., a separate PDF rendering library).
- No hash is stored for the prepared PDF, only for the signed PDF.
- The agent can re-prepare after preview generation without the signing link being invalidated.
**Phase to address:**
Filled document preview phase AND agent signing phase — hash the prepared PDF immediately after writing it (extend the existing `pdfHash` pattern from signed to prepared).
---
### Pitfall 10: Agent Signature Field Handled by Client Signing Page
**What goes wrong:**
A new `"agent-signature"` field type is added to `FieldPlacer`. The agent applies their saved signature to this field before sending. But `SigningPageClient.tsx` iterates all fields in `signatureFields` and shows a signing prompt for each one. If the agent-signature field is included in the array sent to the client, the client sees a field labeled "Signature" (or unlabeled) that is already visually signed with someone else's signature, and the progress bar counts it as an unsigned field the client must complete.
**Why it happens:**
The client signing page receives the full `signatureFields` array from the GET `/api/sign/[token]` response. The route currently returns `doc.signatureFields ?? []` without filtering. When agent-signature fields are added to the same array, they are included in the client's field list.
**Concrete location in codebase:**
```typescript
// /src/app/api/sign/[token]/route.ts, line 88
signatureFields: doc.signatureFields ?? [],
```
This sends ALL fields to the client, including any agent-filled fields.
**How to avoid:**
Filter the `signatureFields` array in the signing token GET route: only return fields where `type !== 'agent-signature'` (or more precisely, only return fields the client is expected to sign). Agent-signed fields should be pre-embedded into the `preparedFilePath` PDF during document preparation — by the time the client opens the signing link, the agent's signature is already baked into the prepared PDF as a drawn image. The `signatureFields` array sent to the client should contain only the fields the client needs to provide.
**Warning signs:**
- The full `signatureFields` array is returned from the signing token GET without filtering by `type`.
- Agent-signed fields are stored in the same `signatureFields` JSONB column as client signature fields.
- The client progress bar shows more fields than the client is responsible for signing.
**Phase to address:**
Agent signing phase — filter the signing response by field type before the first agent-signed document is sent to a client.
---
## Technical Debt Patterns
Shortcuts that seem reasonable but create long-term problems.
| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
|----------|-------------------|----------------|-----------------|
| Store saved signature as dataURL in users table | No new file storage code needed | Every user query pulls 1560KB of base64; dataURL exposed in client props | Never — use file storage from the start |
| Re-use same `_prepared.pdf` path for preview and final prepared doc | No versioning logic needed | Stale previews; no way to prove which prepared PDF the client signed | Never — versioned paths required for legal integrity |
| Return all signatureFields to client (no type filtering) | Simpler route code | Client sees agent-signature fields as required fields to complete | Never for agent-signature type; acceptable for debugging only |
| Prompt OpenAI with entire PDF as one request | Simpler prompt code | Fails silently on documents > ~8 pages; token limit hit without hard error | Acceptable only for prototyping with < 5 page test PDFs |
| Add `type` to SignatureFieldData but don't add a schema migration | Skip Drizzle migration step | Existing rows have `null` type; `signatureFields` JSONB array has mixed null/typed entries; TypeScript union breaks | Never — migrate immediately |
| Generate preview on every page load | No caching logic needed | OOM errors on large PDFs under Vercel memory limit; slow UX | Acceptable only during local development |
---
## Integration Gotchas
Common mistakes when connecting to external services.
| Integration | Common Mistake | Correct Approach |
|-------------|----------------|------------------|
| OpenAI Vision API | Sending raw PDF bytes — PDFs are not natively supported by vision models | Convert each page to PNG via pdfjs-dist on the server; send page images, not PDF bytes |
| OpenAI structured output | Using `response_format: { type: 'json_object' }` and hoping the schema matches | Use `response_format: { type: 'json_schema', json_schema: { ... } }` with the exact schema, then validate with Zod |
| `@cantoo/pdf-lib` (confirmed import in codebase) | Calling `embedPng()` with a base64 dataURL that includes the `data:image/png;base64,` prefix on systems that strip it | The existing `embed-signature.ts` already handles this correctly — preserve the pattern when adding new embed paths |
| `@cantoo/pdf-lib` flatten | Flattening before drawing rectangles causes AcroForm overlay to appear on top of drawn content | The existing `prepare-document.ts` already handles order correctly (flatten first, then draw) — preserve this order in any new prepare paths |
| Vercel Blob (if migrated from local uploads) | Fetching a Blob URL inside a serverless function on the same Vercel deployment causes a request to the CDN with potential cold-start latency | Use the `@vercel/blob` SDK's `get()` method rather than `fetch(blob.url)` from within API routes |
| Agent signature file serving | Serving the agent's saved signature PNG via a public URL | Gate all signature file access behind the authenticated agent API — never expose with a public Blob URL |
---
## Performance Traps
Patterns that work at small scale but fail as usage grows.
| Trap | Symptoms | Prevention | When It Breaks |
|------|----------|------------|----------------|
| OpenAI call inline with agent "AI Place Fields" button click | 1030 second page freeze; API timeout on multi-page PDFs | Trigger AI placement as a background job; poll for completion; show progress bar | Immediately on PDFs > 5 pages |
| PDF preview generation in a synchronous serverless function | Vercel function timeout (60s max Pro); OOM on 8MB PDFs | Generate once and store; serve from storage | On PDFs > 10MB or under concurrent load |
| Storing all signatureFields JSONB on documents table without a size guard | Large JSONB column slows document list queries | Add a field count limit (max 50 fields); if AI places more, require agent review | When AI places fields on 25+ page documents with many fields per page |
| dataURL signature image in `signaturesRef.current` in SigningPageClient | Each re-render serializes 50KB+ per signature into JSON | Already handled correctly in v1.0 (ref, not state) — do not move signature data to state when adding type-based rendering | Would break at > 5 simultaneous signature fields |
---
## Security Mistakes
Domain-specific security issues beyond general web security.
| Mistake | Risk | Prevention |
|---------|------|------------|
| Agent saved signature served via a predictable or public file path | Any user who can guess the path downloads the agent's legal signature | Store under a UUID path; serve only through `GET /api/agent/signature` which verifies the better-auth session before streaming |
| AI field placement values (pre-fill text) passed to OpenAI without scrubbing | Client PII (name, email, SSN, property address) sent to OpenAI and stored in their logs | Provide only anonymized document structure to the AI (page images without personally identifiable pre-fill values); apply pre-fill values server-side after AI field detection |
| Preview PDF served at a guessable URL (e.g. `/api/documents/{id}/preview`) without auth check | Anyone with the document ID can download a prepared document containing client PII | All document file routes must verify the agent session before streaming — apply the same guard as the existing `/api/documents/[id]/download/route.ts` |
| Agent signature dataURL transmitted from client to server in an unguarded API route | Any authenticated user (if multi-agent is ever added) can overwrite the saved signature | The save-signature endpoint must verify the session user matches the signature owner — prepare for this even in solo-agent v1 |
| Signed PDF stale preview served to client after re-preparation | Client signs a document that differs from what agent reviewed and approved | Hash prepared PDF at prepare time; verify hash before serving to client signing page |
---
## UX Pitfalls
Common user experience mistakes in this domain.
| Pitfall | User Impact | Better Approach |
|---------|-------------|-----------------|
| Preview opens in a new browser tab as a raw PDF | Agent has no context that this is a preview vs. the final document; no field overlays visible | Display preview in-app with a "PREVIEW — Fields Filled" watermark overlay on each page |
| AI-placed fields shown without a review step | Agent sends a document with misaligned AI fields to a client; client is confused by floating sign boxes | AI placement populates the FieldPlacer UI for agent review — never auto-sends; agent must manually click "Looks good, proceed" |
| "Prepare and Send" button available before the agent has placed any fields | Agent sends a blank document with no signature fields; client has nothing to sign | Disable "Prepare and Send" if `signatureFields` is empty or contains only agent-signature fields (no client fields) |
| Agent saved signature is applied but no visual confirmation is shown | Agent thinks the signature was applied; document arrives unsigned because the apply step silently failed | Show the agent's saved signature PNG in the field placer overlay immediately after apply; require explicit confirmation before the prepare step |
| Preview shows pre-filled text but not field type labels | Agent cannot distinguish a "checkbox" pre-fill from a "text" pre-fill in the visual preview | Show field type badges (small colored labels) on the preview overlay, not just the filled content |
| Client signing page shows no progress for non-signature fields (text, checkbox, date) | Client doesn't know they need to fill in text boxes or check checkboxes — sees only signature prompts | The progress bar in `SigningProgressBar.tsx` counts `signatureFields.length` — this must count all client-facing fields, not just signature-type fields |
---
## "Looks Done But Isn't" Checklist
Things that appear complete but are missing critical pieces.
- [ ] **AI field placement:** Verify the coordinate conversion unit test asserts specific PDF-space x/y values (not just "fields are returned") — eyeball testing will miss Y-axis inversion errors on Utah standard forms.
- [ ] **Expanded field types:** Verify `SigningPageClient.tsx` has a rendering branch for every type in the `SignatureFieldData` type union — not just the new FieldPlacer palette tokens. Check for the default/fallback case.
- [ ] **Agent saved signature:** Verify the saved signature is stored as a file path, not a dataURL TEXT column — check the Drizzle schema migration and confirm no `dataUrl` column was added to `users`.
- [ ] **Agent signs first:** Verify that after agent applies their signature, the agent-signature field is embedded into the prepared PDF and removed from the `signatureFields` array that gets sent to the client — not just visually hidden in the FieldPlacer.
- [ ] **Filled preview:** Verify the preview URL changes when fields or text fill values change (cache-busting via timestamp or hash in the path) — open DevTools network tab, modify a field, re-generate preview, confirm a new file is fetched.
- [ ] **Filled preview freshness gate:** Verify the "Send" button is disabled when `lastPreviewGeneratedAt < lastFieldsUpdatedAt` — test by generating a preview, changing a field, and confirming the send button becomes disabled.
- [ ] **OpenAI token limit:** Verify the AI placement works on a real 20-page Utah REPC form, not just a 2-page test PDF — check that page 15+ fields are detected with the same accuracy as page 1.
- [ ] **Schema migration:** Verify that documents created in v1.0 (where `signatureFields` JSONB has entries without a `type` key) are handled gracefully by all v1.1 code paths — add a null-safe fallback for `field.type ?? 'signature'` throughout.
---
## Recovery Strategies
When pitfalls occur despite prevention, how to recover.
| Pitfall | Recovery Cost | Recovery Steps |
|---------|---------------|----------------|
| Client received signing link but signing page crashes on new field types | HIGH | Emergency hotfix: add `field.type ?? 'signature'` fallback in SigningPageClient; deploy; invalidate old token; send new link |
| AI placed fields are wrong/inverted on first real-form test | LOW | Fix coordinate conversion unit; re-run AI placement for that document; no data migration needed |
| Agent saved signature stored as dataURL in DB | MEDIUM | Add migration: extract dataURL to file, update path column, nullify dataURL column; existing signed PDFs are unaffected |
| Preview PDF served stale after field changes | LOW | Add cache-busting query param or timestamp to preview URL; no data changes needed |
| Agent-signature field appears in client's signing field list | HIGH | Emergency hotfix: filter signatureFields in signing token GET by type; redeploy; affected in-flight signing sessions may need new tokens |
| Large PDF causes Vercel function OOM during preview generation | MEDIUM | Switch preview to background job + polling; no data migration; existing prepared PDFs are valid |
---
## Pitfall-to-Phase Mapping
How roadmap phases should address these pitfalls.
| Pitfall | Prevention Phase | Verification |
|---------|------------------|--------------|
| Breaking signing page with new field types (Pitfall 1) | Phase 1: Schema + signing page update | Deploy field type union; confirm signing page renders placeholder for unknown types; load an old v1.0 document with no type field and verify graceful fallback |
| AI coordinate system mismatch (Pitfall 2) | Phase 2: AI integration — coordinate conversion utility | Unit test with a known Utah REPC: assert specific PDF-space x/y for a known field; Y-axis inversion test |
| OpenAI token limits on large PDFs (Pitfall 3) | Phase 2: AI integration — page-by-page pipeline | Test with the longest form Teressa uses (likely 20+ page REPC); verify all pages processed |
| Prompt hallucination and schema incompatibility (Pitfall 4) | Phase 2: AI integration — Zod validation of AI response | Feed an edge-case page (all text, no form fields) and verify AI returns empty array, not hallucinated fields |
| Saved signature as dataURL in DB (Pitfall 5) | Phase 3: Agent saved signature | Confirm Drizzle schema has a path column, not a dataURL column; verify file is stored under UUID path |
| Race condition: agent updates signature mid-signing (Pitfall 6) | Phase 3: Agent saved signature + supersede flow | Confirm "Prepare and Send" on a Sent/Viewed document requires confirmation and invalidates old token |
| Stale preview after field changes (Pitfall 7) | Phase 4: Filled document preview | Modify a field after preview generation; confirm send button disables or preview refreshes |
| OOM on large PDF preview (Pitfall 8) | Phase 4: Filled document preview | Test preview generation on a 20-page REPC; monitor Vercel function memory in dashboard |
| Client signs different doc than agent previewed (Pitfall 9) | Phase 4: Filled document preview | Confirm prepared PDF is hashed at prepare time; verify hash is checked before streaming to client |
| Agent-signature field shown to client (Pitfall 10) | Phase 3: Agent signing flow | Confirm signing token GET filters `type === 'agent-signature'` fields before returning; test with a document that has both agent and client signature fields |
---
## Sources
- Reviewed `src/lib/db/schema.ts``SignatureFieldData` has no `type` field; confirmed by inspection 2026-03-21
- Reviewed `src/app/sign/[token]/_components/SigningPageClient.tsx` — confirmed all fields open signature modal; no type branching
- Reviewed `src/app/portal/(protected)/documents/[docId]/_components/FieldPlacer.tsx` — confirmed single "Signature" token; `screenToPdfCoords` function confirms Y-axis inversion pattern
- Reviewed `src/lib/signing/embed-signature.ts` — confirms `@cantoo/pdf-lib` import; PNG-only embed
- Reviewed `src/lib/pdf/prepare-document.ts` — confirms AcroForm flatten-first ordering; text stamp fallback
- Reviewed `src/app/api/sign/[token]/route.ts` — confirmed `signatureFields: doc.signatureFields ?? []` sends unfiltered fields to client (line 88)
- Reviewed `src/app/portal/(protected)/documents/[docId]/_components/PreparePanel.tsx` — no guard against re-preparation of Sent/Viewed documents
- [OpenAI Vision API Token Counting](https://platform.openai.com/docs/guides/vision#calculating-costs) — image token costs confirmed; LOW tile = 85 tokens, HIGH tile adds detail tokens per 512px tile
- [OpenAI Structured Output (JSON Schema mode)](https://platform.openai.com/docs/guides/structured-outputs) — `json_schema` mode confirmed as more reliable than `json_object` for typed responses
- [Vercel Serverless Function Limits](https://vercel.com/docs/functions/runtimes/node-js#memory-and-compute) — 256MB default, 1024MB on Pro; 60s max execution on Pro
- `@cantoo/pdf-lib` confirmed as the import used (not `@pdfme/pdf-lib` or `pdf-lib`) — v1.0 codebase uses this fork throughout
---
*Pitfalls research for: Teressa Copeland Homes — v1.1 AI field placement, expanded field types, agent signing, filled preview*
*Researched: 2026-03-21*