518 lines
29 KiB
Markdown
518 lines
29 KiB
Markdown
|
|
# Phase 13: AI Field Placement and Pre-fill - Research
|
|||
|
|
|
|||
|
|
**Researched:** 2026-03-21
|
|||
|
|
**Domain:** OpenAI GPT-4o-mini structured outputs + pdfjs-dist server-side text extraction + coordinate conversion
|
|||
|
|
**Confidence:** HIGH
|
|||
|
|
|
|||
|
|
<phase_requirements>
|
|||
|
|
## Phase Requirements
|
|||
|
|
|
|||
|
|
| ID | Description | Research Support |
|
|||
|
|
|----|-------------|-----------------|
|
|||
|
|
| AI-01 | Agent can click one button to have AI auto-place all field types (text, checkbox, initials, date, agent signature, client signature) on a PDF in correct positions | pdfjs-dist legacy build for text extraction; GPT-4o-mini structured output for field classification; aiCoordsToPagePdfSpace() for coordinate conversion; fields PUT to existing /api/documents/[id]/fields endpoint |
|
|||
|
|
| AI-02 | AI pre-fills text fields with known values from the client profile (name, property address, date) | Client profile already has name + propertyAddress in DB; textFillData (Record<string,string> keyed by field UUID) already wired through DocumentPageClient → FieldPlacer; AI route returns both fields array and pre-fill map; agent reviews before committing |
|
|||
|
|
</phase_requirements>
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Summary
|
|||
|
|
|
|||
|
|
Phase 13 adds an "AI Auto-place" button that extracts text from the PDF via pdfjs-dist (already installed), sends that text to GPT-4o-mini with a structured output schema asking for field type + page + normalized percentage coordinates, converts those percentages to PDF user-space points (with Y-axis flip), and writes the resulting `SignatureFieldData[]` array to the existing `/api/documents/[id]/fields` PUT endpoint. The agent reviews, adjusts, or deletes any AI-placed field before proceeding.
|
|||
|
|
|
|||
|
|
The `openai` npm package (v6.x, latest as of March 2026) is NOT currently installed in the project — it must be added. However, pdfjs-dist 5.4.296 is already present (via react-pdf) and the legacy build at `pdfjs-dist/legacy/build/pdf.mjs` is usable for server-side text extraction. The coordinate conversion formula already exists in FieldPlacer.tsx and `prepare-document.test.ts`; Phase 13 must replicate it as a named utility (`aiCoordsToPagePdfSpace`) with a dedicated unit test.
|
|||
|
|
|
|||
|
|
The decision recorded in STATE.md is authoritative: do NOT use `zodResponseFormat` (broken with Zod v4 that's installed). Use manual `json_schema` response_format with `strict: true`, `additionalProperties: false`, and every property in `required` at every nesting level. GPT-4o-mini supports this natively (confirmed since gpt-4o-mini-2024-07-18).
|
|||
|
|
|
|||
|
|
**Primary recommendation:** Install `openai` npm package, use `pdfjs-dist/legacy/build/pdf.mjs` for text extraction with `GlobalWorkerOptions.workerSrc = ''` in Node.js route context, send extracted text to GPT-4o-mini with manual JSON schema, convert percentage coordinates to PDF points with Y-axis flip, write fields via existing PUT endpoint, and have the agent review before committing.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Standard Stack
|
|||
|
|
|
|||
|
|
### Core
|
|||
|
|
|
|||
|
|
| Library | Version | Purpose | Why Standard |
|
|||
|
|
|---------|---------|---------|--------------|
|
|||
|
|
| openai (npm) | ^6.32.0 (latest Mar 2026) | Official OpenAI TypeScript SDK for Chat Completions API | Not yet installed; must `npm install openai`. Official SDK with strict mode structured outputs |
|
|||
|
|
| pdfjs-dist | 5.4.296 (already installed) | Server-side PDF text extraction via `getTextContent()` | Already a project dependency via react-pdf. Legacy build works in Node.js route handlers |
|
|||
|
|
|
|||
|
|
### Supporting
|
|||
|
|
|
|||
|
|
| Library | Version | Purpose | When to Use |
|
|||
|
|
|---------|---------|---------|-------------|
|
|||
|
|
| server-only | NOT installed (not in node_modules) | Build-time guard preventing client-side import of server modules | Use comment guard `// server-only` or explicit `if (typeof window !== 'undefined') throw` — OR install `server-only` package |
|
|||
|
|
| crypto.randomUUID() | Node built-in | Generate UUIDs for AI-placed field IDs | Already used in FieldPlacer for field IDs; same pattern here |
|
|||
|
|
|
|||
|
|
### Alternatives Considered
|
|||
|
|
|
|||
|
|
| Instead of | Could Use | Tradeoff |
|
|||
|
|
|------------|-----------|----------|
|
|||
|
|
| Manual `json_schema` response_format | `zodResponseFormat` helper | zodResponseFormat is broken with Zod v4 (confirmed issues #1540, #1602, #1709 — this is a locked decision in STATE.md) |
|
|||
|
|
| pdfjs-dist legacy build | pdf-parse, unpdf | pdfjs-dist is already installed; unpdf would add a dependency; pdf-parse is simpler but less positional data available |
|
|||
|
|
| GPT-4o-mini | GPT-4o, GPT-4.1 | GPT-4o-mini is cheapest and supports structured outputs with 100% schema compliance; good enough for field classification |
|
|||
|
|
|
|||
|
|
**Installation:**
|
|||
|
|
```bash
|
|||
|
|
npm install openai
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Architecture Patterns
|
|||
|
|
|
|||
|
|
### Recommended Project Structure
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
src/
|
|||
|
|
├── lib/
|
|||
|
|
│ ├── ai/
|
|||
|
|
│ │ ├── extract-text.ts # pdfjs-dist server-side text extraction (server-only guard)
|
|||
|
|
│ │ └── field-placement.ts # GPT-4o-mini call + aiCoordsToPagePdfSpace() (server-only guard)
|
|||
|
|
│ └── pdf/
|
|||
|
|
│ ├── prepare-document.ts # existing — unchanged
|
|||
|
|
│ └── __tests__/
|
|||
|
|
│ ├── prepare-document.test.ts # existing — unchanged
|
|||
|
|
│ └── ai-coords.test.ts # NEW — unit test for aiCoordsToPagePdfSpace()
|
|||
|
|
└── app/
|
|||
|
|
└── api/
|
|||
|
|
└── documents/
|
|||
|
|
└── [id]/
|
|||
|
|
└── ai-prepare/
|
|||
|
|
└── route.ts # NEW — POST /api/documents/[id]/ai-prepare
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Pattern 1: Server-Side PDF Text Extraction with pdfjs-dist Legacy Build
|
|||
|
|
|
|||
|
|
**What:** Import from `pdfjs-dist/legacy/build/pdf.mjs`, set `GlobalWorkerOptions.workerSrc = ''` (empty string tells pdf.js to fall back to synchronous/fake worker mode in Node.js — this is the documented pattern for server-side use without a browser worker thread), load the PDF bytes, iterate pages calling `getTextContent()`.
|
|||
|
|
|
|||
|
|
**When to use:** Server-only route handlers (Next.js App Router route.ts files run on Node.js).
|
|||
|
|
|
|||
|
|
**Important:** The project's existing client-side usage (`PdfViewer.tsx`, `PreviewModal.tsx`) sets `workerSrc = new URL('pdfjs-dist/build/pdf.worker.min.mjs', import.meta.url).toString()` — that is browser-only. The server-side module must set `workerSrc = ''` independently. Do NOT share the import or the GlobalWorkerOptions assignment across server and client.
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// Source: pdfjs-dist docs + STATE.md v1.1 Research decision
|
|||
|
|
// lib/ai/extract-text.ts — server-only
|
|||
|
|
|
|||
|
|
import { getDocument, GlobalWorkerOptions } from 'pdfjs-dist/legacy/build/pdf.mjs';
|
|||
|
|
import { readFile } from 'node:fs/promises';
|
|||
|
|
|
|||
|
|
// Empty string = no worker thread (fake/synchronous worker) — required for Node.js server context
|
|||
|
|
GlobalWorkerOptions.workerSrc = '';
|
|||
|
|
|
|||
|
|
export interface PageText {
|
|||
|
|
page: number; // 1-indexed
|
|||
|
|
text: string; // all text items joined with spaces
|
|||
|
|
width: number; // page width in PDF points (72 DPI)
|
|||
|
|
height: number; // page height in PDF points (72 DPI)
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
export async function extractPdfText(filePath: string): Promise<PageText[]> {
|
|||
|
|
const data = new Uint8Array(await readFile(filePath));
|
|||
|
|
const pdf = await getDocument({ data }).promise;
|
|||
|
|
const pages: PageText[] = [];
|
|||
|
|
|
|||
|
|
for (let pageNum = 1; pageNum <= pdf.numPages; pageNum++) {
|
|||
|
|
const page = await pdf.getPage(pageNum);
|
|||
|
|
const viewport = page.getViewport({ scale: 1.0 });
|
|||
|
|
const textContent = await page.getTextContent();
|
|||
|
|
const text = textContent.items
|
|||
|
|
.filter((item) => 'str' in item)
|
|||
|
|
.map((item) => (item as { str: string }).str)
|
|||
|
|
.join(' ');
|
|||
|
|
pages.push({
|
|||
|
|
page: pageNum,
|
|||
|
|
width: viewport.width,
|
|||
|
|
height: viewport.height,
|
|||
|
|
text,
|
|||
|
|
});
|
|||
|
|
}
|
|||
|
|
return pages;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Pattern 2: GPT-4o-mini Structured Output with Manual JSON Schema
|
|||
|
|
|
|||
|
|
**What:** Use the `openai` SDK `chat.completions.create()` with `response_format: { type: 'json_schema', json_schema: { ... strict: true } }`. The schema asks the model to return an array of field placement objects with: `page` (1-indexed integer), `fieldType` (enum of 7 types), `xPct` (0–100 percentage of page width), `yPct` (0–100 percentage of page height, measured from page TOP — AI models think top-left origin), `widthPct`, `heightPct`, and optionally `prefillValue` for text fields.
|
|||
|
|
|
|||
|
|
**Critical JSON Schema rule:** When `strict: true`, ALL properties must be in `required` and ALL objects must have `additionalProperties: false`. Any missing field from `required` causes an API 400 error.
|
|||
|
|
|
|||
|
|
**When to use:** All AI field placement requests.
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// Source: OpenAI docs + STATE.md locked decision (manual json_schema, not zodResponseFormat)
|
|||
|
|
// lib/ai/field-placement.ts — server-only
|
|||
|
|
|
|||
|
|
import OpenAI from 'openai';
|
|||
|
|
import type { PageText } from './extract-text';
|
|||
|
|
import type { SignatureFieldData, SignatureFieldType } from '@/lib/db/schema';
|
|||
|
|
|
|||
|
|
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
|
|||
|
|
|
|||
|
|
const FIELD_PLACEMENT_SCHEMA = {
|
|||
|
|
type: 'object',
|
|||
|
|
properties: {
|
|||
|
|
fields: {
|
|||
|
|
type: 'array',
|
|||
|
|
items: {
|
|||
|
|
type: 'object',
|
|||
|
|
properties: {
|
|||
|
|
page: { type: 'integer' },
|
|||
|
|
fieldType: { type: 'string', enum: ['text', 'checkbox', 'initials', 'date', 'client-signature', 'agent-signature', 'agent-initials'] },
|
|||
|
|
xPct: { type: 'number' },
|
|||
|
|
yPct: { type: 'number' }, // % from page TOP (AI top-left origin)
|
|||
|
|
widthPct: { type: 'number' },
|
|||
|
|
heightPct: { type: 'number' },
|
|||
|
|
prefillValue: { type: 'string' }, // only for text fields; empty string if none
|
|||
|
|
},
|
|||
|
|
required: ['page', 'fieldType', 'xPct', 'yPct', 'widthPct', 'heightPct', 'prefillValue'],
|
|||
|
|
additionalProperties: false,
|
|||
|
|
},
|
|||
|
|
},
|
|||
|
|
},
|
|||
|
|
required: ['fields'],
|
|||
|
|
additionalProperties: false,
|
|||
|
|
} as const;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Pattern 3: Coordinate Conversion — AI Percentage to PDF User-Space
|
|||
|
|
|
|||
|
|
**What:** AI returns percentage coordinates with top-left origin (AI models think of a page as a grid where (0,0) is top-left). PDF user-space uses bottom-left origin with points (1 pt = 1/72 inch). Two conversions are needed:
|
|||
|
|
1. Percentage to absolute points: `x = (xPct / 100) * pageWidth`
|
|||
|
|
2. Y-axis flip: `y = pageHeight - (yPct / 100) * pageHeight - fieldHeight`
|
|||
|
|
(the stored y is the BOTTOM edge of the field in PDF space)
|
|||
|
|
|
|||
|
|
**This is the exact same formula already used in FieldPlacer.tsx** for the drag-and-drop coordinate conversion. The new `aiCoordsToPagePdfSpace()` function extracts it as a named utility, verified by a unit test.
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// Source: FieldPlacer.tsx coordinate math (lines 287-295) — same formula
|
|||
|
|
// lib/ai/field-placement.ts
|
|||
|
|
|
|||
|
|
export interface AiFieldCoords {
|
|||
|
|
page: number;
|
|||
|
|
fieldType: SignatureFieldType;
|
|||
|
|
xPct: number; // % from left, top-left origin
|
|||
|
|
yPct: number; // % from top, top-left origin
|
|||
|
|
widthPct: number;
|
|||
|
|
heightPct: number;
|
|||
|
|
prefillValue: string;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
/**
|
|||
|
|
* Convert AI percentage coordinates (top-left origin) to PDF user-space points (bottom-left origin).
|
|||
|
|
*
|
|||
|
|
* pageWidth/pageHeight in PDF points (from page.getViewport({ scale: 1.0 })).
|
|||
|
|
*
|
|||
|
|
* Formula mirrors FieldPlacer.tsx handleDragEnd (lines 289-291):
|
|||
|
|
* pdfX = (clampedX / renderedW) * pageInfo.originalWidth
|
|||
|
|
* pdfY = ((renderedH - (clampedY + fieldHpx)) / renderedH) * pageInfo.originalHeight
|
|||
|
|
*/
|
|||
|
|
export function aiCoordsToPagePdfSpace(
|
|||
|
|
coords: AiFieldCoords,
|
|||
|
|
pageWidth: number,
|
|||
|
|
pageHeight: number,
|
|||
|
|
): { x: number; y: number; width: number; height: number } {
|
|||
|
|
const fieldWidth = (coords.widthPct / 100) * pageWidth;
|
|||
|
|
const fieldHeight = (coords.heightPct / 100) * pageHeight;
|
|||
|
|
const screenX = (coords.xPct / 100) * pageWidth;
|
|||
|
|
const screenY = (coords.yPct / 100) * pageHeight; // screen Y from top
|
|||
|
|
|
|||
|
|
const x = screenX;
|
|||
|
|
// PDF y = distance from BOTTOM. screenY is from top, so flip:
|
|||
|
|
// pdfY = pageHeight - screenY - fieldHeight (bottom edge of field)
|
|||
|
|
const y = pageHeight - screenY - fieldHeight;
|
|||
|
|
|
|||
|
|
return { x, y, width: fieldWidth, height: fieldHeight };
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Pattern 4: AI Auto-place API Route
|
|||
|
|
|
|||
|
|
**What:** POST `/api/documents/[id]/ai-prepare` — server-side route that orchestrates:
|
|||
|
|
1. Load document + client from DB
|
|||
|
|
2. Resolve PDF file path
|
|||
|
|
3. Call `extractPdfText()` for all pages
|
|||
|
|
4. Call GPT-4o-mini with extracted text and client profile data
|
|||
|
|
5. Convert AI coords to PDF user-space for each field
|
|||
|
|
6. Write `SignatureFieldData[]` back to DB via direct DB update (same as fields PUT endpoint pattern)
|
|||
|
|
7. Return `{ fields, textFillData }` — client updates local state from response
|
|||
|
|
|
|||
|
|
**When to use:** Called from the "AI Auto-place" button in PreparePanel.
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// Source: pattern matches /api/documents/[id]/prepare/route.ts
|
|||
|
|
// app/api/documents/[id]/ai-prepare/route.ts
|
|||
|
|
|
|||
|
|
export async function POST(
|
|||
|
|
req: Request,
|
|||
|
|
{ params }: { params: Promise<{ id: string }> }
|
|||
|
|
) {
|
|||
|
|
const session = await auth();
|
|||
|
|
if (!session?.user?.id) return new Response('Unauthorized', { status: 401 });
|
|||
|
|
|
|||
|
|
const { id } = await params;
|
|||
|
|
const doc = await db.query.documents.findFirst({
|
|||
|
|
where: eq(documents.id, id),
|
|||
|
|
with: { client: true },
|
|||
|
|
});
|
|||
|
|
if (!doc) return Response.json({ error: 'Not found' }, { status: 404 });
|
|||
|
|
if (!doc.filePath) return Response.json({ error: 'No PDF file' }, { status: 422 });
|
|||
|
|
if (doc.status !== 'Draft') return Response.json({ error: 'Document is locked' }, { status: 403 });
|
|||
|
|
|
|||
|
|
const filePath = path.join(UPLOADS_DIR, doc.filePath);
|
|||
|
|
const pageTexts = await extractPdfText(filePath);
|
|||
|
|
const { fields, textFillData } = await classifyFieldsWithAI(pageTexts, doc.client);
|
|||
|
|
|
|||
|
|
// Write fields to DB (same as PUT /fields)
|
|||
|
|
const [updated] = await db
|
|||
|
|
.update(documents)
|
|||
|
|
.set({ signatureFields: fields })
|
|||
|
|
.where(eq(documents.id, id))
|
|||
|
|
.returning();
|
|||
|
|
|
|||
|
|
return Response.json({ fields: updated.signatureFields, textFillData });
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Pattern 5: AI Auto-place Button in PreparePanel
|
|||
|
|
|
|||
|
|
**What:** Add "AI Auto-place" button to PreparePanel. On click: POST to `/api/documents/[id]/ai-prepare`, receive `{ fields, textFillData }`, update DocumentPageClient state (`setTextFillData`, invalidate preview token). FieldPlacer reloads from DB via its existing `loadFields` useEffect (or receives the updated fields as a prop — both approaches work; recommend refreshing from DB to keep single source of truth).
|
|||
|
|
|
|||
|
|
**Recommended approach:** After the AI route returns, trigger a page reload or force FieldPlacer to re-fetch by incrementing a `fieldReloadKey` prop. The simplest approach is to call `router.refresh()` (which is already used in PreparePanel after prepare+send) or to expose a `reload` callback from FieldPlacer.
|
|||
|
|
|
|||
|
|
### Anti-Patterns to Avoid
|
|||
|
|
|
|||
|
|
- **DO NOT** import pdfjs-dist in client components for server extraction — server-only extraction guard is mandatory (pdfjs-dist in a server route is fine; it just must not be bundled into the client)
|
|||
|
|
- **DO NOT** use `zodResponseFormat` — broken with Zod v4 (issues #1540, #1602, #1709). This is a locked decision.
|
|||
|
|
- **DO NOT** use `workerSrc = new URL('...', import.meta.url)` in the server route — `import.meta.url` may not resolve correctly in all Next.js Route Handler contexts and triggers browser worker initialization
|
|||
|
|
- **DO NOT** use the AI coordinates as-is without conversion — AI returns top-left origin percentages; PDF requires bottom-left origin points
|
|||
|
|
- **DO NOT** lock fields after AI placement — the agent MUST be able to review, adjust, or delete any AI-placed field (this is a success criterion)
|
|||
|
|
- **DO NOT** have the AI route set document status to anything other than Draft — only the prepare route should move status to Sent
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Don't Hand-Roll
|
|||
|
|
|
|||
|
|
| Problem | Don't Build | Use Instead | Why |
|
|||
|
|
|---------|-------------|-------------|-----|
|
|||
|
|
| PDF text extraction | Custom PDF parser | pdfjs-dist legacy build (`getTextContent`) | Already installed; handles fonts, encodings, multi-page; tested by Mozilla |
|
|||
|
|
| Structured AI output | Manual JSON parsing + regex fallback | OpenAI `json_schema` response_format with `strict: true` | 100% schema compliance guaranteed via constrained decoding; no parse failures |
|
|||
|
|
| UUID generation for field IDs | Custom ID generator | `crypto.randomUUID()` | Node built-in; same pattern used in FieldPlacer and schema |
|
|||
|
|
| Field type enum validation | Custom validation function | TypeScript union type `SignatureFieldType` | Already defined in schema.ts; pass as `enum` array in JSON schema |
|
|||
|
|
|
|||
|
|
**Key insight:** Both the PDF extraction and the field write endpoint already exist in the project. Phase 13 is primarily a thin orchestration layer connecting them via an AI call.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Common Pitfalls
|
|||
|
|
|
|||
|
|
### Pitfall 1: Y-Axis Inversion Bug
|
|||
|
|
|
|||
|
|
**What goes wrong:** AI returns `yPct: 10` meaning "10% from the top of the page." Developer naively converts: `y = (10/100) * pageHeight = 79.2 pts`. Stored as y=79.2 in PDF space, which means 79.2 pts from the BOTTOM — so the field appears near the bottom, not 10% from the top.
|
|||
|
|
|
|||
|
|
**Why it happens:** AI models describe positions with top-left origin. PDF user-space uses bottom-left origin.
|
|||
|
|
|
|||
|
|
**How to avoid:** Use `aiCoordsToPagePdfSpace()` which applies the flip: `y = pageHeight - screenY - fieldHeight`. The unit test validates this with US Letter dimensions.
|
|||
|
|
|
|||
|
|
**Warning signs:** Fields appear at the wrong vertical position; fields meant for the top of a page appear near the bottom.
|
|||
|
|
|
|||
|
|
### Pitfall 2: OpenAI Strict Mode Schema Requirements
|
|||
|
|
|
|||
|
|
**What goes wrong:** Passing a JSON schema with `strict: true` that omits a property from `required` or omits `additionalProperties: false` on a nested object. The API returns a 400 error: "Invalid schema for response_format."
|
|||
|
|
|
|||
|
|
**Why it happens:** OpenAI strict mode enforces that EVERY object at EVERY nesting level has ALL properties listed in `required` AND `additionalProperties: false`. The requirement cascades to nested objects.
|
|||
|
|
|
|||
|
|
**How to avoid:** Include `required` listing all property keys and `additionalProperties: false` on EVERY object, including the items schema inside arrays.
|
|||
|
|
|
|||
|
|
**Warning signs:** `400 BadRequestError: Invalid schema for response_format` in server logs.
|
|||
|
|
|
|||
|
|
### Pitfall 3: pdfjs-dist Worker in Node.js Context
|
|||
|
|
|
|||
|
|
**What goes wrong:** Calling `getDocument()` without setting `GlobalWorkerOptions.workerSrc = ''` in Node.js throws: `Error: 'No "GlobalWorkerOptions.workerSrc" specified.'` OR tries to spin up a worker thread and fails.
|
|||
|
|
|
|||
|
|
**Why it happens:** pdf.js was designed for browsers. It tries to auto-detect the worker path via `document.currentScript.src`, which is null in Node.js.
|
|||
|
|
|
|||
|
|
**How to avoid:** Set `GlobalWorkerOptions.workerSrc = ''` at the top of the server-side extract-text module. This enables "fake worker" (synchronous) mode in Node.js — no worker thread needed for text extraction.
|
|||
|
|
|
|||
|
|
**Warning signs:** `No "GlobalWorkerOptions.workerSrc" specified` error in route handler logs.
|
|||
|
|
|
|||
|
|
### Pitfall 4: pdfjs-dist TypeScript Import Path
|
|||
|
|
|
|||
|
|
**What goes wrong:** Importing from `'pdfjs-dist/legacy/build/pdf.mjs'` without adjusting TypeScript config. The `legacy/build/pdf.d.mts` file contains `export * from "pdfjs-dist"` which re-exports the main types, but the import path itself may trigger `skipLibCheck` issues.
|
|||
|
|
|
|||
|
|
**Why it happens:** The legacy build has `.d.mts` extension (ESM TypeScript declaration), which some older TypeScript configurations don't automatically pick up.
|
|||
|
|
|
|||
|
|
**How to avoid:** Confirmed — the project already has `"transpilePackages": ['react-pdf', 'pdfjs-dist']` in next.config.ts. Import `getDocument` and `GlobalWorkerOptions` from `'pdfjs-dist/legacy/build/pdf.mjs'`. If type errors appear, add `@ts-ignore` or use the main `pdfjs-dist` types (they're the same via the re-export).
|
|||
|
|
|
|||
|
|
**Warning signs:** TypeScript error `Could not find declaration file for 'pdfjs-dist/legacy/build/pdf.mjs'`.
|
|||
|
|
|
|||
|
|
### Pitfall 5: FieldPlacer State Not Reflecting AI-Placed Fields
|
|||
|
|
|
|||
|
|
**What goes wrong:** AI route writes fields to DB successfully. But FieldPlacer on screen still shows the previous (empty) fields because its local state isn't updated.
|
|||
|
|
|
|||
|
|
**Why it happens:** FieldPlacer loads fields once on mount via `useEffect`. The AI route write bypasses the React state.
|
|||
|
|
|
|||
|
|
**How to avoid:** After the AI route returns successfully, trigger FieldPlacer to re-fetch from DB. Options:
|
|||
|
|
- Call `router.refresh()` (Next.js App Router — causes server component re-render, but FieldPlacer re-mounts and calls loadFields)
|
|||
|
|
- Add a `fieldReloadKey` prop to FieldPlacer that causes its `useEffect` to re-run when incremented
|
|||
|
|
- Pass the returned `fields` array directly to FieldPlacer via a prop (requires making FieldPlacer accept `initialFields`)
|
|||
|
|
|
|||
|
|
**Recommended:** The simplest approach is to expose an `onAiPlacement` callback that calls `router.refresh()` on DocumentPageClient, OR expose a reload callback from FieldPlacer. Given that FieldPlacer already loads from DB on mount, `router.refresh()` is clean.
|
|||
|
|
|
|||
|
|
### Pitfall 6: Text Field Pre-fill Requires Field IDs
|
|||
|
|
|
|||
|
|
**What goes wrong:** AI route returns `textFillData` keyed by "client-name" or label string. But `textFillData` in the project is keyed by **field UUID** (SignatureFieldData.id). A label-keyed map won't match anything in `textFillData[field.id]` lookups.
|
|||
|
|
|
|||
|
|
**Why it happens:** Phase 12.1 wired textFillData to be keyed by `field.id` (UUID) not by label. This is a confirmed design decision in STATE.md.
|
|||
|
|
|
|||
|
|
**How to avoid:** When building `textFillData` from AI output, use the `id: crypto.randomUUID()` assigned to each AI-placed field in the route handler. The route handler creates the field UUIDs, so it can simultaneously build `{ [fieldId]: prefillValue }` for text fields with prefill values.
|
|||
|
|
|
|||
|
|
**Warning signs:** Text fill values don't appear in the preview even though AI returned them.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Code Examples
|
|||
|
|
|
|||
|
|
### Full Coordinate Conversion with Unit Test Target
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// Source: FieldPlacer.tsx lines 287-295 (exact formula)
|
|||
|
|
// Expected unit test cases for aiCoordsToPagePdfSpace:
|
|||
|
|
|
|||
|
|
// US Letter: 612 × 792 pts
|
|||
|
|
// AI says: text field at xPct=10, yPct=5, widthPct=30, heightPct=5 (top of page)
|
|||
|
|
// Expected: x=61.2, y=792-(0.05*792)-(0.05*792) = 792 - 39.6 - 39.6 = 712.8
|
|||
|
|
// i.e., field bottom edge is 712.8 pts from page bottom (near the top)
|
|||
|
|
|
|||
|
|
// Checkbox: AI says xPct=90, yPct=95, widthPct=3, heightPct=3 (near bottom-right)
|
|||
|
|
// Expected: x = 0.90*612 = 550.8
|
|||
|
|
// fieldH = 0.03*792 = 23.76
|
|||
|
|
// y = 792 - (0.95*792) - 23.76 = 792 - 752.4 - 23.76 = 15.84
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Manual JSON Schema for GPT-4o-mini (Complete)
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// Source: OpenAI structured outputs docs
|
|||
|
|
const response = await openai.chat.completions.create({
|
|||
|
|
model: 'gpt-4o-mini',
|
|||
|
|
messages: [
|
|||
|
|
{
|
|||
|
|
role: 'system',
|
|||
|
|
content: `You are a real estate document form field extractor.
|
|||
|
|
Given extracted text from a PDF page (with context about page number and dimensions),
|
|||
|
|
identify where signature, text, checkbox, initials, and date fields should be placed.
|
|||
|
|
Return fields as percentage positions (0-100) from the TOP-LEFT of the page.
|
|||
|
|
Use these field types: text (for typed values), checkbox, initials, date, client-signature, agent-signature, agent-initials.
|
|||
|
|
For text fields that match the client profile, set prefillValue to the known value. Otherwise use empty string.`,
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
role: 'user',
|
|||
|
|
content: `Client name: ${clientName}\nProperty address: ${propertyAddress}\n\nPDF pages:\n${pagesSummary}`,
|
|||
|
|
},
|
|||
|
|
],
|
|||
|
|
response_format: {
|
|||
|
|
type: 'json_schema',
|
|||
|
|
json_schema: {
|
|||
|
|
name: 'field_placement',
|
|||
|
|
strict: true,
|
|||
|
|
schema: FIELD_PLACEMENT_SCHEMA, // defined above — all fields required, additionalProperties: false
|
|||
|
|
},
|
|||
|
|
},
|
|||
|
|
});
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### FieldPlacer Reload After AI Placement (Recommended Pattern)
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
// In DocumentPageClient.tsx — add aiPlacementKey state
|
|||
|
|
const [aiPlacementKey, setAiPlacementKey] = useState(0);
|
|||
|
|
|
|||
|
|
// Pass to FieldPlacer via PdfViewerWrapper
|
|||
|
|
// In FieldPlacer, add to loadFields useEffect dependency array:
|
|||
|
|
useEffect(() => {
|
|||
|
|
loadFields();
|
|||
|
|
}, [docId, aiPlacementKey]); // re-fetch when key increments
|
|||
|
|
|
|||
|
|
// After AI auto-place succeeds in PreparePanel:
|
|||
|
|
async function handleAiAutoPlace() {
|
|||
|
|
const res = await fetch(`/api/documents/${docId}/ai-prepare`, { method: 'POST' });
|
|||
|
|
if (res.ok) {
|
|||
|
|
const { textFillData: aiTextFill } = await res.json();
|
|||
|
|
setTextFillData(prev => ({ ...prev, ...aiTextFill }));
|
|||
|
|
setAiPlacementKey(k => k + 1); // triggers FieldPlacer reload
|
|||
|
|
setPreviewToken(null);
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## State of the Art
|
|||
|
|
|
|||
|
|
| Old Approach | Current Approach | When Changed | Impact |
|
|||
|
|
|--------------|------------------|--------------|--------|
|
|||
|
|
| AcroForm heuristic field detection (FORMS-V2-01) | GPT-4o-mini text classification | Superseded per REQUIREMENTS.md | AI-based approach handles flat/scanned forms that have no AcroForm fields |
|
|||
|
|
| `zodResponseFormat` helper | Manual `json_schema` response_format | Broken in Zod v4 (as of 2025) | Must write JSON schema manually; more verbose but reliable |
|
|||
|
|
| `disableWorker = true` (pdfjs-dist v2 API) | `GlobalWorkerOptions.workerSrc = ''` | pdfjs-dist v3+ | Correct server-side Node.js pattern for current pdfjs-dist 5.x |
|
|||
|
|
| Positional text extraction via bounding boxes | Text-only extraction with AI classification | Phase 13 approach | Simpler and sufficient for label classification; AI handles the spatial reasoning |
|
|||
|
|
|
|||
|
|
**Deprecated/outdated:**
|
|||
|
|
- `PDFJS.disableWorker = true`: v2 API, removed in v3+. Use `GlobalWorkerOptions.workerSrc = ''` instead.
|
|||
|
|
- `zodResponseFormat`: Works only with Zod v3. Project uses Zod v4.3.6. Use manual json_schema.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Open Questions
|
|||
|
|
|
|||
|
|
1. **OPENAI_API_KEY environment variable**
|
|||
|
|
- What we know: The project's `.env.local` file exists but its contents are private. The openai SDK reads from `process.env.OPENAI_API_KEY` by default.
|
|||
|
|
- What's unclear: Whether OPENAI_API_KEY is already set in `.env.local` for the project.
|
|||
|
|
- Recommendation: The plan should include a step noting that OPENAI_API_KEY must be set in `.env.local`. The route handler should return a clear 503 error if OPENAI_API_KEY is not configured.
|
|||
|
|
|
|||
|
|
2. **Utah REPC 20-page form field accuracy**
|
|||
|
|
- What we know: AI coordinate accuracy on real Utah forms is listed as an explicit concern in STATE.md Blockers/Concerns.
|
|||
|
|
- What's unclear: How well GPT-4o-mini will perform on a dense legal document like the Utah REPC without fine-tuning. Percentage coordinates from text-only extraction (no visual) may be imprecise.
|
|||
|
|
- Recommendation: Plan 13-04 integration test is non-negotiable. Expect iteration on the system prompt. Consider truncating very long page text to stay within GPT-4o-mini context limits.
|
|||
|
|
|
|||
|
|
3. **Token limit for 20-page document**
|
|||
|
|
- What we know: GPT-4o-mini context window is 128K tokens. A 20-page dense legal document could approach 30,000-50,000 tokens of extracted text.
|
|||
|
|
- What's unclear: Whether sending all 20 pages in one call will stay within limits comfortably.
|
|||
|
|
- Recommendation: In `classifyFieldsWithAI`, cap page text at ~2000 chars per page (truncate with ellipsis) to stay well under limits. Total text budget: 20 × 2000 = 40,000 chars ≈ ~10,000 tokens — well within 128K. The plan should include this truncation.
|
|||
|
|
|
|||
|
|
4. **FieldPlacer prop vs. DB reload for displaying AI fields**
|
|||
|
|
- What we know: FieldPlacer currently loads fields via `useEffect([docId])` on mount. After AI placement writes to DB, FieldPlacer needs to re-fetch.
|
|||
|
|
- What's unclear: Whether adding `aiPlacementKey` prop to FieldPlacer (which threads through PdfViewerWrapper) or using `router.refresh()` is cleaner given the current component hierarchy.
|
|||
|
|
- Recommendation: Add `aiPlacementKey` prop to FieldPlacer and thread through PdfViewerWrapper. This avoids a full page re-render from `router.refresh()` and gives more surgical control.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Sources
|
|||
|
|
|
|||
|
|
### Primary (HIGH confidence)
|
|||
|
|
- pdfjs-dist 5.4.296 package introspection — `node_modules/pdfjs-dist/legacy/build/` directory listing, `package.json` main/types fields
|
|||
|
|
- Project codebase — `FieldPlacer.tsx` (coordinate formula, lines 287-295), `prepare-document.test.ts` (existing unit test pattern), `schema.ts` (SignatureFieldData types), `PreparePanel.tsx` (UI pattern for async buttons), `prepare/route.ts` (API route pattern)
|
|||
|
|
- STATE.md decisions — locked: manual json_schema (not zodResponseFormat); pdfjs-dist legacy build; Zod v4 incompatibility confirmed issues #1540 #1602 #1709
|
|||
|
|
- REQUIREMENTS.md AI-01, AI-02 — exact success criteria
|
|||
|
|
|
|||
|
|
### Secondary (MEDIUM confidence)
|
|||
|
|
- [OpenAI Structured Outputs docs](https://platform.openai.com/docs/guides/structured-outputs) — json_schema shape, strict mode requirements (additionalProperties: false, all fields required)
|
|||
|
|
- [openai npm package](https://www.npmjs.com/package/openai) — version 6.32.0 confirmed as latest March 2026
|
|||
|
|
- WebSearch results confirming zodResponseFormat broken with Zod v4: GitHub issues #1540, #1602, #1709, #1739 all open/confirmed
|
|||
|
|
- pdfjs-dist server-side text extraction pattern — multiple WebSearch sources confirm `GlobalWorkerOptions.workerSrc = ''` for Node.js fake-worker mode
|
|||
|
|
|
|||
|
|
### Tertiary (LOW confidence)
|
|||
|
|
- AI coordinate accuracy on Utah REPC forms — untested; flagged as open question in STATE.md
|
|||
|
|
- GPT-4o-mini token usage estimate for 20-page legal document — estimated from typical legal document density, not measured
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Metadata
|
|||
|
|
|
|||
|
|
**Confidence breakdown:**
|
|||
|
|
- Standard stack: HIGH — pdfjs-dist already installed and confirmed v5.4.296; openai v6.32.0 confirmed from npm; project uses Zod v4.3.6 (manual json_schema confirmed required)
|
|||
|
|
- Architecture: HIGH — coordinate formula confirmed from FieldPlacer.tsx source; API route pattern confirmed from prepare/route.ts and fields/route.ts; field ID keying confirmed from STATE.md Phase 12.1 decisions
|
|||
|
|
- Pitfalls: HIGH — Y-axis inversion, Zod v4 zodResponseFormat breakage, worker setup, and field ID keying are all confirmed from authoritative sources (code + STATE.md + verified GitHub issues)
|
|||
|
|
|
|||
|
|
**Research date:** 2026-03-21
|
|||
|
|
**Valid until:** 2026-04-21 (30 days — openai SDK stable; pdfjs-dist stable; Zod v4 issues open but workaround confirmed)
|