Files

642 lines
50 KiB
Markdown
Raw Permalink Normal View History

2026-03-19 11:50:51 -06:00
# Pitfalls Research
2026-04-03 14:47:06 -06:00
**Domain:** Real estate broker web app — v1.2 additions: multi-signer support and Docker production deployment
**Researched:** 2026-04-03
**Confidence:** HIGH — all pitfalls grounded in the v1.1 codebase reviewed directly; no speculative claims. Source code line references included throughout.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
## Context: What v1.2 Is Adding to the Existing System
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
The v1.1 codebase has been reviewed in full. Key facts that make every pitfall below concrete:
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
- `signingTokens` table has one row per document, no `signerEmail` column. One token = one signer = current architecture.
- `SignatureFieldData` (schema.ts) stores `{ id, page, x, y, width, height, type? }` — no `signerEmail` field. All fields belong to the single signer.
- `send/route.ts` calls `createSigningToken(doc.id)` once and emails `client.email`. Multi-signer needs iteration.
- `documents.status` enum is `Draft | Sent | Viewed | Signed`. No per-signer completion state exists.
- `POST /api/sign/[token]` marks `documents.status = 'Signed'` when its one token is claimed. With multiple signers, the first signer to complete will trigger this transition prematurely.
- PDF files live at `process.cwd() + '/uploads'` — a local filesystem path. Docker containers have ephemeral filesystems by default.
- `NEXT_PUBLIC_BASE_URL` is used to construct signing URLs. Variables prefixed `NEXT_PUBLIC_` are inlined at build time in Next.js, not resolved at container startup.
- Nodemailer transporter in `signing-mailer.tsx` calls `createTransporter()` per send — healthy pattern, but reads `CONTACT_SMTP_HOST` at call time, which only works if the env var is present in the container.
- `src/lib/db/index.ts` uses `postgres(url)` with no explicit `max` connection limit. In Docker, the `postgres` npm package defaults to `10` connections per instance. Against Neon, the free tier allows 10 concurrent connections total — one container saturates this budget entirely.
- `next.config.ts` declares `serverExternalPackages: ['@napi-rs/canvas']`. This native binary must be present in the Docker image. The package ships platform-specific `.node` files selected by npm at install time. If the Docker image is built on ARM (Apple Silicon) and run on x86_64 Linux, the wrong binary is included.
- `package.json` lists `@vercel/blob` as a production dependency. It is not used anywhere in the codebase. Its presence creates a risk of accidental use in future code that would break in a non-Vercel Docker deployment.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
## Summary
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
Eight risk areas for v1.2:
1. **Multi-signer completion detection** — the current "first signer marks Signed" pattern will falsely complete documents.
2. **Docker filesystem and env var** — Next.js bakes `NEXT_PUBLIC_*` at build time; container loses uploads unless a volume is mounted; `DATABASE_URL` and SMTP secrets silently absent in container.
3. **SMTP in Docker** — not a DNS problem for external SMTP services, but env var injection failure is the confirmed root cause of the reported email breakage.
4. **PDF assembly on partial completion** — the final merged PDF must only be produced once, after all signers complete, without race conditions.
5. **Token security** — multiple tokens per document opens surfaces that a single-token system didn't have.
6. **Neon connection pool exhaustion**`postgres` npm client's default 10 connections saturates Neon's free tier connection limit in a single container.
7. **`@napi-rs/canvas` native binary** — cross-platform Docker builds break this native module without explicit platform targeting.
8. **`@vercel/blob` dead dependency** — installed but unused; its presence risks accidental use in code that would silently fail outside Vercel.
---
## Multi-Signer Pitfalls
### Pitfall 1: First Signer Marks Document "Signed" — Completion Fires Prematurely
2026-03-19 11:50:51 -06:00
**What goes wrong:**
2026-04-03 14:47:06 -06:00
`POST /api/sign/[token]` at line 254263 of the current route unconditionally executes:
```typescript
await db.update(documents).set({ status: 'Signed', signedAt: now, ... })
.where(eq(documents.id, payload.documentId));
```
With two signers, Signer A completes and triggers this. The document is now `Signed`. Signer B's token is still valid, but when Signer B opens their signing page GET request, it checks `doc.signatureFields` filtered by `isClientVisibleField`. The document's fields are all there — nothing prevents Signer B from completing. Two `signature_submitted` audit events are logged for the same document, two conflicting `_signed.pdf` files may be written, and the agent receives two "document signed" emails. The final PDF hash stored in `documents.pdfHash` is from whichever signer completed last and overwrote the row.
2026-03-19 11:50:51 -06:00
**Why it happens:**
2026-04-03 14:47:06 -06:00
The single-signer assumption is load-bearing in the POST handler. Completion detection is a single UPDATE, not a query across all tokens for the document.
2026-03-19 11:50:51 -06:00
**How to avoid:**
2026-04-03 14:47:06 -06:00
Add a `signerEmail TEXT NOT NULL` column to `signingTokens`. Completion detection becomes: after claiming a token (the atomic UPDATE that prevents double-submission), query `SELECT COUNT(*) FROM signing_tokens WHERE document_id = ? AND used_at IS NULL`. If count reaches zero, all signers have completed — only then trigger final PDF assembly and agent notification. Protect this with a database transaction so the count query and the "mark Signed" update are atomic. Never set `documents.status = 'Signed'` until the zero-remaining-tokens check passes.
2026-03-19 11:50:51 -06:00
**Warning signs:**
2026-04-03 14:47:06 -06:00
- `POST /api/sign/[token]` sets `status = 'Signed'` without first counting remaining unclaimed tokens.
- Agent receives two notification emails after a two-signer document is tested.
- `documents.signedAt` is overwritten by both signers (last-write-wins).
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Multi-signer schema phase — before any send or signing UI is changed, establish the completion detection query.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
### Pitfall 2: Race Condition — Two Signers Complete Simultaneously, Both Trigger Final PDF Assembly
2026-03-19 11:50:51 -06:00
**What goes wrong:**
2026-04-03 14:47:06 -06:00
Signer A and Signer B submit within milliseconds of each other (common if they are in the same room). Both claim their respective tokens atomically — that part works. Both then execute the "count remaining unclaimed tokens" check. If that check is not inside the same database transaction as the token claim, both reads may return 0 remaining (after the other's claim propagated), and both handlers proceed to assemble the final merged PDF simultaneously. Two concurrent writes to `{docId}_signed.pdf` corrupt the file (partial PDF bytes interleaved), or the second write silently overwrites the first.
2026-03-19 11:50:51 -06:00
**Why it happens:**
2026-04-03 14:47:06 -06:00
The atomic token claim (`UPDATE ... WHERE used_at IS NULL RETURNING`) is a single row update. The subsequent completion check is a separate query. Two handlers can interleave between those two operations.
2026-03-19 11:50:51 -06:00
**How to avoid:**
2026-04-03 14:47:06 -06:00
Use a `completionTriggeredAt TIMESTAMP` column on `documents` with a one-time-set guard:
```typescript
const won = await db.update(documents)
.set({ completionTriggeredAt: new Date() })
.where(and(eq(documents.id, docId), isNull(documents.completionTriggeredAt)))
.returning({ id: documents.id });
if (won.length === 0) return; // another handler already triggered completion
// proceed to final PDF assembly
```
This is the same pattern the existing token claim uses (`UPDATE ... WHERE used_at IS NULL RETURNING`). If 0 rows returned, another handler already won the race; skip assembly silently.
2026-03-19 11:50:51 -06:00
**Warning signs:**
2026-04-03 14:47:06 -06:00
- Two concurrent POST requests for the same document produce two `_signed.pdf` files.
- The `documents` table has no `completionTriggeredAt` column.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Multi-signer schema phase — establish this pattern alongside the completion detection fix.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
### Pitfall 3: Legacy Single-Signer Documents Break When signingTokens Gains signerEmail
2026-03-19 11:50:51 -06:00
**What goes wrong:**
2026-04-03 14:47:06 -06:00
v1.0 and v1.1 documents have one row in `signingTokens` with no `signerEmail`. When the multi-signer schema adds `signerEmail NOT NULL` to `signingTokens`, all existing token rows become invalid (null violates NOT NULL). If the column is added without a migration that backfills existing rows, all existing signing links stop working: the token lookup succeeds but any code reading `token.signerEmail` throws a null dereference.
2026-03-19 11:50:51 -06:00
**Why it happens:**
2026-04-03 14:47:06 -06:00
Drizzle migrations add the column in a single ALTER TABLE. There is no Drizzle migration command that backfills legacy data — that requires a separate SQL step in the migration file.
2026-03-19 11:50:51 -06:00
**How to avoid:**
2026-04-03 14:47:06 -06:00
Add `signerEmail` as `TEXT` (nullable) initially. Backfill existing rows with the client's email via a JOIN at migration time. Then add the NOT NULL constraint in a second migration once backfill is confirmed. Alternatively, add `signerEmail TEXT DEFAULT ''` and document that empty string means "legacy single-signer." All code reading `signerEmail` must handle the legacy empty/null case.
2026-03-19 11:50:51 -06:00
**Warning signs:**
2026-04-03 14:47:06 -06:00
- Drizzle migration adds `signer_email TEXT NOT NULL` in one step with no `DEFAULT` and no backfill SQL.
- A v1.0 document's signing link is not tested after migration.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Multi-signer schema phase — include legacy backfill SQL in the migration script.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
### Pitfall 4: Field-to-Signer Tag Stored in JSONB — Queries Cannot Filter by Signer Efficiently
2026-03-19 11:50:51 -06:00
**What goes wrong:**
2026-04-03 14:47:06 -06:00
`signatureFields JSONB` is an array of field objects. Adding `signerEmail` to each field object is the right call for field filtering in the signing page (already done via `isClientVisibleField`). But if the completion detection, status dashboard, or "who has signed" query tries to derive signer list from the JSONB array, it requires a Postgres JSONB containment query (`@>` or `jsonb_array_elements`). These are unindexed by default and slow on large arrays. More critically, if the agent changes a field's `signerEmail` tag after the document has been sent, the JSONB update does not cascade to any `signingTokens` rows — the token was issued for the old email.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**How to avoid:**
The authoritative list of signers and their completion state lives in `signingTokens`, not in the JSONB. `signingTokens.signerEmail` is the source of truth for "who needs to sign." The JSONB field's `signerEmail` is used only at signing-page render time to filter which fields a given signer sees. Once a document is Sent (tokens issued), the JSONB field tags are considered frozen — re-tagging fields on a Sent document is not permitted without voiding the existing tokens.
**Warning signs:**
- A query tries to derive the recipient list from `signatureFields JSONB` rather than from `signingTokens`.
**Phase to address:** Multi-signer schema phase — document this invariant in a code comment on `signingTokens`.
---
### Pitfall 5: Audit Trail Gap — No Record of Which Signer Completed Which Field
**What goes wrong:**
The current `audit_events` table has `eventType: 'signature_submitted'` at the document level. With one signer this is unambiguous. With two signers, two `signature_submitted` events are logged for the same `documentId` with no `signerEmail` on the event. The legal audit trail cannot distinguish "Seller A signed at 14:00" from "Seller B signed at 14:05" — both appear as anonymous "signature submitted" events on the same document.
**Why this matters:**
Utah e-signature law requires proof of who signed what and when. An undifferentiated audit log is a legal compliance gap (see existing LEGAL-03 compliance requirement in v1.0).
2026-03-19 11:50:51 -06:00
**How to avoid:**
2026-04-03 14:47:06 -06:00
Add `signerEmail TEXT` to `auditEvents` (nullable, to preserve backward compatibility with v1.0 events). When logging `signature_submitted` in multi-signer mode, include the `signerEmail` from the claimed token row in the event metadata. The `metadata JSONB` column already exists and can carry this without a schema change — use `metadata: { signerEmail: tokenRow.signerEmail }` as a minimum before a proper column is added.
2026-03-19 11:50:51 -06:00
**Warning signs:**
2026-04-03 14:47:06 -06:00
- Two `signature_submitted` events logged for the same `documentId` with no distinguishing field.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Multi-signer signing flow phase — include signer identity in audit events before the first multi-signer document is tested.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
### Pitfall 6: Document Status "Viewed" Conflicts Across Signers
2026-03-19 11:50:51 -06:00
**What goes wrong:**
2026-04-03 14:47:06 -06:00
The current GET `/api/sign/[token]` sets `documents.status = 'Viewed'` when any signer opens their link (line 81 of the current route). With two signers, Signer A opens the link → document becomes Viewed. Signer A backs out without signing. Signer B hasn't even opened their link yet. Agent sees "Viewed" status and assumes both signers have engaged. If Signer A then signs, status jumps from Viewed → Signed (via the POST handler), bypassing any intermediate state. The agent has no way to know that Signer B never opened their link.
2026-03-19 11:50:51 -06:00
**How to avoid:**
2026-04-03 14:47:06 -06:00
Per-signer status belongs in `signingTokens`, not in `documents`. Add a `viewedAt TIMESTAMP` column to `signingTokens`. The GET handler sets `signingTokens.viewedAt = NOW()` for the specific token, not `documents.status`. The documents-level status becomes a computed aggregate: `Draft``Sent` (any token issued) → `Partially Signed` (some tokens usedAt set) → `Signed` (all tokens usedAt set). Consider adding `Partially Signed` to the `documentStatusEnum`, or compute it in the agent dashboard query.
2026-03-19 11:50:51 -06:00
**Warning signs:**
2026-04-03 14:47:06 -06:00
- The signing GET handler writes `documents.status = 'Viewed'` instead of `signingTokens.viewedAt = NOW()`.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Multi-signer schema phase — add `viewedAt` to `signingTokens` and derive document status from token states.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
## Docker/Deployment Pitfalls
### Pitfall 7: NEXT_PUBLIC_BASE_URL Is Baked at Build Time — Wrong URL in Production Container
2026-03-19 11:50:51 -06:00
**What goes wrong:**
2026-04-03 14:47:06 -06:00
`send/route.ts` line 35 reads:
```typescript
const baseUrl = process.env.NEXT_PUBLIC_BASE_URL ?? 'http://localhost:3000';
```
In Next.js, any variable prefixed `NEXT_PUBLIC_` is substituted at `next build` time — it becomes a string literal in the compiled JavaScript bundle. If the Docker image is built with `NEXT_PUBLIC_BASE_URL=http://localhost:3000` (or not set at all), every signing URL emailed to clients will point to `localhost:3000` regardless of what is set in the container's runtime environment. The client clicks the link and gets "connection refused."
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**This is specific to `NEXT_PUBLIC_*` variables.** Server-only variables (no `NEXT_PUBLIC_` prefix) ARE read at runtime from the container environment. Mixing the two causes precisely the confusion reported in this project.
2026-03-19 11:50:51 -06:00
**How to avoid:**
2026-04-03 14:47:06 -06:00
For variables that need to be available on the server only (like `BASE_URL` for constructing server-side URLs), remove the `NEXT_PUBLIC_` prefix. `NEXT_PUBLIC_` should only be used for variables that need to reach the browser bundle. The signing URL is constructed in a server-side API route — it does not need `NEXT_PUBLIC_`. Rename to `SIGNING_BASE_URL` (no prefix), read it only in API routes, and inject it into the container environment at runtime via Docker Compose `environment:` block.
2026-03-19 11:50:51 -06:00
**Warning signs:**
2026-04-03 14:47:06 -06:00
- Signing emails send but clicking the link shows a browser connection error or goes to localhost.
- `NEXT_PUBLIC_BASE_URL` is set in `docker-compose.yml` under `environment:` and the developer assumes this is sufficient — it is not, because the value was already baked in during `docker build`.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Docker deployment phase — rename the variable and audit all `NEXT_PUBLIC_` usages before building the production image.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
### Pitfall 8: Uploads Directory Is Lost on Container Restart
2026-03-19 11:50:51 -06:00
**What goes wrong:**
2026-04-03 14:47:06 -06:00
All uploaded PDFs, prepared PDFs, and signed PDFs are written to `process.cwd() + '/uploads'`. In the Docker container, `process.cwd()` is the directory where Next.js starts — typically `/app`. The path `/app/uploads` is inside the container's writable layer, which is ephemeral. When the container is stopped and recreated (deployment, crash, `docker compose up --force-recreate`), all PDFs are gone. Signed documents that were legally executed are permanently lost. Clients cannot download their signed copies. The agent loses the audit record.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**How to avoid:**
Mount a named Docker volume at `/app/uploads` (or whatever `process.cwd()` resolves to in the container) in `docker-compose.yml`:
```yaml
services:
app:
volumes:
- uploads_data:/app/uploads
volumes:
uploads_data:
```
Verify the mount path matches `process.cwd()` inside the container — do not assume it is `/app`. Run `docker exec <container> node -e "console.log(process.cwd())"` to confirm. The volume must also be backed up separately; Docker named volumes are not automatically backed up.
**Warning signs:**
- No `volumes:` key appears in `docker-compose.yml` for the app service.
- After a container restart, the agent portal shows documents with no downloadable PDF (the file path in the DB is valid but the file does not exist on disk).
**Phase to address:** Docker deployment phase — establish the volume before any production upload occurs.
---
### Pitfall 9: Database Connection String Absent in Container — App Boots but All Queries Fail
**What goes wrong:**
`DATABASE_URL` and other secrets (`SIGNING_JWT_SECRET`, `CONTACT_SMTP_HOST`, etc.) are not committed to the repository. In development they are in `.env.local`. In a Docker container, `.env.local` is not automatically copied (`.gitignore` typically excludes it, and `COPY . .` in a Dockerfile may or may not include it depending on `.dockerignore`). If the Docker image is built without the secret baked in (correct practice) but the `docker-compose.yml` does not inject it via `environment:` or `env_file:`, the container starts successfully — `next start` does not validate env vars at startup — but every database query throws "missing connection string" at request time. The agent portal loads its login page (server components that don't query the DB) but crashes on any data operation.
The `src/lib/db/index.ts` lazy singleton does throw `"DATABASE_URL environment variable is not set"` when first accessed — but this error is silent at startup and only surfaces at first request.
**How to avoid:**
Create a `.env.production` file (not committed) that is referenced in `docker-compose.yml` via `env_file: .env.production`. Alternatively, use Docker Compose `environment:` blocks with explicit variable names. Validate at container startup by adding a health check endpoint (`/api/health`) that runs `SELECT 1` against the database and returns 200 only when the connection is live. Gate the container's `healthcheck:` on this endpoint so Docker Compose's `depends_on: condition: service_healthy` prevents the app from accepting traffic before the DB is reachable.
**Warning signs:**
- The login page loads in Docker but the agent portal shows 500 errors on every page.
- `docker logs <container>` shows "Environment variable DATABASE_URL is not set" at the first request, not at startup.
- The `.env.production` or secrets file is not referenced anywhere in `docker-compose.yml`.
**Phase to address:** Docker deployment phase — validate all required env vars against a checklist before the first production deploy.
---
### Pitfall 10: PostgreSQL Container and App Container Start in Wrong Order — DB Not Ready
**What goes wrong:**
`docker compose up` starts all services in parallel by default. The Next.js app container may attempt its first database query before PostgreSQL has accepted connections. Drizzle's `postgres` client (using the `postgres` npm package) throws `ECONNREFUSED` or `ENOTFOUND` on the first query. The app container may crash-loop if the error is unhandled at startup, or silently return 500s until the DB is ready if queries are only made at request time.
2026-03-19 11:50:51 -06:00
**How to avoid:**
2026-04-03 14:47:06 -06:00
Add `depends_on` with `condition: service_healthy` in `docker-compose.yml`. The PostgreSQL service needs a `healthcheck:` using `pg_isready`:
```yaml
services:
db:
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
app:
depends_on:
db:
condition: service_healthy
```
Also run Drizzle migrations as part of app startup (add `drizzle-kit migrate` to the container's `command:` or an entrypoint script) so the schema is applied before the first request. Without this, a fresh deployment against an empty database will fail on every query.
2026-03-19 11:50:51 -06:00
**Warning signs:**
2026-04-03 14:47:06 -06:00
- `docker-compose.yml` has no `healthcheck:` on the database service.
- `docker-compose.yml` has no `depends_on` on the app service.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Docker deployment phase — write the complete `docker-compose.yml` with health checks before the first production deploy.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
### Pitfall 11: Neon Connection Pool Exhaustion in Docker
2026-03-19 11:50:51 -06:00
**What goes wrong:**
2026-04-03 14:47:06 -06:00
`src/lib/db/index.ts` creates a `postgres(url)` client with no explicit `max` parameter. The `postgres` npm package defaults to `max: 10` connections per process. Neon's free tier allows 10 concurrent connections total. One Next.js container with default settings exhausts the entire connection budget. A second container (staging + production running simultaneously, or a restart overlap) causes all new queries to queue indefinitely until connections are freed, manifesting as timeouts on every request.
Additionally, the current proxy-singleton pattern in `db/index.ts` creates one pool per Node.js process. Next.js in development mode can hot-reload modules, creating multiple pool instances per dev session. In production this is not a problem, but it can silently leak connections during CI test runs or development stress tests.
2026-03-19 11:50:51 -06:00
**Why it happens:**
2026-04-03 14:47:06 -06:00
The `postgres` npm package does not warn when connection limits are exceeded — it silently queues queries. The Neon dashboard shows connection count; the app shows only request timeouts with no clear error.
2026-03-19 11:50:51 -06:00
**How to avoid:**
2026-04-03 14:47:06 -06:00
Set an explicit `max` connection limit appropriate for the deployment. For a single-container deployment against Neon free tier (10 connection limit), use `postgres(url, { max: 5 })` to leave headroom for migrations, admin queries, and overlap during deployments. For paid Neon tiers, scale accordingly. Add `idle_timeout: 20` (seconds) to release idle connections promptly. Add `connect_timeout: 10` to surface connection failures quickly rather than queuing indefinitely.
Recommended `db/index.ts` configuration:
```typescript
const client = postgres(url, {
max: 5, // conservative for Neon free tier; increase with paid plan
idle_timeout: 20, // release idle connections within 20s
connect_timeout: 10, // fail fast if Neon is unreachable
});
```
2026-03-19 11:50:51 -06:00
**Warning signs:**
2026-04-03 14:47:06 -06:00
- `postgres(url)` called with no second argument in `db/index.ts`.
- Neon dashboard shows connection count at ceiling during normal single-user usage.
- Requests time out with no database error in logs — only generic "fetch failed" errors.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Docker deployment phase — configure connection pool limits before the first production deploy.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
### Pitfall 12: @napi-rs/canvas Native Binary — Wrong Platform in Docker Image
2026-03-19 11:50:51 -06:00
**What goes wrong:**
2026-04-03 14:47:06 -06:00
`@napi-rs/canvas` is declared in `serverExternalPackages` in `next.config.ts`, which tells Next.js to load it as a native Node.js module rather than bundling it. The package ships pre-compiled `.node` binary files for specific platforms (darwin-arm64, linux-x64-gnu, linux-arm64-gnu, etc.). When `npm install` runs on an Apple Silicon Mac during development, npm downloads the `darwin-arm64` binary. If the Docker image is built by running `npm install` inside a `node:alpine` container (which is `linux-musl`, not `linux-gnu`), the `linux-x64-musl` binary is selected — but `@napi-rs/canvas` does not publish musl builds. The canvas module fails to load at runtime with `Error: /app/node_modules/@napi-rs/canvas/...node: invalid ELF header`.
Even if the Docker base image is `node:20-slim` (Debian, linux-gnu), building on an ARM host and deploying to an x86 server results in the wrong binary unless the `--platform` flag is used during `docker build`.
2026-03-19 11:50:51 -06:00
**How to avoid:**
2026-04-03 14:47:06 -06:00
Always build the Docker image with an explicit platform target matching the production host:
```bash
docker build --platform linux/amd64 -t app .
```
Use `node:20-slim` (Debian-based, glibc) as the Docker base image — not `node:20-alpine` (musl). Verify the canvas module loads in the container before deploying:
```bash
docker exec <container> node -e "require('@napi-rs/canvas'); console.log('canvas OK')"
```
If developing on ARM and deploying to x86, add `--platform linux/amd64` to the `docker build` command in the deployment runbook and CI pipeline.
2026-03-19 11:50:51 -06:00
**Warning signs:**
2026-04-03 14:47:06 -06:00
- `next.config.ts` lists `@napi-rs/canvas` in `serverExternalPackages`.
- Docker base image is `node:alpine`.
- The build machine architecture differs from the deployment target.
- Runtime error: `invalid ELF header` or `Cannot find module '@napi-rs/canvas'` after a clean image build.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Docker deployment phase — verify canvas module compatibility before the first production build.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
## Email/SMTP Pitfalls
### Pitfall 13: SMTP Env Vars Absent in Container — Root Cause of Reported Email Breakage
2026-03-19 11:50:51 -06:00
**What goes wrong:**
2026-04-03 14:47:06 -06:00
This is the reported issue: email worked in development but broke when deployed to Docker. The most likely root cause is that `CONTACT_SMTP_HOST`, `CONTACT_SMTP_PORT`, `CONTACT_EMAIL_USER`, `CONTACT_EMAIL_PASS`, and `AGENT_EMAIL` are not present in the container environment. `signing-mailer.tsx` reads these in `createTransporter()` which is called at send time (not at module load) — so the missing env vars do not cause a startup error. The first signing email attempt fails with Nodemailer throwing `connect ECONNREFUSED` (if host resolves to nothing) or `Invalid login` (if credentials are absent).
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Why it looks like a DNS problem but isn't:**
Docker containers on a bridge network use the host's DNS resolver (or Docker's embedded resolver) and can reach external SMTP servers by hostname without any special configuration. The SMTP server (`CONTACT_SMTP_HOST`) is an external service (e.g., Mailgun, SendGrid, or a personal SMTP relay) — Docker does not change its reachability. The error is env var injection failure, not DNS.
**Verification steps before attempting the Docker fix:**
1. `docker exec <container> printenv CONTACT_SMTP_HOST` — if empty, the env var is missing.
2. `docker exec <container> node -e "const n = require('nodemailer'); n.createTransport({host: process.env.CONTACT_SMTP_HOST, port: 465, secure: true, auth: {user: process.env.CONTACT_EMAIL_USER, pass: process.env.CONTACT_EMAIL_PASS}}).verify(console.log)"` — tests SMTP connectivity from inside the container.
**How to avoid:**
Include all SMTP variables in the `env_file:` or `environment:` block of the app service in `docker-compose.yml`. Use an `.env.production` file that is manually provisioned on the Docker host (not committed). Consider using Docker secrets (mounted files) for the SMTP password rather than environment variables if the host is shared.
**Warning signs:**
- `docker exec <container> printenv CONTACT_SMTP_HOST` returns empty.
- Signing emails silently fail with no error until first send attempt.
**Phase to address:** Docker deployment phase — SMTP env var verification is the first check in the deployment runbook.
---
### Pitfall 14: Nodemailer Transporter Created With Mismatched Port and TLS Settings
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**What goes wrong:**
`signing-mailer.tsx` contains:
```typescript
port: Number(process.env.CONTACT_SMTP_PORT ?? 465),
secure: Number(process.env.CONTACT_SMTP_PORT ?? 465) === 465,
```
`contact-mailer.ts` contains:
2026-03-21 11:28:42 -06:00
```typescript
2026-04-03 14:47:06 -06:00
port: Number(process.env.CONTACT_SMTP_PORT ?? 587),
secure: false, // STARTTLS on port 587
2026-03-21 11:28:42 -06:00
```
2026-04-03 14:47:06 -06:00
The two mailers use different defaults for the same env var. If `CONTACT_SMTP_PORT` is not set in the container, the signing mailer assumes port 465 (TLS), but the contact form mailer assumes port 587 (STARTTLS). If the SMTP provider only supports one of these, one mailer will connect and the other will time out. The mismatch is invisible until both code paths are exercised in production.
2026-03-19 11:50:51 -06:00
**How to avoid:**
2026-04-03 14:47:06 -06:00
Require `CONTACT_SMTP_PORT` explicitly — remove the fallback defaults and add a startup validation check that throws if this variable is missing. Use a single `createSmtpTransporter()` utility function shared by both mailers, not two separate inline `createTransport()` calls with different defaults. Document the required env var values in a `DEPLOYMENT.md` or the `docker-compose.yml` comments.
2026-03-19 11:50:51 -06:00
**Warning signs:**
2026-04-03 14:47:06 -06:00
- Two separate inline `createTransport()` calls with different `port` defaults for the same env var.
- Only one of the two email paths (signing email vs. contact form) is tested in Docker.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Docker deployment phase — consolidate SMTP transporter creation before the first production email test.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
### Pitfall 15: Multi-Signer Email Loop Fails Halfway — No Partial-Send Recovery
**What goes wrong:**
When sending to three signers, the send route will loop: create token 1, email Signer 1, create token 2, email Signer 2, create token 3, email Signer 3. If email to Signer 2 fails (SMTP timeout, invalid address), tokens 1 and 3 may still be created in the database but Signer 3 never receives their email. The document is now in an inconsistent state: tokens exist for recipients who were never emailed. Signer 1 signs, completion detection counts 2 remaining unclaimed tokens (Signers 2 and 3 never signed), document never reaches "Signed."
**How to avoid:**
Create all tokens before sending any emails. Wrap token creation in a transaction — if any token INSERT fails, roll back all tokens and return an error before any emails are sent. Send emails outside the transaction (SMTP is not transactional). If an email send fails, mark that token as `superseded` (add a `supersededAt` column to `signingTokens`) rather than deleting it, and surface the partial-send failure to the agent with a "resend to failed recipients" option. Never leave unclaimed tokens orphaned by partial email failure.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Warning signs:**
- The send loop interleaves token creation and email sending (create token 1, send email 1, create token 2, send email 2...) rather than creating all tokens atomically first.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Multi-signer send phase — design the send loop with transactional token creation from the start.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
## PDF Assembly Pitfalls
### Pitfall 16: Final PDF Assembly Runs Multiple Times — Duplicate Signed PDFs
**What goes wrong:**
Completion detection triggers PDF assembly (merging all signer contributions into one final PDF). If the race condition guard (Pitfall 2) is not in place, assembly runs twice. Even with the guard, if the assembly function crashes partway through and the `completionTriggeredAt` was already set, there is no way to retry assembly — the guard prevents re-entry and the document is stuck with no signed PDF.
**How to avoid:**
Separate the "completion triggered" flag from the "signed PDF ready" flag. Add both `completionTriggeredAt TIMESTAMP` (prevents double-triggering) and `signedFilePath TEXT` (set only when PDF is successfully written). If `completionTriggeredAt` is set but `signedFilePath` is null after 60 seconds, an admin retry endpoint can reset `completionTriggeredAt` to null to allow re-triggering. The existing atomic rename pattern (`tmp → final`) in `embed-signature.ts` already prevents partial PDF corruption — preserve this in the multi-signer assembly code.
**Warning signs:**
- Only a single flag (`completionTriggeredAt`) is used to track both triggering and completion.
- No retry mechanism exists for a stuck assembly.
**Phase to address:** Multi-signer completion phase — implement idempotent assembly with separate trigger and completion flags.
---
### Pitfall 17: Multi-Signer Final PDF — Which Prepared PDF Is the Base?
**What goes wrong:**
In the current single-signer flow, `embedSignatureInPdf` reads from `doc.preparedFilePath` (the agent-prepared PDF with text fills and agent signatures already embedded) and writes to `_signed.pdf`. With multiple signers, each signer's signature needs to be embedded sequentially onto the same prepared PDF base. If two handlers run concurrently and both read from `preparedFilePath`, modify it in memory, and write independent output PDFs, the final "merge" step needs a different strategy — you cannot simply append two separately-signed PDFs into one document without losing the shared base.
**How to avoid:**
The correct architecture for multi-signer PDF assembly:
1. Each signer's POST handler embeds only that signer's signatures into an intermediate file: `{docId}_partial_{signerEmail_hash}.pdf`. This intermediate file is written atomically (tmp → rename). It is NOT the final document.
2. When completion is triggered (all tokens claimed), a single assembly function reads the prepared PDF once, iterates all signers' signature data (from DB or intermediate files), embeds all signatures in one pass, and writes `{docId}_signed.pdf`.
3. The `pdfHash` is computed only from the final assembled PDF, not from any intermediate.
This avoids the read-modify-write race entirely. Intermediate files are cleaned up after successful final assembly.
**Warning signs:**
- Each signer's POST handler directly writes to `_signed.pdf` rather than an intermediate file.
- The final assembly step reads from two separately-signed PDF files and tries to merge them.
**Phase to address:** Multi-signer completion phase — establish the intermediate file pattern before any signing submission code is written.
---
### Pitfall 18: Temp File Accumulation on Failed Assemblies
**What goes wrong:**
The current code already creates a temp file during date stamping (`preparedAbsPath.datestamped.tmp`) and cleans it up with `unlink().catch(() => {})`. Multi-signer assembly will create intermediate partial files. If the assembly handler crashes between writing intermediates and producing the final PDF, those temp files are never cleaned up. Over time, the `uploads/` directory fills with orphaned intermediate files. On the home Docker server with limited disk, this causes write failures on new documents.
**How to avoid:**
Name all intermediate and temp files with a recognizable pattern (`*.tmp`, `*_partial_*.pdf`). Add a periodic cleanup job (a Next.js route called by a cron or a simple setInterval in a route handler) that deletes `*.tmp` and `*_partial_*.pdf` files older than 24 hours. Log a warning when cleanup finds orphaned files — this surfaces incomplete assemblies that need investigation.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Warning signs:**
- The `uploads/` directory grows unbounded over time.
- Partial files from failed assemblies remain after a document is marked Signed.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Multi-signer completion phase — add cleanup alongside the assembly logic.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
## Security Pitfalls
### Pitfall 19: Multiple Tokens Per Document — Token Enumeration Attack
**What goes wrong:**
In the single-signer system, one token is issued per document. An attacker who intercepts or guesses a token can sign one document. With multi-signer, multiple tokens are issued for the same document. If token generation uses a predictable pattern (e.g., sequential IDs, short UUIDs, or low-entropy random values), an attacker who holds one valid token for a document can enumerate sibling tokens for the same document by brute-forcing nearby values.
**Current state:** `createSigningToken` uses `crypto.randomUUID()` for the JTI and `SignJWT` with HS256. UUID v4 provides 122 bits of randomness — sufficient. The risk is theoretical given current implementation but becomes concrete if the JTI generation is ever changed.
**How to avoid:**
Keep using `crypto.randomUUID()` for JTI. Do not add any sequential or human-readable component to the JTI. Ensure the JWT is verified before the JTI is looked up in the database — `verifySigningToken()` already does this (JWT signature check first, then DB lookup). Add rate limiting on the signing GET and POST endpoints: `MAX 10 requests per IP per minute` prevents brute force. Log and alert on `status: 'invalid'` responses that repeat from the same IP.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Warning signs:**
- JTI generation switches from `crypto.randomUUID()` to a sequential or short-UUID pattern.
- No rate limiting exists on `/api/sign/[token]` GET or POST.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Multi-signer send phase — add rate limiting before issuing multiple tokens per document.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
### Pitfall 20: Token Shared Between Signers — Signer A Uses Signer B's Token
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**What goes wrong:**
With multi-signer, the system issues separate tokens per signer email. But the signing GET handler at line 90 currently returns ALL client-visible fields (filtered by `isClientVisibleField`), not fields tagged to the specific signer. If Signer A somehow obtains Signer B's token (e.g., email forward, shared email account, phishing), Signer A sees Signer B's fields and can sign them. In real estate, this is equivalent to signing another party's name on a contract — a serious legal issue.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
The signing POST handler (lines 210-213) filters `signableFields` to all `client-signature` and `initials` fields for the entire document — it does not restrict by signer. A cross-token submission would succeed server-side.
**How to avoid:**
After multi-signer is implemented, the signing GET handler must filter `signatureFields` by `field.signerEmail === tokenRow.signerEmail`. The signing POST handler must verify that the field IDs in the `signatures` request body correspond only to fields tagged to `tokenRow.signerEmail` — reject any submission that includes field IDs not assigned to that signer. This is a server-side enforcement, not a UI concern.
**Warning signs:**
- The signing GET handler's `signatureFields` filter does not include a `signerEmail` check.
- The signing POST handler's `signableFields` filter does not restrict by `signerEmail`.
**Phase to address:** Multi-signer signing flow phase — add signer-field binding validation to both GET and POST handlers.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
### Pitfall 21: Completion Notification Email Sent to Wrong Recipients
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**What goes wrong:**
The current `sendAgentNotificationEmail` sends to `process.env.AGENT_EMAIL`. In multi-signer, the requirement is to send the final merged PDF to all signers AND the agent when completion occurs. If the recipient list is derived from `documents.emailAddresses` (the JSONB array collected at prepare time), and that array is stale (e.g., the agent changed a signer's email between prepare and send), the final PDF goes to the old address.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
A worse variant: if `emailAddresses` contains CC addresses that are NOT signers (e.g., a title company contact), those recipients receive the completed PDF immediately — before the agent has reviewed it. For a solo agent workflow, this is likely acceptable, but it should be explicit.
**How to avoid:**
Derive the final recipient list from `signingTokens.signerEmail` (the authoritative record of who was actually sent a token), not from `documents.emailAddresses`. Separate "recipients who receive the signing link" from "recipients who receive the completed PDF" explicitly in the data model. The agent should review the final recipient list at send time.
**Warning signs:**
- The completion handler derives email recipients from `documents.emailAddresses` rather than `signingTokens.signerEmail`.
**Phase to address:** Multi-signer send phase — establish the recipient derivation rule before tokens are issued.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
### Pitfall 22: Signing Token Issued But Document Re-Prepared — Token Points to Stale PDF
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**What goes wrong:**
v1.1 introduced a guard: Draft-only documents can be AI-prepared (`ai-prepare/route.ts` line 37: `if (doc.status !== 'Draft') return 403`). But `prepare/route.ts` (which calls `preparePdf` and writes `_prepared.pdf`) does not have an equivalent guard — a Sent document can be re-prepared if the agent POST to `/api/documents/{id}/prepare` directly. With multi-signer, if any token has been issued (even if no signer has used it), re-preparing the document overwrites `_prepared.pdf` and changes `preparedFilePath`. Signers who have already received their token will open the signing page and load the new prepared PDF — which may have different text fills, field positions, or the agent's new signature — not what was legally sent to them.
**How to avoid:**
Add a guard to `prepare/route.ts`: if `signingTokens` has any row for this document with `usedAt IS NULL` (any token still outstanding), reject the prepare request with `409 Conflict: "Cannot re-prepare a document with outstanding signing tokens."` If the agent genuinely needs to change the document, they must first void all outstanding tokens (supersede them) and issue new ones.
**Warning signs:**
- `prepare/route.ts` has no check against the `signingTokens` table before writing `_prepared.pdf`.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Multi-signer send phase — add the outstanding-token guard to the prepare route before multi-signer send is implemented.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
### Pitfall 23: @vercel/blob Is Installed But Not Used — Risk of Accidental Use
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**What goes wrong:**
`package.json` lists `@vercel/blob` as a production dependency. No file in the codebase imports or uses it. The package provides a Vercel-hosted blob storage client that requires `BLOB_READ_WRITE_TOKEN` to be set in the environment. If any future code accidentally imports from `@vercel/blob` instead of using the local filesystem path utilities, it will silently fail in Docker (no `BLOB_READ_WRITE_TOKEN` in a non-Vercel environment) and would route file storage through Vercel's infrastructure rather than the local volume, breaking the signed PDF storage entirely.
**Why it happens:**
`@vercel/blob` may have been installed during initial scaffolding when Vercel deployment was considered. It was never wired up. Its presence in `package.json` is a footgun.
**How to avoid:**
Remove `@vercel/blob` from `package.json` and run `npm install` before building the Docker image. If Vercel deployment is ever considered in the future, re-add it intentionally with a clear decision to migrate storage. Until then, its presence is a liability.
**Warning signs:**
- `@vercel/blob` appears in `package.json` dependencies but `grep -r "@vercel/blob"` finds no usage in `src/`.
- Any new code imports from `@vercel/blob` without an explicit architectural decision to use it.
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
**Phase to address:** Docker deployment phase — remove the unused dependency before building the production image.
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
## Prevention Checklist
Group by phase for the roadmap planner.
### Multi-Signer Schema Phase
- [ ] Add `signerEmail TEXT NOT NULL` to `signingTokens` (with backfill migration for v1.1 rows)
- [ ] Add `viewedAt TIMESTAMP` to `signingTokens`
- [ ] Add `completionTriggeredAt TIMESTAMP` to `documents`
- [ ] Add `Partially Signed` to `documentStatusEnum` or compute from token states
- [ ] Freeze `signatureFields` JSONB after tokens are issued (document invariant, enforced in prepare route)
- [ ] Document the invariant: `signingTokens.signerEmail` is the source of truth for recipient list
### Multi-Signer Send Phase
- [ ] Wrap all token creation in a single DB transaction; send emails after commit
- [ ] Add outstanding-token guard to `prepare/route.ts` (409 if any unclaimed token exists)
- [ ] Derive final PDF recipient list from `signingTokens.signerEmail`, not `emailAddresses`
- [ ] Add rate limiting to signing GET and POST endpoints
### Multi-Signer Signing Flow Phase
- [ ] Filter `signatureFields` by `field.signerEmail === tokenRow.signerEmail` in signing GET
- [ ] Validate submitted field IDs against signer's assigned fields in signing POST
- [ ] Include `signerEmail` in `signature_submitted` audit event metadata
- [ ] Completion detection: count unclaimed tokens in same transaction as token claim
### Multi-Signer Completion Phase
- [ ] Race condition guard: `UPDATE documents SET completion_triggered_at = NOW() WHERE completion_triggered_at IS NULL`
- [ ] Assemble final PDF in one pass from prepared PDF base (not by merging two separately-signed files)
- [ ] Set `signedFilePath` only after successful atomic rename of final assembled PDF
- [ ] Compute `pdfHash` only from final assembled PDF
- [ ] Clean up intermediate `_partial_*.pdf` files after successful assembly
- [ ] Add periodic orphaned-temp-file cleanup
### Docker Deployment Phase
- [ ] Rename `NEXT_PUBLIC_BASE_URL``SIGNING_BASE_URL` (server-only var, no NEXT_PUBLIC_ prefix)
- [ ] Audit all remaining `NEXT_PUBLIC_*` usages — confirm each one genuinely needs browser access
- [ ] Mount named Docker volume at `process.cwd() + '/uploads'` (verify path inside container first)
- [ ] Create `.env.production` on Docker host with all required secrets; reference in `docker-compose.yml`
- [ ] Add `CONTACT_SMTP_PORT` as required env var; remove fallback defaults from both mailers
- [ ] Consolidate SMTP transporter into a shared `createSmtpTransporter()` utility
- [ ] Add PostgreSQL `healthcheck` + app `depends_on: condition: service_healthy`
- [ ] Add Drizzle migration to container startup (before `next start`)
- [ ] Add `/api/health` endpoint that runs `SELECT 1` + checks `DATABASE_URL` + checks `CONTACT_SMTP_HOST`
- [ ] Verify SMTP connectivity from inside container before first production deploy
- [ ] Configure `postgres(url, { max: 5, idle_timeout: 20, connect_timeout: 10 })` for Neon free tier
- [ ] Build Docker image with `--platform linux/amd64` when deploying to x86_64 Linux
- [ ] Use `node:20-slim` (Debian glibc) as base image — not `node:alpine` (musl)
- [ ] Verify `@napi-rs/canvas` loads in container: `node -e "require('@napi-rs/canvas')"`
- [ ] Remove `@vercel/blob` from `package.json` dependencies
### Verification (Do Not Skip)
- [ ] Test a two-signer document where both signers submit within 1 second of each other — confirm one PDF, one notification, one `signedAt`
- [ ] Restart the Docker container and confirm all previously-uploaded PDFs are still accessible
- [ ] Confirm clicking a signing link emailed from Docker opens the correct production URL (not localhost)
- [ ] Confirm `docker exec <container> printenv CONTACT_SMTP_HOST` returns the expected value
- [ ] Test a v1.1 (single-signer) document after migration — confirm existing tokens still work
- [ ] Confirm Neon connection count stays below 7 during normal usage (check Neon dashboard)
- [ ] Confirm canvas module loads: `docker exec <container> node -e "require('@napi-rs/canvas'); console.log('OK')"`
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
---
2026-03-19 11:50:51 -06:00
2026-04-03 14:47:06 -06:00
## Phase-Specific Warning Summary
| Phase Topic | Likely Pitfall | Mitigation |
|-------------|---------------|------------|
| signingTokens schema change | NOT NULL constraint breaks existing token rows | Backfill migration with client email JOIN |
| Multi-signer send loop | Partial email failure orphans tokens | Transactional token creation, separate from email sends |
| Completion detection | First signer marks document Signed | Count unclaimed tokens inside transaction before marking |
| Concurrent completion | Two handlers both run final assembly | `completionTriggeredAt` one-time-set guard |
| Docker build | NEXT_PUBLIC_BASE_URL baked into bundle | Remove NEXT_PUBLIC_ prefix for server-only URL |
| Docker volumes | Uploads lost on container recreate | Named volume mounted at uploads path |
| Docker secrets | SMTP env vars absent in container | env_file in compose, verify with printenv |
| PostgreSQL startup | App queries before DB is ready | service_healthy depends_on + pg_isready healthcheck |
| Neon connection pool | Default 10 connections saturates free tier | Set max: 5 with idle_timeout and connect_timeout |
| Native module in Docker | @napi-rs/canvas wrong platform binary | --platform linux/amd64 + node:20-slim base image |
| Unused dependency | @vercel/blob accidentally used in new code | Remove from package.json before Docker build |
| Final PDF assembly | Signer PDFs assembled by merging two separate files | Single-pass assembly from prepared PDF base |
| Signer identity in audit | Two signature_submitted events indistinguishable | signerEmail in audit event metadata |
2026-03-19 11:50:51 -06:00
---
## Sources
2026-04-03 14:47:06 -06:00
- Reviewed `src/lib/db/schema.ts` — confirmed `signingTokens` has no `signerEmail`; `documentStatusEnum` has no partial state; `SignatureFieldData` has no signer tag; 2026-04-03
- Reviewed `src/app/api/sign/[token]/route.ts` — confirmed completion marks document Signed unconditionally at line 254; confirmed `isClientVisibleField` filter at line 90; confirmed `signableFields` filter does not restrict by signer at lines 210-213
- Reviewed `src/app/api/documents/[id]/send/route.ts` — confirmed single token creation, single recipient
- Reviewed `src/app/api/documents/[id]/prepare/route.ts` — confirmed no guard against re-preparation of Sent documents
- Reviewed `src/lib/signing/signing-mailer.tsx` — confirmed `createTransporter()` per send (healthy), confirmed `CONTACT_SMTP_PORT` defaults differ from `contact-mailer.ts`
- Reviewed `src/lib/signing/token.ts` — confirmed `crypto.randomUUID()` JTI generation (sufficient entropy)
- Reviewed `src/lib/signing/embed-signature.ts` — confirmed atomic rename pattern (`tmp → final`)
- Reviewed `src/lib/db/index.ts` — confirmed `postgres(url)` with no `max` parameter; Proxy singleton pattern; lazy initialization
- Reviewed `next.config.ts` — confirmed `serverExternalPackages: ['@napi-rs/canvas']`
- Reviewed `package.json` — confirmed `@vercel/blob` present in dependencies; confirmed `postgres` npm package in use; confirmed `node:` not specified in package engines
- [Next.js Environment Variables — Build-time vs Runtime](https://nextjs.org/docs/app/building-your-application/configuring/environment-variables) — NEXT_PUBLIC_ vars inlined at build time; confirmed in Next.js 15 docs
- [Docker Compose healthcheck + depends_on](https://docs.docker.com/compose/how-tos/startup-order/) — `service_healthy` condition requires explicit healthcheck definition
- [Nodemailer: SMTP port and TLS](https://nodemailer.com/smtp/) — port 465 = implicit TLS (`secure: true`), port 587 = STARTTLS (`secure: false`); mismatch causes connection timeout
- [postgres npm package documentation](https://github.com/porsager/postgres) — default `max: 10` connections per client instance; `idle_timeout` and `connect_timeout` options
- [Neon connection limits](https://neon.tech/docs/introduction/plans) — free tier: 10 concurrent connections; paid tiers increase this
- [@napi-rs/canvas supported platforms](https://github.com/Brooooooklyn/canvas#support-matrix) — no musl (Alpine) builds published; requires glibc (Debian/Ubuntu) base image
2026-03-19 11:50:51 -06:00
---
2026-04-03 14:47:06 -06:00
*Pitfalls research for: Teressa Copeland Homes — v1.2 multi-signer and Docker deployment*
*Researched: 2026-04-03*
*Previous v1.1 pitfalls (AI field placement, expanded field types, agent signing, filled preview) documented in git history — superseded by this file for v1.2 planning. The v1.1 pitfalls are assumed addressed; recovery strategies from that document remain valid if regressions occur.*