Files

Chandler Copeland 622ca3dc21 docs: complete project research

2026-04-03 14:47:06 -06:00

50 KiB

Raw Blame History

Pitfalls Research

Domain: Real estate broker web app — v1.2 additions: multi-signer support and Docker production deployment Researched: 2026-04-03 Confidence: HIGH — all pitfalls grounded in the v1.1 codebase reviewed directly; no speculative claims. Source code line references included throughout.

Context: What v1.2 Is Adding to the Existing System

The v1.1 codebase has been reviewed in full. Key facts that make every pitfall below concrete:

signingTokens table has one row per document, no signerEmail column. One token = one signer = current architecture.
SignatureFieldData (schema.ts) stores { id, page, x, y, width, height, type? } — no signerEmail field. All fields belong to the single signer.
send/route.ts calls createSigningToken(doc.id) once and emails client.email. Multi-signer needs iteration.
documents.status enum is Draft | Sent | Viewed | Signed. No per-signer completion state exists.
POST /api/sign/[token] marks documents.status = 'Signed' when its one token is claimed. With multiple signers, the first signer to complete will trigger this transition prematurely.
PDF files live at process.cwd() + '/uploads' — a local filesystem path. Docker containers have ephemeral filesystems by default.
NEXT_PUBLIC_BASE_URL is used to construct signing URLs. Variables prefixed NEXT_PUBLIC_ are inlined at build time in Next.js, not resolved at container startup.
Nodemailer transporter in signing-mailer.tsx calls createTransporter() per send — healthy pattern, but reads CONTACT_SMTP_HOST at call time, which only works if the env var is present in the container.
src/lib/db/index.ts uses postgres(url) with no explicit max connection limit. In Docker, the postgres npm package defaults to 10 connections per instance. Against Neon, the free tier allows 10 concurrent connections total — one container saturates this budget entirely.
next.config.ts declares serverExternalPackages: ['@napi-rs/canvas']. This native binary must be present in the Docker image. The package ships platform-specific .node files selected by npm at install time. If the Docker image is built on ARM (Apple Silicon) and run on x86_64 Linux, the wrong binary is included.
package.json lists @vercel/blob as a production dependency. It is not used anywhere in the codebase. Its presence creates a risk of accidental use in future code that would break in a non-Vercel Docker deployment.

Summary

Eight risk areas for v1.2:

Multi-signer completion detection — the current "first signer marks Signed" pattern will falsely complete documents.
Docker filesystem and env var — Next.js bakes NEXT_PUBLIC_* at build time; container loses uploads unless a volume is mounted; DATABASE_URL and SMTP secrets silently absent in container.
SMTP in Docker — not a DNS problem for external SMTP services, but env var injection failure is the confirmed root cause of the reported email breakage.
PDF assembly on partial completion — the final merged PDF must only be produced once, after all signers complete, without race conditions.
Token security — multiple tokens per document opens surfaces that a single-token system didn't have.
Neon connection pool exhaustion — postgres npm client's default 10 connections saturates Neon's free tier connection limit in a single container.
@napi-rs/canvas native binary — cross-platform Docker builds break this native module without explicit platform targeting.
@vercel/blob dead dependency — installed but unused; its presence risks accidental use in code that would silently fail outside Vercel.

Multi-Signer Pitfalls

Pitfall 1: First Signer Marks Document "Signed" — Completion Fires Prematurely

What goes wrong: POST /api/sign/[token] at line 254–263 of the current route unconditionally executes:

await db.update(documents).set({ status: 'Signed', signedAt: now, ... })
  .where(eq(documents.id, payload.documentId));

With two signers, Signer A completes and triggers this. The document is now Signed. Signer B's token is still valid, but when Signer B opens their signing page GET request, it checks doc.signatureFields filtered by isClientVisibleField. The document's fields are all there — nothing prevents Signer B from completing. Two signature_submitted audit events are logged for the same document, two conflicting _signed.pdf files may be written, and the agent receives two "document signed" emails. The final PDF hash stored in documents.pdfHash is from whichever signer completed last and overwrote the row.

Why it happens: The single-signer assumption is load-bearing in the POST handler. Completion detection is a single UPDATE, not a query across all tokens for the document.

How to avoid: Add a signerEmail TEXT NOT NULL column to signingTokens. Completion detection becomes: after claiming a token (the atomic UPDATE that prevents double-submission), query SELECT COUNT(*) FROM signing_tokens WHERE document_id = ? AND used_at IS NULL. If count reaches zero, all signers have completed — only then trigger final PDF assembly and agent notification. Protect this with a database transaction so the count query and the "mark Signed" update are atomic. Never set documents.status = 'Signed' until the zero-remaining-tokens check passes.

Warning signs:

POST /api/sign/[token] sets status = 'Signed' without first counting remaining unclaimed tokens.
Agent receives two notification emails after a two-signer document is tested.
documents.signedAt is overwritten by both signers (last-write-wins).

Phase to address: Multi-signer schema phase — before any send or signing UI is changed, establish the completion detection query.

Pitfall 2: Race Condition — Two Signers Complete Simultaneously, Both Trigger Final PDF Assembly

What goes wrong: Signer A and Signer B submit within milliseconds of each other (common if they are in the same room). Both claim their respective tokens atomically — that part works. Both then execute the "count remaining unclaimed tokens" check. If that check is not inside the same database transaction as the token claim, both reads may return 0 remaining (after the other's claim propagated), and both handlers proceed to assemble the final merged PDF simultaneously. Two concurrent writes to {docId}_signed.pdf corrupt the file (partial PDF bytes interleaved), or the second write silently overwrites the first.

Why it happens: The atomic token claim (UPDATE ... WHERE used_at IS NULL RETURNING) is a single row update. The subsequent completion check is a separate query. Two handlers can interleave between those two operations.

How to avoid: Use a completionTriggeredAt TIMESTAMP column on documents with a one-time-set guard:

const won = await db.update(documents)
  .set({ completionTriggeredAt: new Date() })
  .where(and(eq(documents.id, docId), isNull(documents.completionTriggeredAt)))
  .returning({ id: documents.id });
if (won.length === 0) return; // another handler already triggered completion
// proceed to final PDF assembly

This is the same pattern the existing token claim uses (UPDATE ... WHERE used_at IS NULL RETURNING). If 0 rows returned, another handler already won the race; skip assembly silently.

Warning signs:

Two concurrent POST requests for the same document produce two _signed.pdf files.
The documents table has no completionTriggeredAt column.

Phase to address: Multi-signer schema phase — establish this pattern alongside the completion detection fix.

Pitfall 3: Legacy Single-Signer Documents Break When signingTokens Gains signerEmail

What goes wrong: v1.0 and v1.1 documents have one row in signingTokens with no signerEmail. When the multi-signer schema adds signerEmail NOT NULL to signingTokens, all existing token rows become invalid (null violates NOT NULL). If the column is added without a migration that backfills existing rows, all existing signing links stop working: the token lookup succeeds but any code reading token.signerEmail throws a null dereference.

Why it happens: Drizzle migrations add the column in a single ALTER TABLE. There is no Drizzle migration command that backfills legacy data — that requires a separate SQL step in the migration file.

How to avoid: Add signerEmail as TEXT (nullable) initially. Backfill existing rows with the client's email via a JOIN at migration time. Then add the NOT NULL constraint in a second migration once backfill is confirmed. Alternatively, add signerEmail TEXT DEFAULT '' and document that empty string means "legacy single-signer." All code reading signerEmail must handle the legacy empty/null case.

Warning signs:

Drizzle migration adds signer_email TEXT NOT NULL in one step with no DEFAULT and no backfill SQL.
A v1.0 document's signing link is not tested after migration.

Phase to address: Multi-signer schema phase — include legacy backfill SQL in the migration script.

Pitfall 4: Field-to-Signer Tag Stored in JSONB — Queries Cannot Filter by Signer Efficiently

What goes wrong: signatureFields JSONB is an array of field objects. Adding signerEmail to each field object is the right call for field filtering in the signing page (already done via isClientVisibleField). But if the completion detection, status dashboard, or "who has signed" query tries to derive signer list from the JSONB array, it requires a Postgres JSONB containment query (@> or jsonb_array_elements). These are unindexed by default and slow on large arrays. More critically, if the agent changes a field's signerEmail tag after the document has been sent, the JSONB update does not cascade to any signingTokens rows — the token was issued for the old email.

How to avoid: The authoritative list of signers and their completion state lives in signingTokens, not in the JSONB. signingTokens.signerEmail is the source of truth for "who needs to sign." The JSONB field's signerEmail is used only at signing-page render time to filter which fields a given signer sees. Once a document is Sent (tokens issued), the JSONB field tags are considered frozen — re-tagging fields on a Sent document is not permitted without voiding the existing tokens.

Warning signs:

A query tries to derive the recipient list from signatureFields JSONB rather than from signingTokens.

Phase to address: Multi-signer schema phase — document this invariant in a code comment on signingTokens.

Pitfall 5: Audit Trail Gap — No Record of Which Signer Completed Which Field

What goes wrong: The current audit_events table has eventType: 'signature_submitted' at the document level. With one signer this is unambiguous. With two signers, two signature_submitted events are logged for the same documentId with no signerEmail on the event. The legal audit trail cannot distinguish "Seller A signed at 14:00" from "Seller B signed at 14:05" — both appear as anonymous "signature submitted" events on the same document.

Why this matters: Utah e-signature law requires proof of who signed what and when. An undifferentiated audit log is a legal compliance gap (see existing LEGAL-03 compliance requirement in v1.0).

How to avoid: Add signerEmail TEXT to auditEvents (nullable, to preserve backward compatibility with v1.0 events). When logging signature_submitted in multi-signer mode, include the signerEmail from the claimed token row in the event metadata. The metadata JSONB column already exists and can carry this without a schema change — use metadata: { signerEmail: tokenRow.signerEmail } as a minimum before a proper column is added.

Warning signs:

Two signature_submitted events logged for the same documentId with no distinguishing field.

Phase to address: Multi-signer signing flow phase — include signer identity in audit events before the first multi-signer document is tested.

Pitfall 6: Document Status "Viewed" Conflicts Across Signers

What goes wrong: The current GET /api/sign/[token] sets documents.status = 'Viewed' when any signer opens their link (line 81 of the current route). With two signers, Signer A opens the link → document becomes Viewed. Signer A backs out without signing. Signer B hasn't even opened their link yet. Agent sees "Viewed" status and assumes both signers have engaged. If Signer A then signs, status jumps from Viewed → Signed (via the POST handler), bypassing any intermediate state. The agent has no way to know that Signer B never opened their link.

How to avoid: Per-signer status belongs in signingTokens, not in documents. Add a viewedAt TIMESTAMP column to signingTokens. The GET handler sets signingTokens.viewedAt = NOW() for the specific token, not documents.status. The documents-level status becomes a computed aggregate: Draft → Sent (any token issued) → Partially Signed (some tokens usedAt set) → Signed (all tokens usedAt set). Consider adding Partially Signed to the documentStatusEnum, or compute it in the agent dashboard query.

Warning signs:

The signing GET handler writes documents.status = 'Viewed' instead of signingTokens.viewedAt = NOW().

Phase to address: Multi-signer schema phase — add viewedAt to signingTokens and derive document status from token states.

Docker/Deployment Pitfalls

Pitfall 7: NEXT_PUBLIC_BASE_URL Is Baked at Build Time — Wrong URL in Production Container

What goes wrong: send/route.ts line 35 reads:

const baseUrl = process.env.NEXT_PUBLIC_BASE_URL ?? 'http://localhost:3000';

In Next.js, any variable prefixed NEXT_PUBLIC_ is substituted at next build time — it becomes a string literal in the compiled JavaScript bundle. If the Docker image is built with NEXT_PUBLIC_BASE_URL=http://localhost:3000 (or not set at all), every signing URL emailed to clients will point to localhost:3000 regardless of what is set in the container's runtime environment. The client clicks the link and gets "connection refused."

This is specific to NEXT_PUBLIC_* variables. Server-only variables (no NEXT_PUBLIC_ prefix) ARE read at runtime from the container environment. Mixing the two causes precisely the confusion reported in this project.

How to avoid: For variables that need to be available on the server only (like BASE_URL for constructing server-side URLs), remove the NEXT_PUBLIC_ prefix. NEXT_PUBLIC_ should only be used for variables that need to reach the browser bundle. The signing URL is constructed in a server-side API route — it does not need NEXT_PUBLIC_. Rename to SIGNING_BASE_URL (no prefix), read it only in API routes, and inject it into the container environment at runtime via Docker Compose environment: block.

Warning signs:

Signing emails send but clicking the link shows a browser connection error or goes to localhost.
NEXT_PUBLIC_BASE_URL is set in docker-compose.yml under environment: and the developer assumes this is sufficient — it is not, because the value was already baked in during docker build.

Phase to address: Docker deployment phase — rename the variable and audit all NEXT_PUBLIC_ usages before building the production image.

Pitfall 8: Uploads Directory Is Lost on Container Restart

What goes wrong: All uploaded PDFs, prepared PDFs, and signed PDFs are written to process.cwd() + '/uploads'. In the Docker container, process.cwd() is the directory where Next.js starts — typically /app. The path /app/uploads is inside the container's writable layer, which is ephemeral. When the container is stopped and recreated (deployment, crash, docker compose up --force-recreate), all PDFs are gone. Signed documents that were legally executed are permanently lost. Clients cannot download their signed copies. The agent loses the audit record.

How to avoid: Mount a named Docker volume at /app/uploads (or whatever process.cwd() resolves to in the container) in docker-compose.yml:

services:
  app:
    volumes:
      - uploads_data:/app/uploads
volumes:
  uploads_data:

Verify the mount path matches process.cwd() inside the container — do not assume it is /app. Run docker exec <container> node -e "console.log(process.cwd())" to confirm. The volume must also be backed up separately; Docker named volumes are not automatically backed up.

Warning signs:

No volumes: key appears in docker-compose.yml for the app service.
After a container restart, the agent portal shows documents with no downloadable PDF (the file path in the DB is valid but the file does not exist on disk).

Phase to address: Docker deployment phase — establish the volume before any production upload occurs.

Pitfall 9: Database Connection String Absent in Container — App Boots but All Queries Fail

What goes wrong: DATABASE_URL and other secrets (SIGNING_JWT_SECRET, CONTACT_SMTP_HOST, etc.) are not committed to the repository. In development they are in .env.local. In a Docker container, .env.local is not automatically copied (.gitignore typically excludes it, and COPY . . in a Dockerfile may or may not include it depending on .dockerignore). If the Docker image is built without the secret baked in (correct practice) but the docker-compose.yml does not inject it via environment: or env_file:, the container starts successfully — next start does not validate env vars at startup — but every database query throws "missing connection string" at request time. The agent portal loads its login page (server components that don't query the DB) but crashes on any data operation.

The src/lib/db/index.ts lazy singleton does throw "DATABASE_URL environment variable is not set" when first accessed — but this error is silent at startup and only surfaces at first request.

How to avoid: Create a .env.production file (not committed) that is referenced in docker-compose.yml via env_file: .env.production. Alternatively, use Docker Compose environment: blocks with explicit variable names. Validate at container startup by adding a health check endpoint (/api/health) that runs SELECT 1 against the database and returns 200 only when the connection is live. Gate the container's healthcheck: on this endpoint so Docker Compose's depends_on: condition: service_healthy prevents the app from accepting traffic before the DB is reachable.

Warning signs:

The login page loads in Docker but the agent portal shows 500 errors on every page.
docker logs <container> shows "Environment variable DATABASE_URL is not set" at the first request, not at startup.
The .env.production or secrets file is not referenced anywhere in docker-compose.yml.

Phase to address: Docker deployment phase — validate all required env vars against a checklist before the first production deploy.

Pitfall 10: PostgreSQL Container and App Container Start in Wrong Order — DB Not Ready

What goes wrong: docker compose up starts all services in parallel by default. The Next.js app container may attempt its first database query before PostgreSQL has accepted connections. Drizzle's postgres client (using the postgres npm package) throws ECONNREFUSED or ENOTFOUND on the first query. The app container may crash-loop if the error is unhandled at startup, or silently return 500s until the DB is ready if queries are only made at request time.

How to avoid: Add depends_on with condition: service_healthy in docker-compose.yml. The PostgreSQL service needs a healthcheck: using pg_isready:

services:
  db:
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5
  app:
    depends_on:
      db:
        condition: service_healthy

Also run Drizzle migrations as part of app startup (add drizzle-kit migrate to the container's command: or an entrypoint script) so the schema is applied before the first request. Without this, a fresh deployment against an empty database will fail on every query.

Warning signs:

docker-compose.yml has no healthcheck: on the database service.
docker-compose.yml has no depends_on on the app service.

Phase to address: Docker deployment phase — write the complete docker-compose.yml with health checks before the first production deploy.

Pitfall 11: Neon Connection Pool Exhaustion in Docker

What goes wrong: src/lib/db/index.ts creates a postgres(url) client with no explicit max parameter. The postgres npm package defaults to max: 10 connections per process. Neon's free tier allows 10 concurrent connections total. One Next.js container with default settings exhausts the entire connection budget. A second container (staging + production running simultaneously, or a restart overlap) causes all new queries to queue indefinitely until connections are freed, manifesting as timeouts on every request.

Additionally, the current proxy-singleton pattern in db/index.ts creates one pool per Node.js process. Next.js in development mode can hot-reload modules, creating multiple pool instances per dev session. In production this is not a problem, but it can silently leak connections during CI test runs or development stress tests.

Why it happens: The postgres npm package does not warn when connection limits are exceeded — it silently queues queries. The Neon dashboard shows connection count; the app shows only request timeouts with no clear error.

How to avoid: Set an explicit max connection limit appropriate for the deployment. For a single-container deployment against Neon free tier (10 connection limit), use postgres(url, { max: 5 }) to leave headroom for migrations, admin queries, and overlap during deployments. For paid Neon tiers, scale accordingly. Add idle_timeout: 20 (seconds) to release idle connections promptly. Add connect_timeout: 10 to surface connection failures quickly rather than queuing indefinitely.

Recommended db/index.ts configuration:

const client = postgres(url, {
  max: 5,           // conservative for Neon free tier; increase with paid plan
  idle_timeout: 20, // release idle connections within 20s
  connect_timeout: 10, // fail fast if Neon is unreachable
});

Warning signs:

postgres(url) called with no second argument in db/index.ts.
Neon dashboard shows connection count at ceiling during normal single-user usage.
Requests time out with no database error in logs — only generic "fetch failed" errors.

Phase to address: Docker deployment phase — configure connection pool limits before the first production deploy.

Pitfall 12: @napi-rs/canvas Native Binary — Wrong Platform in Docker Image

What goes wrong: @napi-rs/canvas is declared in serverExternalPackages in next.config.ts, which tells Next.js to load it as a native Node.js module rather than bundling it. The package ships pre-compiled .node binary files for specific platforms (darwin-arm64, linux-x64-gnu, linux-arm64-gnu, etc.). When npm install runs on an Apple Silicon Mac during development, npm downloads the darwin-arm64 binary. If the Docker image is built by running npm install inside a node:alpine container (which is linux-musl, not linux-gnu), the linux-x64-musl binary is selected — but @napi-rs/canvas does not publish musl builds. The canvas module fails to load at runtime with Error: /app/node_modules/@napi-rs/canvas/...node: invalid ELF header.

Even if the Docker base image is node:20-slim (Debian, linux-gnu), building on an ARM host and deploying to an x86 server results in the wrong binary unless the --platform flag is used during docker build.

How to avoid: Always build the Docker image with an explicit platform target matching the production host:

docker build --platform linux/amd64 -t app .

Use node:20-slim (Debian-based, glibc) as the Docker base image — not node:20-alpine (musl). Verify the canvas module loads in the container before deploying:

docker exec <container> node -e "require('@napi-rs/canvas'); console.log('canvas OK')"

If developing on ARM and deploying to x86, add --platform linux/amd64 to the docker build command in the deployment runbook and CI pipeline.

Warning signs:

next.config.ts lists @napi-rs/canvas in serverExternalPackages.
Docker base image is node:alpine.
The build machine architecture differs from the deployment target.
Runtime error: invalid ELF header or Cannot find module '@napi-rs/canvas' after a clean image build.

Phase to address: Docker deployment phase — verify canvas module compatibility before the first production build.

Email/SMTP Pitfalls

Pitfall 13: SMTP Env Vars Absent in Container — Root Cause of Reported Email Breakage

What goes wrong: This is the reported issue: email worked in development but broke when deployed to Docker. The most likely root cause is that CONTACT_SMTP_HOST, CONTACT_SMTP_PORT, CONTACT_EMAIL_USER, CONTACT_EMAIL_PASS, and AGENT_EMAIL are not present in the container environment. signing-mailer.tsx reads these in createTransporter() which is called at send time (not at module load) — so the missing env vars do not cause a startup error. The first signing email attempt fails with Nodemailer throwing connect ECONNREFUSED (if host resolves to nothing) or Invalid login (if credentials are absent).

Why it looks like a DNS problem but isn't: Docker containers on a bridge network use the host's DNS resolver (or Docker's embedded resolver) and can reach external SMTP servers by hostname without any special configuration. The SMTP server (CONTACT_SMTP_HOST) is an external service (e.g., Mailgun, SendGrid, or a personal SMTP relay) — Docker does not change its reachability. The error is env var injection failure, not DNS.

Verification steps before attempting the Docker fix:

docker exec <container> printenv CONTACT_SMTP_HOST — if empty, the env var is missing.
docker exec <container> node -e "const n = require('nodemailer'); n.createTransport({host: process.env.CONTACT_SMTP_HOST, port: 465, secure: true, auth: {user: process.env.CONTACT_EMAIL_USER, pass: process.env.CONTACT_EMAIL_PASS}}).verify(console.log)" — tests SMTP connectivity from inside the container.

How to avoid: Include all SMTP variables in the env_file: or environment: block of the app service in docker-compose.yml. Use an .env.production file that is manually provisioned on the Docker host (not committed). Consider using Docker secrets (mounted files) for the SMTP password rather than environment variables if the host is shared.

Warning signs:

docker exec <container> printenv CONTACT_SMTP_HOST returns empty.
Signing emails silently fail with no error until first send attempt.

Phase to address: Docker deployment phase — SMTP env var verification is the first check in the deployment runbook.

Pitfall 14: Nodemailer Transporter Created With Mismatched Port and TLS Settings

What goes wrong: signing-mailer.tsx contains:

port: Number(process.env.CONTACT_SMTP_PORT ?? 465),
secure: Number(process.env.CONTACT_SMTP_PORT ?? 465) === 465,

contact-mailer.ts contains:

port: Number(process.env.CONTACT_SMTP_PORT ?? 587),
secure: false, // STARTTLS on port 587

The two mailers use different defaults for the same env var. If CONTACT_SMTP_PORT is not set in the container, the signing mailer assumes port 465 (TLS), but the contact form mailer assumes port 587 (STARTTLS). If the SMTP provider only supports one of these, one mailer will connect and the other will time out. The mismatch is invisible until both code paths are exercised in production.

How to avoid: Require CONTACT_SMTP_PORT explicitly — remove the fallback defaults and add a startup validation check that throws if this variable is missing. Use a single createSmtpTransporter() utility function shared by both mailers, not two separate inline createTransport() calls with different defaults. Document the required env var values in a DEPLOYMENT.md or the docker-compose.yml comments.

Warning signs:

Two separate inline createTransport() calls with different port defaults for the same env var.
Only one of the two email paths (signing email vs. contact form) is tested in Docker.

Phase to address: Docker deployment phase — consolidate SMTP transporter creation before the first production email test.

Pitfall 15: Multi-Signer Email Loop Fails Halfway — No Partial-Send Recovery

What goes wrong: When sending to three signers, the send route will loop: create token 1, email Signer 1, create token 2, email Signer 2, create token 3, email Signer 3. If email to Signer 2 fails (SMTP timeout, invalid address), tokens 1 and 3 may still be created in the database but Signer 3 never receives their email. The document is now in an inconsistent state: tokens exist for recipients who were never emailed. Signer 1 signs, completion detection counts 2 remaining unclaimed tokens (Signers 2 and 3 never signed), document never reaches "Signed."

How to avoid: Create all tokens before sending any emails. Wrap token creation in a transaction — if any token INSERT fails, roll back all tokens and return an error before any emails are sent. Send emails outside the transaction (SMTP is not transactional). If an email send fails, mark that token as superseded (add a supersededAt column to signingTokens) rather than deleting it, and surface the partial-send failure to the agent with a "resend to failed recipients" option. Never leave unclaimed tokens orphaned by partial email failure.

Warning signs:

The send loop interleaves token creation and email sending (create token 1, send email 1, create token 2, send email 2...) rather than creating all tokens atomically first.

Phase to address: Multi-signer send phase — design the send loop with transactional token creation from the start.

PDF Assembly Pitfalls

Pitfall 16: Final PDF Assembly Runs Multiple Times — Duplicate Signed PDFs

What goes wrong: Completion detection triggers PDF assembly (merging all signer contributions into one final PDF). If the race condition guard (Pitfall 2) is not in place, assembly runs twice. Even with the guard, if the assembly function crashes partway through and the completionTriggeredAt was already set, there is no way to retry assembly — the guard prevents re-entry and the document is stuck with no signed PDF.

How to avoid: Separate the "completion triggered" flag from the "signed PDF ready" flag. Add both completionTriggeredAt TIMESTAMP (prevents double-triggering) and signedFilePath TEXT (set only when PDF is successfully written). If completionTriggeredAt is set but signedFilePath is null after 60 seconds, an admin retry endpoint can reset completionTriggeredAt to null to allow re-triggering. The existing atomic rename pattern (tmp → final) in embed-signature.ts already prevents partial PDF corruption — preserve this in the multi-signer assembly code.

Warning signs:

Only a single flag (completionTriggeredAt) is used to track both triggering and completion.
No retry mechanism exists for a stuck assembly.

Phase to address: Multi-signer completion phase — implement idempotent assembly with separate trigger and completion flags.

Pitfall 17: Multi-Signer Final PDF — Which Prepared PDF Is the Base?

What goes wrong: In the current single-signer flow, embedSignatureInPdf reads from doc.preparedFilePath (the agent-prepared PDF with text fills and agent signatures already embedded) and writes to _signed.pdf. With multiple signers, each signer's signature needs to be embedded sequentially onto the same prepared PDF base. If two handlers run concurrently and both read from preparedFilePath, modify it in memory, and write independent output PDFs, the final "merge" step needs a different strategy — you cannot simply append two separately-signed PDFs into one document without losing the shared base.

How to avoid: The correct architecture for multi-signer PDF assembly:

Each signer's POST handler embeds only that signer's signatures into an intermediate file: {docId}_partial_{signerEmail_hash}.pdf. This intermediate file is written atomically (tmp → rename). It is NOT the final document.
When completion is triggered (all tokens claimed), a single assembly function reads the prepared PDF once, iterates all signers' signature data (from DB or intermediate files), embeds all signatures in one pass, and writes {docId}_signed.pdf.
The pdfHash is computed only from the final assembled PDF, not from any intermediate.

This avoids the read-modify-write race entirely. Intermediate files are cleaned up after successful final assembly.

Warning signs:

Each signer's POST handler directly writes to _signed.pdf rather than an intermediate file.
The final assembly step reads from two separately-signed PDF files and tries to merge them.

Phase to address: Multi-signer completion phase — establish the intermediate file pattern before any signing submission code is written.

Pitfall 18: Temp File Accumulation on Failed Assemblies

What goes wrong: The current code already creates a temp file during date stamping (preparedAbsPath.datestamped.tmp) and cleans it up with unlink().catch(() => {}). Multi-signer assembly will create intermediate partial files. If the assembly handler crashes between writing intermediates and producing the final PDF, those temp files are never cleaned up. Over time, the uploads/ directory fills with orphaned intermediate files. On the home Docker server with limited disk, this causes write failures on new documents.

How to avoid: Name all intermediate and temp files with a recognizable pattern (*.tmp, *_partial_*.pdf). Add a periodic cleanup job (a Next.js route called by a cron or a simple setInterval in a route handler) that deletes *.tmp and *_partial_*.pdf files older than 24 hours. Log a warning when cleanup finds orphaned files — this surfaces incomplete assemblies that need investigation.

Warning signs:

The uploads/ directory grows unbounded over time.
Partial files from failed assemblies remain after a document is marked Signed.

Phase to address: Multi-signer completion phase — add cleanup alongside the assembly logic.

Security Pitfalls

Pitfall 19: Multiple Tokens Per Document — Token Enumeration Attack

What goes wrong: In the single-signer system, one token is issued per document. An attacker who intercepts or guesses a token can sign one document. With multi-signer, multiple tokens are issued for the same document. If token generation uses a predictable pattern (e.g., sequential IDs, short UUIDs, or low-entropy random values), an attacker who holds one valid token for a document can enumerate sibling tokens for the same document by brute-forcing nearby values.

Current state: createSigningToken uses crypto.randomUUID() for the JTI and SignJWT with HS256. UUID v4 provides 122 bits of randomness — sufficient. The risk is theoretical given current implementation but becomes concrete if the JTI generation is ever changed.

How to avoid: Keep using crypto.randomUUID() for JTI. Do not add any sequential or human-readable component to the JTI. Ensure the JWT is verified before the JTI is looked up in the database — verifySigningToken() already does this (JWT signature check first, then DB lookup). Add rate limiting on the signing GET and POST endpoints: MAX 10 requests per IP per minute prevents brute force. Log and alert on status: 'invalid' responses that repeat from the same IP.

Warning signs:

JTI generation switches from crypto.randomUUID() to a sequential or short-UUID pattern.
No rate limiting exists on /api/sign/[token] GET or POST.

Phase to address: Multi-signer send phase — add rate limiting before issuing multiple tokens per document.

Pitfall 20: Token Shared Between Signers — Signer A Uses Signer B's Token

What goes wrong: With multi-signer, the system issues separate tokens per signer email. But the signing GET handler at line 90 currently returns ALL client-visible fields (filtered by isClientVisibleField), not fields tagged to the specific signer. If Signer A somehow obtains Signer B's token (e.g., email forward, shared email account, phishing), Signer A sees Signer B's fields and can sign them. In real estate, this is equivalent to signing another party's name on a contract — a serious legal issue.

The signing POST handler (lines 210-213) filters signableFields to all client-signature and initials fields for the entire document — it does not restrict by signer. A cross-token submission would succeed server-side.

How to avoid: After multi-signer is implemented, the signing GET handler must filter signatureFields by field.signerEmail === tokenRow.signerEmail. The signing POST handler must verify that the field IDs in the signatures request body correspond only to fields tagged to tokenRow.signerEmail — reject any submission that includes field IDs not assigned to that signer. This is a server-side enforcement, not a UI concern.

Warning signs:

The signing GET handler's signatureFields filter does not include a signerEmail check.
The signing POST handler's signableFields filter does not restrict by signerEmail.

Phase to address: Multi-signer signing flow phase — add signer-field binding validation to both GET and POST handlers.

Pitfall 21: Completion Notification Email Sent to Wrong Recipients

What goes wrong: The current sendAgentNotificationEmail sends to process.env.AGENT_EMAIL. In multi-signer, the requirement is to send the final merged PDF to all signers AND the agent when completion occurs. If the recipient list is derived from documents.emailAddresses (the JSONB array collected at prepare time), and that array is stale (e.g., the agent changed a signer's email between prepare and send), the final PDF goes to the old address.

A worse variant: if emailAddresses contains CC addresses that are NOT signers (e.g., a title company contact), those recipients receive the completed PDF immediately — before the agent has reviewed it. For a solo agent workflow, this is likely acceptable, but it should be explicit.

How to avoid: Derive the final recipient list from signingTokens.signerEmail (the authoritative record of who was actually sent a token), not from documents.emailAddresses. Separate "recipients who receive the signing link" from "recipients who receive the completed PDF" explicitly in the data model. The agent should review the final recipient list at send time.

Warning signs:

The completion handler derives email recipients from documents.emailAddresses rather than signingTokens.signerEmail.

Phase to address: Multi-signer send phase — establish the recipient derivation rule before tokens are issued.

Pitfall 22: Signing Token Issued But Document Re-Prepared — Token Points to Stale PDF

What goes wrong: v1.1 introduced a guard: Draft-only documents can be AI-prepared (ai-prepare/route.ts line 37: if (doc.status !== 'Draft') return 403). But prepare/route.ts (which calls preparePdf and writes _prepared.pdf) does not have an equivalent guard — a Sent document can be re-prepared if the agent POST to /api/documents/{id}/prepare directly. With multi-signer, if any token has been issued (even if no signer has used it), re-preparing the document overwrites _prepared.pdf and changes preparedFilePath. Signers who have already received their token will open the signing page and load the new prepared PDF — which may have different text fills, field positions, or the agent's new signature — not what was legally sent to them.

How to avoid: Add a guard to prepare/route.ts: if signingTokens has any row for this document with usedAt IS NULL (any token still outstanding), reject the prepare request with 409 Conflict: "Cannot re-prepare a document with outstanding signing tokens." If the agent genuinely needs to change the document, they must first void all outstanding tokens (supersede them) and issue new ones.

Warning signs:

prepare/route.ts has no check against the signingTokens table before writing _prepared.pdf.

Phase to address: Multi-signer send phase — add the outstanding-token guard to the prepare route before multi-signer send is implemented.

Pitfall 23: @vercel/blob Is Installed But Not Used — Risk of Accidental Use

What goes wrong: package.json lists @vercel/blob as a production dependency. No file in the codebase imports or uses it. The package provides a Vercel-hosted blob storage client that requires BLOB_READ_WRITE_TOKEN to be set in the environment. If any future code accidentally imports from @vercel/blob instead of using the local filesystem path utilities, it will silently fail in Docker (no BLOB_READ_WRITE_TOKEN in a non-Vercel environment) and would route file storage through Vercel's infrastructure rather than the local volume, breaking the signed PDF storage entirely.

Why it happens: @vercel/blob may have been installed during initial scaffolding when Vercel deployment was considered. It was never wired up. Its presence in package.json is a footgun.

How to avoid: Remove @vercel/blob from package.json and run npm install before building the Docker image. If Vercel deployment is ever considered in the future, re-add it intentionally with a clear decision to migrate storage. Until then, its presence is a liability.

Warning signs:

@vercel/blob appears in package.json dependencies but grep -r "@vercel/blob" finds no usage in src/.
Any new code imports from @vercel/blob without an explicit architectural decision to use it.

Phase to address: Docker deployment phase — remove the unused dependency before building the production image.

Prevention Checklist

Group by phase for the roadmap planner.

Multi-Signer Schema Phase

Add signerEmail TEXT NOT NULL to signingTokens (with backfill migration for v1.1 rows)
Add viewedAt TIMESTAMP to signingTokens
Add completionTriggeredAt TIMESTAMP to documents
Add Partially Signed to documentStatusEnum or compute from token states
Freeze signatureFields JSONB after tokens are issued (document invariant, enforced in prepare route)
Document the invariant: signingTokens.signerEmail is the source of truth for recipient list

Multi-Signer Send Phase

Wrap all token creation in a single DB transaction; send emails after commit
Add outstanding-token guard to prepare/route.ts (409 if any unclaimed token exists)
Derive final PDF recipient list from signingTokens.signerEmail, not emailAddresses
Add rate limiting to signing GET and POST endpoints

Multi-Signer Signing Flow Phase

Filter signatureFields by field.signerEmail === tokenRow.signerEmail in signing GET
Validate submitted field IDs against signer's assigned fields in signing POST
Include signerEmail in signature_submitted audit event metadata
Completion detection: count unclaimed tokens in same transaction as token claim

Multi-Signer Completion Phase

Race condition guard: UPDATE documents SET completion_triggered_at = NOW() WHERE completion_triggered_at IS NULL
Assemble final PDF in one pass from prepared PDF base (not by merging two separately-signed files)
Set signedFilePath only after successful atomic rename of final assembled PDF
Compute pdfHash only from final assembled PDF
Clean up intermediate _partial_*.pdf files after successful assembly
Add periodic orphaned-temp-file cleanup

Docker Deployment Phase

Rename NEXT_PUBLIC_BASE_URL → SIGNING_BASE_URL (server-only var, no NEXT_PUBLIC_ prefix)
Audit all remaining NEXT_PUBLIC_* usages — confirm each one genuinely needs browser access
Mount named Docker volume at process.cwd() + '/uploads' (verify path inside container first)
Create .env.production on Docker host with all required secrets; reference in docker-compose.yml
Add CONTACT_SMTP_PORT as required env var; remove fallback defaults from both mailers
Consolidate SMTP transporter into a shared createSmtpTransporter() utility
Add PostgreSQL healthcheck + app depends_on: condition: service_healthy
Add Drizzle migration to container startup (before next start)
Add /api/health endpoint that runs SELECT 1 + checks DATABASE_URL + checks CONTACT_SMTP_HOST
Verify SMTP connectivity from inside container before first production deploy
Configure postgres(url, { max: 5, idle_timeout: 20, connect_timeout: 10 }) for Neon free tier
Build Docker image with --platform linux/amd64 when deploying to x86_64 Linux
Use node:20-slim (Debian glibc) as base image — not node:alpine (musl)
Verify @napi-rs/canvas loads in container: node -e "require('@napi-rs/canvas')"
Remove @vercel/blob from package.json dependencies

Verification (Do Not Skip)

Test a two-signer document where both signers submit within 1 second of each other — confirm one PDF, one notification, one signedAt
Restart the Docker container and confirm all previously-uploaded PDFs are still accessible
Confirm clicking a signing link emailed from Docker opens the correct production URL (not localhost)
Confirm docker exec <container> printenv CONTACT_SMTP_HOST returns the expected value
Test a v1.1 (single-signer) document after migration — confirm existing tokens still work
Confirm Neon connection count stays below 7 during normal usage (check Neon dashboard)
Confirm canvas module loads: docker exec <container> node -e "require('@napi-rs/canvas'); console.log('OK')"

Phase-Specific Warning Summary

Phase Topic	Likely Pitfall	Mitigation
signingTokens schema change	NOT NULL constraint breaks existing token rows	Backfill migration with client email JOIN
Multi-signer send loop	Partial email failure orphans tokens	Transactional token creation, separate from email sends
Completion detection	First signer marks document Signed	Count unclaimed tokens inside transaction before marking
Concurrent completion	Two handlers both run final assembly	`completionTriggeredAt` one-time-set guard
Docker build	NEXT_PUBLIC_BASE_URL baked into bundle	Remove NEXT_PUBLIC_ prefix for server-only URL
Docker volumes	Uploads lost on container recreate	Named volume mounted at uploads path
Docker secrets	SMTP env vars absent in container	env_file in compose, verify with printenv
PostgreSQL startup	App queries before DB is ready	service_healthy depends_on + pg_isready healthcheck
Neon connection pool	Default 10 connections saturates free tier	Set max: 5 with idle_timeout and connect_timeout
Native module in Docker	@napi-rs/canvas wrong platform binary	--platform linux/amd64 + node:20-slim base image
Unused dependency	@vercel/blob accidentally used in new code	Remove from package.json before Docker build
Final PDF assembly	Signer PDFs assembled by merging two separate files	Single-pass assembly from prepared PDF base
Signer identity in audit	Two signature_submitted events indistinguishable	signerEmail in audit event metadata

Sources

Reviewed src/lib/db/schema.ts — confirmed signingTokens has no signerEmail; documentStatusEnum has no partial state; SignatureFieldData has no signer tag; 2026-04-03
Reviewed src/app/api/sign/[token]/route.ts — confirmed completion marks document Signed unconditionally at line 254; confirmed isClientVisibleField filter at line 90; confirmed signableFields filter does not restrict by signer at lines 210-213
Reviewed src/app/api/documents/[id]/send/route.ts — confirmed single token creation, single recipient
Reviewed src/app/api/documents/[id]/prepare/route.ts — confirmed no guard against re-preparation of Sent documents
Reviewed src/lib/signing/signing-mailer.tsx — confirmed createTransporter() per send (healthy), confirmed CONTACT_SMTP_PORT defaults differ from contact-mailer.ts
Reviewed src/lib/signing/token.ts — confirmed crypto.randomUUID() JTI generation (sufficient entropy)
Reviewed src/lib/signing/embed-signature.ts — confirmed atomic rename pattern (tmp → final)
Reviewed src/lib/db/index.ts — confirmed postgres(url) with no max parameter; Proxy singleton pattern; lazy initialization
Reviewed next.config.ts — confirmed serverExternalPackages: ['@napi-rs/canvas']
Reviewed package.json — confirmed @vercel/blob present in dependencies; confirmed postgres npm package in use; confirmed node: not specified in package engines
Next.js Environment Variables — Build-time vs Runtime — NEXT_PUBLIC_ vars inlined at build time; confirmed in Next.js 15 docs
Docker Compose healthcheck + depends_on — service_healthy condition requires explicit healthcheck definition
Nodemailer: SMTP port and TLS — port 465 = implicit TLS (secure: true), port 587 = STARTTLS (secure: false); mismatch causes connection timeout
postgres npm package documentation — default max: 10 connections per client instance; idle_timeout and connect_timeout options
Neon connection limits — free tier: 10 concurrent connections; paid tiers increase this
@napi-rs/canvas supported platforms — no musl (Alpine) builds published; requires glibc (Debian/Ubuntu) base image

Pitfalls research for: Teressa Copeland Homes — v1.2 multi-signer and Docker deployment Researched: 2026-04-03 Previous v1.1 pitfalls (AI field placement, expanded field types, agent signing, filled preview) documented in git history — superseded by this file for v1.2 planning. The v1.1 pitfalls are assumed addressed; recovery strategies from that document remain valid if regressions occur.

50 KiB Raw Blame History Unescape Escape

Pitfalls Research

Context: What v1.2 Is Adding to the Existing System

Summary

Multi-Signer Pitfalls

Pitfall 1: First Signer Marks Document "Signed" — Completion Fires Prematurely

Pitfall 2: Race Condition — Two Signers Complete Simultaneously, Both Trigger Final PDF Assembly

Pitfall 3: Legacy Single-Signer Documents Break When signingTokens Gains signerEmail

Pitfall 4: Field-to-Signer Tag Stored in JSONB — Queries Cannot Filter by Signer Efficiently

Pitfall 5: Audit Trail Gap — No Record of Which Signer Completed Which Field

Pitfall 6: Document Status "Viewed" Conflicts Across Signers

Docker/Deployment Pitfalls

Pitfall 7: NEXT_PUBLIC_BASE_URL Is Baked at Build Time — Wrong URL in Production Container

Pitfall 8: Uploads Directory Is Lost on Container Restart

Pitfall 9: Database Connection String Absent in Container — App Boots but All Queries Fail

Pitfall 10: PostgreSQL Container and App Container Start in Wrong Order — DB Not Ready

Pitfall 11: Neon Connection Pool Exhaustion in Docker

Pitfall 12: @napi-rs/canvas Native Binary — Wrong Platform in Docker Image

Email/SMTP Pitfalls

Pitfall 13: SMTP Env Vars Absent in Container — Root Cause of Reported Email Breakage

Pitfall 14: Nodemailer Transporter Created With Mismatched Port and TLS Settings

Pitfall 15: Multi-Signer Email Loop Fails Halfway — No Partial-Send Recovery

PDF Assembly Pitfalls

Pitfall 16: Final PDF Assembly Runs Multiple Times — Duplicate Signed PDFs

Pitfall 17: Multi-Signer Final PDF — Which Prepared PDF Is the Base?

Pitfall 18: Temp File Accumulation on Failed Assemblies

Security Pitfalls

Pitfall 19: Multiple Tokens Per Document — Token Enumeration Attack

Pitfall 20: Token Shared Between Signers — Signer A Uses Signer B's Token

Pitfall 21: Completion Notification Email Sent to Wrong Recipients

Pitfall 22: Signing Token Issued But Document Re-Prepared — Token Points to Stale PDF

Pitfall 23: @vercel/blob Is Installed But Not Used — Risk of Accidental Use

Prevention Checklist

Multi-Signer Schema Phase

Multi-Signer Send Phase

Multi-Signer Signing Flow Phase

Multi-Signer Completion Phase

Docker Deployment Phase

Verification (Do Not Skip)

Phase-Specific Warning Summary

Sources

50 KiB

Raw Blame History