markdown-review/DESIGN.md
TLC AI Lab 8eba380a3f markdown-review v0.0.1 spike: comment-on-markdown -> agent (MCP Apps)
Antigravity-style inline markdown commenting as an installable Claude plugin.
- MCP Apps webview (markdown-it renderer + comment UI)
- three lanes: Answer Now (sendMessage), Add to batch, Submit All (persist+send)
- edit/review mode toggle; sidecar comment persistence
- self-contained prebuilt server/dist (zero runtime node_modules)
- self-marketplace manifest for /plugin install
- 11/11 stdio smoke tests passing

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 00:58:36 -05:00

10 KiB

markdown-review — Design Doc

Antigravity-style comment-on-markdown → send-to-agent as a Claude plugin (MCP Apps / MCP-UI). Status: SPEC (Phase 0 complete — capability verified, API grounded). Author: JARVIS / CDT. 2026-06-09.


0. Why this is buildable (verification result)

The mechanism is already proven on this machine: Anthropic's pdf-viewer plugin is installed and its pdf MCP server is live. It renders an interactive viewer with zero Electron/browser — it ships a single-file HTML app as an MCP resource (ui://…, mimetype text/html;profile=mcp-app) that Claude Desktop renders in its own sandboxed webview. We replicate that pattern with a markdown renderer + comment UI.

Grounded stack (same as the working pdf server): Node 24, @modelcontextprotocol/ext-apps@^1.7, @modelcontextprotocol/sdk@^1.29, zod@^4, Vite + vite-plugin-singlefile to bundle app.html.

Key ext-apps primitives we use (real, verified):

  • registerAppTool(server, name, { _meta: { ui: { resourceUri } } }, cb) — model-facing tool that opens the view.
  • registerAppResource(server, title, "ui://…", {}, readCb) — serves the HTML, with CSP/permission _meta.ui.
  • Tool visibility _meta.ui.visibility: ["app"] — hide plumbing tools from the model (isToolVisibilityAppOnly).
  • App side (@modelcontextprotocol/ext-apps/app + app-bridge): the webview can call server tools and updateModelContext — push text directly into the model's turn. This is what powers "Answer Now".

Host note: the live comment panel renders in Claude Desktop (the MCP-Apps host). Claude Code builds/installs the plugin; the interactive surface is Desktop's. A headless Claude Code fallback (render to a temp HTML + Read) is a secondary, non-interactive mode.


1. The interaction model — three lanes

Antigravity has two effective behaviors (batch artifact comments; select-text→chat). Rav's refinement adds a third, and makes the batch-result behavior a per-session toggle. The result is three lanes:

Lane A — Reading-phase batch (the default, flow-preserving)

You read top-to-bottom and drop comments as you go. No AI is invoked. Comments accumulate silently. A single Submit All flushes the whole set into one agent turn. This is the steering loop.

Lane B — "Answer Now" (single, ephemeral, mid-read) ⟵ Rav's key addition

A separate button on an individual comment. Fires only that one item + that one comment to the agent immediately, via updateModelContext. It:

  • does not enter the batch store,
  • does not disturb the other pending batch comments,
  • is for a quick correction or question without breaking the reading phase.

Mechanically distinct from Lane A: different button, different storage (memory/none), different trigger.

Lane C — what Submit All does (per-session mode toggle)

Orthogonal to A/B. Controls the effect of a batch flush:

  • edit mode — comments become edit instructions; agent modifies the anchored ranges in the real .md, re-renders, verifies. (Antigravity's steering loop.)
  • review mode — comments become a punch-list/review thread; the file is left untouched until you approve items one by one.
  • Switchable per session (a toolbar toggle + a tool arg). Default edit.
Lane Trigger Scope Storage Effect
A Batch Submit All all pending sidecar (default) edit or review (Lane C)
B Answer Now per-comment button that one ephemeral (memory) immediate single ask
C (mode) toggle governs A's effect

2. Anchor model

Because we operate on real files (unlike Antigravity's opaque rendered-span anchor), comments anchor richer and survive edits better:

"anchor": {
  "line_start": 12, "line_end": 14,        // 1-based, inclusive
  "char_start": 340, "char_end": 410,       // byte/char offsets into the raw file
  "block_id": "h2-architecture-3",          // stable-ish slug of the enclosing md block
  "quote": "We should use Flask here",      // verbatim selected text (re-anchor fallback)
  "context_before": "…", "context_after": "…"  // ~120 chars each, for fuzzy re-anchoring
}

Re-anchoring on a changed file: try char_start → verify quote matches; else search quote near block_id; else fuzzy-match quote in context. Stale anchors are flagged, never silently dropped.


3. Storage — "all of the above, intelligently"

Three backends, selected by intent, with explicit override. Combos are first-class.

Backend Lives in Lifetime Default use (scenario)
memory server process, per viewUUID session Lane B Answer-Now; throwaway brainstorm; "don't clutter anything"
sidecar .claude/md-comments/<sha1(path)>.json durable, resumable Lane A batch review threads; multi-file reviews; keeps .md clean (Antigravity-like)
inline <!-- mdrev:{id} … --> markers in the .md travels with file hand-off to a colleague, git-committed review, portable, want comment visible in raw source

Intelligent routing (defaults; all overridable per call / toolbar)

  • Answer-Now → memory (never persisted).
  • Batch comment authored during reading → sidecar (source of truth), auto-resumes next open.
  • Explicit "Export for sharing" → materialize sidecar threads as inline markers.
  • On open, if the file already contains inline mdrev: markers → import them into the sidecar working set (combo: inline-authored → sidecar-managed).
  • Promote: a memory/Answer-Now comment you want to keep → "pin" → sidecar.

Combos we explicitly support

  1. sidecar SoT + inline export — manage privately, publish a shareable copy.
  2. inline import → sidecar — colleague sent a .md with mdrev: comments; we ingest + manage them.
  3. memory → sidecar promote — a quick ask turned out worth keeping.
  4. sidecar + inline mirror (synced) — opt-in two-way: edits in either reflect (sidecar is authority on conflict).

Storage backend is a field on every comment (store: "memory"|"sidecar"|"inline") so a single view can mix all three. resolve/delete operate uniformly across backends.


4. MCP surface

Model-facing (the only tools the model sees)

  • open_markdown(path, { mode?: "edit"|"review" }){ viewUUID, fileMeta, importedComments }. Opens the view (returns _meta.ui.resourceUri), imports inline markers + loads sidecar.
  • interact(viewUUID, commands[]) → batched ops the model issues: scroll_to(anchor) | highlight(anchor) | get_comments(filter?) | get_state | set_mode(edit|review) | resolve(id) | export_inline | get_pages(rendered+source).
  • (get_comments is how the model receives a Submit All payload after the user clicks it.)

App-only (hidden, _meta.ui.visibility:["app"]) — the plumbing

  • poll_md_commands(viewUUID) — long-poll queue draining interact commands to the webview.
  • submit_batch(viewUUID, comments[], mode) — Lane A: user clicked Submit All; resolves the model's pending get_comments/blocks the interact promise (pdf-viewer pattern).
  • persist_comment(viewUUID, comment) — webview writes a single batch comment to its backend as authored.
  • read_md_bytes(viewUUID, range?) — webview pulls source to render.
  • save_md(viewUUID, bytes) — when edit mode writes back inline markers.

App→model direct (no server tool) — Lane B

  • "Answer Now" calls the bridge updateModelContext({ item, comment }) → injected into the live turn. No queue, no store. (Falls back to a dedicated answer_now app-tool if a host lacks updateModelContext.)

5. Round-trip (mirrors the verified pdf-viewer mechanism)

model → interact(cmds)           server enqueues to commandQueues[viewUUID]
webview → poll_md_commands(...)   long-poll; executes in the renderer (scroll/highlight/extract)
                                  for data cmds: interact() BLOCKS on Promise(requestId, timeout)
user clicks "Submit All"          webview → submit_batch(...)  → resolves the pending model call
user clicks "Answer Now"          webview → updateModelContext(...) → straight into the turn

6. Plugin packaging

markdown-review/
├── .claude-plugin/plugin.json     # name, version, author, components
├── .mcp.json                      # { mcpServers: { "md": { command: node, args:[server/index.js,--stdio] } } }
├── commands/                      # /open.md  /comment.md  /review.md
├── skills/review-markdown/SKILL.md  # orchestration brain (open-once, batched interact, mode handling)
├── server/                        # Node MCP server (queue + blocking-promise plumbing)
│   ├── index.js  · server.js  · storage.js  · anchor.js
└── app/                           # Vite single-file → bundled app.html (md renderer + comment UI)
    └── src/  → built to ../server/app.html

Renderer: markdown-it (CommonMark + tables/footnotes) in the sandboxed webview. Comment UI = margin threads anchored to selection; toolbar = mode toggle, Submit All, Export-inline, store selector; per-comment = Answer-Now + Submit + store-pick + resolve.


7. Build phases (slow-is-fast)

  1. Verify host support + ground the API. (done)
  2. Spike the round-trip — minimal md server: open a file, render in webview, prove interact/poll/submit + updateModelContext fire end-to-end. De-risks everything.
  3. Comment UI — selection → anchored thread → Submit All payload + Answer-Now.
  4. Storage layer — sidecar + inline + memory + the intelligent router + combos.
  5. Agent wiring — SKILL.md + commands turn submitted comments into edits (edit mode) / list (review mode).
  6. Anchor robustness — re-anchoring across edits; stale flagging.
  7. Package + install — plugin.json + local marketplace entry; smoke-test in Claude Desktop.

8. Open questions / risks

  • updateModelContext host support in this Desktop build — confirm in Phase 1 spike; fallback = answer_now app-tool.
  • markdown-it bundle size in single-file (acceptable; pdf-viewer ships 4.4MB).
  • Two-way sidecar↔inline sync (combo #4) deferred past v1 — conflict policy needs care.