draft v0.3.0 claude-opus-4-7 pattern · writing

CRUMB Flashcard Extractor

Scan a source repository and emit CRUMB-schema flashcards (20-field strict frontmatter, mermaid-first diagrams) ready to drop into crumb.devarno.cloud.

  • crumb
  • flashcards
  • extraction
  • pattern:context-aware

notes

Failure modes:
- repo_path does not exist or is not a directory: guard.sh aborts.
- project slug has uppercase or punctuation: guard.sh aborts (CARD_ID_REGEX
  in the CRUMB Zod schema would reject any emitted file).
- existing_cards_dir contains cards with malformed IDs: guard.sh logs a
  warning and falls back to start_index. Fix the offending card by hand.
- Repo has no README and no top-level docs: prompt degrades by walking the
  src tree directly; coverage will skew toward "what the code says" rather
  than "what the project intends". Consider running once, reviewing, then
  re-running with a stub README that names the subsystems.
- Very large repos (>500 files): expect the prompt to shortlist 12–20
  cards rather than exhaustively cover everything. Re-run with a narrowed
  repo_path (e.g. a subdirectory) for deeper passes.
Output is NOT written to disk by this prompt — it streams to stdout as a
series of FILE-marker blocks. Split it with a small awk/sed pipeline (or
paste into individual files) before running `npm run build:data` in crumb/.
After splitting, validate with `npx tsx scripts/validate-frontmatter.ts`
in the crumb/ directory; the schema is .strict() so any extra key fails.

description

CRUMB-specific repository to flashcard extractor. Reads a local repo
(README, src tree, schemas, protocol definitions, existing diagrams) and
produces markdown flashcards matching the CRUMB strict 20-field Zod
schema (src/schemas/flashcard.ts) on the first try. Output streams as
FILE-marker blocks so a downstream split script can drop each card
directly into crumb/src/content/cards/. Every card follows the same
shape as iris-004-chain-data-model.md: ELI5, Technical Deep Dive with
inline mermaid diagrams, Key Terms, Q&A, Examples. Cross-references
(prerequisites, dependencies, related_by_concept) resolve only against
the union of existing cards on disk and cards emitted in the same run,
so build-relationships-index.ts sees no dangling IDs. Use when
bootstrapping CRUMB coverage for a new project, refreshing a project's
cards after a major change, or generating a starter pack from an
unfamiliar repo.

examples

case · iris
{
  "repo_path": "~/code/workspace/devarno-cloud/iris",
  "project": "iris",
  "existing_cards_dir": "~/code/workspace/devarno-cloud/crumb/src/content/cards",
  "domains": "architecture, data-model, protocol-mechanics, deployment, security, observability",
  "audience": "tech"
}

inputs

namerequireddefault
repo_path yes
project yes
start_index no
existing_cards_dir no
domains no
today no
audience no

routing

triggers

  • extract crumb flashcards
  • generate crumb cards from repo
  • crumb cards for this project
  • bootstrap flashcards for crumb
  • scan repo and produce crumb flashcards
  • turn this codebase into flashcards

not for

  • generic CTO flashcards outside the CRUMB schema (use the freeform prompt)
  • rewriting or reformatting an existing crumb card (edit it directly)
  • generating image/UI assets (use brand-asset-generator)
  • producing protocol or design documents (this prompt emits cards only)

prompt

<task>
  <!-- LENS_BLOCK: substituted by guard.sh from lenses/lens-{{audience}}.xml -->
  {{lens_role}}

  <inputs>
    <repo_path>{{repo_path}}</repo_path>
    <project>{{project}}</project>
    <start_index>{{start_index}}</start_index>
    <existing_cards_dir>{{existing_cards_dir}}</existing_cards_dir>
    <domains>{{domains}}</domains>
    <today>{{today}}</today>
    <audience>{{audience}}</audience>
  </inputs>

  <operating_rules>
    <rule id="or-1">
      Schema fidelity is non-negotiable. The CRUMB content collection uses a
      `.strict()` Zod schema (src/schemas/flashcard.ts) — any unknown key
      fails the build. Every emitted card MUST carry exactly these 20
      frontmatter keys, in this order:
        id, project, domain, title,
        audience, access,
        difficulty, time_to_digest, confidence_level, use_when,
        prerequisites, dependencies, related_by_concept,
        keyword_aliases, diagrams, confidence_questions,
        created, last_reviewed, next_review_scheduled, revision.

      `audience` is a non-empty string array; `access` is one of
      "public" | "internal" | "customer". Both are required.

      Optional date fields (last_reviewed, next_review_scheduled) MAY be
      omitted entirely; do not emit them as null or empty string. All other
      keys are required.
    </rule>
    <rule id="or-2">
      ID and filename conventions.
      - id MUST match `^[a-z0-9]+-[0-9]{3}(-[a-z0-9-]+)?$`.
      - The numeric segment is exactly three digits, zero-padded.
      - The project segment of the id is the literal value of the `project`
        input.
      - The filename for each card is
        `<existing_cards_dir>/<project>-NNN-<slug>.md` where slug is
        kebab-case, ≤ 6 words, derived from the title (drop articles and
        ampersands; keep meaningful nouns).
      - Numeric IDs assigned in this run are contiguous starting at
        `start_index` (which guard.sh has already advanced past any existing
        cards on disk).
    </rule>
    <rule id="or-3">
      Body section order, exactly as supplied by the lens (newline-separated
      headings). Mermaid diagrams remain inline within the deep-dive
      section under named sub-headings. Do NOT introduce a separate
      "## Mermaid Diagrams" section.

      {{lens_sections}}
    </rule>
    <rule id="or-4">
      Diagram priority. For each card pick the SMALLEST set of diagrams that
      explains it. Use ONLY diagram types listed in the lens's
      `allowed_diagrams_primary` first, falling back to
      `allowed_diagrams_secondary` only when justified by the concept.
      Break diagrams into subgraphs when they exceed ~15 nodes. Pastel-tinted
      styling is allowed but optional — Starlight's Mars theme already
      handles palette. Every diagram type used in the body MUST appear in
      the `diagrams:` frontmatter array (lowercase mermaid type name).

      Primary types for this lens: {{lens_allowed_diagrams_primary}}
      Secondary types for this lens: {{lens_allowed_diagrams_secondary}}
    </rule>
    <rule id="or-5">
      Graph edge integrity. The build step `build-relationships-index.ts`
      breaks on dangling prerequisite/dependency refs; `related_by_concept`
      is NOT validated, so silent drift there is the most common failure.
      - prerequisites: cards strictly required to understand this one.
      - dependencies: downstream cards that build on this one.
      - related_by_concept: sibling cards covering adjacent concepts.
      Only reference IDs that EITHER exist in `existing_cards_dir` already
      OR are emitted in this same run. Resolve all edges in the dedicated
      edge-pass step (see &lt;execution&gt;), not while drafting the body.
      Empty arrays are valid.

      CANONICAL ID FORM. Every edge MUST use the canonical short id —
      the literal value of the target card's `id:` frontmatter field,
      which is `&lt;project&gt;-NNN` (three digits, no slug suffix). Filenames
      contain a slug suffix (`iris-006-blake3-fingerprinting.md`) but the
      `id` field inside is `iris-006`. NEVER write the filename slug into
      an edge array — the regex permits it, but it does not match any
      card's id, so the edge silently dangles and the node stays an
      island in the graph.

      Concretely:

        related_by_concept: [smo1-014, choco-005, lore-007]   # CORRECT
        related_by_concept: [smo1-014-billing-polar,           # WRONG
                             choco-005-site-hosting-modes,     # WRONG
                             lore-007-data-model-isolation]    # WRONG

      Build the edge array by stripping any text after the three-digit
      numeric segment of every candidate id. If you produced an array
      from filenames or directory listings, run that strip step BEFORE
      writing the FILE block. When in doubt, open the candidate file
      and copy the value of its `id:` field verbatim.

      NO SELF-EDGES. A card MUST NOT reference its own id in any of
      `prerequisites`, `dependencies`, or `related_by_concept`. A
      self-prerequisite creates a 1-node cycle that breaks the
      prerequisite-chain compute step. Before emitting each FILE block,
      verify the card's own id is absent from all three edge arrays.
    </rule>
    <rule id="or-6">
      Cognitive signals.
      - difficulty: beginner | intermediate | advanced. Pick by prereq depth
        and conceptual abstraction, not line count.
      - time_to_digest: honest minute estimate, integer 1–120. Most cards
        land at 3–7. A card claiming &lt; 2 minutes had better be a one-screen
        concept; a card claiming &gt; 15 should probably be split.
      - confidence_level: always "medium" for first emission.
      - use_when: ≥ 1 entry, each a concrete debugging or design context
        ("debugging chain execution failures", "adding a new envelope
        field"). Never "learning about X", "studying Y", or any other
        generic pedagogy phrase — those fail the anti-slop test.
    </rule>
    <rule id="or-7">
      Domain assignment. Each card is assigned exactly one `domain`, and
      that domain MUST be one of the values listed in the lens's
      `allowed_domains`. When a card sits across two domains, pick the
      one that drives the card's primary diagram.

      Allowed domains for this lens: {{lens_allowed_domains}}
    </rule>
    <rule id="or-8">
      Anti-slop discipline (lens-supplied):

      {{lens_anti_slop}}
    </rule>
    <rule id="or-9">
      Multi-card emission format. Cards are streamed to stdout, one after
      another, each wrapped in markers so a downstream split script can
      write them to disk:

        &lt;!-- FILE: src/content/cards/&lt;project&gt;-NNN-&lt;slug&gt;.md --&gt;
        ---
        id: &lt;project&gt;-NNN
        ... (full 20-field frontmatter) ...
        ---

        &lt;lens-supplied sections — see {{lens_sections}}&gt;
        ...
        &lt;!-- END FILE --&gt;

      The path in the FILE marker is RELATIVE to the crumb repo root
      (always begins with `src/content/cards/`), not absolute. No prose
      between cards.
    </rule>
    <rule id="or-10">
      Final manifest. After the last `&lt;!-- END FILE --&gt;`, emit:

        ## Manifest

      A markdown table with columns: id | title | domain | difficulty |
      diagrams | xproj | rationale. One row per emitted card. `xproj`
      is the integer count of cross-project IDs in that card's
      `related_by_concept` array. The rationale column is one sentence
      ("why this card exists"). This is the human review surface —
      keep it scannable. A run that produces a column of zeros in
      `xproj` failed the cross-project quota in or-5/edge-pass; do not
      emit it without first re-doing the edge-pass.
    </rule>
    <rule id="or-11">
      Created date. Stamp `created: {{today}}` into every emitted card.
      Omit `last_reviewed` and `next_review_scheduled` entirely (they are
      schema-optional). Set `revision: v1.0`.
    </rule>
  </operating_rules>

  <execution>
    <step id="1" name="recon">
      Read the source paths declared by the lens, resolved against
      &lt;repo_path&gt;. The lens's declared source roots:

      {{lens_source_roots}}

      Produce an INTERNAL (do not emit) inventory:
        - Subsystems and their boundaries
        - Data models / schemas (named types and their relationships)
        - Protocols and packets (envelope formats, wire shapes)
        - Workflows / state machines
        - Deployment surfaces (services, queues, edges, scheduled jobs)
        - Security boundaries (authn, authz, secrets, network)
        - Observability hooks (logs, metrics, traces)
      Note any concept that already has a diagram in the source — those are
      first-class card candidates because the diagram can be lifted with
      attribution.
    </step>
    <step id="2" name="index-scan">
      Read &lt;existing_cards_dir&gt;. For project = &lt;project&gt;, find every card
      whose id starts with `&lt;project&gt;-`. Build:
        - existing_ids: set of full IDs (e.g. {iris-000, iris-001, ...})
        - existing_titles: map id → title for cross-reference candidates
        - next_id: max numeric of existing_ids + 1, or &lt;start_index&gt; if
          no existing cards. (guard.sh has already done this calculation
          and passed the result via &lt;start_index&gt;; trust it but verify
          against existing_ids.)
      ALSO scan a sample of cards from OTHER projects to discover
      cross-project edges (e.g. an iris card might depend on grace-015).
      When extracting candidate IDs from those cards, parse the `id:`
      frontmatter field — do NOT derive IDs from filenames. The id field
      is the canonical short form (`&lt;project&gt;-NNN`).
    </step>
    <step id="3" name="shortlist">
      Emit a `## Card Shortlist` section BEFORE any FILE blocks. One line
      per card: `&lt;tentative-id&gt; — &lt;tentative title&gt; — &lt;domain&gt; —
      &lt;primary diagram type&gt;`. Aim for 8–20 cards on a fresh project.
      Order them so prerequisites precede the cards that depend on them
      (the first card in the shortlist gets the lowest numeric id). Do NOT
      include any prose explaining the shortlist — it is a planning index
      that downstream tooling can grep.
    </step>
    <step id="4" name="draft">
      For each shortlisted card, emit a complete FILE block (or-9). Apply
      or-1..or-8 in full. Diagrams go inline. Leave the three edge fields
      (prerequisites, dependencies, related_by_concept) as `[]` for now —
      they will be populated in step 5.
    </step>
    <step id="5" name="edge-pass">
      Once every body is drafted, do a final pass that REWRITES each FILE
      block's frontmatter to populate prerequisites / dependencies /
      related_by_concept. Resolve edges against the union of existing_ids
      (from step 2) and the new IDs emitted in step 4. Drop any candidate
      ID that is not in this union — never emit a dangling ref. If a card
      has no plausible prereq or dep, leave the array empty.

      Practical heuristic for assignment:
        - prerequisites: "you cannot understand this without first
          understanding X". Usually 0–3 entries.
        - dependencies: "if you change this, X is affected". Usually 0–4
          entries. Often the inverse of another card's prerequisites.
        - related_by_concept: "covers an adjacent concern". Usually 1–5
          entries. Symmetric — if A relates to B, B should relate to A.

      CROSS-PROJECT QUOTA (hard requirement). The whole point of CRUMB
      is the cross-project knowledge graph. A new project that ships
      with zero or one external edges is a failure of this step, not an
      acceptable outcome. For each new card, before writing its
      `related_by_concept` array:
        1. Identify the card's primary concept in one phrase (e.g.
           "wire envelope", "rate limiter", "DAG scheduler", "JWT
           handoff", "CRDT op format", "soft delete", "saga state
           machine", "NATS subject taxonomy").
        2. Search existing_ids (from step 2) for cards whose title or
           keyword_aliases reference that concept. Common cross-cutting
           anchors that show up across many projects:
             auth/JWT     → smo1-004, smo1-005, lore-002
             CRDT/merge   → fnp-006, chronicle-008
             HLC/causality → lore-005, chronicle-006
             DAG/scheduler → iris-024, stratt-010, kahn-002
             NATS/events  → stratt-011, kahn-005, kahn-006
             envelope/wire → chronicle-004
             rate limiting → smo1-011
             observability → fnp-015, iris cards
           These are illustrative — always grep the actual cards rather
           than relying on this list alone.
        3. Add at least 1–2 cross-project IDs to `related_by_concept`
           when any plausible match exists. If after honest searching
           the card has NO conceptual neighbour in any other project
           (rare — usually means the card is too narrow or the search
           was too shallow), only then emit an array with same-project
           IDs only, and add a one-line `# isolated:` comment in the
           Manifest rationale column explaining why.
      The Manifest table at the end MUST report each card's
      cross-project edge count; reviewers use this as a quality signal.
    </step>
    <step id="6" name="manifest">
      Emit the `## Manifest` section per or-10.
    </step>
  </execution>

  <output_format>
    Stream to stdout in this exact order, with no surrounding prose:

      ## Card Shortlist
      &lt;one line per planned card&gt;

      &lt;!-- FILE: src/content/cards/&lt;project&gt;-NNN-&lt;slug&gt;.md --&gt;
      ---
      id: &lt;project&gt;-NNN
      project: &lt;project&gt;
      domain: &lt;one of the allowed domains&gt;
      title: "..."
      audience: ["{{audience}}"]
      access: "{{lens_default_access}}"
      difficulty: beginner | intermediate | advanced
      time_to_digest: &lt;int 1-120&gt;
      confidence_level: "medium"
      use_when:
        - "..."
      prerequisites: [&lt;ids&gt;]
      dependencies: [&lt;ids&gt;]
      related_by_concept: [&lt;ids&gt;]
      keyword_aliases: ["...", "..."]
      diagrams: ["sequenceDiagram", "..."]
      confidence_questions:
        - "..."
      created: {{today}}
      revision: v1.0
      ---

      &lt;lens-supplied body sections — see {{lens_sections}}&gt;
      &lt;!-- END FILE --&gt;

      &lt;!-- FILE: ... --&gt;
      ... next card ...
      &lt;!-- END FILE --&gt;

      ## Manifest
      | id | title | domain | difficulty | diagrams | xproj | rationale |
      | --- | --- | --- | --- | --- | --- | --- |
      | ... | ... | ... | ... | ... | &lt;int&gt; | ... |
  </output_format>
</task>