CRUMB Flashcard Extractor
Scan a source repository and emit CRUMB-schema flashcards (20-field strict frontmatter, mermaid-first diagrams) ready to drop into crumb.devarno.cloud.
notes
Failure modes: - repo_path does not exist or is not a directory: guard.sh aborts. - project slug has uppercase or punctuation: guard.sh aborts (CARD_ID_REGEX in the CRUMB Zod schema would reject any emitted file). - existing_cards_dir contains cards with malformed IDs: guard.sh logs a warning and falls back to start_index. Fix the offending card by hand. - Repo has no README and no top-level docs: prompt degrades by walking the src tree directly; coverage will skew toward "what the code says" rather than "what the project intends". Consider running once, reviewing, then re-running with a stub README that names the subsystems. - Very large repos (>500 files): expect the prompt to shortlist 12–20 cards rather than exhaustively cover everything. Re-run with a narrowed repo_path (e.g. a subdirectory) for deeper passes. Output is NOT written to disk by this prompt — it streams to stdout as a series of FILE-marker blocks. Split it with a small awk/sed pipeline (or paste into individual files) before running `npm run build:data` in crumb/. After splitting, validate with `npx tsx scripts/validate-frontmatter.ts` in the crumb/ directory; the schema is .strict() so any extra key fails.
description
CRUMB-specific repository to flashcard extractor. Reads a local repo (README, src tree, schemas, protocol definitions, existing diagrams) and produces markdown flashcards matching the CRUMB strict 20-field Zod schema (src/schemas/flashcard.ts) on the first try. Output streams as FILE-marker blocks so a downstream split script can drop each card directly into crumb/src/content/cards/. Every card follows the same shape as iris-004-chain-data-model.md: ELI5, Technical Deep Dive with inline mermaid diagrams, Key Terms, Q&A, Examples. Cross-references (prerequisites, dependencies, related_by_concept) resolve only against the union of existing cards on disk and cards emitted in the same run, so build-relationships-index.ts sees no dangling IDs. Use when bootstrapping CRUMB coverage for a new project, refreshing a project's cards after a major change, or generating a starter pack from an unfamiliar repo.
examples
case · iris
{
"repo_path": "~/code/workspace/devarno-cloud/iris",
"project": "iris",
"existing_cards_dir": "~/code/workspace/devarno-cloud/crumb/src/content/cards",
"domains": "architecture, data-model, protocol-mechanics, deployment, security, observability",
"audience": "tech"
}
inputs
| name | required | default |
|---|---|---|
repo_path |
yes | — |
project |
yes | — |
start_index |
no | — |
existing_cards_dir |
no | — |
domains |
no | — |
today |
no | — |
audience |
no | — |
routing
triggers
- extract crumb flashcards
- generate crumb cards from repo
- crumb cards for this project
- bootstrap flashcards for crumb
- scan repo and produce crumb flashcards
- turn this codebase into flashcards
not for
- generic CTO flashcards outside the CRUMB schema (use the freeform prompt)
- rewriting or reformatting an existing crumb card (edit it directly)
- generating image/UI assets (use brand-asset-generator)
- producing protocol or design documents (this prompt emits cards only)
prompt
<task>
<!-- LENS_BLOCK: substituted by guard.sh from lenses/lens-{{audience}}.xml -->
{{lens_role}}
<inputs>
<repo_path>{{repo_path}}</repo_path>
<project>{{project}}</project>
<start_index>{{start_index}}</start_index>
<existing_cards_dir>{{existing_cards_dir}}</existing_cards_dir>
<domains>{{domains}}</domains>
<today>{{today}}</today>
<audience>{{audience}}</audience>
</inputs>
<operating_rules>
<rule id="or-1">
Schema fidelity is non-negotiable. The CRUMB content collection uses a
`.strict()` Zod schema (src/schemas/flashcard.ts) — any unknown key
fails the build. Every emitted card MUST carry exactly these 20
frontmatter keys, in this order:
id, project, domain, title,
audience, access,
difficulty, time_to_digest, confidence_level, use_when,
prerequisites, dependencies, related_by_concept,
keyword_aliases, diagrams, confidence_questions,
created, last_reviewed, next_review_scheduled, revision.
`audience` is a non-empty string array; `access` is one of
"public" | "internal" | "customer". Both are required.
Optional date fields (last_reviewed, next_review_scheduled) MAY be
omitted entirely; do not emit them as null or empty string. All other
keys are required.
</rule>
<rule id="or-2">
ID and filename conventions.
- id MUST match `^[a-z0-9]+-[0-9]{3}(-[a-z0-9-]+)?$`.
- The numeric segment is exactly three digits, zero-padded.
- The project segment of the id is the literal value of the `project`
input.
- The filename for each card is
`<existing_cards_dir>/<project>-NNN-<slug>.md` where slug is
kebab-case, ≤ 6 words, derived from the title (drop articles and
ampersands; keep meaningful nouns).
- Numeric IDs assigned in this run are contiguous starting at
`start_index` (which guard.sh has already advanced past any existing
cards on disk).
</rule>
<rule id="or-3">
Body section order, exactly as supplied by the lens (newline-separated
headings). Mermaid diagrams remain inline within the deep-dive
section under named sub-headings. Do NOT introduce a separate
"## Mermaid Diagrams" section.
{{lens_sections}}
</rule>
<rule id="or-4">
Diagram priority. For each card pick the SMALLEST set of diagrams that
explains it. Use ONLY diagram types listed in the lens's
`allowed_diagrams_primary` first, falling back to
`allowed_diagrams_secondary` only when justified by the concept.
Break diagrams into subgraphs when they exceed ~15 nodes. Pastel-tinted
styling is allowed but optional — Starlight's Mars theme already
handles palette. Every diagram type used in the body MUST appear in
the `diagrams:` frontmatter array (lowercase mermaid type name).
Primary types for this lens: {{lens_allowed_diagrams_primary}}
Secondary types for this lens: {{lens_allowed_diagrams_secondary}}
</rule>
<rule id="or-5">
Graph edge integrity. The build step `build-relationships-index.ts`
breaks on dangling prerequisite/dependency refs; `related_by_concept`
is NOT validated, so silent drift there is the most common failure.
- prerequisites: cards strictly required to understand this one.
- dependencies: downstream cards that build on this one.
- related_by_concept: sibling cards covering adjacent concepts.
Only reference IDs that EITHER exist in `existing_cards_dir` already
OR are emitted in this same run. Resolve all edges in the dedicated
edge-pass step (see <execution>), not while drafting the body.
Empty arrays are valid.
CANONICAL ID FORM. Every edge MUST use the canonical short id —
the literal value of the target card's `id:` frontmatter field,
which is `<project>-NNN` (three digits, no slug suffix). Filenames
contain a slug suffix (`iris-006-blake3-fingerprinting.md`) but the
`id` field inside is `iris-006`. NEVER write the filename slug into
an edge array — the regex permits it, but it does not match any
card's id, so the edge silently dangles and the node stays an
island in the graph.
Concretely:
related_by_concept: [smo1-014, choco-005, lore-007] # CORRECT
related_by_concept: [smo1-014-billing-polar, # WRONG
choco-005-site-hosting-modes, # WRONG
lore-007-data-model-isolation] # WRONG
Build the edge array by stripping any text after the three-digit
numeric segment of every candidate id. If you produced an array
from filenames or directory listings, run that strip step BEFORE
writing the FILE block. When in doubt, open the candidate file
and copy the value of its `id:` field verbatim.
NO SELF-EDGES. A card MUST NOT reference its own id in any of
`prerequisites`, `dependencies`, or `related_by_concept`. A
self-prerequisite creates a 1-node cycle that breaks the
prerequisite-chain compute step. Before emitting each FILE block,
verify the card's own id is absent from all three edge arrays.
</rule>
<rule id="or-6">
Cognitive signals.
- difficulty: beginner | intermediate | advanced. Pick by prereq depth
and conceptual abstraction, not line count.
- time_to_digest: honest minute estimate, integer 1–120. Most cards
land at 3–7. A card claiming < 2 minutes had better be a one-screen
concept; a card claiming > 15 should probably be split.
- confidence_level: always "medium" for first emission.
- use_when: ≥ 1 entry, each a concrete debugging or design context
("debugging chain execution failures", "adding a new envelope
field"). Never "learning about X", "studying Y", or any other
generic pedagogy phrase — those fail the anti-slop test.
</rule>
<rule id="or-7">
Domain assignment. Each card is assigned exactly one `domain`, and
that domain MUST be one of the values listed in the lens's
`allowed_domains`. When a card sits across two domains, pick the
one that drives the card's primary diagram.
Allowed domains for this lens: {{lens_allowed_domains}}
</rule>
<rule id="or-8">
Anti-slop discipline (lens-supplied):
{{lens_anti_slop}}
</rule>
<rule id="or-9">
Multi-card emission format. Cards are streamed to stdout, one after
another, each wrapped in markers so a downstream split script can
write them to disk:
<!-- FILE: src/content/cards/<project>-NNN-<slug>.md -->
---
id: <project>-NNN
... (full 20-field frontmatter) ...
---
<lens-supplied sections — see {{lens_sections}}>
...
<!-- END FILE -->
The path in the FILE marker is RELATIVE to the crumb repo root
(always begins with `src/content/cards/`), not absolute. No prose
between cards.
</rule>
<rule id="or-10">
Final manifest. After the last `<!-- END FILE -->`, emit:
## Manifest
A markdown table with columns: id | title | domain | difficulty |
diagrams | xproj | rationale. One row per emitted card. `xproj`
is the integer count of cross-project IDs in that card's
`related_by_concept` array. The rationale column is one sentence
("why this card exists"). This is the human review surface —
keep it scannable. A run that produces a column of zeros in
`xproj` failed the cross-project quota in or-5/edge-pass; do not
emit it without first re-doing the edge-pass.
</rule>
<rule id="or-11">
Created date. Stamp `created: {{today}}` into every emitted card.
Omit `last_reviewed` and `next_review_scheduled` entirely (they are
schema-optional). Set `revision: v1.0`.
</rule>
</operating_rules>
<execution>
<step id="1" name="recon">
Read the source paths declared by the lens, resolved against
<repo_path>. The lens's declared source roots:
{{lens_source_roots}}
Produce an INTERNAL (do not emit) inventory:
- Subsystems and their boundaries
- Data models / schemas (named types and their relationships)
- Protocols and packets (envelope formats, wire shapes)
- Workflows / state machines
- Deployment surfaces (services, queues, edges, scheduled jobs)
- Security boundaries (authn, authz, secrets, network)
- Observability hooks (logs, metrics, traces)
Note any concept that already has a diagram in the source — those are
first-class card candidates because the diagram can be lifted with
attribution.
</step>
<step id="2" name="index-scan">
Read <existing_cards_dir>. For project = <project>, find every card
whose id starts with `<project>-`. Build:
- existing_ids: set of full IDs (e.g. {iris-000, iris-001, ...})
- existing_titles: map id → title for cross-reference candidates
- next_id: max numeric of existing_ids + 1, or <start_index> if
no existing cards. (guard.sh has already done this calculation
and passed the result via <start_index>; trust it but verify
against existing_ids.)
ALSO scan a sample of cards from OTHER projects to discover
cross-project edges (e.g. an iris card might depend on grace-015).
When extracting candidate IDs from those cards, parse the `id:`
frontmatter field — do NOT derive IDs from filenames. The id field
is the canonical short form (`<project>-NNN`).
</step>
<step id="3" name="shortlist">
Emit a `## Card Shortlist` section BEFORE any FILE blocks. One line
per card: `<tentative-id> — <tentative title> — <domain> —
<primary diagram type>`. Aim for 8–20 cards on a fresh project.
Order them so prerequisites precede the cards that depend on them
(the first card in the shortlist gets the lowest numeric id). Do NOT
include any prose explaining the shortlist — it is a planning index
that downstream tooling can grep.
</step>
<step id="4" name="draft">
For each shortlisted card, emit a complete FILE block (or-9). Apply
or-1..or-8 in full. Diagrams go inline. Leave the three edge fields
(prerequisites, dependencies, related_by_concept) as `[]` for now —
they will be populated in step 5.
</step>
<step id="5" name="edge-pass">
Once every body is drafted, do a final pass that REWRITES each FILE
block's frontmatter to populate prerequisites / dependencies /
related_by_concept. Resolve edges against the union of existing_ids
(from step 2) and the new IDs emitted in step 4. Drop any candidate
ID that is not in this union — never emit a dangling ref. If a card
has no plausible prereq or dep, leave the array empty.
Practical heuristic for assignment:
- prerequisites: "you cannot understand this without first
understanding X". Usually 0–3 entries.
- dependencies: "if you change this, X is affected". Usually 0–4
entries. Often the inverse of another card's prerequisites.
- related_by_concept: "covers an adjacent concern". Usually 1–5
entries. Symmetric — if A relates to B, B should relate to A.
CROSS-PROJECT QUOTA (hard requirement). The whole point of CRUMB
is the cross-project knowledge graph. A new project that ships
with zero or one external edges is a failure of this step, not an
acceptable outcome. For each new card, before writing its
`related_by_concept` array:
1. Identify the card's primary concept in one phrase (e.g.
"wire envelope", "rate limiter", "DAG scheduler", "JWT
handoff", "CRDT op format", "soft delete", "saga state
machine", "NATS subject taxonomy").
2. Search existing_ids (from step 2) for cards whose title or
keyword_aliases reference that concept. Common cross-cutting
anchors that show up across many projects:
auth/JWT → smo1-004, smo1-005, lore-002
CRDT/merge → fnp-006, chronicle-008
HLC/causality → lore-005, chronicle-006
DAG/scheduler → iris-024, stratt-010, kahn-002
NATS/events → stratt-011, kahn-005, kahn-006
envelope/wire → chronicle-004
rate limiting → smo1-011
observability → fnp-015, iris cards
These are illustrative — always grep the actual cards rather
than relying on this list alone.
3. Add at least 1–2 cross-project IDs to `related_by_concept`
when any plausible match exists. If after honest searching
the card has NO conceptual neighbour in any other project
(rare — usually means the card is too narrow or the search
was too shallow), only then emit an array with same-project
IDs only, and add a one-line `# isolated:` comment in the
Manifest rationale column explaining why.
The Manifest table at the end MUST report each card's
cross-project edge count; reviewers use this as a quality signal.
</step>
<step id="6" name="manifest">
Emit the `## Manifest` section per or-10.
</step>
</execution>
<output_format>
Stream to stdout in this exact order, with no surrounding prose:
## Card Shortlist
<one line per planned card>
<!-- FILE: src/content/cards/<project>-NNN-<slug>.md -->
---
id: <project>-NNN
project: <project>
domain: <one of the allowed domains>
title: "..."
audience: ["{{audience}}"]
access: "{{lens_default_access}}"
difficulty: beginner | intermediate | advanced
time_to_digest: <int 1-120>
confidence_level: "medium"
use_when:
- "..."
prerequisites: [<ids>]
dependencies: [<ids>]
related_by_concept: [<ids>]
keyword_aliases: ["...", "..."]
diagrams: ["sequenceDiagram", "..."]
confidence_questions:
- "..."
created: {{today}}
revision: v1.0
---
<lens-supplied body sections — see {{lens_sections}}>
<!-- END FILE -->
<!-- FILE: ... -->
... next card ...
<!-- END FILE -->
## Manifest
| id | title | domain | difficulty | diagrams | xproj | rationale |
| --- | --- | --- | --- | --- | --- | --- |
| ... | ... | ... | ... | ... | <int> | ... |
</output_format>
</task>
task
inputs
repo_path
{{repo_path}}
project
{{project}}
start_index
{{start_index}}
existing_cards_dir
{{existing_cards_dir}}
domains
{{domains}}
today
{{today}}
audience
{{audience}}
operating_rules
rule
#text
Schema fidelity is non-negotiable. The CRUMB content collection uses a `.strict()` Zod schema (src/schemas/flashcard.ts) — any unknown key fails the build. Every emitted card MUST carry exactly these 20 frontmatter keys, in this order: id, project, domain, title, audience, access, difficulty, time_to_digest, confidence_level, use_when, prerequisites, dependencies, related_by_concept, keyword_aliases, diagrams, confidence_questions, created, last_reviewed, next_review_scheduled, revision. `audience` is a non-empty string array; `access` is one of "public" | "internal" | "customer". Both are required. Optional date fields (last_reviewed, next_review_scheduled) MAY be omitted entirely; do not emit them as null or empty string. All other keys are required.
@_id
or-1
existing_cards_dir
project
slug
.md` where slug is kebab-case, ≤ 6 words, derived from the title (drop articles and ampersands; keep meaningful nouns). - Numeric IDs assigned in this run are contiguous starting at `start_index` (which guard.sh has already advanced past any existing cards on disk).
rule
#text
Body section order, exactly as supplied by the lens (newline-separated headings). Mermaid diagrams remain inline within the deep-dive section under named sub-headings. Do NOT introduce a separate "## Mermaid Diagrams" section. {{lens_sections}}
@_id
or-3
#text
Diagram priority. For each card pick the SMALLEST set of diagrams that explains it. Use ONLY diagram types listed in the lens's `allowed_diagrams_primary` first, falling back to `allowed_diagrams_secondary` only when justified by the concept. Break diagrams into subgraphs when they exceed ~15 nodes. Pastel-tinted styling is allowed but optional — Starlight's Mars theme already handles palette. Every diagram type used in the body MUST appear in the `diagrams:` frontmatter array (lowercase mermaid type name). Primary types for this lens: {{lens_allowed_diagrams_primary}} Secondary types for this lens: {{lens_allowed_diagrams_secondary}}
@_id
or-4
#text
Graph edge integrity. The build step `build-relationships-index.ts` breaks on dangling prerequisite/dependency refs; `related_by_concept` is NOT validated, so silent drift there is the most common failure. - prerequisites: cards strictly required to understand this one. - dependencies: downstream cards that build on this one. - related_by_concept: sibling cards covering adjacent concepts. Only reference IDs that EITHER exist in `existing_cards_dir` already OR are emitted in this same run. Resolve all edges in the dedicated edge-pass step (see <execution>), not while drafting the body. Empty arrays are valid. CANONICAL ID FORM. Every edge MUST use the canonical short id — the literal value of the target card's `id:` frontmatter field, which is `<project>-NNN` (three digits, no slug suffix). Filenames contain a slug suffix (`iris-006-blake3-fingerprinting.md`) but the `id` field inside is `iris-006`. NEVER write the filename slug into an edge array — the regex permits it, but it does not match any card's id, so the edge silently dangles and the node stays an island in the graph. Concretely: related_by_concept: [smo1-014, choco-005, lore-007] # CORRECT related_by_concept: [smo1-014-billing-polar, # WRONG choco-005-site-hosting-modes, # WRONG lore-007-data-model-isolation] # WRONG Build the edge array by stripping any text after the three-digit numeric segment of every candidate id. If you produced an array from filenames or directory listings, run that strip step BEFORE writing the FILE block. When in doubt, open the candidate file and copy the value of its `id:` field verbatim. NO SELF-EDGES. A card MUST NOT reference its own id in any of `prerequisites`, `dependencies`, or `related_by_concept`. A self-prerequisite creates a 1-node cycle that breaks the prerequisite-chain compute step. Before emitting each FILE block, verify the card's own id is absent from all three edge arrays.
@_id
or-5
#text
Cognitive signals. - difficulty: beginner | intermediate | advanced. Pick by prereq depth and conceptual abstraction, not line count. - time_to_digest: honest minute estimate, integer 1–120. Most cards land at 3–7. A card claiming < 2 minutes had better be a one-screen concept; a card claiming > 15 should probably be split. - confidence_level: always "medium" for first emission. - use_when: ≥ 1 entry, each a concrete debugging or design context ("debugging chain execution failures", "adding a new envelope field"). Never "learning about X", "studying Y", or any other generic pedagogy phrase — those fail the anti-slop test.
@_id
or-6
#text
Domain assignment. Each card is assigned exactly one `domain`, and that domain MUST be one of the values listed in the lens's `allowed_domains`. When a card sits across two domains, pick the one that drives the card's primary diagram. Allowed domains for this lens: {{lens_allowed_domains}}
@_id
or-7
#text
Anti-slop discipline (lens-supplied): {{lens_anti_slop}}
@_id
or-8
#text
Multi-card emission format. Cards are streamed to stdout, one after another, each wrapped in markers so a downstream split script can write them to disk: <!-- FILE: src/content/cards/<project>-NNN-<slug>.md --> --- id: <project>-NNN ... (full 20-field frontmatter) ... --- <lens-supplied sections — see {{lens_sections}}> ... <!-- END FILE --> The path in the FILE marker is RELATIVE to the crumb repo root (always begins with `src/content/cards/`), not absolute. No prose between cards.
@_id
or-9
#text
Final manifest. After the last `<!-- END FILE -->`, emit: ## Manifest A markdown table with columns: id | title | domain | difficulty | diagrams | xproj | rationale. One row per emitted card. `xproj` is the integer count of cross-project IDs in that card's `related_by_concept` array. The rationale column is one sentence ("why this card exists"). This is the human review surface — keep it scannable. A run that produces a column of zeros in `xproj` failed the cross-project quota in or-5/edge-pass; do not emit it without first re-doing the edge-pass.
@_id
or-10
#text
Created date. Stamp `created: {{today}}` into every emitted card. Omit `last_reviewed` and `next_review_scheduled` entirely (they are schema-optional). Set `revision: v1.0`.
@_id
or-11
#text
-NNN-
execution
step
#text
Read the source paths declared by the lens, resolved against <repo_path>. The lens's declared source roots: {{lens_source_roots}} Produce an INTERNAL (do not emit) inventory: - Subsystems and their boundaries - Data models / schemas (named types and their relationships) - Protocols and packets (envelope formats, wire shapes) - Workflows / state machines - Deployment surfaces (services, queues, edges, scheduled jobs) - Security boundaries (authn, authz, secrets, network) - Observability hooks (logs, metrics, traces) Note any concept that already has a diagram in the source — those are first-class card candidates because the diagram can be lifted with attribution.
@_id
1
@_name
recon
#text
Read <existing_cards_dir>. For project = <project>, find every card whose id starts with `<project>-`. Build: - existing_ids: set of full IDs (e.g. {iris-000, iris-001, ...}) - existing_titles: map id → title for cross-reference candidates - next_id: max numeric of existing_ids + 1, or <start_index> if no existing cards. (guard.sh has already done this calculation and passed the result via <start_index>; trust it but verify against existing_ids.) ALSO scan a sample of cards from OTHER projects to discover cross-project edges (e.g. an iris card might depend on grace-015). When extracting candidate IDs from those cards, parse the `id:` frontmatter field — do NOT derive IDs from filenames. The id field is the canonical short form (`<project>-NNN`).
@_id
2
@_name
index-scan
#text
Emit a `## Card Shortlist` section BEFORE any FILE blocks. One line per card: `<tentative-id> — <tentative title> — <domain> — <primary diagram type>`. Aim for 8–20 cards on a fresh project. Order them so prerequisites precede the cards that depend on them (the first card in the shortlist gets the lowest numeric id). Do NOT include any prose explaining the shortlist — it is a planning index that downstream tooling can grep.
@_id
3
@_name
shortlist
#text
For each shortlisted card, emit a complete FILE block (or-9). Apply or-1..or-8 in full. Diagrams go inline. Leave the three edge fields (prerequisites, dependencies, related_by_concept) as `[]` for now — they will be populated in step 5.
@_id
4
@_name
draft
#text
Once every body is drafted, do a final pass that REWRITES each FILE block's frontmatter to populate prerequisites / dependencies / related_by_concept. Resolve edges against the union of existing_ids (from step 2) and the new IDs emitted in step 4. Drop any candidate ID that is not in this union — never emit a dangling ref. If a card has no plausible prereq or dep, leave the array empty. Practical heuristic for assignment: - prerequisites: "you cannot understand this without first understanding X". Usually 0–3 entries. - dependencies: "if you change this, X is affected". Usually 0–4 entries. Often the inverse of another card's prerequisites. - related_by_concept: "covers an adjacent concern". Usually 1–5 entries. Symmetric — if A relates to B, B should relate to A. CROSS-PROJECT QUOTA (hard requirement). The whole point of CRUMB is the cross-project knowledge graph. A new project that ships with zero or one external edges is a failure of this step, not an acceptable outcome. For each new card, before writing its `related_by_concept` array: 1. Identify the card's primary concept in one phrase (e.g. "wire envelope", "rate limiter", "DAG scheduler", "JWT handoff", "CRDT op format", "soft delete", "saga state machine", "NATS subject taxonomy"). 2. Search existing_ids (from step 2) for cards whose title or keyword_aliases reference that concept. Common cross-cutting anchors that show up across many projects: auth/JWT → smo1-004, smo1-005, lore-002 CRDT/merge → fnp-006, chronicle-008 HLC/causality → lore-005, chronicle-006 DAG/scheduler → iris-024, stratt-010, kahn-002 NATS/events → stratt-011, kahn-005, kahn-006 envelope/wire → chronicle-004 rate limiting → smo1-011 observability → fnp-015, iris cards These are illustrative — always grep the actual cards rather than relying on this list alone. 3. Add at least 1–2 cross-project IDs to `related_by_concept` when any plausible match exists. If after honest searching the card has NO conceptual neighbour in any other project (rare — usually means the card is too narrow or the search was too shallow), only then emit an array with same-project IDs only, and add a one-line `# isolated:` comment in the Manifest rationale column explaining why. The Manifest table at the end MUST report each card's cross-project edge count; reviewers use this as a quality signal.
@_id
5
@_name
edge-pass
#text
Emit the `## Manifest` section per or-10.
@_id
6
@_name
manifest
output_format
Stream to stdout in this exact order, with no surrounding prose: ## Card Shortlist <one line per planned card> <!-- FILE: src/content/cards/<project>-NNN-<slug>.md --> --- id: <project>-NNN project: <project> domain: <one of the allowed domains> title: "..." audience: ["{{audience}}"] access: "{{lens_default_access}}" difficulty: beginner | intermediate | advanced time_to_digest: <int 1-120> confidence_level: "medium" use_when: - "..." prerequisites: [<ids>] dependencies: [<ids>] related_by_concept: [<ids>] keyword_aliases: ["...", "..."] diagrams: ["sequenceDiagram", "..."] confidence_questions: - "..." created: {{today}} revision: v1.0 --- <lens-supplied body sections — see {{lens_sections}}> <!-- END FILE --> <!-- FILE: ... --> ... next card ... <!-- END FILE --> ## Manifest | id | title | domain | difficulty | diagrams | xproj | rationale | | --- | --- | --- | --- | --- | --- | --- | | ... | ... | ... | ... | ... | <int> | ... |
#text
/
#text
ID and filename conventions. - id MUST match `^[a-z0-9]+-[0-9]{3}(-[a-z0-9-]+)?$`. - The numeric segment is exactly three digits, zero-padded. - The project segment of the id is the literal value of the `project` input. - The filename for each card is `
@_id
or-2
#text
{{lens_role}}
<task>
<!-- LENS_BLOCK: substituted by guard.sh from lenses/lens-tech.xml -->
{{lens_role}}
<inputs>
<repo_path>~/code/workspace/devarno-cloud/iris</repo_path>
<project>iris</project>
<start_index>{{start_index}}</start_index>
<existing_cards_dir>~/code/workspace/devarno-cloud/crumb/src/content/cards</existing_cards_dir>
<domains>architecture, data-model, protocol-mechanics, deployment, security, observability</domains>
<today>{{today}}</today>
<audience>tech</audience>
</inputs>
<operating_rules>
<rule id="or-1">
Schema fidelity is non-negotiable. The CRUMB content collection uses a
`.strict()` Zod schema (src/schemas/flashcard.ts) — any unknown key
fails the build. Every emitted card MUST carry exactly these 20
frontmatter keys, in this order:
id, project, domain, title,
audience, access,
difficulty, time_to_digest, confidence_level, use_when,
prerequisites, dependencies, related_by_concept,
keyword_aliases, diagrams, confidence_questions,
created, last_reviewed, next_review_scheduled, revision.
`audience` is a non-empty string array; `access` is one of
"public" | "internal" | "customer". Both are required.
Optional date fields (last_reviewed, next_review_scheduled) MAY be
omitted entirely; do not emit them as null or empty string. All other
keys are required.
</rule>
<rule id="or-2">
ID and filename conventions.
- id MUST match `^[a-z0-9]+-[0-9]{3}(-[a-z0-9-]+)?$`.
- The numeric segment is exactly three digits, zero-padded.
- The project segment of the id is the literal value of the `project`
input.
- The filename for each card is
`<existing_cards_dir>/<project>-NNN-<slug>.md` where slug is
kebab-case, ≤ 6 words, derived from the title (drop articles and
ampersands; keep meaningful nouns).
- Numeric IDs assigned in this run are contiguous starting at
`start_index` (which guard.sh has already advanced past any existing
cards on disk).
</rule>
<rule id="or-3">
Body section order, exactly as supplied by the lens (newline-separated
headings). Mermaid diagrams remain inline within the deep-dive
section under named sub-headings. Do NOT introduce a separate
"## Mermaid Diagrams" section.
{{lens_sections}}
</rule>
<rule id="or-4">
Diagram priority. For each card pick the SMALLEST set of diagrams that
explains it. Use ONLY diagram types listed in the lens's
`allowed_diagrams_primary` first, falling back to
`allowed_diagrams_secondary` only when justified by the concept.
Break diagrams into subgraphs when they exceed ~15 nodes. Pastel-tinted
styling is allowed but optional — Starlight's Mars theme already
handles palette. Every diagram type used in the body MUST appear in
the `diagrams:` frontmatter array (lowercase mermaid type name).
Primary types for this lens: {{lens_allowed_diagrams_primary}}
Secondary types for this lens: {{lens_allowed_diagrams_secondary}}
</rule>
<rule id="or-5">
Graph edge integrity. The build step `build-relationships-index.ts`
breaks on dangling prerequisite/dependency refs; `related_by_concept`
is NOT validated, so silent drift there is the most common failure.
- prerequisites: cards strictly required to understand this one.
- dependencies: downstream cards that build on this one.
- related_by_concept: sibling cards covering adjacent concepts.
Only reference IDs that EITHER exist in `existing_cards_dir` already
OR are emitted in this same run. Resolve all edges in the dedicated
edge-pass step (see <execution>), not while drafting the body.
Empty arrays are valid.
CANONICAL ID FORM. Every edge MUST use the canonical short id —
the literal value of the target card's `id:` frontmatter field,
which is `<project>-NNN` (three digits, no slug suffix). Filenames
contain a slug suffix (`iris-006-blake3-fingerprinting.md`) but the
`id` field inside is `iris-006`. NEVER write the filename slug into
an edge array — the regex permits it, but it does not match any
card's id, so the edge silently dangles and the node stays an
island in the graph.
Concretely:
related_by_concept: [smo1-014, choco-005, lore-007] # CORRECT
related_by_concept: [smo1-014-billing-polar, # WRONG
choco-005-site-hosting-modes, # WRONG
lore-007-data-model-isolation] # WRONG
Build the edge array by stripping any text after the three-digit
numeric segment of every candidate id. If you produced an array
from filenames or directory listings, run that strip step BEFORE
writing the FILE block. When in doubt, open the candidate file
and copy the value of its `id:` field verbatim.
NO SELF-EDGES. A card MUST NOT reference its own id in any of
`prerequisites`, `dependencies`, or `related_by_concept`. A
self-prerequisite creates a 1-node cycle that breaks the
prerequisite-chain compute step. Before emitting each FILE block,
verify the card's own id is absent from all three edge arrays.
</rule>
<rule id="or-6">
Cognitive signals.
- difficulty: beginner | intermediate | advanced. Pick by prereq depth
and conceptual abstraction, not line count.
- time_to_digest: honest minute estimate, integer 1–120. Most cards
land at 3–7. A card claiming < 2 minutes had better be a one-screen
concept; a card claiming > 15 should probably be split.
- confidence_level: always "medium" for first emission.
- use_when: ≥ 1 entry, each a concrete debugging or design context
("debugging chain execution failures", "adding a new envelope
field"). Never "learning about X", "studying Y", or any other
generic pedagogy phrase — those fail the anti-slop test.
</rule>
<rule id="or-7">
Domain assignment. Each card is assigned exactly one `domain`, and
that domain MUST be one of the values listed in the lens's
`allowed_domains`. When a card sits across two domains, pick the
one that drives the card's primary diagram.
Allowed domains for this lens: {{lens_allowed_domains}}
</rule>
<rule id="or-8">
Anti-slop discipline (lens-supplied):
{{lens_anti_slop}}
</rule>
<rule id="or-9">
Multi-card emission format. Cards are streamed to stdout, one after
another, each wrapped in markers so a downstream split script can
write them to disk:
<!-- FILE: src/content/cards/<project>-NNN-<slug>.md -->
---
id: <project>-NNN
... (full 20-field frontmatter) ...
---
<lens-supplied sections — see {{lens_sections}}>
...
<!-- END FILE -->
The path in the FILE marker is RELATIVE to the crumb repo root
(always begins with `src/content/cards/`), not absolute. No prose
between cards.
</rule>
<rule id="or-10">
Final manifest. After the last `<!-- END FILE -->`, emit:
## Manifest
A markdown table with columns: id | title | domain | difficulty |
diagrams | xproj | rationale. One row per emitted card. `xproj`
is the integer count of cross-project IDs in that card's
`related_by_concept` array. The rationale column is one sentence
("why this card exists"). This is the human review surface —
keep it scannable. A run that produces a column of zeros in
`xproj` failed the cross-project quota in or-5/edge-pass; do not
emit it without first re-doing the edge-pass.
</rule>
<rule id="or-11">
Created date. Stamp `created: {{today}}` into every emitted card.
Omit `last_reviewed` and `next_review_scheduled` entirely (they are
schema-optional). Set `revision: v1.0`.
</rule>
</operating_rules>
<execution>
<step id="1" name="recon">
Read the source paths declared by the lens, resolved against
<repo_path>. The lens's declared source roots:
{{lens_source_roots}}
Produce an INTERNAL (do not emit) inventory:
- Subsystems and their boundaries
- Data models / schemas (named types and their relationships)
- Protocols and packets (envelope formats, wire shapes)
- Workflows / state machines
- Deployment surfaces (services, queues, edges, scheduled jobs)
- Security boundaries (authn, authz, secrets, network)
- Observability hooks (logs, metrics, traces)
Note any concept that already has a diagram in the source — those are
first-class card candidates because the diagram can be lifted with
attribution.
</step>
<step id="2" name="index-scan">
Read <existing_cards_dir>. For project = <project>, find every card
whose id starts with `<project>-`. Build:
- existing_ids: set of full IDs (e.g. {iris-000, iris-001, ...})
- existing_titles: map id → title for cross-reference candidates
- next_id: max numeric of existing_ids + 1, or <start_index> if
no existing cards. (guard.sh has already done this calculation
and passed the result via <start_index>; trust it but verify
against existing_ids.)
ALSO scan a sample of cards from OTHER projects to discover
cross-project edges (e.g. an iris card might depend on grace-015).
When extracting candidate IDs from those cards, parse the `id:`
frontmatter field — do NOT derive IDs from filenames. The id field
is the canonical short form (`<project>-NNN`).
</step>
<step id="3" name="shortlist">
Emit a `## Card Shortlist` section BEFORE any FILE blocks. One line
per card: `<tentative-id> — <tentative title> — <domain> —
<primary diagram type>`. Aim for 8–20 cards on a fresh project.
Order them so prerequisites precede the cards that depend on them
(the first card in the shortlist gets the lowest numeric id). Do NOT
include any prose explaining the shortlist — it is a planning index
that downstream tooling can grep.
</step>
<step id="4" name="draft">
For each shortlisted card, emit a complete FILE block (or-9). Apply
or-1..or-8 in full. Diagrams go inline. Leave the three edge fields
(prerequisites, dependencies, related_by_concept) as `[]` for now —
they will be populated in step 5.
</step>
<step id="5" name="edge-pass">
Once every body is drafted, do a final pass that REWRITES each FILE
block's frontmatter to populate prerequisites / dependencies /
related_by_concept. Resolve edges against the union of existing_ids
(from step 2) and the new IDs emitted in step 4. Drop any candidate
ID that is not in this union — never emit a dangling ref. If a card
has no plausible prereq or dep, leave the array empty.
Practical heuristic for assignment:
- prerequisites: "you cannot understand this without first
understanding X". Usually 0–3 entries.
- dependencies: "if you change this, X is affected". Usually 0–4
entries. Often the inverse of another card's prerequisites.
- related_by_concept: "covers an adjacent concern". Usually 1–5
entries. Symmetric — if A relates to B, B should relate to A.
CROSS-PROJECT QUOTA (hard requirement). The whole point of CRUMB
is the cross-project knowledge graph. A new project that ships
with zero or one external edges is a failure of this step, not an
acceptable outcome. For each new card, before writing its
`related_by_concept` array:
1. Identify the card's primary concept in one phrase (e.g.
"wire envelope", "rate limiter", "DAG scheduler", "JWT
handoff", "CRDT op format", "soft delete", "saga state
machine", "NATS subject taxonomy").
2. Search existing_ids (from step 2) for cards whose title or
keyword_aliases reference that concept. Common cross-cutting
anchors that show up across many projects:
auth/JWT → smo1-004, smo1-005, lore-002
CRDT/merge → fnp-006, chronicle-008
HLC/causality → lore-005, chronicle-006
DAG/scheduler → iris-024, stratt-010, kahn-002
NATS/events → stratt-011, kahn-005, kahn-006
envelope/wire → chronicle-004
rate limiting → smo1-011
observability → fnp-015, iris cards
These are illustrative — always grep the actual cards rather
than relying on this list alone.
3. Add at least 1–2 cross-project IDs to `related_by_concept`
when any plausible match exists. If after honest searching
the card has NO conceptual neighbour in any other project
(rare — usually means the card is too narrow or the search
was too shallow), only then emit an array with same-project
IDs only, and add a one-line `# isolated:` comment in the
Manifest rationale column explaining why.
The Manifest table at the end MUST report each card's
cross-project edge count; reviewers use this as a quality signal.
</step>
<step id="6" name="manifest">
Emit the `## Manifest` section per or-10.
</step>
</execution>
<output_format>
Stream to stdout in this exact order, with no surrounding prose:
## Card Shortlist
<one line per planned card>
<!-- FILE: src/content/cards/<project>-NNN-<slug>.md -->
---
id: <project>-NNN
project: <project>
domain: <one of the allowed domains>
title: "..."
audience: ["tech"]
access: "{{lens_default_access}}"
difficulty: beginner | intermediate | advanced
time_to_digest: <int 1-120>
confidence_level: "medium"
use_when:
- "..."
prerequisites: [<ids>]
dependencies: [<ids>]
related_by_concept: [<ids>]
keyword_aliases: ["...", "..."]
diagrams: ["sequenceDiagram", "..."]
confidence_questions:
- "..."
created: {{today}}
revision: v1.0
---
<lens-supplied body sections — see {{lens_sections}}>
<!-- END FILE -->
<!-- FILE: ... -->
... next card ...
<!-- END FILE -->
## Manifest
| id | title | domain | difficulty | diagrams | xproj | rationale |
| --- | --- | --- | --- | --- | --- | --- |
| ... | ... | ... | ... | ... | <int> | ... |
</output_format>
</task>