But why??

Most AI coding sessions still look like this: prompt, wait, get distracted, come back, review, fix, repeat. The limiting factor isn't model capability — it's human sequencing overhead, and the fact that lessons from prior sessions evaporate.

What if you batch-planned a whole stretch of work upfront — with full project context and experiments carried forward from your last plan — then ran the prompts in parallel while you did something else? And then reviewed and shipped with evidence, not vibes?

That's Conducty. Systems-level AI agent orchestration for Claude Code, backed by an Obsidian vault as the context engine. Every plan, design, project context, improvement, failure pattern, and metrics row is a wikilinked note. The next plan reads the graph and gets sharper.

Shape → Plan → Trace → Execute → Verify → Improve → Code Review → Ship.

The cycle

Shape + PlanStart a plan
  • Non-trivial goals go through appetite-driven design first — set boundaries, define no-go zones, identify rabbit holes before writing a single prompt.
  • Load the project's context sub-graph from the vault and pull failure patterns, metrics, and improvement experiments from prior plans.
  • Generate all prompts upfront with acceptance criteria, verification steps, calibrated review levels, and tracer markers. Every prompt is checked for smells before it ships.
TraceFirst pass
  • The first prompt in each group runs alone as a tracer bullet — validating plan assumptions end-to-end.
  • If the tracer fails, it's the plan that needs revision, not just the code. The remaining prompts don't blindly execute.
ExecuteParallel dispatch
  • Remaining prompts run as Claude Code Task subagents in parallel, each with precisely curated context and no-go zones.
  • Git worktrees isolate parallel prompts targeting the same repo. Time budgets act as circuit breakers.
  • Review rigor scales with risk: low-risk prompts get verify-only, high-risk gets full two-stage review.
CheckpointBetween groups
  • Health metrics — first-attempt pass rate, retry count, blocked count — computed after each group.
  • Hill chart positions updated for every goal. Systemic failures (2+ related) flagged as plan-level issues, not individual code bugs.
  • Fixes generated at the right leverage point: plan, prompt, or code. Three failures on the same prompt triggers a circuit breaker.
Review + ImproveEnd of plan
  • Evidence-based audit of every change — verdicts, failure patterns, velocity metrics, all written into the vault.
  • Improvement kata: target vs. actual, obstacles, and specific experiments for the next plan.
  • History that doesn't change behavior is just a log. The next plan reads this graph and gets sharper.
Code ReviewPost-cycle, pre-merge
  • Whole-branch holistic review across five lenses: spec alignment, correctness, security, architecture & coupling, tests & maintainability.
  • Findings triaged Critical / Important / Minor with file:line references. Critical+ findings flow back to Failure Patterns so the next plan can prevent them.
  • Goes beyond the in-cycle reviewers — sees the cumulative diff as a single artifact, not prompt-by-prompt.
ShipPre-merge gate
  • Six-gate battery: code-review verdict, lint, typecheck, full test suite, secrets scan, dependency-vulnerability check.
  • Single green / yellow / red verdict written to the vault. Mechanical, not subjective — failures cite verbatim command output.
  • Ship never auto-merges. Verdict is advisory; you own the merge.

Claude Code native

Skills~/.claude/skills/
18 skills installed for Claude Code: cycle (Shape, Plan, Execute, Checkpoint, Review, Improve), discipline (TDD, Verify, Debug), post-cycle (Code Review, Ship), foundation (System, Obsidian, Bootstrap), and supporting (Context, Worktrees, Dialectic, Vault Graph).
Rules~/.claude/CLAUDE.md
Always-on quality principles appended to your global CLAUDE.md and enforced across every session. No manual checklist.
State$CONDUCTY_VAULT
An Obsidian vault — every plan, design, project context, improvement, failure pattern, and metrics row is a wikilinked note. The next plan reads the graph and inherits all of it.
Platform
Claude Code (CLI, desktop, VS Code, JetBrains).
Invoke
"set up Conducty", "plan this work", "load context from ../other-project", "review my changes", "ship it". Skill descriptions trigger automatically.

Context engine

Conducty's state isn't a flat log. It's an Obsidian vault where every artifact is a wikilinked note. A plan links the designs it consumed, the project context it loaded, the improvement experiments it's testing, and the prior plan whose work carries forward. The next plan navigates that graph instead of re-grepping a write-only history.

PlansPlan YYYY-MM-DD HHmm.md
Per-plan, timestamped. Multiple plans per day are normal — each carries its own appetite, prompts, hill chart, and end-of-plan review.
DesignsDesign YYYY-MM-DD HHmm Topic.md
Appetite, components, no-go zones, rabbit holes, trade-offs. Linked to the plan that consumes it; backlinks accumulate.
ContextContext Project + slices
Each project is a sub-graph: hub plus Architecture, Conventions, Invariants, Hotspots, Tests, Glossary slices. Plans pull only the slices they need. Refresh deltas track drift over time.
ImprovementsImprovement YYYY-MM-DD HHmm.md
Toyota Kata entries — target, current, obstacles, next experiment. Read by the next plan to apply experiments.
AccumulatingFailure Patterns / Metrics / Prompt Log
Singular vault notes appended to over time. Cross-plan learning corpus that compounds.
ReviewsCode Review / Ship Report
Whole-branch holistic review and pre-merge battery output. Critical findings flow back to Failure Patterns to inoculate future plans.

Quality principles

Ten engineering-grounded principles enforced across every session through always-on rules. No manual checklist.

Appetite before planTime budget constrains the plan, not the other way around
Tracer before volleyFirst prompt validates assumptions before full execution
Prompt quality is leverageBad prompts waste everything downstream
Evidence before claimsRun verification, read output, then claim
Root cause before fixesFix at the highest leverage point: plan > prompt > code
Design before implementationNon-trivial goals get shaped before prompts are written
Characterize before changingVerify existing behavior before modifying it
Deep modules, not ceremonySkills reduce complexity, not add checkboxes
Calibrate rigor to riskLow risk = verify-only. High risk = full review
Learn or repeatPer-plan improvement kata extracts lessons and runs experiments

Engineering roots

Shape Up Appetite-driven planningToyota Kata Improvement loopsThe Pragmatic Programmer Tracer bulletsRelease It! Health metricsA Philosophy of Software Design Deep modules, not ceremonyThinking in Systems Leverage pointsWorking Effectively with Legacy Code Characterize before changing

A learning system that compounds. Every plan's failures become the next plan's better prompts. The vault remembers so you don't have to.