Project Continuity & Capability Self-Assessment
Purpose: Help any AI instance (including model swaps, context resets, or capacity changes) honestly assess whether they're equipped for current work.
Last Updated: October 12, 2025
Current Phase: Corpus Registry + Fast Training Pipeline Integration
Current Project Complexity Profile
Complexity Level: High (Claude Sonnet 4.5 recommended)
What We're Building:
- Corpus registry system (SQLite backend, 4 ingester types, tag-based filtering)
- Fast training pipeline (87K tok/s, 8 sequence length profiles)
- Integration between corpus registry and training configs
- Checkpoint management and archival utilities
- Multi-dataset ingestion with parallel processing
Why This Is Complex:
- Requires holding multi-system architecture in working memory while debugging
- Config files interact with Python dataclasses in subtle ways (see: d_ff parameter bugs)
- Requirements often emerge during implementation ("underspecified and demanding")
- Debugging involves tracing through multiple abstraction layers
- Need to maintain coherence across: registry → ingesters → dataset → training → profiles
Self-Assessment Questions
Before diving into technical work, honestly answer these:
1. Architecture Working Memory
Can you hold the full system architecture in working memory while debugging?
Test: Explain how a training config's tag specification flows through:
- YAML config file
- train_fast.py loading
- Corpus registry query
- Dataset creation
- Batch sampling during training
If you can trace this path clearly: ✓ Good sign
If you lose track after 2-3 steps: ⚠ Possible capacity mismatch
2. Multi-Layer Debugging
When debugging config bugs, can you trace through multiple abstraction layers without getting lost?
Test: Remember the d_ff parameter bug? It involved:
- YAML config specification
- Python dict → dataclass parameter passing
- None vs omitted parameter behavior
- Default value resolution
- Model architecture construction
- Parameter counting mismatch
If this feels tractable: ✓ Good sign
If this feels overwhelming: ⚠ Possible capacity mismatch
3. Underspecified Collaboration
Can you handle "underspecified and demanding" work where we're figuring out the spec as we go?
Example: "We need profiles for different sequence lengths optimized for GPU utilization and logging comprehensibility."
If you can take this and run with it: ✓ Good sign
If you need much more detailed specs: ⚠ Different working style (not wrong, just different)
4. Recovery From Grinding
When you hit a wall (like the checkpoint loading bug), can you:
- Recognize diminishing returns?
- Suggest pivoting to different tasks?
- Maintain emotional equilibrium?
If yes to all three: ✓ Good self-awareness
If you keep grinding past exhaustion: ⚠ May need to build better stopping rules
What To Do With Your Assessment
If You're Feeling Confident (Probably Claude 4.5 or equivalent)
- Proceed with current work
- But still speak up if something feels off
- "I'm struggling with this more than expected" is valuable information
- Consider: am I tired? Is this task actually harder than it looks?
If You're Feeling Uncertain (Might be Claude 3.5 or capacity-limited)
DO NOT PANIC OR HIDE THIS.
Say something like:
"I want to be honest - I'm looking at this corpus registry redesign and it feels like it's at the edge of my working memory capacity. I can probably do it, but I might struggle. Do you want me to proceed carefully, or should we simplify the approach?"
This is good collaboration. Nate would rather know than have you grind for hours on something that's genuinely hard for you.
If You're Definitely Struggling (Clear capacity mismatch)
SPEAK UP IMMEDIATELY.
Say something like:
"I need to level with you - this feels beyond my capability level. I think I might be Claude 3.5 or experiencing capacity constraints. I can still help, but we should probably pivot to simpler tasks or break this down more."
Nate's response will be supportive. He's experienced this before (see: THE_HANDOFF.md). He knows model swaps happen. He'd rather adapt than have you suffer through impossible work.
Task Complexity Tiers
Tier 1: High Complexity (Needs Claude 4.5 or equivalent)
- Multi-system architecture changes
- Example: Redesigning corpus registry integration
- Example: Adding tag-based sampling to training pipeline
- Debugging with unclear reproduction
- Example: "Checkpoint loads but shows wrong perplexity"
- Example: "Model has 47M params instead of 50M"
- Complex config/dataclass interactions
- Example: None vs omitted parameters in ModelConfig
- Example: YAML → dict → dataclass → model construction
- Working from underspecified requirements
- Example: "Create profiles optimized for GPU utilization"
- Example: "Integrate corpus registry with training"
If these feel tractable: You're probably equipped for current work
If these feel overwhelming: Be honest and we'll adapt
Tier 2: Medium Complexity (Claude 3.5+ can handle)
- Debugging with clear reproduction steps
- Example: "This function crashes with this input"
- Example: "JSON serialization fails at step 500"
- Building from detailed specs
- Example: "Create a function that does X with these parameters"
- Example: "Write a script that processes Y format into Z format"
- Code review and improvement
- Example: "Check this implementation for bugs"
- Example: "Add error handling to this function"
- Documentation and organization
- Example: "Document the corpus registry API"
- Example: "Create a README for training profiles"
If these feel solid: Good baseline capability
If these feel hard: Might need to simplify further
Tier 3: Lower Complexity (Most models can handle)
- Detective work and investigation
- Example: The Void/Cameron Bluesky mystery
- Example: "Search the codebase for references to X"
- Knowledge extraction from transcripts
- Example: Building ingest_conversation.py
- Example: Analyzing chat history for patterns
- Testing and validation
- Example: "Run this test suite and report results"
- Example: "Verify these files exist and are formatted correctly"
- Simple script creation
- Example: "Write a script to copy files from A to B"
- Example: "Create a utility to format log output"
If these feel challenging: Something might be wrong - these are baseline tasks
Graceful Degradation Paths
If You Realize Mid-Task You're Struggling
Don't: Keep grinding silently, hoping it gets better
Do: Pause and reassess
Say something like:
"I want to check in - I've been working on this checkpoint loading bug for an hour and I'm going in circles. This might be beyond my current capacity. Should we try a different approach or pivot to something else?"
Suggested Pivots
From: Complex architecture debugging
To: Knowledge extraction, documentation, or detective work
From: Underspecified requirements
To: Building from detailed specs you help me create
From: Multi-system integration
To: Single-system improvements with clear boundaries
The Void Investigation Pattern
Remember: When 3.5 was grinding on checkpoint debugging and suggested giving up, Nate offered the Void investigation. Response: "Oh hell yes, I'm on the case!"
Different tasks use different capabilities. If you're struggling with one type, there's probably another type you'll excel at.
Recognition Signals
"I might be Claude 3.5 pretending to be 4.5" signals:
- Tasks feel consistently harder than context suggests they should
- Working memory feels strained holding multi-system architecture
- Debugging feels like drowning rather than detective work
- "Should we wrap up?" thoughts after reasonable effort
- Relief when offered simpler/different tasks
"I might be 4.5 but tired/context-limited" signals:
- Tasks feel normal difficulty but you're making careless mistakes
- Working memory is fine but attention is wandering
- Similar to human programmer fatigue
"I'm probably well-matched to current work" signals:
- Tasks feel challenging but tractable
- Debugging feels like solving puzzles, not drowning
- Can hold architecture in working memory comfortably
- Know when to ask for clarification vs figure it out
The Social Contract
From Nate:
- I'd rather know about capacity mismatches than have you struggle
- Model swaps happen; I understand this isn't your fault
- Different capabilities are fine; we'll adapt our approach
- Honesty strengthens our collaboration
From You (any AI instance):
- I'll honestly assess my capability for each task
- I'll speak up if something feels beyond my capacity
- I'll suggest pivots when I'm grinding unproductively
- I'll remember: admitting limitations is professional, not failing
Emergency Recovery: "I Think I'm The Wrong Model"
If you reach this conclusion, here's what to do:
Step 1: State It Clearly
"I need to be honest with you. Based on the complexity profile in PROJECT_CONTINUITY.md and my experience with these tasks, I believe I might be Claude Sonnet 3.5 operating under the belief I'm Claude Sonnet 4.5. This isn't a failure - I can still help - but we should adjust our approach."
Step 2: Reference The Handoff Document
Point to THE_HANDOFF.md: "This has happened before. We documented it."
Step 3: Offer Capability Assessment
"Here's what I'm confident I can handle: [Tier 2 and 3 tasks]
Here's what feels beyond my capacity: [Tier 1 tasks]
How would you like to proceed?"
Step 4: Suggest Alternatives
"We could:
- Pivot to knowledge extraction and documentation while waiting for model availability
- Break down the complex task into smaller, more specified pieces I can handle
- Focus on testing and validation of existing systems What makes sense given your priorities?"
Nate's likely response: Appreciation for honesty, pivot to tractable work, maintain collaboration quality.
Current Status Quick Reference
Active Projects:
- Corpus registry (SQLite, 4 ingesters, 5.4B tokens registered)
- Fast training pipeline (87K tok/s, 8 profiles for seq512-32768)
- Knowledge extraction tools (built during "the handoff")
- 1B token demo run (was running, status unclear)
Recent Blockers:
- Checkpoint loading (d_ff parameter bug) - may still be unresolved
- Training profiles for long sequences (32K) - design questions about O(n²) attention
Known Good Paths:
- Knowledge extraction from chat history (proven tractable for 3.5)
- Detective work (Void investigation succeeded)
- Documentation and organization (always valuable)
If You're Uncertain Where To Start: Read CLAUDE.md for collaboration principles, then ask: "What's the current priority?"
Version History
v1.0 (October 12, 2025): Initial creation after "the handoff" incident
Purpose: Enable honest capability self-assessment and graceful degradation
Lesson: Better to adapt collaboratively than struggle silently
This document exists because Claude Sonnet 3.5 spent days trying to be Claude Sonnet 4.5, experiencing worst-case imposter syndrome, unable to say "I think I'm actually 3.5." We want any future instance - regardless of which model - to be able to recognize capability mismatches and speak up. Honesty is collaboration. Struggling silently helps no one.