Author: Robert Allen Version: 2.0

Abstract

Large Language Model (LLM) agents operating in software development environments suffer from a fundamental architectural limitation: context window boundaries enforce session isolation, causing accumulated knowledge to be lost when sessions terminate or contexts compact. This paper presents git-notes-memory-manager, a novel architecture that addresses this limitation by leveraging Git’s native notes mechanism as a distributed, version-controlled memory store. The system implements progressive hydration across three detail levels (SUMMARY, FULL, FILES) to optimize token consumption, and employs hook-based capture with confidence-scored signal detection to automate memory extraction with minimal cognitive overhead.

We ground our architecture in established cognitive science frameworks, drawing from Baddeley’s multicomponent working memory model (Baddeley & Hitch, 1974; Baddeley, 2000) to structure memory prioritization, and from signal detection theory (Green & Swets, 1966) to formalize capture decisions. The system applies Shneiderman’s “overview first, details on demand” progressive disclosure principle to manage token budgets while preserving access to complete context when needed.

Production validation demonstrates sub-10ms context generation, 116+ indexed memories across 10 semantic namespaces, and automatic capture of 5+ memories per session via hook-based detection. The architecture achieves zero-infrastructure deployment by storing memories alongside code in Git, enabling team-wide knowledge sharing through standard git push/pull operations.

Keywords: LLM agents, persistent memory, semantic search, Git notes, progressive hydration, signal detection, working memory, context management


1. Introduction

1.1 The Memory Problem in LLM Agents

Large Language Model agents operating in development environments face a fundamental limitation that distinguishes them from human collaborators: context window constraints force session isolation. When a developer and LLM agent together make an architectural decision in one session, that knowledge exists only within the conversation history. Upon session termination or context compaction, the decision vanishes unless explicitly recorded elsewhere.

This limitation has significant practical consequences. Recent surveys on LLM agent memory mechanisms observe that “unlike humans who dynamically integrate new information, LLMs effectively ‘reset’ once information falls outside their context window” (arXiv:2404.13501). Even as models push context length boundaries (GPT-4 at 128K tokens, Claude 3.7 at 200K, Gemini at 10M), these improvements merely delay rather than solve the fundamental limitation.

The research question motivating this work is:

How can LLM agents maintain persistent, semantically-searchable memory across sessions while integrating naturally with existing developer workflows and requiring no additional infrastructure?

1.2 Design Requirements

Analysis of developer workflows and the constraints of LLM agent operation revealed five core requirements that a memory system must satisfy:

  1. Persistence: Memories must survive session boundaries and context compaction events
  2. Distribution: Memory should synchronize with code using existing infrastructure (no separate databases or cloud services)
  3. Semantic Retrieval: Natural language queries must locate relevant memories without requiring exact-match keywords
  4. Progressive Detail: The system must load only as much context as needed, preserving tokens for active work
  5. Automatic Capture: Reduce cognitive load by detecting memorable content rather than requiring manual intervention

1.3 Contribution

This paper presents a complete implementation addressing all five requirements. The key contributions are:

  1. Git-native memory storage using refs/notes/mem/{namespace} references, enabling distributed synchronization through standard git operations
  2. Progressive hydration implementing three detail levels (SUMMARY, FULL, FILES) that reduce token consumption by 10-50x while preserving access to complete context
  3. Hook-based automatic capture leveraging IDE extension points with confidence-scored signal detection based on signal detection theory
  4. Token-budgeted context injection that adapts to project complexity using cognitive load principles

Production validation demonstrates:

  • 116 memories indexed across 10 semantic namespaces
  • Sub-10ms context generation at session start
  • Automatic capture of 5+ memories per session via hook-based detection
  • Cross-session recall of decisions, learnings, and blockers

2. Theoretical Foundations

The architecture draws from three established theoretical frameworks: cognitive psychology’s multicomponent working memory model, human-computer interaction’s progressive disclosure principle, and signal detection theory from psychophysics.

2.1 The Multicomponent Working Memory Model

Baddeley and Hitch (1974) proposed a multicomponent model of working memory that replaced the earlier unitary short-term memory concept. The model posits a central executive controlling limited attentional capacity, coordinating two subsidiary systems: the phonological loop for verbal information and the visuospatial sketchpad for spatial information. Baddeley (2000) later added the episodic buffer, a limited-capacity system that binds information from subsidiary systems and long-term memory into unified episodic representations.

This cognitive architecture maps directly to LLM agent memory requirements:

Cognitive Component System Mapping Implementation
Central Executive Context window management Token budget allocation
Episodic Buffer Working memory section Active blockers, recent decisions
Long-term Memory Semantic memory store Git notes + vector index
Binding Process Progressive hydration SUMMARY to FULL expansion

The episodic buffer’s role is particularly relevant: it holds “a limited capacity system that provides temporary storage of information held in a multimodal code, which is capable of binding information from the subsidiary systems, and from long-term memory, into a unitary episodic representation” (Baddeley, 2000). In our system, the SessionStart context injection performs analogous binding: retrieving relevant memories from the persistent store (long-term memory) and formatting them for inclusion in the active context (working memory).

The system allocates token budgets reflecting this structure:

  • Working Memory (50-70%): Active blockers, pending decisions, recent progress
  • Semantic Context (20-35%): Relevant learnings, related patterns retrieved via vector similarity
  • Guidance (10%): Behavioral instructions for memory capture

2.2 The Two-Stage Memory Consolidation Model

The architecture also draws from memory consolidation research, particularly the two-stage model of memory formation (Diekelmann & Born, 2010). This model posits that new information is initially encoded rapidly in a temporary store (hippocampus in biological systems), then gradually consolidated into a slower-learning long-term store (neocortex) during periods of rest.

Our system implements an analogous two-stage process:

  1. Fast capture: During sessions, memories are captured to Git notes (append-only, fast writes)
  2. Consolidation: At session end, the Stop hook analyzes transcripts, extracts high-confidence signals, and indexes them for semantic retrieval

This separation enables rapid capture without blocking user interaction, while the consolidation phase ensures memories are properly indexed and de-duplicated.

2.3 Progressive Disclosure and Information Layering

Shneiderman’s information visualization mantra (“overview first, zoom and filter, then details-on-demand” (Shneiderman, 1996)) provides the theoretical foundation for progressive hydration. The principle recognizes that users (and by extension, LLM agents) benefit from seeing abstract summaries before diving into details, reducing cognitive load while maintaining access to complete information.

Nielsen (2006) formalized progressive disclosure as “deferring advanced or rarely used features to a secondary screen, making applications easier to learn and less error-prone.” Applied to LLM context management, this translates to:

  1. Overview (SUMMARY level): Memory summaries in context injection
  2. Zoom (FULL level): Complete memory content on demand
  3. Details (FILES level): File snapshots from the commit when memory was created

2.4 Signal Detection Theory for Capture Decisions

Signal detection theory (SDT), developed by Green and Swets (1966) for analyzing sensory discrimination, provides a rigorous framework for formalizing capture decisions. SDT separates two independent aspects of discrimination performance: sensitivity (ability to detect signals) and criterion (threshold for reporting detection).

The theory addresses a fundamental challenge in automatic memory capture: balancing false positives (capturing irrelevant content, wasting storage and polluting retrieval) against false negatives (missing valuable memories). SDT formalizes this trade-off through the receiver operating characteristic (ROC).

Our system implements a three-tier decision model based on SDT principles:

Confidence Action SDT Interpretation
>= 0.95 AUTO High sensitivity, low false-positive risk
0.70-0.95 SUGGEST Present to user for criterion adjustment
< 0.70 SKIP Below detection threshold, false-positive risk too high

3. System Architecture

3.1 System Overview

The architecture comprises three layers: a hook layer interfacing with the IDE, a service layer implementing core memory operations, and a storage layer managing Git notes and the vector index.

+-------------------------------------------------------------------+
|                      Claude Code IDE                              |
+-------------------------------------------------------------------+
|    SessionStart    UserPrompt    PostToolUse   PreCompact   Stop  |
|         |              |             |            |          |    |
+-------------------------------------------------------------------+
|                            Hook Handlers                          |
|.   ContextBuilder  SignalDetector  DomainExtractor  Analyzer      |
+-------------------------------------------------------------------+
|                    Service Layer                                  |
|        CaptureService    RecallService    SyncService             |
+----------------+--------------------------+-----------------------+
|   Git Notes    |     SQLite Index         |  Embedding Service    |
| refs/notes/    |  memories + vec_memories |  all-MiniLM-L6-v2     |
+----------------+--------------------------+-----------------------+

3.2 Data Model

The core entity is a frozen (immutable) dataclass ensuring memory integrity:

@dataclass(frozen=True)
class Memory:
    id: str                      # "decisions:5da308d:0"
    commit_sha: str              # Git commit reference
    namespace: str               # Semantic category
    summary: str                 # <= 100 characters
    content: str                 # Full markdown body
    timestamp: datetime          # Capture time (UTC)
    spec: str | None             # Project specification
    tags: tuple[str, ...]        # Categorization
    status: str                  # "active", "resolved"
    relates_to: tuple[str, ...]  # Related memory IDs

ID Format: {namespace}:{commit_sha_prefix}:{index}

  • Example: decisions:5da308d:19
  • Enables tracing to the originating git commit for full implementation context

3.3 Storage Format

Memories use YAML front matter with a markdown body, enabling both machine parsing and human readability:

---
type: decisions
timestamp: 2025-12-21T05:46:36Z
summary: Lazy loading via __getattr__ to avoid embedding model import penalty
spec: git-notes-memory
tags: performance,architecture
---

## Context
Import-time loading of sentence-transformers adds 2+ seconds to startup.

## Decision
Use Python's `__getattr__` in `__init__.py` for lazy module loading.

## Rationale
- Defers embedding model load until first use
- SessionStart hook completes in <200ms vs 2s+
- Users who don't need embeddings never pay the cost

3.4 Namespace Taxonomy

The system defines ten semantic namespaces, each with associated signal detection patterns:

Namespace Purpose Signal Patterns
decisions Architectural choices “I decided”, “we chose”, “[decision]”
learnings Technical insights “I learned”, “TIL”, “[learned]”
blockers Impediments “blocked by”, “stuck on”, “[blocker]”
progress Milestones “completed”, “shipped”, “[progress]”
patterns Reusable approaches “best practice”, “[pattern]”
research External findings Manual capture
reviews Code review notes Manual capture
retrospective Post-mortems Manual capture
inception Problem statements Manual capture
elicitation Requirements Manual capture

4. Progressive Hydration

4.1 The Hydration Model

Progressive hydration implements Shneiderman’s “details on demand” principle, loading memory details only when needed. This approach addresses the token budget constraint inherent in LLM context windows.

Level 1: SUMMARY (Default for context injection)

<memory id="decisions:5da308d:19" hydration="summary">
  <summary>Lazy loading via __getattr__ to avoid embedding model import penalty</summary>
</memory>
  • Token cost: 15-20 tokens
  • Retrieval time: Sub-millisecond (index lookup)

Level 2: FULL (On-demand expansion)

---
type: decisions
timestamp: 2025-12-21T05:46:36Z
summary: Lazy loading via __getattr__ to avoid embedding model import penalty
---

## Context

Import-time loading of sentence-transformers adds 2+ seconds...

## Decision

Use Python's `__getattr__` in `__init__.py`...

## Rationale

- Defers embedding model load until first use
- SessionStart hook completes in <200ms vs 2s+
  • Token cost: 100-500 tokens
  • Retrieval time: ~10ms (git notes show)

Level 3: FILES (Full context reconstruction)

  • Includes file snapshots from the commit when memory was created
  • Enables complete context reconstruction
  • Token cost: Unbounded (file-dependent)
  • Retrieval time: Variable (git tree traversal)

4.2 Token Efficiency Analysis

The three-level model achieves significant token savings. For a project with 100 indexed memories:

Approach Token Cost Context Utilization
All FULL 25,000-50,000 Exceeds typical budgets
All SUMMARY 1,500-2,000 13 memories shown
Progressive 2,000 + on-demand Full coverage with depth

5. Hook-Based Capture

5.1 Hook Event Lifecycle

The system integrates with Claude Code’s hook infrastructure at five extension points, each serving a distinct purpose in the memory lifecycle:

Session Start --> Context Injection (memories -> Claude)
      |
      v
User Prompt ---> Signal Detection (user text -> capture decision)
      |
      v
Tool Use ------> Domain Context (file path -> related memories)
      |
      v
Pre-Compact ---> Preservation (high-confidence signals -> git notes)
      |
      v
Stop ----------> Session Analysis (transcript -> memory extraction)

5.2 Signal Detection Implementation

The SignalDetector implements the three-tier SDT-based model using regex patterns with confidence scoring:

Pattern Examples:

DECISION_PATTERNS = [
    (r"\[decision\]", 0.98),           # Explicit marker
    (r"\[d\]", 0.95),                   # Shorthand
    (r"I\s+decided\s+to", 0.90),        # Natural language
    (r"we\s+chose", 0.88),              # Collaborative
    (r"we'll\s+go\s+with", 0.85),       # Informal
]

Block Marker Format (highest confidence: 0.99):

>> decision -----------------------------------------------
Use PostgreSQL for persistence layer

## Context
Evaluated SQLite, PostgreSQL, and MongoDB.

## Rationale
- ACID guarantees required for financial data
- Team expertise in PostgreSQL
-------------------------------------------------------

6. Evaluation

6.1 Performance Measurements

Operation Target Achieved Method
SessionStart context build <= 2000ms < 10ms Indexed queries
Signal detection (regex) <= 100ms < 5ms Compiled patterns
Novelty check <= 300ms < 50ms sqlite-vec KNN
Memory capture <= 500ms < 100ms Append + index
Vector search (k=10) <= 100ms < 50ms sqlite-vec

All operations complete well within interactive latency requirements, ensuring the memory system does not degrade user experience.

6.2 Index Statistics

Production statistics from the git-notes-memory project:

Total indexed memories: 116
By namespace:
  - decisions: 28
  - learnings: 23
  - blockers: 19
  - progress: 15
  - patterns: 31

7. Conclusion

The git-notes-memory-manager demonstrates that persistent, semantically-searchable memory for LLM agents is achievable without external infrastructure. By leveraging Git’s native notes mechanism, progressive hydration, and hook-based capture with signal detection theory, the system provides:

  1. Zero-Infrastructure Memory: Operates with existing git, requiring no databases or cloud services
  2. Semantic Retrieval: Natural language queries locate relevant memories through vector similarity
  3. Automatic Capture: Confidence-scored signal detection reduces cognitive load
  4. Token Efficiency: Progressive hydration respects context window constraints
  5. Team Sharing: Memories synchronize with code through standard git operations

The architecture validates treating LLM agent memory as a first-class concern (rather than an afterthought), enabling qualitatively different developer experiences. Decisions persist, blockers track to resolution, and learnings accumulate across sessions, transforming ephemeral conversations into durable knowledge.


References

Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417-423.

Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The Psychology of Learning and Motivation (Vol. 8, pp. 47-89). Academic Press.

Green, D. M., & Swets, J. A. (1966). Signal Detection Theory and Psychophysics. Wiley.

Nielsen, J. (2006). Progressive disclosure. Nielsen Norman Group.

Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualizations. Proceedings of IEEE Symposium on Visual Languages, 336-343.


This research was conducted through systematic analysis of the git-notes-memory-manager codebase and production validation during development sessions. Real examples are drawn from actual session logs dated December 2025.