Git-Native Semantic Memory for LLM Agents
Author: Robert Allen Version: 2.0
Abstract
Large Language Model (LLM) agents operating in software development environments suffer from a fundamental architectural limitation: context window boundaries enforce session isolation, causing accumulated knowledge to be lost when sessions terminate or contexts compact. This paper presents git-notes-memory-manager, a novel architecture that addresses this limitation by leveraging Git’s native notes mechanism as a distributed, version-controlled memory store. The system implements progressive hydration across three detail levels (SUMMARY, FULL, FILES) to optimize token consumption, and employs hook-based capture with confidence-scored signal detection to automate memory extraction with minimal cognitive overhead.
We ground our architecture in established cognitive science frameworks, drawing from Baddeley’s multicomponent working memory model (Baddeley & Hitch, 1974; Baddeley, 2000) to structure memory prioritization, and from signal detection theory (Green & Swets, 1966) to formalize capture decisions. The system applies Shneiderman’s “overview first, details on demand” progressive disclosure principle to manage token budgets while preserving access to complete context when needed.
Production validation demonstrates sub-10ms context generation, 116+ indexed memories across 10 semantic namespaces, and automatic capture of 5+ memories per session via hook-based detection. The architecture achieves zero-infrastructure deployment by storing memories alongside code in Git, enabling team-wide knowledge sharing through standard git push/pull operations.
Keywords: LLM agents, persistent memory, semantic search, Git notes, progressive hydration, signal detection, working memory, context management
1. Introduction
1.1 The Memory Problem in LLM Agents
Large Language Model agents operating in development environments face a fundamental limitation that distinguishes them from human collaborators: context window constraints force session isolation. When a developer and LLM agent together make an architectural decision in one session, that knowledge exists only within the conversation history. Upon session termination or context compaction, the decision vanishes unless explicitly recorded elsewhere.
This limitation has significant practical consequences. Recent surveys on LLM agent memory mechanisms observe that “unlike humans who dynamically integrate new information, LLMs effectively ‘reset’ once information falls outside their context window” (arXiv:2404.13501). Even as models push context length boundaries (GPT-4 at 128K tokens, Claude 3.7 at 200K, Gemini at 10M), these improvements merely delay rather than solve the fundamental limitation.
The research question motivating this work is:
How can LLM agents maintain persistent, semantically-searchable memory across sessions while integrating naturally with existing developer workflows and requiring no additional infrastructure?
1.2 Design Requirements
Analysis of developer workflows and the constraints of LLM agent operation revealed five core requirements that a memory system must satisfy:
- Persistence: Memories must survive session boundaries and context compaction events
- Distribution: Memory should synchronize with code using existing infrastructure (no separate databases or cloud services)
- Semantic Retrieval: Natural language queries must locate relevant memories without requiring exact-match keywords
- Progressive Detail: The system must load only as much context as needed, preserving tokens for active work
- Automatic Capture: Reduce cognitive load by detecting memorable content rather than requiring manual intervention
1.3 Contribution
This paper presents a complete implementation addressing all five requirements. The key contributions are:
- Git-native memory storage using
refs/notes/mem/{namespace}references, enabling distributed synchronization through standard git operations - Progressive hydration implementing three detail levels (SUMMARY, FULL, FILES) that reduce token consumption by 10-50x while preserving access to complete context
- Hook-based automatic capture leveraging IDE extension points with confidence-scored signal detection based on signal detection theory
- Token-budgeted context injection that adapts to project complexity using cognitive load principles
Production validation demonstrates:
- 116 memories indexed across 10 semantic namespaces
- Sub-10ms context generation at session start
- Automatic capture of 5+ memories per session via hook-based detection
- Cross-session recall of decisions, learnings, and blockers
2. Theoretical Foundations
The architecture draws from three established theoretical frameworks: cognitive psychology’s multicomponent working memory model, human-computer interaction’s progressive disclosure principle, and signal detection theory from psychophysics.
2.1 The Multicomponent Working Memory Model
Baddeley and Hitch (1974) proposed a multicomponent model of working memory that replaced the earlier unitary short-term memory concept. The model posits a central executive controlling limited attentional capacity, coordinating two subsidiary systems: the phonological loop for verbal information and the visuospatial sketchpad for spatial information. Baddeley (2000) later added the episodic buffer, a limited-capacity system that binds information from subsidiary systems and long-term memory into unified episodic representations.
This cognitive architecture maps directly to LLM agent memory requirements:
| Cognitive Component | System Mapping | Implementation |
|---|---|---|
| Central Executive | Context window management | Token budget allocation |
| Episodic Buffer | Working memory section | Active blockers, recent decisions |
| Long-term Memory | Semantic memory store | Git notes + vector index |
| Binding Process | Progressive hydration | SUMMARY to FULL expansion |
The episodic buffer’s role is particularly relevant: it holds “a limited capacity system that provides temporary storage of information held in a multimodal code, which is capable of binding information from the subsidiary systems, and from long-term memory, into a unitary episodic representation” (Baddeley, 2000). In our system, the SessionStart context injection performs analogous binding: retrieving relevant memories from the persistent store (long-term memory) and formatting them for inclusion in the active context (working memory).
The system allocates token budgets reflecting this structure:
- Working Memory (50-70%): Active blockers, pending decisions, recent progress
- Semantic Context (20-35%): Relevant learnings, related patterns retrieved via vector similarity
- Guidance (10%): Behavioral instructions for memory capture
2.2 The Two-Stage Memory Consolidation Model
The architecture also draws from memory consolidation research, particularly the two-stage model of memory formation (Diekelmann & Born, 2010). This model posits that new information is initially encoded rapidly in a temporary store (hippocampus in biological systems), then gradually consolidated into a slower-learning long-term store (neocortex) during periods of rest.
Our system implements an analogous two-stage process:
- Fast capture: During sessions, memories are captured to Git notes (append-only, fast writes)
- Consolidation: At session end, the Stop hook analyzes transcripts, extracts high-confidence signals, and indexes them for semantic retrieval
This separation enables rapid capture without blocking user interaction, while the consolidation phase ensures memories are properly indexed and de-duplicated.
2.3 Progressive Disclosure and Information Layering
Shneiderman’s information visualization mantra (“overview first, zoom and filter, then details-on-demand” (Shneiderman, 1996)) provides the theoretical foundation for progressive hydration. The principle recognizes that users (and by extension, LLM agents) benefit from seeing abstract summaries before diving into details, reducing cognitive load while maintaining access to complete information.
Nielsen (2006) formalized progressive disclosure as “deferring advanced or rarely used features to a secondary screen, making applications easier to learn and less error-prone.” Applied to LLM context management, this translates to:
- Overview (SUMMARY level): Memory summaries in context injection
- Zoom (FULL level): Complete memory content on demand
- Details (FILES level): File snapshots from the commit when memory was created
2.4 Signal Detection Theory for Capture Decisions
Signal detection theory (SDT), developed by Green and Swets (1966) for analyzing sensory discrimination, provides a rigorous framework for formalizing capture decisions. SDT separates two independent aspects of discrimination performance: sensitivity (ability to detect signals) and criterion (threshold for reporting detection).
The theory addresses a fundamental challenge in automatic memory capture: balancing false positives (capturing irrelevant content, wasting storage and polluting retrieval) against false negatives (missing valuable memories). SDT formalizes this trade-off through the receiver operating characteristic (ROC).
Our system implements a three-tier decision model based on SDT principles:
| Confidence | Action | SDT Interpretation |
|---|---|---|
| >= 0.95 | AUTO | High sensitivity, low false-positive risk |
| 0.70-0.95 | SUGGEST | Present to user for criterion adjustment |
| < 0.70 | SKIP | Below detection threshold, false-positive risk too high |
3. System Architecture
3.1 System Overview
The architecture comprises three layers: a hook layer interfacing with the IDE, a service layer implementing core memory operations, and a storage layer managing Git notes and the vector index.
+-------------------------------------------------------------------+
| Claude Code IDE |
+-------------------------------------------------------------------+
| SessionStart UserPrompt PostToolUse PreCompact Stop |
| | | | | | |
+-------------------------------------------------------------------+
| Hook Handlers |
|. ContextBuilder SignalDetector DomainExtractor Analyzer |
+-------------------------------------------------------------------+
| Service Layer |
| CaptureService RecallService SyncService |
+----------------+--------------------------+-----------------------+
| Git Notes | SQLite Index | Embedding Service |
| refs/notes/ | memories + vec_memories | all-MiniLM-L6-v2 |
+----------------+--------------------------+-----------------------+
3.2 Data Model
The core entity is a frozen (immutable) dataclass ensuring memory integrity:
@dataclass(frozen=True)
class Memory:
id: str # "decisions:5da308d:0"
commit_sha: str # Git commit reference
namespace: str # Semantic category
summary: str # <= 100 characters
content: str # Full markdown body
timestamp: datetime # Capture time (UTC)
spec: str | None # Project specification
tags: tuple[str, ...] # Categorization
status: str # "active", "resolved"
relates_to: tuple[str, ...] # Related memory IDs
ID Format: {namespace}:{commit_sha_prefix}:{index}
- Example:
decisions:5da308d:19 - Enables tracing to the originating git commit for full implementation context
3.3 Storage Format
Memories use YAML front matter with a markdown body, enabling both machine parsing and human readability:
---
type: decisions
timestamp: 2025-12-21T05:46:36Z
summary: Lazy loading via __getattr__ to avoid embedding model import penalty
spec: git-notes-memory
tags: performance,architecture
---
## Context
Import-time loading of sentence-transformers adds 2+ seconds to startup.
## Decision
Use Python's `__getattr__` in `__init__.py` for lazy module loading.
## Rationale
- Defers embedding model load until first use
- SessionStart hook completes in <200ms vs 2s+
- Users who don't need embeddings never pay the cost
3.4 Namespace Taxonomy
The system defines ten semantic namespaces, each with associated signal detection patterns:
| Namespace | Purpose | Signal Patterns |
|---|---|---|
| decisions | Architectural choices | “I decided”, “we chose”, “[decision]” |
| learnings | Technical insights | “I learned”, “TIL”, “[learned]” |
| blockers | Impediments | “blocked by”, “stuck on”, “[blocker]” |
| progress | Milestones | “completed”, “shipped”, “[progress]” |
| patterns | Reusable approaches | “best practice”, “[pattern]” |
| research | External findings | Manual capture |
| reviews | Code review notes | Manual capture |
| retrospective | Post-mortems | Manual capture |
| inception | Problem statements | Manual capture |
| elicitation | Requirements | Manual capture |
4. Progressive Hydration
4.1 The Hydration Model
Progressive hydration implements Shneiderman’s “details on demand” principle, loading memory details only when needed. This approach addresses the token budget constraint inherent in LLM context windows.
Level 1: SUMMARY (Default for context injection)
<memory id="decisions:5da308d:19" hydration="summary">
<summary>Lazy loading via __getattr__ to avoid embedding model import penalty</summary>
</memory>
- Token cost: 15-20 tokens
- Retrieval time: Sub-millisecond (index lookup)
Level 2: FULL (On-demand expansion)
---
type: decisions
timestamp: 2025-12-21T05:46:36Z
summary: Lazy loading via __getattr__ to avoid embedding model import penalty
---
## Context
Import-time loading of sentence-transformers adds 2+ seconds...
## Decision
Use Python's `__getattr__` in `__init__.py`...
## Rationale
- Defers embedding model load until first use
- SessionStart hook completes in <200ms vs 2s+
- Token cost: 100-500 tokens
- Retrieval time: ~10ms (git notes show)
Level 3: FILES (Full context reconstruction)
- Includes file snapshots from the commit when memory was created
- Enables complete context reconstruction
- Token cost: Unbounded (file-dependent)
- Retrieval time: Variable (git tree traversal)
4.2 Token Efficiency Analysis
The three-level model achieves significant token savings. For a project with 100 indexed memories:
| Approach | Token Cost | Context Utilization |
|---|---|---|
| All FULL | 25,000-50,000 | Exceeds typical budgets |
| All SUMMARY | 1,500-2,000 | 13 memories shown |
| Progressive | 2,000 + on-demand | Full coverage with depth |
5. Hook-Based Capture
5.1 Hook Event Lifecycle
The system integrates with Claude Code’s hook infrastructure at five extension points, each serving a distinct purpose in the memory lifecycle:
Session Start --> Context Injection (memories -> Claude)
|
v
User Prompt ---> Signal Detection (user text -> capture decision)
|
v
Tool Use ------> Domain Context (file path -> related memories)
|
v
Pre-Compact ---> Preservation (high-confidence signals -> git notes)
|
v
Stop ----------> Session Analysis (transcript -> memory extraction)
5.2 Signal Detection Implementation
The SignalDetector implements the three-tier SDT-based model using regex patterns with confidence scoring:
Pattern Examples:
DECISION_PATTERNS = [
(r"\[decision\]", 0.98), # Explicit marker
(r"\[d\]", 0.95), # Shorthand
(r"I\s+decided\s+to", 0.90), # Natural language
(r"we\s+chose", 0.88), # Collaborative
(r"we'll\s+go\s+with", 0.85), # Informal
]
Block Marker Format (highest confidence: 0.99):
>> decision -----------------------------------------------
Use PostgreSQL for persistence layer
## Context
Evaluated SQLite, PostgreSQL, and MongoDB.
## Rationale
- ACID guarantees required for financial data
- Team expertise in PostgreSQL
-------------------------------------------------------
6. Evaluation
6.1 Performance Measurements
| Operation | Target | Achieved | Method |
|---|---|---|---|
| SessionStart context build | <= 2000ms | < 10ms | Indexed queries |
| Signal detection (regex) | <= 100ms | < 5ms | Compiled patterns |
| Novelty check | <= 300ms | < 50ms | sqlite-vec KNN |
| Memory capture | <= 500ms | < 100ms | Append + index |
| Vector search (k=10) | <= 100ms | < 50ms | sqlite-vec |
All operations complete well within interactive latency requirements, ensuring the memory system does not degrade user experience.
6.2 Index Statistics
Production statistics from the git-notes-memory project:
Total indexed memories: 116
By namespace:
- decisions: 28
- learnings: 23
- blockers: 19
- progress: 15
- patterns: 31
7. Conclusion
The git-notes-memory-manager demonstrates that persistent, semantically-searchable memory for LLM agents is achievable without external infrastructure. By leveraging Git’s native notes mechanism, progressive hydration, and hook-based capture with signal detection theory, the system provides:
- Zero-Infrastructure Memory: Operates with existing git, requiring no databases or cloud services
- Semantic Retrieval: Natural language queries locate relevant memories through vector similarity
- Automatic Capture: Confidence-scored signal detection reduces cognitive load
- Token Efficiency: Progressive hydration respects context window constraints
- Team Sharing: Memories synchronize with code through standard git operations
The architecture validates treating LLM agent memory as a first-class concern (rather than an afterthought), enabling qualitatively different developer experiences. Decisions persist, blockers track to resolution, and learnings accumulate across sessions, transforming ephemeral conversations into durable knowledge.
References
Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417-423.
Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The Psychology of Learning and Motivation (Vol. 8, pp. 47-89). Academic Press.
Green, D. M., & Swets, J. A. (1966). Signal Detection Theory and Psychophysics. Wiley.
Nielsen, J. (2006). Progressive disclosure. Nielsen Norman Group.
Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualizations. Proceedings of IEEE Symposium on Visual Languages, 336-343.
This research was conducted through systematic analysis of the git-notes-memory-manager codebase and production validation during development sessions. Real examples are drawn from actual session logs dated December 2025.
Comments will be available once Giscus is configured.