不要轻易关 cc 的/recap：KV Cache 复用逻辑

发布时间： May 19, 2026

前两篇文章分别拆了 /compact 的机制和压缩后对话如何恢复。compact 是昂贵的（120K token 一次性全价），但 Claude Code 里还有另一个和上下文相关的命令——/recap。它的成本和 compact 完全相反：几乎免费。

/recap 的作用是当你中途离开、回来时，给你一个当前进度的简短回顾。默认开启，也可以关掉。这篇文章拆一个真实的 recap 请求。

/recap 触发时机

当你离开终端一段时间再回来（或者说，当 CLI 检测到用户重新聚焦窗口），Claude Code 会自动生成一条 recap，让你知道「刚才在做什么、下一步该做什么」。如果嫌烦，/recap 可以手动开关。

这不是 subagent

compact 是新开一个 subagent，system prompt 砍掉、tools 砍光、缓存全丢。但 recap 不是 subagent——它是主 agent 本身在无工具模式下的一次轻量推理。

看一眼 recap 请求的 system 和 tools：

"system": [
  { "text": "x-anthropic-billing-header: ..." },
  {
    "text": "You are Claude Code, Anthropic's official CLI for Claude.",
    "cache_control": {}          // 持久缓存标记
  },
  {
    "text": "You are an interactive agent that helps users...",  // 千字长文全保留
    "cache_control": {}
  }
],
"tools": [
  {}, {}, {}, ...  // 全部 28 个工具都在
]

system prompt 完整保留，28 个工具一个不少，和正常对话的请求结构完全一样。这意味着它的请求前缀和之前的正常对话是同一个，KV cache 可以无缝复用。

唯一的区别：最后一条 message

recap 请求没有多轮对话、没有 tool call 的 system-reminder。messages 数组前面是 207 条对话历史（已缓存在 KV cache 中），最后附加一条特殊的 user message：

The user stepped away and is coming back. Recap in under 40 words,
1-2 plain sentences, no markdown. Lead with the overall goal and
current task, then the one next action. Skip root-cause narrative,
fix internals, secondary to-dos, and em-dash tangents.

这段 prompt 非常精准：

under 40 words, 1-2 plain sentences：输出要短，别啰嗦
Lead with the overall goal and current task, then the one next action：先说大目标+当前进度，再接下一动作
Skip root-cause narrative, fix internals, secondary to-dos, and em-dash tangents：不要讲分析过程、修复细节、次要待办

它是在让模型做一个极简摘要，而不是 compact 那种 9 节结构化总结。

另外注意 context_management.edits 是空的 []——没有清理任何历史 thinking，和 compact 的 clear_thinking_20251015 完全不同。因为这只是一次读操作，不需要清理什么。

核心数据：142K token 从缓存读，只付了 404 token

模型的响应：

{
  "type": "thinking",
  "thinking": "The user stepped away and is coming back. I need to give a brief recap. What we did: 1. Analyzed the CTD M2 project... 2. Set up local .env from K8s secrets... 3. Ran local tests... 4. Ran K8s tests... Current state: Both APIs tested and working..."
},
{
  "type": "text",
  "text": "你在做 CTD 药品注册文档平台的打标和 L1/L2 提取接口。本地和 K8s 两个接口都测通了，最新代码在 worktree 的 `feature-m24-phase4-extract-plugin` 分支。下一步可以通知后端健宇开始联调。"
}

然后看 usage：

"usage": {
  "input_tokens": 184,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 142592,
  "output_tokens": 220,
}

三个数字：

指标	数值	说明
input_tokens	184	实际要计算的 token，只有最后那条 recap prompt
cache_read_input_tokens	142,592	从 KV cache 直接读的，包括 system prompt + 207 条历史消息
output_tokens	220	thinking + 两句 text，极短

缓存命中率 = 142,592 / (184 + 142,592) = 99.87%。

真正花钱的就 184 + 220 = 404 token。如果按 deepseek-v4-pro 的 API 价格（$0.28/M input, $1.10/M output），这一次 recap 的成本是 $0.0003。一万次 recap 还不到 3 美元。

这就是 KV cache 的威力——前缀没变（system prompt + 历史消息一模一样），API 直接从缓存读，不需要重新计算。recap 的设计就是为了让这个特性最大化：不换 system prompt，不换 tools，不清理 thinking，只在末尾追加一条简短的 user message。

和 compact 放在一起看

	/recap	/compact
身份	主 agent	subagent
system prompt	千字长文完整保留	砍成一句话
tools	28 个	1 个
context_management	空 []	clear_thinking_20251015
目的	提醒用户当前进度	压缩上下文
cache_read	142K（99.87%）	0（0%）
input tokens	184	120K
output tokens	220	4.5K
单次成本	~$0.0003	~完整计费

recap 是白嫖 KV cache。compact 是 KV cache 大屠杀。

结论：开着，别关

recap 输出示例

/recap 默认是开启的，也可以关掉。但从这次的 request 来看，一次 recap 真正的计费 token 只有 400 左右，其他 14 万 token 全从缓存走。就算你一天触发 20 次 recap，加起来还不到 0.01 美元。

所以结论很简单：开着，别关。 它不是花钱的，是省钱省注意力的——花 0.0003 美元让你不用翻聊天记录，比自己翻 5 分钟高效得多。