抓包实测 Claude Code：执行 /compact 命令的真实 Token 成本

发布时间： May 18, 2026

上一篇文章拆了 Claude Code 正常对话的请求结构：3 个 system block、27 个工具、skills registry、system-reminder 注入。日常我的对话我看了下日志缓存命中率基本在 99%，香得很基本不用花钱。然后今天和 claude code上下文太长，我手贱打了 /compact。当时没觉得有啥，后来切到 cc-viewer 一看请求，我直接破防了。

/compact 不是 main agent，是新开了一个 subagent

/compact 不是当前对话的 main agent 自己总结自己。Claude Code 的做法是新起一个 subagent，请求里的 system prompt 完全变了（后面细说），工具列表也变了，这意味着每次 compact 都是一次全新的 API 调用，和 main agent 的对话是两条独立的请求链路。这个设计本身没啥问题——不让正在干活的 agent 自己总结自己，而是 subagent用” summarizer “来处理这件事是合理的。但问题是，这个 subagent 的上下文结构和 main agent 完全不同😭，kv cache 无法命中。

subagent 的请求长什么样

直接上裸 JSON：

{
  "model": "deepseek-v4-pro",
  "messages": [
    {}, {}, {}, {},  // ... 200+ 条历史消息，全部带着
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "CRITICAL: Respond with TEXT ONLY. Do NOT call any tools..."
        }
      ]
    }
  ],
  "system": [
    {
      "type": "text",
      "text": "x-anthropic-billing-header: cc_version=2.1.126.88c; cc_entrypoint=cli; cch=88547;"
    },
    {
      "type": "text",
      "text": "You are Claude Code, Anthropic's official CLI for Claude."
    },
    {
      "type": "text",
      "text": "You are a helpful AI assistant tasked with summarizing conversations."
    }
  ],
  "tools": [
    {
      "name": "Read",
      "description": "Reads a file from the local filesystem..."
    }
  ],
  "max_tokens": 20000,
  "temperature": 1,
  "output_config": {
    "effort": "high"
  },
  "stream": true
}

一个一个拆吧：

system prompt 从长篇大论变成了一句话

上一篇我们拆过，正常 main agent 的 system 是三个 block：

billing header（版本号、入口、计费标记）
身份声明：You are Claude Code, Anthropic’s official CLI for Claude.
主 system prompt：包含软件工程任务定位、安全边界、不要猜 URL、工具使用规范、输出风格、risky action 确认策略、session-specific guidance、auto memory、environment、context management……几千字的长文

到了 compact subagent，第三个 block 变成了：

You are a helpful AI assistant tasked with summarizing conversations.

就这一句。一句。一句。几千字的长文 system prompt 没了。从”你是 Anthropic 的官方 CLI 工具，你要遵守以下 847 条规则”，变成了”你是个帮忙总结对话的助手”。这合理吗？合理。compact subagent 不需要知道怎么用 Edit、怎么处理 git、怎么应对 risky action。它的任务只有一个：读对话历史，写总结。给它主 system prompt 完全是浪费。

但合理不代表没代价。

tools 从 27 个砍到了 1 个

正常 main agent 有 27 个工具：Agent、AskUserQuestion、Bash、CronCreate、CronDelete、CronList、Edit、EnterPlanMode、EnterWorktree、ExitPlanMode、ExitWorktree、Monitor、NotebookEdit、PushNotification、Read、RemoteTrigger、ScheduleWakeup、Skill、TaskCreate、TaskGet、TaskList、TaskOutput、TaskStop、TaskUpdate、WebFetch、WebSearch…

compact subagent 只有 1 个工具：

{
  "name": "Read",
  "description": "Reads a file from the local filesystem..."
}

只有一个 Read。连 Bash 都没给。这也合理——summarizer 只需要读历史消息就行了，最多再读一下之前的 transcript 文件。给它 Bash 权限反而多余。但是。system prompt 变了 + tools 变了 = prefix cache 全部作废。

缓存命中 0%，我人傻了

先看 compact subagent 的 usage：

{
  "input_tokens": 120223,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 0,
  "output_tokens": 4513,
  "service_tier": "standard"
}

三个关键数字：

input: 120K（120,223 tokens）
cache_read: 0（缓存命中 0）
cache_creation: 0（也没创建缓存，因为这是 subagent 的一次性调用）

对比正常 main agent 的请求：缓存命中率 99.7%。你发一句”你好”，真正要花钱的可能就几百 token。但是这个 compact subagent：12 万 token，全部实打实计费。一刀没少。为什么缓存完全没命中？因为 prefix cache 的工作原理是：如果前缀（system prompt + tools + 前面的 messages）和之前的请求一模一样，才能命中缓存。 compact subagent 的 system prompt 变了（第三个 block 从千字文变成一句话），tools 变了（从 27 个变成 1 个），messages 的结构也变了。这跟 main agent 的请求没有任何可复用的前缀。

所以每次 /compact 都是一次全新的、无缓存的、12 万 token 的全额 API 调用。compact 完了以后，main agent 的上下文确实缩短了，但之前那些消息的 KV cache 全扔了。compact 之后你发的下一条消息，main agent 要重新编码压缩后的上下文。横竖都是花钱——不 compact 上下文太长花钱，compact 缓存放空也花钱，compact 完了重新编码还花钱。

不要轻易用 /compact*，，，就让他缓存着吧。

最后一条 user message

注意请求里最后那条 user message 的 content：

CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.
- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already have all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn — you will fail the task.
- Your entire response must be plain text: an <analysis> block followed by a <summary> block.
Your task is to create a detailed summary of the conversation so far...

这是一段强约束 prompt：

“Respond with TEXT ONLY. Do NOT call any tools.” —— 虽然给了一个 Read 工具，但直接告诉你别用。给工具但不让用，我不能理解🤣
“You already have all the context” —— 告诉模型不要想着去读文件补信息，所有需要的上下文已经在 messages 里了。
“Tool calls will be REJECTED” —— 告诉我们就算我们调了工具也会被拒绝。
“Your entire response must be plain text: an <analysis> block followed by a <summary> block.” —— 输出格式强制约束。先分析，再总结。
然后是一大段对 summary 内容的详细要求：9 个小节，从 Primary Request 到 Optional Next Step，每一节都有明确的填写指令。

这就是 compact subagent 。

模型的回复：thinking + structured summary

模型返回的结构：

{
  "role": "assistant",
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze the conversation chronologically..."
    },
    {
      "type": "text",
      "text": "<analysis>...</analysis>\n<summary>...</summary>"
    }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 120223,
    "cache_read_input_tokens": 0,
    "output_tokens": 4513
  }
}

4.5K token 输出，产出了一个结构化的 9 节总结。这个总结会被注入回 main agent 的上下文，替代之前那些被压缩掉的历史消息。最后用户看到的效果是：

⏺ Compact summary
  ⎿  This session is being continued from a previous conversation that ran out of context.
     The summary below covers the earlier portion of the conversation.

     Summary:
     1. Primary Request and Intent: ...
     2. Key Technical Concepts: ...
     3. Files and Code Sections: ...
     ...

ctrl+o 可以展开看详细版本，还能看到完整 transcript 的文件路径：

/Users/hanwenbo/.claude/projects/Users-hanwenbo-PycharmProjects/c07046ac-...jsonl

总结：/compact 的代价

compact 的本质是：用一次昂贵的、无缓存的 API 调用，换取后续 main agent 上下文的瘦身。如果我们的对话已经长到 main agent 每轮都吃满 context window、频繁 truncate。那 compact 虽然贵，但至少能让对话继续下去。但如果我们只是”觉得上下文有点长想清理一下”——那我们就是在白白烧钱。因为我 compact 的时候花了 120K token 做总结，compact 之后 main agent 的缓存又从零开始积累。等积累起来又要 compact，又归零。

所以 /compact 是一个紧急止损工具，不是一个日常清理工具。上下文没炸就别碰😭