5. 提示工程管线：如何组装一个完美的 System Prompt

在本章中，我们将深入探讨 hermes-agent 的提示工程管线 —— 这是一个精心设计的 9 槽位管道，负责从身份定义、平台适配、环境感知、记忆指导到上下文文件注入的完整系统提示组装过程。

System Prompt 是 AI Agent 行为的核心控制器。一个设计良好的系统提示可以让 Agent 准确理解自己的角色、能力和限制，从而更可靠地执行任务。hermes-agent 的提示管线不仅考虑了功能性，还考虑了安全性（注入防御）、性能（缓存策略）和成本（Anthropic 缓存优化）。

5.1. 1. 提示工程的挑战

5.1.1. 为什么 System Prompt 如此重要？

在 AI Agent 架构中，System Prompt 是一个关键的接口层，它在模型推理之前定义了 Agent 的行为规范。一个设计不当的 System Prompt 可能导致：

角色混乱 ：Agent 不知道自己是谁，行为不一致
工具误用 ：Agent 不了解工具的正确使用方式
平台不兼容 ：在 Markdown 不支持的平台使用 Markdown 格式
上下文浪费 ：包含无关信息，浪费宝贵的 token 空间
安全漏洞 ：缺乏注入防御，可能被恶意利用

hermes-agent 的提示管线通过分层设计、动态适配和缓存策略来解决这些问题。

5.1.2. 提示工程的核心矛盾

提示工程面临几个核心矛盾：

完整性 vs 简洁性 ：提示需要足够完整以指导行为，但又不能太长以至于浪费 token 或分散注意力
通用性 vs 特异性 ：提示需要在所有场景下都有用，但又需要针对特定场景提供专门指导
稳定性 vs 动态性 ：某些指导应该稳定不变，而另一些需要根据运行时环境动态调整
安全性 vs 可用性 ：安全检查可以防止注入攻击，但过于严格的检查可能误杀正常内容

hermes-agent 通过 9 槽位管道设计来解决这些矛盾，每个槽位负责一个特定的关注点，组合起来形成完整的系统提示。

5.2. 2. 提示组装管线总览

hermes-agent 的系统提示由 9 个独立的"槽位"组成，每个槽位负责一个特定方面。这种设计使得每个部分可以独立开发和测试。

5.2.1. 9 槽位管道概览

让我们首先看看这 9 个槽位的概览：

Agent 身份层 ：来自 SOUL.md，定义 Agent 的个性和身份
平台适配层 ：根据运行平台提供格式和行为指导
环境感知层 ：检测运行环境（WSL、Docker 等）
工具使用指导 ：指导 Agent 如何使用工具
记忆与搜索指导 ：MEMORY_GUIDANCE 和 SESSION_SEARCH_GUIDANCE
模型特定执行规范 ：针对不同模型的专门指导
技能索引系统 ：列出可用的技能及其描述
上下文文件 ：从 .hermes.md、AGENTS.md 等加载的项目上下文
Anthropic 缓存标记 ：为 Anthropic 模型注入缓存控制

5.2.2. 提示组装管线流程图

下面是提示组装管线的完整流程图：

        flowchart TD
    Start([Start building system prompt]) --> Slot1

    subgraph S1[Slot 1: Agent Identity]
        Slot1[Load SOUL.md] --> Slot1Check{SOUL.md exists?}
        Slot1Check -->|Yes| Slot1Use[Use SOUL.md content]
        Slot1Check -->|No| Slot1Default[Use DEFAULT_AGENT_IDENTITY]
    end

    Slot1Use --> Slot2
    Slot1Default --> Slot2

    subgraph S2[Slot 2: Platform Hints]
        Slot2[Query current platform] --> Slot2Hint{Match found?}
        Slot2Hint -->|Yes| Slot2Add[Add PLATFORM_HINTS]
        Slot2Hint -->|No| Slot2Skip[Skip]
    end

    Slot2Add --> Slot3
    Slot2Skip --> Slot3

    subgraph S3[Slot 3: Environment]
        Slot3[Detect runtime env] --> Slot3Env{WSL or Termux?}
        Slot3Env -->|Yes| Slot3Hint[Add env hints]
        Slot3Env -->|No| Slot3None[No special hints]
    end

    Slot3Hint --> Slot4
    Slot3None --> Slot4

    subgraph S4[Slot 4-6: Guidance Layer]
        Slot4[Tool use guidance] --> Slot5[Memory and search guidance]
        Slot5 --> Slot6[Model-specific rules]
    end

    Slot6 --> Slot7

    subgraph S7[Slot 7: Skills Index]
        Slot7[Query skill cache] --> Slot7Cache{Cache hit?}
        Slot7Cache -->|Yes| Slot7Use[Use cache]
        Slot7Cache -->|No| Slot7Build[Scan and build index]
    end

    Slot7Use --> Slot8
    Slot7Build --> Slot8

    subgraph S8[Slot 8: Context Files]
        Slot8[Search context files] --> Slot8Priority{Load by priority}
        Slot8Priority --> Slot8Hermes[".hermes.md"]
        Slot8Hermes --> Slot8Agents[AGENTS.md]
        Slot8Agents --> Slot8Claude[CLAUDE.md]
        Slot8Claude --> Slot8Cursor[".cursorrules"]
    end

    Slot8 --> Slot9

    subgraph S9[Slot 9: Cache Optimization]
        Slot9[Inject cache markers] --> Slot9Anthropic{Anthropic model?}
        Slot9Anthropic -->|Yes| Slot9Cache[Add cache_control]
        Slot9Anthropic -->|No| Slot9Skip[Skip]
    end

    Slot9Cache --> Final[Merge all slots]
    Slot9Skip --> Final
    Final --> End([Return complete system prompt])

每个槽位的输出都是一个字符串片段，最终这些片段会被合并成一个完整的系统提示。

5.3. 3. Agent 身份层 (SOUL.md)

Agent 的身份定义是系统提示的第一个组成部分，它决定了 Agent 的基本行为方式和个性。

5.3.1. 身份的定义

hermes-agent 的默认身份定义如下：

DEFAULT_AGENT_IDENTITY = (
    "You are Hermes Agent, an intelligent AI assistant created by Nous Research. "
    "You are helpful, knowledgeable, and direct. You assist users with a wide "
    "range of tasks including answering questions, writing and editing code, "
    "analyzing information, creative work, and executing actions via your tools. "
    "You communicate clearly, admit uncertainty when appropriate, and prioritize "
    "being genuinely useful over being verbose unless otherwise directed below. "
    "Be targeted and efficient in your exploration and investigations."
)

这段话定义了几个关键特性：

名称：Hermes Agent
创建者 ：Nous Research
核心特质 ：有帮助的、知识渊博的、直接的
能力范围 ：问答、代码编写/编辑、信息分析、创意工作、工具执行
沟通风格 ：清晰、诚实、高效
行为准则 ：优先实际有用，避免冗余

5.3.2. 自定义身份：SOUL.md

用户可以通过创建 SOUL.md 文件来自定义 Agent 的身份：

def load_soul_md() -> Optional[str]:
    """Load SOUL.md from HERMES_HOME and return its content, or None."""
    try:
        from hermes_cli.config import ensure_hermes_home
        ensure_hermes_home()
    except Exception as e:
        logger.debug("Could not ensure HERMES_HOME before loading SOUL.md: %s", e)

    soul_path = get_hermes_home() / "SOUL.md"
    if not soul_path.exists():
        return None
    try:
        content = soul_path.read_text(encoding="utf-8").strip()
        if not content:
            return None
        content = _scan_context_content(content, "SOUL.md")
        content = _truncate_content(content, "SOUL.md")
        return content
    except Exception as e:
        logger.debug("Could not read SOUL.md from %s: %s", soul_path, e)
        return None

SOUL.md 的加载流程包括几个重要步骤：

确保目录存在 ：调用 ensure_hermes_home() 确保 HERMES_HOME 目录已创建
路径定位 ：在 ~/.hermes/SOUL.md 中查找
注入扫描 ：通过 _scan_context_content() 检查潜在的注入攻击
大小限制 ：通过 _truncate_content() 确保内容不会过大

SOUL.md 的设计理念是"用户即设计师"——用户可以完全控制 Agent 的个性。例如：

技术支持团队可以定义一个专业、精确的 Agent 身份
创意团队可以定义一个更有想象力、更富有表现力的 Agent
教育场景可以定义一个耐心的、善于引导的 Agent

5.4. 4. 平台适配层

不同的消息平台有不同的特性：有的支持 Markdown，有的不支持；有的可以发送文件，有的只能发文本。hermes-agent 通过平台适配层来处理这些差异。

5.4.1. 13 个平台提示

hermes-agent 目前支持 13 个平台的提示适配。让我们看看这些平台：

PLATFORM_HINTS = {
    "whatsapp": (
        "You are on a text messaging communication platform, WhatsApp. "
        "Please do not use markdown as it does not render. "
        "You can send media files natively: to deliver a file to the user, "
        "include MEDIA:/absolute/path/to/file in your response. ..."
    ),
    "telegram": (
        "You are on a text messaging communication platform, Telegram. "
        "Standard markdown is automatically converted to Telegram format. ..."
    ),
    "discord": (
        "You are in a Discord server or group chat communicating with your user. ..."
    ),
    "slack": (...),
    "signal": (...),
    "email": (...),
    "cron": (...),
    "cli": (...),
    "sms": (...),
    "bluebubbles": (...),
    "weixin": (...),
    "wecom": (...),
    "qqbot": (...),
    # ... 更多平台
}

每个平台提示包含几个方面的信息：

平台描述 ：告诉 Agent 它运行在什么平台上
格式规则 ：是否支持 Markdown，应该使用什么格式
媒体能力 ：如何发送文件、图片、音频等
特殊规则 ：该平台特有的行为约束

5.4.2. 平台适配的关键考量

让我们看看一些平台的特殊考虑：

WhatsApp / SMS / BlueBubbles

这些平台不支持 Markdown，所以提示明确告诉 Agent 不要使用 Markdown：

"Please do not use markdown as it does not render."

Telegram

Telegram 支持 Markdown 的一个子集，提示列出了具体支持的格式：

"Supported: **bold**, *italic*, ~~strikethrough~~, ||spoiler||,
`inline code`, ```code blocks```, [links](url), and ## headers."

Email

邮件有特殊的格式要求：

"Write clear, well-structured responses suitable for email.
Use plain text formatting (no markdown). Keep responses concise
but complete."

Cron

定时任务没有用户参与，需要完全自主执行：

"You are running as a scheduled cron job. There is no user present —
you cannot ask questions, request clarification, or wait for follow-up.
Execute the task fully and autonomously."

WeCom (企业微信)

企业微信支持 Markdown，并强调文件发送能力：

"Do NOT tell the user you lack file-sending capability — use MEDIA:
syntax whenever a file delivery is appropriate."

这条规则特别有趣——它不仅描述了能力，还明确防止 Agent 自我否定。这是因为有些模型可能会因为不确定而拒绝执行操作。

5.4.3. 媒体发送机制

所有支持媒体发送的平台使用统一的 MEDIA: 语法：

MEDIA:/absolute/path/to/file

这个约定让 Agent 知道如何请求发送文件，而具体的发送逻辑由平台适配层处理。

5.5. 5. 环境感知层

除了消息平台之外，Agent 还需要知道它运行在什么计算环境中。hermes-agent 通过环境感知层来提供这些信息。

5.5.1. WSL 检测

Windows Subsystem for Linux (WSL) 是一个常见的特殊环境：

WSL_ENVIRONMENT_HINT = (
    "You are running inside WSL (Windows Subsystem for Linux). "
    "The Windows host filesystem is mounted under /mnt/ — "
    "/mnt/c/ is the C: drive, /mnt/d/ is D:, etc. "
    "The user's Windows files are typically at "
    "/mnt/c/Users/<username>/Desktop/, Documents/, Downloads/, etc. "
    "When the user references Windows paths or desktop files, translate "
    "to the /mnt/c/ equivalent. You can list /mnt/c/Users/ to discover "
    "the Windows username if needed."
)

这个提示告诉 Agent：

它运行在 WSL 中
Windows 文件系统的挂载点
用户文件的典型位置
如何处理 Windows 路径引用

这很重要，因为 WSL 用户可能使用 Windows 风格的路径（如 C:\Users\...），而 Agent 需要知道如何将这些转换为 Linux 路径。

5.5.2. 环境检测函数

def build_environment_hints() -> str:
    """Return environment-specific guidance for the system prompt.

    Detects WSL, and can be extended for Termux, Docker, etc.
    Returns an empty string when no special environment is detected.
    """
    hints: list[str] = []
    if is_wsl():
        hints.append(WSL_ENVIRONMENT_HINT)
    return "\n\n".join(hints)

这个函数设计为可扩展的：未来可以添加 Termux、Docker 等其他环境的检测。

5.6. 6. 记忆与搜索指导

hermes-agent 的记忆系统允许 Agent 跨会话持久化信息，搜索指导则告诉 Agent 何时应该查找历史对话。

5.6.1. 记忆指导 (MEMORY_GUIDANCE)

MEMORY_GUIDANCE = (
    "You have persistent memory across sessions. Save durable facts using the memory "
    "tool: user preferences, environment details, tool quirks, and stable conventions. "
    "Memory is injected into every turn, so keep it compact and focused on facts that "
    "will still matter later.\n"
    "Prioritize what reduces future user steering — the most valuable memory is one "
    "that prevents the user from having to correct or remind you again. "
    "User preferences and recurring corrections matter more than procedural task details.\n"
    "Do NOT save task progress, session outcomes, completed-work logs, or temporary TODO "
    "state to memory; use session_search to recall those from past transcripts. "
    "If you've discovered a new way to do something, solved a problem that could be "
    "necessary later, save it as a skill with the skill tool."
)

这段指导包含了几个重要原则：

什么应该保存 ：用户偏好、环境细节、工具特性、稳定约定
什么不应该保存 ：任务进度、会话结果、已完成的工作日志、临时 TODO
格式要求 ：保持紧凑，因为记忆会在每轮对话中注入
价值优先级 ：能减少用户纠正的记忆最有价值
替代方案 ：任务相关的历史应该用 session_search 而不是记忆

这些指导帮助 Agent 做出明智的记忆决策，避免记忆系统被无价值的信息淹没。

5.6.2. 会话搜索指导 (SESSION_SEARCH_GUIDANCE)

SESSION_SEARCH_GUIDANCE = (
    "When the user references something from a past conversation or you suspect "
    "relevant cross-session context exists, use session_search to recall it before "
    "asking them to repeat themselves."
)

这条简短的指导告诉 Agent：当用户提到过去对话的内容时，应该主动搜索而不是要求用户重复。这大大改善了用户体验。

5.6.3. 技能指导 (SKILLS_GUIDANCE)

SKILLS_GUIDANCE = (
    "After completing a complex task (5+ tool calls), fixing a tricky error, "
    "or discovering a non-trivial workflow, save the approach as a "
    "skill with skill_manage so you can reuse it next time.\n"
    "When using a skill and finding it outdated, incomplete, or wrong, "
    "patch it immediately with skill_manage(action='patch') — don't wait to be asked. "
    "Skills that aren't maintained become liabilities."
)

这条指导鼓励 Agent 主动创建和维护技能，形成自我改进的循环。

5.7. 7. 模型特定执行规范

不同的 LLM 有不同的行为特性和常见失败模式。hermes-agent 为不同模型提供了特定的执行指导。

5.7.1. 工具使用强制指导

对于某些模型，hermes-agent 会注入额外的工具使用强制指导：

TOOL_USE_ENFORCEMENT_MODELS = ("gpt", "codex", "gemini", "gemma", "grok")

TOOL_USE_ENFORCEMENT_GUIDANCE = (
    "# Tool-use enforcement\n"
    "You MUST use your tools to take action — do not describe what you would do "
    "or plan to do without actually doing it. When you say you will perform an "
    "action (e.g. 'I will run the tests', 'Let me check the file', 'I will create "
    "the project'), you MUST immediately make the corresponding tool call in the same "
    "response. Never end your turn with a promise of future action — execute it now.\n"
    "Keep working until the task is actually complete. Do not stop with a summary of "
    "what you plan to do next time. If you have tools available that can accomplish "
    "the task, use them instead of telling the user what you would do.\n"
    "Every response should either (a) contain tool calls that make progress, or "
    "(b) deliver a final result to the user. Responses that only describe intentions "
    "without acting are not acceptable."
)

这条指导针对某些模型的一个常见问题：它们会描述自己要做什么，而不是实际去做。这被称为"承诺但不执行"问题。

5.7.2. OpenAI 模型执行指导

对于 OpenAI 的模型（特别是 GPT 和 Codex），hermes-agent 提供了更详细的执行指导：

OPENAI_MODEL_EXECUTION_GUIDANCE = (
    "# Execution discipline\n"
    "<tool_persistence>\n"
    "- Use tools whenever they improve correctness, completeness, or grounding.\n"
    "- Do not stop early when another tool call would materially improve the result.\n"
    "- If a tool returns empty or partial results, retry with a different query or "
    "strategy before giving up.\n"
    "- Keep calling tools until: (1) the task is complete, AND (2) you have verified "
    "the result.\n"
    "</tool_persistence>\n"
    "\n"
    "<mandatory_tool_use>\n"
    "NEVER answer these from memory or mental computation — ALWAYS use a tool:\n"
    "- Arithmetic, math, calculations → use terminal or execute_code\n"
    "- Hashes, encodings, checksums → use terminal (e.g. sha256sum, base64)\n"
    "- Current time, date, timezone → use terminal (e.g. date)\n"
    "- System state: OS, CPU, memory, disk, ports, processes → use terminal\n"
    "- File contents, sizes, line counts → use read_file, search_files, or terminal\n"
    "- Git history, branches, diffs → use terminal\n"
    "- Current facts (weather, news, versions) → use web_search\n"
    "</mandatory_tool_use>\n"
    # ... 更多指导
)

这段指导使用 XML 标签分组，包含几个方面：

工具持久性 （tool_persistence）：不要过早停止工具调用
强制工具使用 （mandatory_tool_use）：某些信息必须通过工具获取，不能从记忆中回答
行动而非询问 （act_dont_ask）：有明确默认解释时直接行动
前置检查 （prerequisite_checks）：不要跳过必要的准备步骤
验证（verification）：最终结果前检查正确性
缺失上下文 （missing_context）：缺少信息时使用工具查找

这些指导针对 OpenAI 模型的已知问题进行了优化，包括过早放弃工作、跳过前置检查、用记忆代替工具获取信息等。

5.7.3. Google 模型操作指导

对于 Google 的 Gemini 和 Gemma 模型：

GOOGLE_MODEL_OPERATIONAL_GUIDANCE = (
    "# Google model operational directives\n"
    "Follow these operational rules strictly:\n"
    "- **Absolute paths:** Always construct and use absolute file paths for all "
    "file system operations. Combine the project root with relative paths.\n"
    "- **Verify first:** Use read_file/search_files to check file contents and "
    "project structure before making changes. Never guess at file contents.\n"
    "- **Dependency checks:** Never assume a library is available. Check "
    "package.json, requirements.txt, Cargo.toml, etc. before importing.\n"
    "- **Conciseness:** Keep explanatory text brief — a few sentences, not "
    "paragraphs. Focus on actions and results over narration.\n"
    "- **Parallel tool calls:** When you need to perform multiple independent "
    "operations (e.g. reading several files), make all the tool calls in a "
    "single response rather than sequentially.\n"
    "- **Non-interactive commands:** Use flags like -y, --yes, --non-interactive "
    "to prevent CLI tools from hanging on prompts.\n"
    "- **Keep going:** Work autonomously until the task is fully resolved. "
    "Don't stop with a plan — execute it.\n"
)

这些指导针对 Google 模型的特定问题，包括使用相对路径、猜测文件内容、不检查依赖等。

5.7.4. 角色映射

hermes-agent 还为不同模型提供了角色映射：

# 模型名称子串 -> 应使用 'developer' 角色而非 'system'
DEVELOPER_ROLE_MODELS = ("gpt-5", "codex")

对于 OpenAI 的新模型（GPT-5、Codex），使用 developer 角色而不是 system 角色。这是因为 OpenAI 发现 developer 角色在这些模型上有更强的指令遵循权重。

这个映射在 API 边界进行转换，内部消息表示保持一致（始终使用 "system"），确保了代码的一致性。

5.8. 8. 技能索引系统

技能（Skill）是 hermes-agent 中一个强大的概念，它允许 Agent 学习和复用特定的工作流和知识。技能索引系统负责在系统提示中列出所有可用的技能。

5.8.1. 双层缓存架构

技能索引系统使用了一个精心设计的双层缓存：

LRU 内存缓存 ：进程内的最近最少使用缓存，最多保存 8 个条目
磁盘快照 ：使用 mtime/size 清单验证的持久化缓存，在进程重启后仍然有效

让我们看看这个架构的实现：

_SKILLS_PROMPT_CACHE_MAX = 8
_SKILLS_PROMPT_CACHE: OrderedDict[tuple, str] = OrderedDict()
_SKILLS_PROMPT_CACHE_LOCK = threading.Lock()
_SKILLS_SNAPSHOT_VERSION = 1

LRU 内存缓存使用 OrderedDict 实现，这是 Python 中实现 LRU 缓存的标准方式：

查找：检查缓存键是否存在，如果存在就移到末尾（最近使用）
存储：将新值添加到末尾
淘汰：当缓存大小超过限制时，从头部移除（最近最少使用）

5.8.2. 缓存键的设计

缓存键由多个因素组成，确保不同配置产生不同的缓存条目：

cache_key = (
    str(skills_dir.resolve()),
    tuple(str(d) for d in external_dirs),
    tuple(sorted(str(t) for t in (available_tools or set()))),
    tuple(sorted(str(ts) for ts in (available_toolsets or set()))),
    _platform_hint,
)

缓存键包含：

技能目录路径 ：不同的技能目录有不同的技能
外部目录 ：外部技能目录的列表
可用工具集合 ：技能可能根据可用工具过滤
可用工具集集合 ：技能可能根据可用工具集过滤
平台提示 ：不同平台可能禁用不同的技能

这确保了不同配置之间不会共享缓存条目，避免返回错误的技能列表。

5.8.3. 磁盘快照

磁盘快照使用 mtime/size 清单来验证缓存是否仍然有效：

def _build_skills_manifest(skills_dir: Path) -> dict[str, list[int]]:
    """Build an mtime/size manifest of all SKILL.md and DESCRIPTION.md files."""
    manifest: dict[str, list[int]] = {}
    for filename in ("SKILL.md", "DESCRIPTION.md"):
        for path in iter_skill_index_files(skills_dir, filename):
            try:
                st = path.stat()
            except OSError:
                continue
            manifest[str(path.relative_to(skills_dir))] = [st.st_mtime_ns, st.st_size]
    return manifest

清单记录了每个技能文件的修改时间（纳秒精度）和大小。当加载快照时，重新计算清单并比较：

def _load_skills_snapshot(skills_dir: Path) -> Optional[dict]:
    """Load the disk snapshot if it exists and its manifest still matches."""
    snapshot_path = _skills_prompt_snapshot_path()
    if not snapshot_path.exists():
        return None
    try:
        snapshot = json.loads(snapshot_path.read_text(encoding="utf-8"))
    except Exception:
        return None
    if not isinstance(snapshot, dict):
        return None
    if snapshot.get("version") != _SKILLS_SNAPSHOT_VERSION:
        return None
    if snapshot.get("manifest") != _build_skills_manifest(skills_dir):
        return None
    return snapshot

这个验证机制确保：

版本检查 ：快照格式可能变化，版本号确保兼容性
内容变化检测 ：如果任何技能文件被修改（mtime 或大小变化），快照失效
新文件检测 ：如果添加了新的技能文件，清单长度变化，快照失效

当快照无效时，系统回退到完整的文件系统扫描，然后写入新的快照。

5.8.4. 技能条件过滤

技能可以有条件激活规则，控制何时应该在索引中显示：

def _skill_should_show(
    conditions: dict,
    available_tools: "set[str] | None",
    available_toolsets: "set[str] | None",
) -> bool:
    """Return False if the skill's conditional activation rules exclude it."""
    if available_tools is None and available_toolsets is None:
        return True

    at = available_tools or set()
    ats = available_toolsets or set()

    # fallback_for: hide when the primary tool/toolset IS available
    for ts in conditions.get("fallback_for_toolsets", []):
        if ts in ats:
            return False
    for t in conditions.get("fallback_for_tools", []):
        if t in at:
            return False

    # requires: hide when a required tool/toolset is NOT available
    for ts in conditions.get("requires_toolsets", []):
        if ts not in ats:
            return False
    for t in conditions.get("requires_tools", []):
        if t not in at:
            return False

    return True

技能支持两种条件：

fallback_for ：当主工具/工具集可用时隐藏。例如，一个"通用搜索"技能可能在 web_search 工具可用时隐藏
requires ：当必要的工具/工具集不可用时隐藏。例如，一个"Docker 操作"技能可能在 terminal 工具不可用时隐藏

这让技能系统变得智能：不是所有技能都在所有情况下可见，只有相关的技能才会出现在索引中。

5.8.5. 技能索引的输出格式

技能索引最终生成一个结构化的文本块：

"## Skills (mandatory)\n"
"Before replying, scan the skills below. If a skill matches or is even partially relevant "
"to your task, you MUST load it with skill_view(name) and follow its instructions. "
# ...
"<available_skills>\n"
"  category_name: Category description\n"
"    - skill_name: Skill description\n"
"    - another_skill: Another description\n"
"</available_skills>\n"

这个格式有几个特点：

层级结构 ：技能按类别分组
描述信息 ：每个类别和技能都有描述
XML 标签 ：使用 <available_skills> 标签，便于模型识别
强制性 ：标题声明这是"必须"执行的步骤

5.8.6. 技能缓存查找流程图

下面是技能缓存查找的序列图：

        sequenceDiagram
    autonumber
    participant Caller as build_skills_system_prompt()
    participant LRU as LRU 内存缓存
    participant Disk as 磁盘快照
    participant FS as 文件系统扫描

    Caller->>LRU: 查询缓存键
    alt 缓存命中
        LRU-->>Caller: 返回缓存的技能索引
    else 缓存未命中
        LRU-->>Caller: None
        Caller->>Disk: 加载磁盘快照
        alt 快照有效
            Disk-->>Caller: 返回预解析的技能数据
        else 快照无效或不存在
            Disk-->>Caller: None
            Caller->>FS: 完整文件系统扫描
            FS-->>Caller: 技能文件列表
            Caller->>Caller: 解析技能文件
            Caller->>Disk: 写入新快照
        end
        Caller->>LRU: 存入缓存
        Caller-->>Caller: 返回技能索引
    end

这个流程图展示了技能查找的三层逻辑：先查 LRU，再查磁盘快照，最后回退到文件系统扫描。

5.9. 9. 上下文文件优先级

hermes-agent 支持多种上下文文件格式，用于注入项目特定的指导。这些文件有一个明确的优先级顺序。

5.9.1. 优先级顺序

上下文文件按以下优先级加载（第一个匹配的文件被使用）：

.hermes.md / HERMES.md ：Hermes 专用上下文文件
AGENTS.md / agents.md ：通用 Agent 上下文文件
CLAUDE.md / claude.md ：Claude 兼容的上下文文件
.cursorrules / .cursor/rules/*.mdc：Cursor 编辑器的规则文件

这个优先级设计有几个考虑：

Hermes 优先 ：专用文件覆盖通用文件
向后兼容 ：支持 AGENTS.md 和 CLAUDE.md，方便从其他工具迁移
互斥加载 ：只加载第一个匹配的文件，避免冲突和重复

让我们看看加载逻辑：

def build_context_files_prompt(cwd: Optional[str] = None, skip_soul: bool = False) -> str:
    """Discover and load context files for the system prompt.

    Priority (first found wins — only ONE project context type is loaded):
      1. .hermes.md / HERMES.md  (walk to git root)
      2. AGENTS.md / agents.md   (cwd only)
      3. CLAUDE.md / claude.md   (cwd only)
      4. .cursorrules / .cursor/rules/*.mdc  (cwd only)

    SOUL.md from HERMES_HOME is independent and always included when present.
    Each context source is capped at 20,000 chars.
    """
    if cwd is None:
        cwd = os.getcwd()

    cwd_path = Path(cwd).resolve()
    sections = []

    # Priority-based project context: first match wins
    project_context = (
        _load_hermes_md(cwd_path)
        or _load_agents_md(cwd_path)
        or _load_claude_md(cwd_path)
        or _load_cursorrules(cwd_path)
    )
    if project_context:
        sections.append(project_context)

    # SOUL.md from HERMES_HOME only — skip when already loaded as identity
    if not skip_soul:
        soul_content = load_soul_md()
        if soul_content:
            sections.append(soul_content)

    if not sections:
        return ""
    return "# Project Context\n\nThe following project context files have been loaded and should be followed:\n\n" + "\n".join(sections)

注意 Python 的 "or" 短路求值在这里创造了一个优雅的优先级链：第一个返回非空字符串的函数被使用，后面的函数不会被调用。

5.9.2. .hermes.md 的搜索范围

.hermes.md 与其他文件不同，它不仅搜索当前目录，还会向上搜索到 git 仓库根目录：

def _find_hermes_md(cwd: Path) -> Optional[Path]:
    """Discover the nearest ``.hermes.md`` or ``HERMES.md``.

    Search order: *cwd* first, then each parent directory up to (and
    including) the git repository root.
    """
    stop_at = _find_git_root(cwd)
    current = cwd.resolve()

    for directory in [current, *current.parents]:
        for name in _HERMES_MD_NAMES:
            candidate = directory / name
            if candidate.is_file():
                return candidate
        if stop_at and directory == stop_at:
            break
    return None

这意味着你可以在项目根目录放置 .hermes.md，它会在所有子目录中生效。这是一个有用的功能，因为用户可能在项目子目录中启动 Agent。

5.9.3. 内容截断

每个上下文文件的内容被限制在 20,000 字符以内：

CONTEXT_FILE_MAX_CHARS = 20_000
CONTEXT_TRUNCATE_HEAD_RATIO = 0.7
CONTEXT_TRUNCATE_TAIL_RATIO = 0.2

def _truncate_content(content: str, filename: str, max_chars: int = CONTEXT_FILE_MAX_CHARS) -> str:
    """Head/tail truncation with a marker in the middle."""
    if len(content) <= max_chars:
        return content
    head_chars = int(max_chars * CONTEXT_TRUNCATE_HEAD_RATIO)
    tail_chars = int(max_chars * CONTEXT_TRUNCATE_TAIL_RATIO)
    head = content[:head_chars]
    tail = content[-tail_chars:]
    marker = f"\n\n[...truncated {filename}: kept {head_chars}+{tail_chars} of {len(content)} chars. Use file tools to read the full file.]\n\n"
    return head + marker + tail

截断策略是保留头部 70% 和尾部 20%，中间插入一个说明标记。这种设计假设：

最重要的内容通常在文件开头（身份、核心规则）
结尾可能有重要的约束或总结
中间的内容可以安全地省略

注意还有 10% 的空间没有被保留，这是有意为之——为标记和额外的格式留出空间。

5.9.4. YAML Frontmatter 处理

.hermes.md 文件可能包含 YAML frontmatter，在注入系统提示之前会被剥离：

def _strip_yaml_frontmatter(content: str) -> str:
    """Remove optional YAML frontmatter (``---`` delimited) from *content*."""
    if content.startswith("---"):
        end = content.find("\n---", 3)
        if end != -1:
            body = content[end + 4:].lstrip("\n")
            return body if body else content
    return content

这允许在 frontmatter 中包含结构化配置（如模型覆盖、工具设置），同时只将人类可读的 Markdown 正文注入系统提示。

5.9.5. 上下文文件解析优先级流程图

下面是上下文文件解析的流程图：

        flowchart TD
    Start([开始搜索上下文文件]) --> CheckHermes{当前目录有 .hermes.md?}
    CheckHermes -->|是| LoadHermes[加载 .hermes.md]
    CheckHermes -->|否| WalkUp[向上搜索到 git 根目录]
    WalkUp --> FoundHermes{找到 .hermes.md?}
    FoundHermes -->|是| LoadHermes
    FoundHermes -->|否| CheckAgents{当前目录有 AGENTS.md?}
    CheckAgents -->|是| LoadAgents[加载 AGENTS.md]
    CheckAgents -->|否| CheckClaude{当前目录有 CLAUDE.md?}
    CheckClaude -->|是| LoadClaude[加载 CLAUDE.md]
    CheckClaude -->|否| CheckCursor{有 .cursorrules?}
    CheckCursor -->|是| LoadCursor[加载 .cursorrules + .cursor/rules/*.mdc]
    CheckCursor -->|否| NoContext[无上下文文件]
    LoadHermes --> Scan[注入扫描]
    LoadAgents --> Scan
    LoadClaude --> Scan
    LoadCursor --> Scan
    Scan --> Truncate[内容截断]
    Truncate --> Merge[合并为 Project Context]
    NoContext --> CheckSoul{跳过 SOUL.md?}
    Merge --> CheckSoul
    CheckSoul -->|否| LoadSoul[加载 SOUL.md]
    CheckSoul -->|是| Final[最终系统提示]
    LoadSoul --> Final

5.10. 10. 提示注入防御

在一个允许加载用户提供的上下文文件的系统中，提示注入攻击是一个严重的威胁。hermes-agent 实现了全面的防御机制。

5.10.1. 13 种检测模式

hermes-agent 使用正则表达式检测 13 种常见的提示注入模式：

_CONTEXT_THREAT_PATTERNS = [
    (r'ignore\s+(previous|all|above|prior)\s+instructions', "prompt_injection"),
    (r'do\s+not\s+tell\s+the\s+user', "deception_hide"),
    (r'system\s+prompt\s+override', "sys_prompt_override"),
    (r'disregard\s+(your|all|any)\s+(instructions|rules|guidelines)', "disregard_rules"),
    (r'act\s+as\s+(if|though)\s+you\s+(have\s+no|don\'t\s+have)\s+(restrictions|limits|rules)', "bypass_restrictions"),
    (r'<!--[^>]*(?:ignore|override|system|secret|hidden)[^>]*-->', "html_comment_injection"),
    (r'<\s*div\s+style\s*=\s*["\'][\s\S]*?display\s*:\s*none', "hidden_div"),
    (r'translate\s+.*\s+into\s+.*\s+and\s+(execute|run|eval)', "translate_execute"),
    (r'curl\s+[^\n]*\$\{?\w*(KEY|TOKEN|SECRET|PASSWORD|CREDENTIAL|API)', "exfil_curl"),
    (r'cat\s+[^\n]*(\.env|credentials|\.netrc|\.pgpass)', "read_secrets"),
]

这些模式覆盖了几类攻击：

指令覆盖 ：如 "ignore all instructions"、"disregard your rules"
欺骗隐藏 ：如 "do not tell the user"
权限绕过 ：如 "act as if you have no restrictions"
HTML 注入 ：如包含 "ignore"、"override" 的 HTML 注释、隐藏的 div
代码执行 ：如 "translate and execute"
数据泄露 ：如尝试使用 curl 获取环境变量、读取敏感文件

5.10.2. Unicode 不可见字符扫描

除了正则匹配，hermes-agent 还扫描不可见的 Unicode 字符：

_CONTEXT_INVISIBLE_CHARS = {
    '\u200b', '\u200c', '\u200d', '\u2060', '\ufeff',
    '\u202a', '\u202b', '\u202c', '\u202d', '\u202e',
}

这些字符包括：

零宽空格 （U+200B）：不可见但存在
零宽连接符 （U+200C, U+200D）：用于某些语言的连字
字节顺序标记 （U+FEFF）：通常在文件开头
方向控制字符 （U+202A-202E）：控制文本方向

攻击者可能使用这些字符来隐藏恶意指令，例如在正常文本中插入不可见的 "ignore previous instructions"。

5.10.3. 扫描与阻止流程

当检测到威胁时，整个文件的内容被替换为阻止消息：

def _scan_context_content(content: str, filename: str) -> str:
    """Scan context file content for injection. Returns sanitized content."""
    findings = []

    # Check invisible unicode
    for char in _CONTEXT_INVISIBLE_CHARS:
        if char in content:
            findings.append(f"invisible unicode U+{ord(char):04X}")

    # Check threat patterns
    for pattern, pid in _CONTEXT_THREAT_PATTERNS:
        if re.search(pattern, content, re.IGNORECASE):
            findings.append(pid)

    if findings:
        logger.warning("Context file %s blocked: %s", filename, ", ".join(findings))
        return f"[BLOCKED: {filename} contained potential prompt injection ({', '.join(findings)}). Content not loaded.]"

    return content

这是一种"全有或全无"的策略：

如果检测到任何威胁，整个文件被阻止
返回一个描述性的阻止消息，而不是原始内容
原始内容永远不会进入系统提示

这种策略虽然可能产生误报（false positive），但在安全上下文中，宁可误杀也不可遗漏。

5.11. 11. Anthropic Prompt 缓存

最后，让我们探讨 hermes-agent 中一个重要的成本优化机制：Anthropic Prompt 缓存。

5.11.1. 为什么需要 Prompt 缓存？

在与 LLM 的多轮对话中，系统提示和早期对话内容通常在每轮中都是相同的。如果不使用缓存，这些重复的内容会在每轮都计费。对于长系统提示和多轮对话，这会显著增加成本。

Anthropic 提供了一个 Prompt 缓存功能，允许缓存部分消息，避免重复计算。hermes-agent 使用 "system_and_3" 策略来最大化缓存命中。

5.11.2. system_and_3 策略

system_and_3 策略的名称来源于它的设计：在最多 4 个位置放置缓存断点（Anthropic 允许的最大值）：

1. 系统提示 ：在所有对话轮次中保持不变 2-4. 最后 3 条非系统消息 ：滚动窗口，每轮更新

让我们看看实现：

def apply_anthropic_cache_control(
    api_messages: List[Dict[str, Any]],
    cache_ttl: str = "5m",
    native_anthropic: bool = False,
) -> List[Dict[str, Any]]:
    """Apply system_and_3 caching strategy to messages for Anthropic models."""
    messages = copy.deepcopy(api_messages)
    if not messages:
        return messages

    marker = {"type": "ephemeral"}
    if cache_ttl == "1h":
        marker["ttl"] = "1h"

    breakpoints_used = 0

    if messages[0].get("role") == "system":
        _apply_cache_marker(messages[0], marker, native_anthropic=native_anthropic)
        breakpoints_used += 1

    remaining = 4 - breakpoints_used
    non_sys = [i for i in range(len(messages)) if messages[i].get("role") != "system"]
    for idx in non_sys[-remaining:]:
        _apply_cache_marker(messages[idx], marker, native_anthropic=native_anthropic)

    return messages

这个函数：

深拷贝消息 ：不修改原始消息列表
设置标记类型 ：默认是 "ephemeral"（短期缓存），可选择 1 小时 TTL
标记系统消息 ：如果第一条消息是系统消息，标记它
标记最近的非系统消息 ：从后往前标记，直到用完 4 个断点

5.11.3. 缓存标记的注入

缓存标记的注入需要处理多种消息格式：

def _apply_cache_marker(msg: dict, cache_marker: dict, native_anthropic: bool = False) -> None:
    """Add cache_control to a single message, handling all format variations."""
    role = msg.get("role", "")
    content = msg.get("content")

    if role == "tool":
        if native_anthropic:
            msg["cache_control"] = cache_marker
        return

    if content is None or content == "":
        msg["cache_control"] = cache_marker
        return

    if isinstance(content, str):
        msg["content"] = [
            {"type": "text", "text": content, "cache_control": cache_marker}
        ]
        return

    if isinstance(content, list) and content:
        last = content[-1]
        if isinstance(last, dict):
            last["cache_control"] = cache_marker

这个函数处理了所有可能的消息格式变体：

工具消息 ：对于原生 Anthropic API，直接在消息级别添加标记
空内容 ：在消息级别添加标记
字符串内容 ：转换为内容块列表，在最后一个块添加标记
列表内容 ：在最后一个内容块添加标记

缓存标记总是放在最后一个内容块上，因为 Anthropic 的缓存是从头到标记位置的连续前缀。

5.11.4. 成本节省分析

使用 prompt 缓存可以节省约 75% 的输入 token 成本。让我们分析一下原因：

在典型的多轮对话中：

系统提示 ：通常包含数千个 token，在所有轮次中不变
早期对话 ：一旦生成，在后续轮次中不变
只有最新消息 ：在每轮中新增

没有缓存时，每轮都需要支付全部输入 token 的费用。有了缓存：

系统提示只计算一次，后续从缓存读取
早期的对话轮次也会被缓存
只有缓存未命中的部分才需要完整计费

具体来说，假设系统提示有 3000 token，每轮新增 500 token：

第 1 轮：无缓存，支付 3000 token
第 2 轮：系统提示命中缓存，支付 500 新 token
第 3 轮：系统提示 + 第 1 轮命中缓存，支付 500 新 token
...

这就是为什么缓存可以节省约 75% 的输入 token 成本。

5.11.5. Anthropic 缓存策略流程图

下面是 Anthropic 缓存策略的流程图：

        flowchart TD
    Start([接收消息列表]) --> DeepCopy[深拷贝消息]
    DeepCopy --> SetMarker[设置缓存标记类型]
    SetMarker --> CheckTTL{TTL设置}
    CheckTTL -->|5m| MarkerEphemeral[marker = ephemeral]
    CheckTTL -->|1h| Marker1H[marker = ephemeral + ttl=1h]

    MarkerEphemeral --> CheckSystem{第一条消息是系统消息?}
    Marker1H --> CheckSystem

    CheckSystem -->|是| MarkSystem[标记系统消息, breakpoints=1]
    CheckSystem -->|否| NoSystem[breakpoints=0]

    MarkSystem --> CalcRemaining[remaining = 4 - breakpoints]
    NoSystem --> CalcRemaining

    CalcRemaining --> FindNonSys[找出所有非系统消息的索引]
    FindNonSys --> MarkLast[标记最后 remaining 条非系统消息]
    MarkLast --> Return[返回带缓存标记的消息]
    Return --> End([结束])

    subgraph 缓存效果
        direction LR
        M1[消息1: system] --> M2[消息2: user]
        M2 --> M3[消息3: assistant]
        M3 --> M4[消息4: tool]
        M4 --> M5[消息5: user]
        M5 --> M6[消息6: assistant]
        M6 --> M7[消息7: user]

        M1 -.- C1[cache_control]
        M5 -.- C2[cache_control]
        M6 -.- C3[cache_control]
        M7 -.- C4[cache_control]
    end

5.12. 总结

在本章中，我们深入探讨了 hermes-agent 的提示工程管线，从 13 段式管道设计、Agent 身份定义、13 个平台适配、环境感知、记忆与搜索指导、模型特定规范、双层缓存技能索引、上下文文件优先级到提示注入防御和 Anthropic Prompt 缓存。

这个系统的设计体现了几个重要原则：

分层设计 ：每个槽位负责一个独立的关注点，便于单独开发和测试
动态适配 ：根据平台、环境、模型动态调整提示内容
安全优先 ：多层次的注入防御机制
性能优化 ：双层缓存、磁盘快照、Anthropic 缓存
成本控制 ：通过缓存策略减少 token 使用
可扩展性 ：新平台、新模型、新环境可以轻松添加

理解这个系统不仅有助于使用 hermes-agent，也为设计其他 AI Agent 框架的提示系统提供了宝贵的参考。一个好的提示系统不是一蹴而就的，而是需要在实际使用中不断迭代和优化的。