Replies: 2 comments 2 replies
-
|
倾向选择方案B
|
Beta Was this translation helpful? Give feedback.
1 reply
-
|
倾向选择方案B 方案A 中密钥丢失代表着数据丢失,从数据可用性和产品易用性的角度来看感觉不太行? 不可恢复对于一个系统来讲是很难接受的吧 PS: |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
English Version / 英文版
🔍 Review Items
Already confirmed (no review needed): Algorithm selection (§1), API Key hashing (§4.4), VectorDB no-encryption strategy (§5).
Context
Currently all sensitive data in OpenViking's AGFS storage layer is stored in plaintext:
/{account_id}/_system/users.json(AGFS)secrets.token_hex(32)/local/{account_id}/...(AGFS local).relations.json(AGFS)/_system/accounts.json(AGFS)Threat Model
OpenViking is a multi-tenant system where different customers (accounts) store data (resources, memories, skills) in the same server-side AGFS.
Core Threat: Anyone with server storage access (ops personnel, DBAs, or attackers who compromised the storage system) can directly read any customer's plaintext data.
Protection Goal: Even if an attacker obtains all files on the AGFS disk, without the corresponding account's encryption key, they cannot read any customer's file content.
Encryption Scope
ov.confConfig1. Algorithm Selection
This scheme involves two confirmed cryptographic problems and one optional one depending on AGFS scheme choice:
1.1 Symmetric Encryption Selection
Requirement: Encrypt AGFS file content (L0/L1/L2 text, JSON, etc.) with both confidentiality and integrity.
cryptographyno direct supportFinal Selection: AES-256-GCM (NIST SP 800-38D)
1.2 Key Derivation Function (Scheme B only)
Final Selection: HKDF-SHA256 (RFC 5869) —
infoparameter bindsaccount_idfor per-account isolation, <1μs/call1.3 API Key Hash Selection
Requirement: API Keys only need verification (one-way) after storage. Must resist brute-force.
Final Selection: Argon2id (RFC 9106)
memory=64MiB, iterations=3, parallelism=41.4 Summary & Referenced Standards
2. AGFS File Encryption: Key Source Scheme 🔍 Under Review
Core question: Where does the encryption key come from, and who holds it? This determines the security boundary and system complexity.
2.1 Scheme A: Client-Provided Key
How it works: Each account has its own encryption key held by the client. Every API request carries the key via Header (
X-Encryption-Key). Server only holds the key during request processing — never persisted.Security boundary: Server doesn't store keys. Even with disk + process access, operators cannot decrypt (key only exists briefly in request memory).
Pros:
Cons:
X-Encryption-KeyHeader2.2 Scheme B: Server-side Root Key Derivation (Envelope Encryption) ⭐ Recommended
How it works: Server manages a "master key" (Root Key). Client needs no encryption knowledge. Encryption is fully transparent:
HKDF(Root Key, "acc_teamA") → teamA's dedicated key— called account key (industry term: KEK)Security boundary: All disk data is ciphertext. Direct storage access cannot read data. But server holds Root Key at runtime, so anyone with Root Key access can decrypt all accounts' data.
Pros:
Cons:
Recommendation reason: Provides sufficient security under our threat model (protecting against storage-layer direct access) with minimal architectural impact. Root Key protection can be further strengthened via KMS (Vault / AWS KMS) — standard industry practice.
2.3 Scheme Comparison
2.4 Review Decision Points
The following §2.5 ~ §3 are Scheme B's detailed design. To be adjusted after scheme confirmation.
2. Scheme Details: Flow Diagrams & Async Sub-schemes
Scheme A Flow Diagram
Issues to solve:
add_resource()returns, background SemanticQueue generates L0/L1 and writes to AGFS — no client request context availableImplementation plan (if chosen):
Scheme B Flow Diagram
Implementation plan (if chosen):
2.5 Three-Layer Model & OpenViking Component Mapping (Scheme B Detailed Design)
Uses Envelope Encryption three-layer key model:
Walkthrough: acc_teamA uploads
viking://resources/utils.pyWrite flow (
VikingFS.write()):Then semantic queue generates L0/L1, also encrypted:
Read flow (
VikingFS.read()):Cross-account isolation: teamB's derived key ≠ teamA's → cannot decrypt →
InvalidTagWhy Three Layers
Account Key Derivation
salt(Root Key already high-entropy);infobindsaccount_id;v1for future upgrades; deterministic outputFile Key & Envelope
3. Root Key Provider Detailed Design (Scheme B)
Three implementations for different environments:
3.1 Abstract Interface
3.2 Local File Provider (Dev / Single-node)
~/.openviking/master.key(hex-encoded,chmod 0600){"encryption": {"enabled": true, "provider": "local", "local": {"key_file": "~/.openviking/master.key"}}}openviking-cli crypto init-key --output ~/.openviking/master.key3.3 HashiCorp Vault Provider (Production)
contextparameter for per-account derivation (Vault does HKDF internally)POST /v1/transit/encrypt|decrypt/openviking-rootwithcontext=base64(account_id)POST /v1/transit/keys/openviking-root/rotate— old versions retainedDependency:
hvac3.4 AWS KMS Provider (AWS Cloud)
GenerateDataKeyreturns both plaintext + encrypted file keyEncryptionContext={"account_id": "xxx"}as AAD, must match on decryptaws kms enable-key-rotation; all calls logged to CloudTrailDependency:
boto33.5 Provider Comparison
4. Data Encryption Detailed Design
4.1 Envelope Format v1
Magic bytes
OVE1→ decrypt path; otherwise → plaintext (gradual migration).4.2 AGFS Encryption Scope
.abstract.md.overview.mdfile.py.relations.json_system/users.json_system/accounts.jsoncollection_meta.json4.3 VikingFS Integration
4.4 API Key Hash ✅ Confirmed
Argon2id (RFC 9106), one-way hash:
Prefix-based lookup:
{prefix → [(identity, hash)]}→ O(1) + single Argon2id verify (~50ms)_load_account_keys()resolve()key_index[key]O(1)register_user()5. VectorDB Strategy ✅ No Encryption
localhttpvolcenginevikingdbFuture options: encrypt text metadata fields; or homomorphic encryption (academic stage).
6. Key Rotation
Periodically replacing keys to limit breach impact.
write()Scheme A rotation managed by client.
7. Implementation Summary
Changes
openviking/crypto/viking_fs.pyapi_keys.pyopenviking_service.pyDependencies
cryptographyargon2-cffihvacboto3Verification
InvalidTag; permission checksadd_resource()→find(),add_message()→commit()中文版 / Chinese Version
🔍 待评审事项
以下内容已确认,无需评审:算法选型(§1)、API Key 哈希方案(§4.4)、VectorDB 不加密策略(§5)。
Context
当前 OpenViking 所有 AGFS 存储层的敏感数据均以明文存储:
/{account_id}/_system/users.json(AGFS)secrets.token_hex(32)/local/{account_id}/...(AGFS 本地).relations.json(AGFS)/_system/accounts.json(AGFS)威胁模型
OpenViking 是多租户系统,不同客户(account)的数据(资源文件、记忆、技能)都存储在同一套服务端 AGFS 中。
核心威胁: 有服务端存储访问权限的人(运维人员、DBA、或存储系统被入侵时的攻击者)可以直接读取任意客户的明文数据。
防护目标: 即使攻击者拿到了 AGFS 磁盘上的全部文件,在没有对应 account 的加密密钥的情况下,无法读取任何客户的文件内容。
加密范围与状态
ov.conf配置文件1. 算法选型
本方案涉及两个已确认的密码学问题,以及一个取决于 AGFS 方案选择的可选问题:
1.1 对称加密算法选型
需求: 加密 AGFS 文件内容(L0/L1/L2 文本、JSON 等),需要同时保证机密性和完整性。
cryptography库不直接支持最终选择: AES-256-GCM (NIST SP 800-38D)
1.2 密钥派生函数 (KDF) 选型(AGFS 方案 B 适用)
最终选择: HKDF-SHA256 (RFC 5869) —
info参数绑定account_id确保 per-account 隔离,计算开销 <1μs/次1.3 API Key 哈希算法选型
需求: 用户 API Key 存储后只需验证(单向),不需要还原原文。需要抗暴力破解。
最终选择: Argon2id (RFC 9106)
memory=64MiB, iterations=3, parallelism=41.4 算法总览与引用标准
2. AGFS 文件加密:密钥来源方案 🔍 待评审
AGFS 加密的核心问题是:加密密钥从哪来、谁持有。这决定了安全边界和系统复杂度。
2.1 方案 A:Client 传入密钥
工作方式: 每个 account 有自己的加密密钥,由 client 持有。每次 API 请求通过 Header(
X-Encryption-Key)携带密钥。Server 只在请求处理期间持有密钥,不落盘、不存储。安全边界: server 不存储密钥,运维人员即使有磁盘 + 进程访问权限,也无法解密(密钥不在 config 里,只在请求内存中短暂存在)。
优点:
缺点:
X-Encryption-KeyHeader2.2 方案 B:Server 端 Root Key 派生(Envelope Encryption) ⭐ 推荐
工作方式: Server 自己管理一把"总钥匙"(Root Key),client 不需要知道任何加密细节。加密过程完全由 server 透明完成:
HKDF(Root Key, "acc_teamA") → teamA 专用密钥。这把派生密钥叫 account 密钥(行业术语 KEK)安全边界: 磁盘上全是密文,直接访问存储无法读取。但 server 运行时持有 Root Key,因此有 Root Key 访问权限的人可以解密所有 account 的数据。
优点:
缺点:
推荐理由: 在威胁模型(防存储层直接访问)下提供了足够的安全保障,同时对现有架构冲击最小。Root Key 的保护可通过 KMS(Vault / AWS KMS)进一步加强,这是云厂商和行业的标准做法。
2.3 方案对比
2.4 待评审决策点
以下第 2.5 ~ 第 3 节为方案 B 的详细设计。方案确认后按需调整。
2. 方案详细流程图与异步子方案
方案 A 流程图
需要解决的问题:
add_resource()返回后,后台 SemanticQueue 异步生成 L0/L1 并写入 AGFS,此时没有 client 请求上下文后续实施规划(如选此方案):
方案 B 流程图
后续实施规划(如选此方案):
2.5 三层模型与 OpenViking 组件映射(方案 B 详细设计)
本方案采用 Envelope Encryption(信封加密)三层密钥模型:
用实际例子走一遍:acc_teamA 上传
viking://resources/utils.py写入流程 (
VikingFS.write()):随后语义队列生成 L0/L1 摘要,同样加密存储:
读取流程 (
VikingFS.read()):跨 account 隔离: teamB 的派生密钥 ≠ teamA → 无法解密 →
InvalidTag异常为什么需要三层
account 密钥派生算法
salt(Root Key 已是高熵);info绑定account_id;v1预留升级;确定性输出文件密钥生成与 Envelope
3. Root Key Provider 详细设计(方案 B 适用)
三种实现适配不同环境:
3.1 抽象接口
3.2 Local File Provider(开发/单机)
~/.openviking/master.key(hex 编码,chmod 0600){"encryption": {"enabled": true, "provider": "local", "local": {"key_file": "~/.openviking/master.key"}}}openviking-cli crypto init-key --output ~/.openviking/master.key3.3 HashiCorp Vault Provider(生产推荐)
context参数实现 per-account 密钥派生(Vault 内部做 HKDF)POST /v1/transit/encrypt|decrypt/openviking-root+context=base64(account_id)POST /v1/transit/keys/openviking-root/rotate— 旧版本自动保留依赖:
hvac3.4 AWS KMS Provider(AWS 云部署)
GenerateDataKey同时返回明文和加密后的文件密钥EncryptionContext={"account_id": "xxx"}作为 AAD,解密时必须匹配aws kms enable-key-rotation;所有调用记录到 CloudTrail依赖:
boto33.5 Provider 对比
4. 数据加密方案详细设计
4.1 加密文件二进制格式 (Envelope Format v1)
Magic
OVE1→ 解密路径;否则 → 明文(支持渐进式迁移)。4.2 AGFS 文件加密范围
.abstract.md.overview.mdfile.py.relations.json_system/users.json_system/accounts.jsoncollection_meta.json4.3 VikingFS 集成
4.4 API Key 哈希存储 ✅ 已确认
Argon2id (RFC 9106),单向哈希:
Prefix 索引:
{prefix → [(identity, hash)]}→ O(1) + 单次 Argon2id (~50ms)_load_account_keys()resolve()key_index[key]O(1)register_user()5. VectorDB 加密策略 ✅ 不加密
localhttpvolcenginevikingdb未来可选:加密文本元数据字段;或同态加密(学术阶段)。
6. 密钥轮换
定期换新密钥,限制泄露影响范围。
write()新生成方案 A 密钥轮换由 client 管理。
7. 实施概要
主要改动
openviking/crypto/viking_fs.pyapi_keys.pyopenviking_service.py依赖库
cryptographyargon2-cffihvacboto3验证要点
InvalidTag;权限校验add_resource()→find()、add_message()→commit()端到端Beta Was this translation helpful? Give feedback.
All reactions