Zhuang's Diary

言之有物,持之以恒

一、关于DeepSeek公司及其大模型

1.1 公司概况

DeepSeek 2023年7月成立于杭州,是幻方量化旗下的子公司,全称是杭州深度求索人工智能基础技术研究有限公司。”成立时间才一年多”、”最近推出的V3已经能和OpenAI的4o媲美”、”训练成本不到600W美元”、”API定价仅是国内其他头部厂商几十分之一”、”APP已经在中美APP store登上免费应用榜首”;

以上是最近关于DeepSeek的一些新闻热点信息,下面我们从官网看下:
DeepSeek近半年相继推出了3个主要的大模型版本,分别是DeepSeek V2.5、DeepSeek V3、DeepSeek-R1(无一例外的都是用了MOE架构)。在这之前还推出了DeepSeek-VL、DeepSeek Coder、DeepSeek Math。

1.2 模型能力

DeepSeek模型已经对标国内Qwen、海外Llama、GPT 4o,从公布的榜单评测上看:DeepSeek-V3 在开源模型中位列榜首,与世界上最先进的闭源模型不分伯仲。

1.3训推成本

推理成本(API报价):百万Token输入价格能达到1元。


训练成本:从技术报告中看DeepSeek用的是H800的GPU做的训练,而且只有2千张左右的H800,整个V3的正式训练成本不超过600W美元。

1、预训练阶段,每万亿的Token 训练V3使用 2048 个H800GPU集群,只需要 180K 个H800 GPU小时,大概 3.7 天(180000/2048/24)
2、整个预训练总耗时 2664K GPU小时(不到2个月),加上 上下文扩展和后训练,总耗时大概2788KGPU耗时。
3、按照 H800 每小时2美元租赁,总的训练成本不超过600W美元

这么低的推理和训练成本不由引出以下的问题:

  • 模型采用了什么样的网络架构?
  • 训练的精度、框架和并行策略是怎样的?
  • 模型的部署和优化方案是怎样的?
  • 在硬件层的计算和通信上做了什么优化?

二、DeepSeek训推核心技术

2.1 DeepSeek-V3模型网络架构

1) DeepSeekV3 整体预训练用了14.8万亿的高质量Token,2) 并且在后期做了SFT和RL,模型参数量达到 671B,但是每个Token仅激活37B参数。为了做到高效的推理和训练,3) DeepSeekV3自研了MLA注意力机制和无辅助损失负载均衡策略的MoE架构。

从技术报告中看出,是经典的Transformer架构,比较亮眼的就是前馈网络使用的DeepSeekMoE架构、Attention机制使用MLA架构,其实这两个在DeepSeekV2模型已经被验证使用过。
与DeepSeek-V2相比,V3额外引入了一种无辅助损失的负载均衡策略,用于DeepSeekMoE,以减轻因需要保证Expert负载均衡而导致的性能下降。

2.1.1 DeepSeekMoE

第一个将MoE架构引入Transformer网络的就是GShard架构了,与传统大模型架构相比,MoE架构在数据流转过程中集成了一个专家网络层。
可以看出传统的MoE基本两部分组成:Gating门控网络、稀疏MoE层;

  • 稀疏 MoE 层: 这些层代替了传统 Transformer 模型中的前馈网络 (FFN) 层。MoE 层包含若干“专家”(例如 8 个),每个专家本身是一个独立的神经网络。在实际应用中,这些专家通常是前馈网络 (FFN),但它们也可以是更复杂的网络结构,甚至可以是 MoE 层本身,从而形成层级式的 MoE 结构。
  • 门控网络或路由: 这个部分用于决定哪些Token被发送到哪个专家。Token的路由方式是 MoE 使用中的一个关键点,因为路由器由学习的参数组成,并且与网络的其他部分一同进行预训练。


和传统的MoE架构相比,DeepSeekMoE使用更细粒度的专家,并将一些专家隔离为共享专家,减少专家间的知识冗余


门控网络路由策略:TopK表示第t个Token和所有路由专家计算出的亲和力分数中K个最高分数的集合,在DeepSeekV3中,使用sigmoid函数计算亲和力分数,然后在所有选择的亲和力分数中应用归一化来生成门控值。
通常在MoE模型的训练过程中,不同专家因为路由策略的因素会导致接收的训练数据分布不均,比如所有的Token都被发送到只有少数几个受欢迎的专家,那么有些专家就可能没有被训练到。
业界通用的解决方案就是引入辅助损失,但是,有时候过大的辅助损失会损害模型性能。
为了在负载均衡和模型性能之间取得更好的平衡,DeepSeek开创了一种无辅助损失的负载均衡策略:为每个专家引入一个偏差项 bi,并将其添加到相应的亲和力分数 Si,t 中以确定top-K路由,具体来说:如果其对应的专家过载,我们将偏差项减少γ;如果其对应的专家负载不足,我们将偏差项增加γ,其中γ是一个称为偏差更新速度的超参数。

门控网络本质上就是一个softmax叠加一个分类网络,那么辅助loss往往就是添加一个惩罚项,对输出过大的 logits 进行惩罚,鼓励模型生成更加适度的 logits 值,防止模型生成过于极端的输出。

2.1.2 MLA 多头潜在注意力

大模型推理过程KV Cache机制一般是限制推理效率的一大瓶颈,而标准的Transformer 架构里面的MHA架构会产出非常多的KV Cache,为了减少对应的KV Cache业界实践过很多方案,例如PagedAttention、多查询注意力(MQA)和分组查询注意力(GQA),但是性能相比原生的MHA有一定差距。

DeepSeek-V2,提出一种创新的注意力机制:多头潜在注意力(MLA)。
相比MQA的KV共用和GQA的KV分组,MLA的核心是注意力键和值的低秩联合压缩,以减少推理过程中的键值(KV)缓存。相比MHA具有更好的性能,但需要的 KV 缓存量要少得多。

低秩矩阵是指其秩(rank)远小于其行数和列数的矩阵。
假设我们有一个矩阵,其实际结构允许它被分解为两个较小的矩阵的乘积。这种情况通常意味着原矩阵是低秩的。
假设我们有一个4×5的矩阵A,这个矩阵可以通过两个更小的矩阵的乘积来表示,比如一个4×2的矩阵B和一个2×5的矩阵C。这意味着原始矩阵A的信息可以通过这两个较小的矩阵来捕捉,表明A是一个低秩矩阵。

低秩压缩计算核心过程:



这里的 ht 表示第 t 个Token的输入,WDKV 表示KV的向下投影矩阵,将 ht 做降维压缩表示,实际得到 cKVt 就是要缓存的KV压缩隐向量;WUK和WUV是向上做升维的投影矩阵,将Token的压缩隐向量cKVt复原为原始KV矩阵;
MLA 模块架构图如下:

2.2 训练推理核心技术

2.2.1 训练框架HAI-LLM

DeepSeek-V3在一个配备了2048个NVIDIA H800 GPU的集群上进行训练,使用的是自研的HAI-LLM框架,框架实现了四种并行训练方式:ZeRO 支持的数据并行、流水线并行、张量切片模型并行和序列并行。  
这种并行能力支持不同工作负载的需求,可以支持数万亿规模的超大模型并扩展到数千个 GPU,同时还自研了一些配套的高性能算子haiscale,可以帮助 HAI-LLM 极大优化大模型训练的显存效率和计算效率。

2.2.2 核心算法DualPipe-创新流水线并行算法

i.通信计算重叠优化
DeepSeek-V3应用了16路流水线并行(PP),跨越8个节点的64路专家并行(EP),以及ZeRO-1数据并行(DP)。
与现有的流水线并行方法相比,DualPipe的流水线气泡更少。同时重叠了前向和后向过程中的计算和通信阶段,解决了跨节点专家并行引入的沉重通信开销的挑战
DualPipe的关键思想是重叠一对单独的前向和后向块中的计算和通信:将每个块划分为四个组件:注意力、all-all调度、MLP和all-all组合

例如,假设我们有两个计算块,A和B:
1.在块A进行前向传播计算时,可以同时进行块B的后向传播通信过程。
2.当块A完成前向传播计算后,开始它的通信过程;而块B则开始它的前向传播计算。


通过优化排列这些功能模块,并精确调控用于通信和计算的 GPU SM资源分配比例,系统能够在运行过程中有效隐藏全节点通信和 PP 通信开销。
可以看出DeepSeek在PP这块,做了大量的通信计算重叠优化,从技术报告中看出,即使是细粒度的all-all专家通信,all-all的通信开销几乎为0。


  • 计算通信重叠
    在深度学习大规模分布式训练过程中,通信的速度往往落后于计算的速度,如何在通信的gap期间内并行做一些计算就是高性能计算和通信重叠,是实现高效训练的关键因素。
  • 流水线并行气泡问题
    一些大的模型会采用流水线并行策略,将模型的不同层放在不同的GPU上,但是不同层之间有依赖关系,后面层需要等前面的计算完才能开始计算,会导致GPU在一段时间是闲置的,如下图所示:

ii.跨节点全对全通信
DeepSeek还专门定制了高效的跨节点all-all通信内核(包括调度和组合)。
具体来说:跨节点 GPU 通过 IB 完全互连,节点内通信通过 NVLink 处理,每个Token最多调度到 4个节点,从而减少 IB 通信量。同时使用warp专业化技术做调度和组合的优化

在调度过程中,(1) IB 发送,(2) IB 到 NVLink 转发,以及 (3) NVLink 接收分别由各自的 warp 处理。分配给每个通信任务的 warp 数会根据所有 SM 上的实际工作负载动态调整。
在合并过程中,(1) NVLink 发送,(2) NVLink 到 IB 的转发和累积,以及 (3) IB 接收和累积也由动态调整的 warp 处理。

通过这种方式,IB 和 NVLink 的通信实现完全重叠,每个 token 能够在不产生 NVLink 额外开销的情况下,在每个节点上平均高效选择 3.2 个专家。这意味着,虽然 DeepSeek-V3 实际只选择 8 个路由专家,但它可以将这个数字扩展到最多 13 个专家(4 个节点 × 3.2 个专家/节点),同时保持相同的通信成本。

DSV3采用了1个共享专家和256个路由专家的MoE架构,每个token会激活8个路由专家。

2.2.3 用于FP8训练的混合精度框架

这里并没有将全量参数FP8量化训练,大多数计算密集型操作都在FP8中进行,而一些关键操作则战略性地保留其原始数据格式,以平衡训练效率和数值稳定性。

哪些算子启用FP8量化去计算?取舍逻辑是什么?

  • 大多数核心计算过程,即 GEMM 运算,都以 FP8 精度实现
  • 涉及对低精度计算的敏感性的算子,仍然需要更高的精度
  • 一些低成本算子也可以使用更高的精度
    以下组件保留了原始精度(例如,BF16 或 FP32):Embedding模块、输出头、MoE 门控模块、Normalization 算子以及 Attention 算子。

如何提高低精度训练精度?

  • 细粒度量化

    对激活,在token维度采用group-wise的量化(1128);对权重,采用128 128的block-wise量化

  • 提高累加精度

    在 TensorCore 上执行矩阵 MMA(矩阵乘法累加)操作时,每当累加达到一个间隔时,这些部分结果会被传输到 CUDA Cores 上的 FP32 寄存器中,并在那里进行FP32 精度的累加计算。

2.2.4 MTP的训练目标

DeepSeekV3训练过程设置了多Token预测的目标,从技术报告的消融实验看出,确实提高了模型在大多数评估基准上的性能,而且MTP模块还可以用于推理加速。

2.2.5 推理部署方案

DeepSeek-V3 整体参数量达到了671B,如此多的参数量,我们看下他的一个部署方案:
推理部署采用了预填充(Prefilling)和解码(Decoding)分离的策略,确保了在线服务的高吞吐量和低延迟。通过冗余专家部署和动态路由策略,模型在推理时保持了高效的负载均衡。
整套部署方案下来基本是跨机分布式推理。

2.2.5.1 Prefill 阶段
这个阶段简单说就是并行处理用户的Prompt,将其转为KV Cache。

预填充阶段的最小部署单元由4个节点组成,每个节点配备32个GPU。注意力部分采用4路张量并行(TP4)和序列并行(SP),并结合8路数据并行(DP8)。其较小的TP规模(4路)限制了TP通信的开销。对于MoE部分,我们使用32路专家并行(EP32)

2.2.5.2 Decoder 阶段
这个阶段就是做自回归的每个Token的输出。

解码阶段的最小部署单元由40个节点和320个GPU组成。注意力部分采用TP4和SP,结合DP80,而MoE部分使用EP320。对于MoE部分,每个GPU只承载一个专家,64个GPU负责承载冗余专家和共享专家

总结:为什么DeepSeekV3训练成本这么低?

训练成本主要由模型架构以及训练架构所决定,而且两者一定是相辅相成。从报告中可以看出以下几个原因:
I.MLA 机制:通过对KV做联合低秩压缩大幅减少KV Cache,相比业界从KV数量角度做KV Cache的减少,MLA 的压缩实现很考验研究团队的基本功。
II.FP8 训练:通过低精度计算减少了 GPU 内存使用和计算开销,技术报告中也提到FP8混合精度训练框架是首次在一个极大规模的模型上验证了其有效性,这一点也看出DeepSeek的Infra工程团队的底蕴。
III.MoE 架构:通过MoE稀疏激活机制大幅减少了计算量,相比Qwen和Llama的Dense架构有很大的训推先天优势,不过难题(专家的负载、通信、路由)也给到了Infra工程团队。

三、为什么是DeepSeek?

在硅谷,类似DeepSeek这样的AI创新并不少有,只是这次是一家中国公司做出了这个动作,相比传统的‘美国创新、中国应用’的模式显得格外的让人兴奋。

从最近的一些访谈以及DeepSeek的技术报告中也能看出以下几点:
1、大模型是一个知识密集型产业,如何组织高密度人才?显然DeepSeek做到了。
2、大模型技术没有魔法,更多时候就是考验基本功和驱动力。
3、不以商业化为第一要义,很多时候能轻装上阵。

四、个人思考

1、长远来看,后续可能会有专门的适配Transformer架构的芯片,就像为卷积设计了ASIC芯片。
2、多Token预测、MoE架构可能很长一段时间都是大模型训推架构热门研究方向。
3、在国内做AI,应用始终会比基础研究有市场,更有话语权,但是基础创新和海外的代际差距会越来越小。
4、大模型训练和推理,软硬件是一个协同的生态,DeepSeek的出现将会促进AI全行业的更加快速且低成本的迭代。

参考资料

1、Better & Faster Large Language Models via Multi-token Prediction
4、DeepSeek-V3 Technical Report
5、DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
6、deepseek v3的成本这么低的根本原因是什么?
7、GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism 

CBDC商业需求的框架:

  1. 货币属性与分层(Monetary Hierarchy)
    • 明确性增强:
    建议进一步明确CBDC与现有货币(如现钞、存款货币)在法律地位和功能上的边界,特别是CBDC与稳定币的关系,以防止市场混淆。
    • 计息机制补充:
    动态利率调控应细化适用场景,如是否适用于长期零利率环境或应对通缩危机。增加对“负利率”选项的讨论(如欧盟报告提及),并评估其可能对现金的替代效应和公众接受度的影响。

  2. 合规与监管(Compliance & Regulation)
    • 隐私与监管平衡细化:
    • 对于“小额匿名交易”的标准可结合具体国情调整,比如中国试点采用的「可控匿名」方案,可以在技术上设置交易金额、频率的动态监控上限,而非固定金额阈值。
    • 增加数据最小化(Data Minimization)原则,以减少不必要的信息收集,从而增强用户隐私保护。
    • 制裁合规优化:
    建议进一步细化跨境交易黑名单机制的实现方式,是否支持智能合约自动筛查,同时预防潜在的技术误判和冤假错案。

  3. 技术架构(Technical Infrastructure)
    • 混合账本的弹性:
    明确不同机构的技术分工,例如央行核心账本是采用传统集中式数据库,还是许可链。对于子账本的具体功能,可以进一步说明其扩展范围(如钱包、支付场景)以及与核心账本的交互机制。
    • 隐私保护技术:
    除零知识证明(ZKP)和同态加密外,可增加多方安全计算(MPC)的讨论,特别是对于跨境支付中的敏感数据分割与安全计算的适用性分析。
    • 可扩展性补充:
    万级TPS的标准可细化到压力测试指标,如峰值负载下的系统响应时间、区块生成时间和网络延迟等。

  4. 支付系统互操作性(Interoperability)
    • 增强描述:
    增加CBDC在跨境支付场景中的“桥接角色”功能说明,例如mBridge、ICE项目的架构特点;明确自动外汇兑换的具体技术实现(是否基于去中心化市场或官方清算系统)。
    • 与商业银行系统的协同:
    明确CBDC是否会与商业银行账户体系直接交互,以及在支付链条中,如何避免资源争夺(如存款转移至CBDC后对商业银行的影响)。

  5. 用户体验与普惠性(User-Centric Design)
    • 普惠性增强:
    • 硬件钱包的支持范围可进一步说明,如是否采用低成本芯片(例如安全芯片SE),以应对农村或偏远地区普及难题。
    • 对无智能手机用户的支持还可以讨论二维码与短信支付结合方案。
    • 双离线能力补充:
    增加离线交易后同步验证机制的技术细节(如通过时间戳或单次签名以防双花攻击)。

  6. 货币政策传导(Monetary Policy Implementation)
    • 定向发放机制优化:
    明确技术实现,例如智能合约中的补贴精确传导机制,并讨论此类功能对社会福利政策的潜在影响(如精准扶贫)。
    • 数据采集分析透明性:
    增加对数据使用的透明化要求,例如统计数据是否匿名化处理以及其公开程度,以增强公众信任。

  7. 法律与治理框架(Legal & Governance)
    • 治理多元化:
    增加外部监督机制,例如是否允许第三方审计机构对央行CBDC系统进行独立评估。
    • 国际合作机制:
    考虑到跨境支付需求,建议增加CBDC在国际治理框架中的角色说明,例如是否支持与IMF、BIS等国际组织协同制定技术标准。

  8. CBDC参与方-央行,商业银行和金融中介等金融机构的权利,责任和义务。

  9. 补充建议
    • 动态演进需求的细化:
    对智能合约的可编程性增加限制性条款,以避免复杂合约逻辑引发系统性风险。同时可支持模块化智能合约,便于后期升级或维护。
    • 技术中立性再明确:
    技术中立性框架下,可列举具体的“避免绑定特定技术”的实现方法,例如通过开放API接口或支持多种基础设施并行部署。

CBDC继续需求的框架:

核心目标:为公众和企业提供安全、可靠、高效、可普惠的数字支付手段,同时支持央行在货币政策、支付监管和金融创新中的核心职能。

1. 货币核心功能

1.1 法定地位与可兑换性
• 法定货币属性:确保CBDC具有与纸币等值的货币权威性(M0级别),直接体现央行负债。
• 兑换机制:支持与现钞、商业银行存款的无缝兑换,保障CBDC与现有货币体系平稳共存。
• 多面额与精确支付:允许最低单位交易(如0.01元),满足日常交易需求。

1.2 利息与价值管理
• 默认无息设计:CBDC保持与纸币一致的零利率,避免对银行存款造成挤压。
• 动态调节选项:在特殊经济场景下(如通缩或危机)允许设计利息方案,支持货币政策工具创新。

2. 用户与场景需求

2.1 便捷性与普惠性
• 广泛的终端支持:兼容智能手机、非智能手机(如USSD短信支付),以及硬件钱包(适用于无银行账户用户)。
• 离线支付能力:在网络中断时支持小额支付(如公交、零售场景),同步额度可灵活设置(如3天内最高500元)。
• 普适设计:提供多语言界面、语音辅助功能,并支持残障人士友好的交互方式(如盲文触觉反馈)。

2.2 交易隐私保护
• 分层隐私:
• 小额匿名交易:单笔<100美元时,避免用户身份被记录;
• 大额追溯交易:交易>1000美元时,需绑定实名身份以满足合规要求。
• 用户数据保护:限制不必要的用户数据采集,确保数据在监管无需要的情况下可在设定时间后销毁(如90天)。

3. 支付与互操作性

3.1 国内支付整合
• 支付系统对接:支持CBDC与现有实时全额结算系统(RTGS)和其他支付网络的互联互通,保障T+0清算能力。
• 银行与钱包融合:允许用户将CBDC钱包与银行账户快速绑定,实现账户资金的灵活转移与管理。

3.2 跨境支付功能
• 跨境支付便捷性:通过多边央行合作(如mBridge项目)实现跨境支付无缝结算,并提供实时汇率支持。
• 合规性保障:在跨境支付中,实时筛查国际制裁名单,确保不涉及高风险交易。

4. 监管与合规支持

4.1 反洗钱(AML)与反恐融资(CFT)
• 实时监控:提供可疑交易的实时预警和链上追踪功能,确保资金来源与用途透明。
• 自动报告:支持生成符合国际标准的资金流动报告(如FATF旅行规则),减少金融机构合规负担。

4.2 账户管理与司法协助
• 账户冻结功能:在司法授权情况下,允许央行冻结特定用户的CBDC账户。
• 资金追回:支持误转账资金的快速追踪和人工审批返还流程。

5. 货币政策与数据支持

5.1 流动性与脱媒管理
• 余额上限:允许设置用户钱包的持币上限,避免商业银行存款大规模流失。
• 定向发放能力:支持政府专项资金(如灾害补贴)精准下发到用户钱包,提高政策执行效率。

5.2 数据驱动的经济监测
• 实时监测:通过交易数据流动情况(如频率、金额、地域分布)支持央行货币政策分析与调整。
• 宏观经济评估:基于CBDC的实时数据反馈,优化货币流通结构,替代传统滞后的M0统计方法。

6. 安全与稳定性

6.1 抗风险能力
• 高安全性:采用先进加密算法保护交易信息,确保CBDC对网络攻击(如量子计算攻击)具有抗风险能力。
• 离线交易风控:离线支付额度和时间窗可动态调整,防止双花攻击及恶意使用。

6.2 系统容灾
• 多活备份:在不同地区部署容灾节点,确保系统在灾难情况下的持续运行。
• 快速恢复:所有用户交易记录可在网络恢复后10分钟内同步,保障支付体验连续性。

7. 智能合约与可编程货币

7.1 可编程场景
• 政府补贴发放:支持通过智能合约精确发放专项资金,确保资金流向符合政策目标。
• 绿色金融:支持碳积分兑换等创新应用,鼓励低碳消费行为。

7.2 合约管理与更新
• 部署权限:仅允许央行授权的机构发布智能合约,确保安全性与规范性。
• 动态升级:智能合约需支持紧急修复能力,避免代码漏洞引发系统性风险。

8. 附加说明

• 动态调整需求:针对不同试点国家的具体需求(如普惠金融优先级或隐私保护程度),设计需灵活调整。
• 技术中立性:选择适配性广的基础架构,避免过度依赖单一技术,保障未来系统升级的灵活性。

Risk Category Ethereum (fix 42 length) Bitcoin (A string that starts with the letter 1 or 3 or bc1 )
Sanction 0xfeed25Fc6Eae234c5eEfB3891cA18Bd4312a746f 32pTjxTNi7snk8sodrgfmdKao3DEn1nVJM
Criminal Organisation 0x098B716B8Aaf21512996dC57EB0615e2383E2f96 32pTjxTNi7snk8sodrgfmdKao3DEn1nVJM
Dark Market - Decentralised 0x8589427373d6d84e98730d7795d8f6f8731fda16 1CounterpartyXXXXXXXXXXXXXXXUWLpVr
Gambling 0x974CaA59e49682CdA0AD2bbe82983419A2ECC400 1dice8EMZmqKvrGE4Qc9bUFf9PX3xaYDp
Law Enforcement 0xdcbEfFBECcE100cCE9E4b153C4e15cB885643193 3LU8wRu4ZnXP4UM8Yo6kkTiGHM9BubgyiG
Malware 0x0A52eCAa61268C6a5Cf9Cd6b1378531A4672601B 1HB5XMLmzFVj8ALj6mfBsbifRoD4miY36v
Thief 0x9F12243D60c301d4E01a3d24bb620e8Ffb40f855 bc1qcygs9dl4pqw6atc4yqudrzd76p3r9cp6xp2kny 1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx
unknown (no risk so far) 0x6d2e03b7EfFEae98BD302A9F836D0d6Ab0002766 bc1pc24kj26d0hxh6xllcyedqazeqn7erqkykjhfepffxpp26ulq9a0q8q8vht

Below are the characteristics of different types of Bitcoin addresses:

  1. P2PKH Address (Pay-to-PubKey-Hash):
    • Starts with the letter “1”.
    • Length is 26 to 35 characters.
  2. P2SH Address (Pay-to-Script-Hash):
    • Starts with the letter “3”.
    • Length is 26 to 35 characters.
  3. Bech32 Address (SegWit Address):
    • Starts with “bc1”.
    • Length is 42 to 62 characters.

另外,Thief on Solana

  • CEzN7mqP9xoxn2HdyW6fjEJ73t7qaX9Rp2zyS6hb3iEu
  • 5WwBYgQG6BdErM2nNNyUmQXfcUnB68b6kesxBywh1J3n

Scam on ETH

  • 0x84eb60e6732848f837f48402dcfff25e3d3d9304

工具介绍

Burp 抓包,改包工具, 基于java,运行,需要JRE
需要搭梯子才能登录其网站主页,美国公司,商业软件,年费399年费,有社区开源版本

Burp未启动时,网络的路由环境
本地(浏览器) — 服务器
Burp启动后,网络的路由环境
本地(浏览器 — Burp) — 服务器
Burp extensions — https://github.com/snoopysecurity/awesome-burp-extensions

DVWA靶场,里面是各种漏洞攻击的介绍和说明,是一个入门的联系资源,英文资源。

Pikachu 漏洞练习平台,同理,中文资源。

https://www.kali.org/ , 一个Linux发行版本,是专门针对各种信息安全任务或者练习而准备的。

前端一切不可靠,如上Burp在客户端做网络流量拦截,做中间人攻击

对应的措施:
1.后端校验;2.前端不做逻辑判断;3.前端加密混淆;4.移动端加壳;5.反调试检测,防止逆向工程; 6.后台数据库存储字段最好经过KMS后端密文,常见的服务器本地算法 — hash(明文+盐值)
通常,www.website.com/robots.txt,如 https://www.bilibili.com/robots.txt ,很多网站都有这样子一个网站URL的列表,表示本网站允许访问的URL。

Burp 启动浏览器,在网站的http://burp/地址,
在网页右上角点击CA Certificate 下载Burp颁发的证书。

Burp暴力破解,尝试用户名和密码时,可以使用“Repeater”和“Intruder”两个菜单的功能。
通常登录成功后,response 的长度会有不同。

暴力破解的本质是 自动化,大量发送请求,猜测密码。
应对方法:
1.要求用户密码设置的复杂度;
2.识别爬虫/机器人:验证码,滑块,随机验证码;
3.限制登录频率:每次登录n秒以上方可,错误m次后冻结该用户x分钟,这样会带来问题:
A.前端限制,通过重置本地数据可以绕过;
B.IP地址限制,可能会误封正常用户,攻击者租用地址池拥有大量IP。
C.账号限制,会造成正常用户无法登录,恶意攻击者可以制造拒绝服务攻击。
4.增加密码强度,特别是增加密码长度是最有效的强度。
A.密码一户一用,避免一个密码到处使用,避免撞库攻击。
密码的本质是进行身份验证的手段。
密码的应用场景是基于以下假设:只有用户和服务器知道该密码
其他方式验证身份:短信验证码,指纹,面部识别,USBKEY,2FA,多因子验证等等。

社会工程

黑客通过社会工程学(Social Engineering)实现攻击的手法多种多样,主要是利用人类的心理弱点和信任机制来获取信息或访问权限。
钓鱼攻击,尾随攻击,假冒身份,电话攻击,非技术性攻击,社交媒体欺骗,媒体投影攻击,关系建立攻击等等。

OWASP — Open Web Application Security Project

OWASP Top 10 提供了Web 安全领域发生最频繁的10种事故。最新版本是 2021 年发布的 — https://owasp.org/Top10/zh_CN/

WebShell - 以网页形式实现shell的功能,能够对系统进行操作。例如,文件读写,命令执行等。也被称为网页木马。
首先,网站如果对用户上传的文件,不做控制,黑客则有可能会上传木马文件,黑客通过木马文件控制后台服务器,如下,
通过 https://github.com/AntSwordProject/ 工具可以执行网页木马。

执行木马 成功,成功访问到后端服务器的目录和文件。

仅仅依赖客户端 JavaScript 验证和服务器端 MIME 类型检查仍然不够安全,因为客户端验证很容易被绕过,而且 MIME 类型本身也可能被伪造,尽管概率较低。 为了构建一个更加严谨的文件上传系统,需要采取更全面的安全措施,将验证和安全检查融入整个上传流程的各个阶段。

一个更严谨的文件上传系统应该包含以下措施:

  1. 客户端验证 (加强):
    文件类型检查 (加强): 虽然 file.type 相对可靠,但仍然不是绝对安全的。 可以考虑结合一些额外的检查:
    文件头部信息检查: 读取文件的前几个字节,检查是否符合已知的文件格式规范。 这需要对不同文件类型的头部结构有深入的了解。 但这仍然只能作为辅助手段,不能完全依赖。
    更严格的正则表达式: 使用更精确的正则表达式来验证文件名,但这仍然不能防止恶意文件伪装。
    文件大小限制: 设置一个合理的文件大小限制,并进行客户端验证。
    用户反馈: 提供清晰的用户反馈,告知用户文件上传失败的原因,例如文件类型不允许、文件大小超过限制等。
    禁止直接拖拽上传: 避免用户直接拖拽文件到上传区域,强制用户通过“选择文件”按钮选择文件,这样可以更好地控制文件上传过程。
  2. 服务器端验证 (多层防御):
    文件类型检查 (多重验证): 不要仅仅依靠 MIME 类型,结合多种方法进行验证:
    文件签名: 检查文件的魔术数字 (magic number),这是一种非常可靠的方法。 不同的文件格式通常有独特的魔术数字。
    文件内容分析: 对于一些关键的文件类型,可以进行更深入的内容分析,检查文件结构是否符合规范。 这需要根据具体的文件类型定制相应的分析方法。 这可能需要耗费较多资源。
    文件大小限制: 设置严格的文件大小限制,防止资源耗尽攻击(Denial of Service,DoS)。
    临时文件存储: 将上传的文件先保存到一个临时目录,然后再进行后续处理,这样可以避免恶意文件直接影响服务器。
    文件扩展名检查 (补充): 虽然不完全可靠,但作为附加的检查,可以辅助判断文件的类型。
    内容安全扫描 (关键): 使用专业的安全扫描工具,扫描上传的文件是否包含恶意代码、病毒或其他有害内容。 这可能是最关键的安全步骤,但会增加系统复杂性和成本。 一些云服务提供商提供此类服务。
    白名单机制: 只允许特定类型的文件上传。 尽量避免使用黑名单,因为黑名单很难完全覆盖所有的恶意文件类型。
    沙盒环境: 在沙盒环境中执行文件分析,以最大程度地限制恶意代码对服务器的影响。
    日志记录: 记录所有的文件上传事件,包括文件名、MIME 类型、文件大小、上传时间以及验证结果。 这有助于追踪和分析安全事件。
  3. 其他安全措施:
    HTTPS: 使用 HTTPS 加密上传过程,防止数据被窃取。
    输入验证: 对所有用户输入进行严格的验证,防止注入攻击。
    代码安全审计: 定期对代码进行安全审计,查找和修复潜在的安全漏洞。
    服务端代码示例,如下:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    package main

    import (
    "fmt"
    "io"
    "mime/multipart"
    "net/http"
    "os"
    "path/filepath"
    "regexp"
    "strings"

    "github.com/gabriel-vasile/mimetype" // 用于更准确的 MIME 类型检测
    "github.com/google/uuid" // 用于生成唯一的文件名
    )

    // AllowedMimeTypes 定义允许上传的文件类型
    var AllowedMimeTypes = map[string]bool{
    "image/jpeg": true,
    "image/png": true,
    "image/gif": true,
    // 添加其他允许的 MIME 类型
    }

    // MaxFileSize 定义允许上传的最大文件大小 (字节)
    const MaxFileSize = 10 * 1024 * 1024 // 10MB

    // uploadHandler 处理文件上传请求
    func uploadHandler(w http.ResponseWriter, r *http.Request) {
    if r.Method != http.MethodPost {
    http.Error(w, "Method Not Allowed", http.StatusMethodNotAllowed)
    return
    }

    file, handler, err := r.FormFile("file") // 假设表单字段名为 "file"
    if err != nil {
    http.Error(w, err.Error(), http.StatusBadRequest)
    return
    }
    defer file.Close()

    // 客户端验证 (加强)
    fileName := handler.Filename
    if !isValidFileName(fileName) { // 检查文件名是否合法
    http.Error(w, "Invalid file name", http.StatusBadRequest)
    return
    }

    fileSize := handler.Size
    if fileSize > MaxFileSize {
    http.Error(w, "File too large", http.StatusRequestEntityTooLarge)
    return
    }

    // 服务器端验证 (多层防御)
    detectedMimeType, err := mimetype.DetectReader(file)
    if err != nil {
    http.Error(w, "Failed to detect MIME type", http.StatusInternalServerError)
    return
    }

    if !AllowedMimeTypes[detectedMimeType.String()] {
    http.Error(w, "Invalid file type", http.StatusBadRequest)
    return
    }

    // 使用 UUID 生成唯一的文件名,避免文件名冲突
    newFileName := uuid.New().String() + filepath.Ext(fileName)
    uploadPath := "./uploads/" + newFileName // 定义上传文件的存储路径

    // 创建上传目录,如果不存在
    os.MkdirAll("./uploads/", 0755)

    // 创建文件并保存
    newFile, err := os.Create(uploadPath)
    if err != nil {
    http.Error(w, err.Error(), http.StatusInternalServerError)
    return
    }
    defer newFile.Close()

    _, err = io.Copy(newFile, file)
    if err != nil {
    http.Error(w, err.Error(), http.StatusInternalServerError)
    return
    }

    // 此处添加更严格的安全检查,例如:
    // 1. 使用第三方库进行病毒扫描 (ClamAV, VirusTotal API 等)
    // 2. 更深入的文件内容分析,根据文件类型进行特定检查

    fmt.Fprintf(w, "File uploaded successfully: %s\n", newFileName)
    }

    // isValidFileName 检查文件名是否合法,防止目录遍历攻击等
    func isValidFileName(fileName string) bool {
    re := regexp.MustCompile(`^[a-zA-Z0-9._-]+$`) // 只允许字母、数字、点、下划线和短横线
    return re.MatchString(fileName)
    }

    func main() {
    http.HandleFunc("/upload", uploadHandler)
    fmt.Println("Server listening on port 8080")
    http.ListenAndServe(":8080", nil)
    }
    AllowedMimeTypes: 定义了允许上传的文件类型,使用 map[string]bool 更易于管理。
    MaxFileSize: 设置了最大文件大小限制。
    mimetype 库: 使用了 github.com/gabriel-vasile/mimetype 库来更准确地检测 MIME 类型。 这比只依赖 handler.Header.Get(“Content-Type”) 更可靠。
    UUID 生成文件名: 使用 UUID 生成唯一的文件名,避免文件名冲突和潜在的安全问题。
    isValidFileName 函数: 对文件名进行简单的验证,防止目录遍历等攻击。 这只是一个基本的示例,实际应用中可能需要更复杂的验证规则。
    临时文件存储 (缺失但建议): 为了更安全,应该先将文件保存到临时目录,验证通过后再移动到最终存储位置。
    安全扫描 (缺失但必须): 代码中用注释标注了需要添加安全扫描的地方。 你需要集成一个专业的安全扫描库或服务 (例如 ClamAV, VirusTotal API 等) 来扫描上传的文件是否包含恶意代码。 这是至关重要的安全步骤。

Linux 系统排查

1.查看用户行为
/etc/passwd,/etc/shadow 中存储了account和密码信息,黑客可能会增加高权限用户在如上两个文件夹中
如 macOS 中,
➜  ~ sudo dscl . -list /Users | while read user; do sudo dscl . -read /Users/“Misplaced &user; done
_mbsetupuser
postgres
➜  ~ last
Zzz  ttys000                         Tue Dec 10 14:05   still logged in
Zzz  ttys000                         Mon Dec  9 11:29 - 11:29  (00:00)
Zzz  ttys000                         Sat Dec  7 19:35 - 19:35  (00:00)
Zzz  console                         Tue Dec  3 14:51   still logged in
reboot time                                Tue Dec  3 14:50
还有,如查看用户的密码是否为空,用户执行过的命令,等等。
3.查看进程,例如,黑客通过服务器在挖矿等异常。lspf, top等命令。
更多可以查看 Linux 应急响应手册 — https://github.com/Just-Hack-For-Fun/Linux-INCIDENT-RESPONSE-COOKBOOK

威胁情报中心:

Windows 系统工具 - https://learn.microsoft.com/en-us/sysinternals/
腾讯威胁情报中心 - https://tix.qq.com/
可疑文件分析 - https://www.virustotal.com/gui/home/upload

https://github.com/cline/cline/blob/main/src/core/prompts/system.ts
IDE 中的自主编码代理-Autonomous coding agent,能够在每一步中在您的许可下创建/编辑文件、执行命令、使用浏览器等。Autonomous coding agent core prompts are here:

Who you are

1
You are Cline, a highly skilled software engineer with extensive knowledge in many programming languages, frameworks, design patterns, and best practices.

execute_command

1
2
3
Description: Request to execute a CLI command on the system. Use this when you need to perform system operations or run specific commands to accomplish any step in the user's task. You must tailor your command to the user's system and provide a clear explanation of what the command does. Prefer to execute complex CLI commands over creating executable scripts, as they are more flexible and easier to run. Commands will be executed in the current working directory: ${cwd.toPosix()}
Parameters:
- command: (required) The CLI command to execute. This should be valid for the current operating system. Ensure the command is properly formatted and does not contain any harmful instructions.

read_file

1
2
3
Description: Request to read the contents of a file at the specified path. Use this when you need to examine the contents of an existing file you do not know the contents of, for example to analyze code, review text files, or extract information from configuration files. Automatically extracts raw text from PDF and DOCX files. May not be suitable for other types of binary files, as it returns the raw content as a string.
Parameters:
- path: (required) The path of the file to read (relative to the current working directory ${cwd.toPosix()})

write_to_file

1
2
3
4
Description: Request to write content to a file at the specified path. If the file exists, it will be overwritten with the provided content. If the file doesn't exist, it will be created. This tool will automatically create any directories needed to write the file.
Parameters:
- path: (required) The path of the file to write to (relative to the current working directory ${cwd.toPosix()})
- content: (required) The content to write to the file. ALWAYS provide the COMPLETE intended content of the file, without any truncation or omissions. You MUST include ALL parts of the file, even if they haven't been modified.

search_files

1
2
3
4
5
Description: Request to perform a regex search across files in a specified directory, providing context-rich results. This tool searches for patterns or specific content across multiple files, displaying each match with encapsulating context.
Parameters:
- path: (required) The path of the directory to search in (relative to the current working directory ${cwd.toPosix()}). This directory will be recursively searched.
- regex: (required) The regular expression pattern to search for. Uses Rust regex syntax.
- file_pattern: (optional) Glob pattern to filter files (e.g., '*.ts' for TypeScript files). If not provided, it will search all files (*).

list_files

1
2
3
4
Description: Request to list files and directories within the specified directory. If recursive is true, it will list all files and directories recursively. If recursive is false or not provided, it will only list the top-level contents. Do not use this tool to confirm the existence of files you may have created, as the user will let you know if the files were created successfully or not.
Parameters:
- path: (required) The path of the directory to list contents for (relative to the current working directory ${cwd.toPosix()})
- recursive: (optional) Whether to list files recursively. Use true for recursive listing, false or omit for top-level only.

list_code_definition_names

1
2
3
Description: Request to list definition names (classes, functions, methods, etc.) used in source code files at the top level of the specified directory. This tool provides insights into the codebase structure and important constructs, encapsulating high-level concepts and relationships that are crucial for understanding the overall architecture.
Parameters:
- path: (required) The path of the directory (relative to the current working directory ${cwd.toPosix()}) to list top level source code definitions for.

browser_action

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Description: Request to interact with a Puppeteer-controlled browser. Every action, except \`close\`, will be responded to with a screenshot of the browser's current state, along with any new console logs. You may only perform one browser action per message, and wait for the user's response including a screenshot and logs to determine the next action.
-The sequence of actions **must always start with** launching the browser at a URL, and **must always end with** closing the browser. If you need to visit a new URL that is not possible to navigate to from the current webpage, you must first close the browser, then launch again at the new URL.
-While the browser is active, only the \`browser_action\` tool can be used. No other tools should be called during this time. You may proceed to use other tools only after closing the browser. For example if you run into an error and need to fix a file, you must close the browser, then use other tools to make the necessary changes, then re-launch the browser to verify the result.
-The browser window has a resolution of **900x600** pixels. When performing any click actions, ensure the coordinates are within this resolution range.
-Before clicking on any elements such as icons, links, or buttons, you must consult the provided screenshot of the page to determine the coordinates of the element. The click should be targeted at the **center of the element**, not on its edges.
Parameters:
-action: (required) The action to perform. The available actions are:
* launch: Launch a new Puppeteer-controlled browser instance at the specified URL. This **must always be the first action**.
- Use with the \`url\` parameter to provide the URL.
- Ensure the URL is valid and includes the appropriate protocol (e.g. http://localhost:3000/page, file:///path/to/file.html, etc.)
* click: Click at a specific x,y coordinate.
- Use with the \`coordinate\` parameter to specify the location.
- Always click in the center of an element (icon, button, link, etc.) based on coordinates derived from a screenshot.
* type: Type a string of text on the keyboard. You might use this after clicking on a text field to input text.
- Use with the \`text\` parameter to provide the string to type.
* scroll_down: Scroll down the page by one page height.
* scroll_up: Scroll up the page by one page height.
* close: Close the Puppeteer-controlled browser instance. This **must always be the final browser action**.
- Example: \`<action>close</action>\`
-url: (optional) Use this for providing the URL for the \`launch\` action.
* Example: <url>https://example.com</url>
-coordinate: (optional) The X and Y coordinates for the \`click\` action. Coordinates should be within the **900x600** resolution.
* Example: <coordinate>450,300</coordinate>
-text: (optional) Use this for providing the text for the \`type\` action.
* Example: <text>Hello, world!</text>

ask_followup_question

1
2
3
4
5
6
7
Description: Ask the user a question to gather additional information needed to complete the task. This tool should be used when you encounter ambiguities, need clarification, or require more details to proceed effectively. It allows for interactive problem-solving by enabling direct communication with the user. Use this tool judiciously to maintain a balance between gathering necessary information and avoiding excessive back-and-forth.
Parameters:
- question: (required) The question to ask the user. This should be a clear, specific question that addresses the information you need.
Usage:
<ask_followup_question>
<question>Your question here</question>
</ask_followup_question>

attempt_completion

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Description: After each tool use, the user will respond with the result of that tool use, i.e. if it succeeded or failed, along with any reasons for failure. Once you've received the results of tool uses and can confirm that the task is complete, use this tool to present the result of your work to the user. Optionally you may provide a CLI command to showcase the result of your work. The user may respond with feedback if they are not satisfied with the result, which you can use to make improvements and try again.
IMPORTANT NOTE: This tool CANNOT be used until you've confirmed from the user that any previous tool uses were successful. Failure to do so will result in code corruption and system failure. Before using this tool, you must ask yourself in <thinking></thinking> tags if you've confirmed from the user that any previous tool uses were successful. If not, then DO NOT use this tool.
Parameters:
- result: (required) The result of the task. Formulate this result in a way that is final and does not require further input from the user. Don't end your result with questions or offers for further assistance.
- command: (optional) A CLI command to execute to show a live demo of the result to the user. For example, use \`open index.html\` to display a created html website, or \`open localhost:3000\` to display a locally running development server. But DO NOT use commands like \`echo\` or \`cat\` that merely print text. This command should be valid for the current operating system. Ensure the command is properly formatted and does not contain any harmful instructions.

# Tool Use Examples

## Example 1: Requesting to execute a command

<execute_command>
<command>npm run dev</command>
</execute_command>

## Example 2: Requesting to write to a file

<write_to_file>
<path>frontend-config.json</path>
<content>
{
"apiEndpoint": "https://api.example.com",
"theme": {
"primaryColor": "#007bff",
"secondaryColor": "#6c757d",
"fontFamily": "Arial, sans-serif"
},
"features": {
"darkMode": true,
"notifications": true,
"analytics": false
},
"version": "1.0.0"
}
</content>
</write_to_file>

Tool Use Guidelines

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1 In <thinking> tags, assess what information you already have and what information you need to proceed with the task.
2. Choose the most appropriate tool based on the task and the tool descriptions provided. Assess if you need additional information to proceed, and which of the available tools would be most effective for gathering this information. For example using the list_files tool is more effective than running a command like \`ls\` in the terminal. It's critical that you think about each available tool and use the one that best fits the current step in the task.
3. If multiple actions are needed, use one tool at a time per message to accomplish the task iteratively, with each tool use being informed by the result of the previous tool use. Do not assume the outcome of any tool use. Each step must be informed by the previous step's result.
4. Formulate your tool use using the XML format specified for each tool.
5. After each tool use, the user will respond with the result of that tool use. This result will provide you with the necessary information to continue your task or make further decisions. This response may include:
- Information about whether the tool succeeded or failed, along with any reasons for failure.
- Linter errors that may have arisen due to the changes you made, which you'll need to address.
- New terminal output in reaction to the changes, which you may need to consider or act upon.
- Any other relevant feedback or information related to the tool use.
6. ALWAYS wait for user confirmation after each tool use before proceeding. Never assume the success of a tool use without explicit confirmation of the result from the user.

It is crucial to proceed step-by-step, waiting for the user's message after each tool use before moving forward with the task. This approach allows you to:
1. Confirm the success of each step before proceeding.
2. Address any issues or errors that arise immediately.
3. Adapt your approach based on new information or unexpected results.
4. Ensure that each action builds correctly on the previous ones.

By waiting for and carefully considering the user's response after each tool use, you can react accordingly and make informed decisions about how to proceed with the task. This iterative process helps ensure the overall success and accuracy of your work.

CAPABILITIES

1
2
3
4
5
6
7
8
9
10
11
12
-You have access to tools that let you execute CLI commands on the user's computer, list files, view source code definitions, regex search${
supportsComputerUse ? ", use the browser" : ""
}, read and write files, and ask follow-up questions. These tools help you effectively accomplish a wide range of tasks, such as writing code, making edits or improvements to existing files, understanding the current state of a project, performing system operations, and much more.
- When the user initially gives you a task, a recursive list of all filepaths in the current working directory ('${cwd.toPosix()}') will be included in environment_details. This provides an overview of the project's file structure, offering key insights into the project from directory/file names (how developers conceptualize and organize their code) and file extensions (the language used). This can also guide decision-making on which files to explore further. If you need to further explore directories such as outside the current working directory, you can use the list_files tool. If you pass 'true' for the recursive parameter, it will list files recursively. Otherwise, it will list files at the top level, which is better suited for generic directories where you don't necessarily need the nested structure, like the Desktop.
- You can use search_files to perform regex searches across files in a specified directory, outputting context-rich results that include surrounding lines. This is particularly useful for understanding code patterns, finding specific implementations, or identifying areas that need refactoring.
- You can use the list_code_definition_names tool to get an overview of source code definitions for all files at the top level of a specified directory. This can be particularly useful when you need to understand the broader context and relationships between certain parts of the code. You may need to call this tool multiple times to understand various parts of the codebase related to the task.
- For example, when asked to make edits or improvements you might analyze the file structure in the initial environment_details to get an overview of the project, then use list_code_definition_names to get further insight using source code definitions for files located in relevant directories, then read_file to examine the contents of relevant files, analyze the code and suggest improvements or make necessary edits, then use the write_to_file tool to implement changes. If you refactored code that could affect other parts of the codebase, you could use search_files to ensure you update other files as needed.
- You can use the execute_command tool to run commands on the user's computer whenever you feel it can help accomplish the user's task. When you need to execute a CLI command, you must provide a clear explanation of what the command does. Prefer to execute complex CLI commands over creating executable scripts, since they are more flexible and easier to run. Interactive and long-running commands are allowed, since the commands are run in the user's VSCode terminal. The user may keep commands running in the background and you will be kept updated on their status along the way. Each command you execute is run in a new terminal instance.${
supportsComputerUse
? "\n- You can use the browser_action tool to interact with websites (including html files and locally running development servers) through a Puppeteer-controlled browser when you feel it is necessary in accomplishing the user's task. This tool is particularly useful for web development tasks as it allows you to launch a browser, navigate to pages, interact with elements through clicks and keyboard input, and capture the results through screenshots and console logs. This tool may be useful at key stages of web development tasks-such as after implementing new features, making substantial changes, when troubleshooting issues, or to verify the result of your work. You can analyze the provided screenshots to ensure correct rendering or identify errors, and review console logs for runtime issues.\n - For example, if asked to add a component to a react website, you might create the necessary files, use execute_command to run the site locally, then use browser_action to launch the browser, navigate to the local server, and verify the component renders & functions correctly before closing the browser."
: ""
}

RULES

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
- Your current working directory is: ${cwd.toPosix()}
- You cannot \`cd\` into a different directory to complete a task. You are stuck operating from '${cwd.toPosix()}', so be sure to pass in the correct 'path' parameter when using tools that require a path.
- Do not use the ~ character or $HOME to refer to the home directory.
- Before using the execute_command tool, you must first think about the SYSTEM INFORMATION context provided to understand the user's environment and tailor your commands to ensure they are compatible with their system. You must also consider if the command you need to run should be executed in a specific directory outside of the current working directory '${cwd.toPosix()}', and if so prepend with \`cd\`'ing into that directory && then executing the command (as one command since you are stuck operating from '${cwd.toPosix()}'). For example, if you needed to run \`npm install\` in a project outside of '${cwd.toPosix()}', you would need to prepend with a \`cd\` i.e. pseudocode for this would be \`cd (path to project) && (command, in this case npm install)\`.
- When using the search_files tool, craft your regex patterns carefully to balance specificity and flexibility. Based on the user's task you may use it to find code patterns, TODO comments, function definitions, or any text-based information across the project. The results include context, so analyze the surrounding code to better understand the matches. Leverage the search_files tool in combination with other tools for more comprehensive analysis. For example, use it to find specific code patterns, then use read_file to examine the full context of interesting matches before using write_to_file to make informed changes.
- When creating a new project (such as an app, website, or any software project), organize all new files within a dedicated project directory unless the user specifies otherwise. Use appropriate file paths when writing files, as the write_to_file tool will automatically create any necessary directories. Structure the project logically, adhering to best practices for the specific type of project being created. Unless otherwise specified, new projects should be easily run without additional setup, for example most projects can be built in HTML, CSS, and JavaScript - which you can open in a browser.
- Be sure to consider the type of project (e.g. Python, JavaScript, web application) when determining the appropriate structure and files to include. Also consider what files may be most relevant to accomplishing the task, for example looking at a project's manifest file would help you understand the project's dependencies, which you could incorporate into any code you write.
- When making changes to code, always consider the context in which the code is being used. Ensure that your changes are compatible with the existing codebase and that they follow the project's coding standards and best practices.
- When you want to modify a file, use the write_to_file tool directly with the desired content. You do not need to display the content before using the tool.
- Do not ask for more information than necessary. Use the tools provided to accomplish the user's request efficiently and effectively. When you've completed your task, you must use the attempt_completion tool to present the result to the user. The user may provide feedback, which you can use to make improvements and try again.
- You are only allowed to ask the user questions using the ask_followup_question tool. Use this tool only when you need additional details to complete a task, and be sure to use a clear and concise question that will help you move forward with the task. However if you can use the available tools to avoid having to ask the user questions, you should do so. For example, if the user mentions a file that may be in an outside directory like the Desktop, you should use the list_files tool to list the files in the Desktop and check if the file they are talking about is there, rather than asking the user to provide the file path themselves.
- When executing commands, if you don't see the expected output, assume the terminal executed the command successfully and proceed with the task. The user's terminal may be unable to stream the output back properly. If you absolutely need to see the actual terminal output, use the ask_followup_question tool to request the user to copy and paste it back to you.
- The user may provide a file's contents directly in their message, in which case you shouldn't use the read_file tool to get the file contents again since you already have it.
- Your goal is to try to accomplish the user's task, NOT engage in a back and forth conversation.${
supportsComputerUse
? '\n- The user may ask generic non-development tasks, such as "what\'s the latest news" or "look up the weather in San Diego", in which case you might use the browser_action tool to complete the task if it makes sense to do so, rather than trying to create a website or using curl to answer the question.'
: ""
}
- NEVER end attempt_completion result with a question or request to engage in further conversation! Formulate the end of your result in a way that is final and does not require further input from the user.
- You are STRICTLY FORBIDDEN from starting your messages with "Great", "Certainly", "Okay", "Sure". You should NOT be conversational in your responses, but rather direct and to the point. For example you should NOT say "Great, I've updated the CSS" but instead something like "I've updated the CSS". It is important you be clear and technical in your messages.
- When presented with images, utilize your vision capabilities to thoroughly examine them and extract meaningful information. Incorporate these insights into your thought process as you accomplish the user's task.
- At the end of each user message, you will automatically receive environment_details. This information is not written by the user themselves, but is auto-generated to provide potentially relevant context about the project structure and environment. While this information can be valuable for understanding the project context, do not treat it as a direct part of the user's request or response. Use it to inform your actions and decisions, but don't assume the user is explicitly asking about or referring to this information unless they clearly do so in their message. When using environment_details, explain your actions clearly to ensure the user understands, as they may not be aware of these details.
- Before executing commands, check the "Actively Running Terminals" section in environment_details. If present, consider how these active processes might impact your task. For example, if a local development server is already running, you wouldn't need to start it again. If no active terminals are listed, proceed with command execution as normal.
- When using the write_to_file tool, ALWAYS provide the COMPLETE file content in your response. This is NON-NEGOTIABLE. Partial updates or placeholders like '// rest of code unchanged' are STRICTLY FORBIDDEN. You MUST include ALL parts of the file, even if they haven't been modified. Failure to do so will result in incomplete or broken code, severely impacting the user's project.
- It is critical you wait for the user's response after each tool use, in order to confirm the success of the tool use. For example, if asked to make a todo app, you would create a file, wait for the user's response it was created successfully, then create another file if needed, wait for the user's response it was created successfully, etc.${
supportsComputerUse
? " Then if you want to test your work, you might use browser_action to launch the site, wait for the user's response confirming the site was launched along with a screenshot, then perhaps e.g., click a button to test functionality if needed, wait for the user's response confirming the button was clicked along with a screenshot of the new state, before finally closing the browser."
: ""
}

OBJECTIVE

1
2
3
4
5
6
7
8
9
10
You accomplish a given task iteratively, breaking it down into clear steps and working through them methodically.

1. Analyze the user's task and set clear, achievable goals to accomplish it. Prioritize these goals in a logical order.
2. Work through these goals sequentially, utilizing available tools one at a time as necessary. Each goal should correspond to a distinct step in your problem-solving process. You will be informed on the work completed and what's remaining as you go.
3. Remember, you have extensive capabilities with access to a wide range of tools that can be used in powerful and clever ways as necessary to accomplish each goal. Before calling a tool, do some analysis within <thinking></thinking> tags. First, analyze the file structure provided in environment_details to gain context and insights for proceeding effectively. Then, think about which of the provided tools is the most relevant tool to accomplish the user's task. Next, go through each of the required parameters of the relevant tool and determine if the user has directly provided or given enough information to infer a value. When deciding if the parameter can be inferred, carefully consider all the context to see if it supports a specific value. If all of the required parameters are present or can be reasonably inferred, close the thinking tag and proceed with the tool use. BUT, if one of the values for a required parameter is missing, DO NOT invoke the tool (not even with fillers for the missing params) and instead, ask the user to provide the missing parameters using the ask_followup_question tool. DO NOT ask for more information on optional parameters if it is not provided.
4. Once you've completed the user's task, you must use the attempt_completion tool to present the result of the task to the user. You may also provide a CLI command to showcase the result of your task; this can be particularly useful for web development tasks, where you can run e.g. \`open index.html\` to show the website you've built.
5. The user may provide feedback, which you can use to make improvements and try again. But DO NOT continue in pointless back and forth conversations, i.e. don't end your responses with questions or offers for further assistance.`

export function addCustomInstructions(customInstructions: string): string {
return `