Task Requirements¶
这里写当前任务的附加要求。
可以写的内容例如:
- 指定默认测试 shape,例如
BNSD=1,16,8192,512 - 指定必须优先优化的输入规模
- 指定不允许修改的文件或模块
- 指定必须保留的行为约束
- 指定本次任务特别关注的性能指标
示例:
- 把默认的 size 改成 BNSD=1,16,8192,512
- 优先关注 torch_npu baseline 对比
- 不要改动与 flash_attention_score 无关的算子
Hints¶
- Prefer small, isolated kernel changes. Treat correctness failure as a hard stop for that iteration.
- If 3 consecutive iterations show no improvement, re-read
../../remote.md, inspect previous rounds inITERATIONS.md, search online for additional optimization ideas, and re-evaluate the tuning direction before continuing.