ai-code-review
AI代码审查 AI Code Review
介绍 Introduction
With the rapid advancement of large language models (LLMs) and other AI technologies, AI-assisted code review is expected to play an increasingly significant role in future software development processes.
Traditional code review not only requires substantial human resources and time, but also heavily relies on individual reviewers’ expertise, often failing to comprehensively identify potential issues. By integrating LLM-based solutions, defects can be detected, diverse strategies can be proposed, and both cost and review time can be substantially reduced, ultimately resulting in higher-quality and more efficient development workflows.
- AI Code Review VS Static Code Analysis(Linter, Sonar)
- AI Code Review VS Manual Code Review
Manual code review
- High Cost: requires significant human efforts
- Inconsistent Quality & Standards: relies on individual reviewers’ expertise
- Human Errors: Often fail to identify potential issues due to time constraints
- High Latency: requires peers take lots of time to review
With LLM, we can achieve better code review
- Low Cost
- Consistent Quality & Standards
- Boost productivity and efficiency
- Automation
相关工作 Related Work
文章
- BitsAI-CR: Automated Code Review via LLM in Practice https://arxiv.org/pdf/2501.15134
- BitsAI-CR的前身:基于大模型 + 知识库的 Code Review 实践 https://zhuanlan.zhihu.com/p/655550057
产品
- PR Agents
- CodeRabbits
- Cursor
- CodeAnt
- Kody
整体架构
此设计参考了BitsAI-CR
方法
Git Code Diff
- Detailed Description of Unified Format https://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html
gitlab、github使用的code diff都是git的code diff
让我们看看一个code diff长什么样
- 在repo中创建一个新的
test.txt
文件,内容为Hello World
。 - 使用
git diff head
命令可以看到其内容如下
diff --git a/test.txt b/test.txt
new file mode 100644
index 0000000..c57eff5
--- /dev/null
+++ b/test.txt
@@ -0,0 +1 @@
+Hello World!
\ No newline at end of file
Prompt的设计
(1) 如何传入Code Diff让LLM获取足够的信息来进行Code Review
不同的模式(todo:这里应该绘制一个图)
- MR级别,一次性传入所有的code diff
- 文件级别
- 对某个文件的code diff做一个汇总
- 基于当前的code diff进行扩充(上下10行),参考linux codebase提交MR的code diff展示
(2) 输出设计
-
inline的review comment
- 需要LLM返回end_new_line
-
suggestion mode
- 需要LLM返回精确的new line start和end,以及suggesting_code
减少Cost
ChatGPT的API是要钱的,部署开源模型也需要算力资源。
如何在模型的Cost和
- 有一个gitignore似的文件用于过滤掉proto, proto gen, unit test, mock相关文件
- 增加一个前序Precheck Agents,使用低价的4o-mini基座模型,可以提前过滤掉类似于import和proto改变的信息
- 由于目前ChatGPT的KVCache主要是对prefix做的,尽量把固定的内容放在system prompt前面,使得其可以被cached
- 减少prompt的context
- incremental code review of pro mode