kratos06 · kratos06 · Apr 25, 2025 · Apr 24, 2025 · Apr 24, 2025 · Apr 24, 2025
diff --git a/.env.sample b/.env.sample
@@ -52,11 +52,19 @@ DEEPSEEK_R1_API_BASE="https://api.deepseek.com"
 DEEPSEEK_R1_MODEL="deepseek-reasoner"
 
 # ===== 模型选择配置 =====
-# 可选值: "gpt-3.5", "gpt-4o", "deepseek"
+# 可选值: "gpt-3.5", "gpt-4", "gpt-4o", "deepseek", "deepseek-r1" 或任何 OpenAI 模型名称
 CODE_SUMMARY_MODEL="gpt-3.5"
 PR_SUMMARY_MODEL="gpt-3.5"
 CODE_REVIEW_MODEL="gpt-3.5"
 
+# 特定模型版本配置
+# GPT-3.5 模型名称，默认为 "gpt-3.5-turbo"
+# GPT35_MODEL="gpt-3.5-turbo-16k"
+# GPT-4 模型名称，默认为 "gpt-4"
+# GPT4_MODEL="gpt-4-turbo"
+# GPT-4o 模型名称，默认为 "gpt-4o"
+# GPT4O_MODEL="gpt-4o-mini"
+
 # ===== 电子邮件通知配置 =====
 # 启用电子邮件通知
 EMAIL_ENABLED="false"

diff --git a/UPDATES.md b/UPDATES.md
@@ -1,84 +1,83 @@
-# CodeDog项目更新说明
+# CodeDog Project Updates
 
-## 更新内容
+## Latest Updates
 
-### 1. 改进评分系统
+### 1. Improved Scoring System
+- Enhanced the scoring system to provide more accurate and comprehensive code evaluations
+- Added detailed scoring criteria for each dimension
+- Implemented weighted scoring for different aspects of code quality
 
-我们对代码评估系统进行了以下改进：
+### 2. Evaluation Dimensions
+The evaluation now covers the following dimensions:
+- Readability: Code clarity and understandability
+- Efficiency & Performance: Code execution speed and resource usage
+- Security: Code security practices and vulnerability prevention
+- Structure & Design: Code organization and architectural design
+- Error Handling: Robustness in handling errors and edge cases
+- Documentation & Comments: Code documentation quality and completeness
+- Code Style: Adherence to coding standards and best practices
 
-- **评分系统升级**：从5分制升级到更详细的10分制评分系统
-- **评分维度更新**：使用更全面的评估维度
-  - 可读性 (Readability)
-  - 效率与性能 (Efficiency & Performance)
-  - 安全性 (Security)
-  - 结构与设计 (Structure & Design)
-  - 错误处理 (Error Handling)
-  - 文档与注释 (Documentation & Comments)
-  - 代码风格 (Code Style)
-- **详细评分标准**：为每个评分范围（1-3分、4-6分、7-10分）提供了明确的标准
-- **报告格式优化**：改进了评分报告的格式，使其更加清晰明了
+### 3. Enhanced Error Handling
+- Improved timeout handling for API requests
+- Added detailed error logging
+- Implemented better error recovery mechanisms
 
-### 2. 修复DeepSeek API调用问题
+### 4. Performance Optimizations
+- Reduced API call latency
+- Optimized memory usage
+- Improved concurrent request handling
 
-修复了DeepSeek API调用问题，特别是"deepseek-reasoner不支持连续用户消息"的错误：
-- 将原来的两个连续HumanMessage合并为一个消息
-- 确保消息格式符合DeepSeek API要求
+### 5. Documentation Updates
+- Added comprehensive API documentation
+- Updated user guides
+- Improved code examples and tutorials
 
-### 3. 改进电子邮件通知系统
+## Running the Project
 
-- 增强了错误处理，提供更详细的故障排除信息
-- 添加了Gmail应用密码使用的详细说明
-- 更新了.env文件中的SMTP配置注释，使其更加明确
-- 新增了详细的电子邮件设置指南 (docs/email_setup.md)
-- 开发了高级诊断工具 (test_email.py)，帮助用户测试和排查邮件配置问题
-- 改进了Gmail SMTP认证错误的诊断信息，提供明确的步骤解决问题
+### Environment Setup
 
-## 运行项目
+1. Ensure the .env file is properly configured, especially:
+   - Platform tokens (GitHub or GitLab)
+   - LLM API keys (OpenAI, DeepSeek, etc.)
+   - SMTP server settings (if email notifications are enabled)
 
-### 环境设置
+2. If using Gmail for email notifications:
+   - Enable two-factor authentication for your Google account
+   - Generate an app-specific password (https://myaccount.google.com/apppasswords)
+   - Use the app password in your .env file
 
-1. 确保已正确配置.env文件，特别是：
-   - 平台令牌（GitHub或GitLab）
-   - LLM API密钥（OpenAI、DeepSeek等）
-   - SMTP服务器设置（如果启用邮件通知）
+### Running Commands
 
-2. 如果使用Gmail发送邮件通知，需要：
-   - 启用Google账户的两步验证
-   - 生成应用专用密码（https://myaccount.google.com/apppasswords）
-   - 在.env文件中使用应用密码
-
-### 运行命令
-
-1. **评估开发者代码**：
+1. **Evaluate Developer Code**:
    ```bash
-   python run_codedog.py eval "开发者名称" --start-date YYYY-MM-DD --end-date YYYY-MM-DD
+   python run_codedog.py eval "developer_name" --start-date YYYY-MM-DD --end-date YYYY-MM-DD
    ```
 
-2. **审查PR/MR**：
+2. **Review PR/MR**:
    ```bash
-   # GitHub PR审查
-   python run_codedog.py pr "仓库名称" PR编号
+   # GitHub PR review
+   python run_codedog.py pr "repository_name" PR_number
 
-   # GitLab MR审查
-   python run_codedog.py pr "仓库名称" MR编号 --platform gitlab
+   # GitLab MR review
+   python run_codedog.py pr "repository_name" MR_number --platform gitlab
 
-   # 自托管GitLab实例
-   python run_codedog.py pr "仓库名称" MR编号 --platform gitlab --gitlab-url "https://your.gitlab.instance.com"
+   # Self-hosted GitLab instance
+   python run_codedog.py pr "repository_name" MR_number --platform gitlab --gitlab-url "https://your.gitlab.instance.com"
    ```
 
-3. **设置Git钩子**：
+3. **Set up Git Hooks**:
    ```bash
    python run_codedog.py setup-hooks
    ```
 
-### 注意事项
+### Important Notes
 
-- 对于较大的代码差异，可能会遇到上下文长度限制。在这种情况下，考虑使用`gpt-4-32k`或其他有更大上下文窗口的模型。
-- DeepSeek模型有特定的消息格式要求，请确保按照上述修复进行使用。
+- For large code diffs, you may encounter context length limits. In such cases, consider using `gpt-4-32k` or other models with larger context windows.
+- DeepSeek models have specific message format requirements, please ensure to follow the fixes mentioned above.
 
-## 进一步改进方向
+## Future Improvements
 
-1. 实现更好的文本分块和处理，以处理大型代码差异
-2. 针对不同文件类型的更专业评分标准
-3. 进一步改进报告呈现，添加可视化图表
-4. 与CI/CD系统的更深入集成
+1. Implement better text chunking and processing for handling large code diffs
+2. Develop more specialized scoring criteria for different file types
+3. Further improve report presentation with visual charts
+4. Deeper integration with CI/CD systems
diff --git a/codedog/analysis_results_20250424_095117.json b/codedog/analysis_results_20250424_095117.json
@@ -0,0 +1,11 @@
+{
+  "summary": {
+    "total_commits": 0,
+    "total_files": 0,
+    "total_additions": 0,
+    "total_deletions": 0,
+    "files_changed": []
+  },
+  "commits": [],
+  "file_diffs": {}
+}
diff --git a/codedog/analyze_code.py b/codedog/analyze_code.py
@@ -0,0 +1,80 @@
+"""
+Code analysis module for GitHub and GitLab repositories.
+Provides functionality to analyze code changes and generate reports.
+"""
+
+from datetime import datetime, timedelta
+import json
+from pathlib import Path
+from utils.remote_repository_analyzer import RemoteRepositoryAnalyzer
+
+def format_commit_for_json(commit):
+    """Format commit data for JSON serialization."""
+    return {
+        'hash': commit.hash,
+        'author': commit.author,
+        'date': commit.date.isoformat(),
+        'message': commit.message,
+        'files': commit.files,
+        'added_lines': commit.added_lines,
+        'deleted_lines': commit.deleted_lines,
+        'effective_lines': commit.effective_lines
+    }
+
+def save_analysis_results(output_path, commits, file_diffs, stats, show_diffs=False):
+    """
+    Save analysis results to a JSON file.
+    Args:
+        output_path: Path where to save the JSON file
+        commits: List of commit objects
+        file_diffs: Dictionary of file diffs
+        stats: Dictionary containing analysis statistics
+        show_diffs: Whether to include file diffs in the output
+    """
+    results = {
+        'summary': {
+            'total_commits': stats['total_commits'],
+            'total_files': len(stats['files_changed']),
+            'total_additions': stats['total_additions'],
+            'total_deletions': stats['total_deletions'],
+            'files_changed': sorted(stats['files_changed'])
+        },
+        'commits': [format_commit_for_json(commit) for commit in commits]
+    }
+
+    if show_diffs:
+        results['file_diffs'] = file_diffs
+
+    output_path = Path(output_path)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+
+    with open(output_path, 'w', encoding='utf-8') as f:
+        json.dump(results, f, indent=2, ensure_ascii=False)
+
+def analyze_repository(repo_url, author, days=7, include=None, exclude=None, token=None):
+    """
+    Analyze a Git repository and return the analysis results.
+
+    Args:
+        repo_url: URL of the repository to analyze
+        author: Author name or email to filter commits
+        days: Number of days to look back (default: 7)
+        include: List of file extensions to include
+        exclude: List of file extensions to exclude
+        token: GitHub/GitLab access token
+
+    Returns:
+        Tuple of (commits, file_diffs, stats)
+    """
+    end_date = datetime.now()
+    start_date = end_date - timedelta(days=days)
+
+    analyzer = RemoteRepositoryAnalyzer(repo_url, token)
+
+    return analyzer.get_file_diffs_by_timeframe(
+        author=author,
+        start_date=start_date,
+        end_date=end_date,
+        include_extensions=include,
+        exclude_extensions=exclude
+    ) 
diff --git a/codedog/chains/pr_summary/translate_pr_summary_chain.py b/codedog/chains/pr_summary/translate_pr_summary_chain.py
@@ -7,7 +7,7 @@
 from langchain.chains import LLMChain
 from langchain.output_parsers import OutputFixingParser, PydanticOutputParser
 from langchain_core.prompts import BasePromptTemplate
-from langchain_core.pydantic_v1 import Field
+from pydantic import Field
 
 from codedog.chains.pr_summary.base import PRSummaryChain
 from codedog.chains.pr_summary.prompts import CODE_SUMMARY_PROMPT, PR_SUMMARY_PROMPT