Skip to content

Commit 9cc33d0

Browse files
committed
Support/Fix issues 15, 16, and 17
Signed-off-by: Rahul Krishna <[email protected]>
1 parent 4af838c commit 9cc33d0

File tree

384 files changed

+198433
-242
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

384 files changed

+198433
-242
lines changed

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

CHANGELOG.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,113 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.1.10] - 2025-07-20
9+
10+
### Added
11+
- Ray distributed processing support for parallel symbol table generation (addresses [#16](https://github.com/codellm-devkit/codeanalyzer-python/issues/16))
12+
- `--ray/--no-ray` CLI flag to enable/disable Ray-based distributed analysis
13+
- `--skip-tests/--include-tests` CLI flag to control whether test files are analyzed (improves analysis performance)
14+
- `--file-name` CLI flag for single file analysis (addresses part of [#16](https://github.com/codellm-devkit/codeanalyzer-python/issues/16))
15+
- Incremental caching system with SHA256-based file change detection
16+
- Automatic caching of analysis results to `analysis_cache.json`
17+
- File-level caching with content hash validation to avoid re-analyzing unchanged files
18+
- Significant performance improvements for subsequent analysis runs
19+
- Cache reuse statistics logging
20+
- Custom exception classes for better error handling in symbol table building:
21+
- `SymbolTableBuilderException` (base exception)
22+
- `SymbolTableBuilderFileNotFoundError` (file not found errors)
23+
- `SymbolTableBuilderParsingError` (parsing errors)
24+
- `SymbolTableBuilderRayError` (Ray processing errors)
25+
- Enhanced PyModule schema with metadata fields for caching:
26+
- `last_modified` timestamp tracking
27+
- `content_hash` for precise change detection
28+
- Progress bar support for both serial and parallel processing modes
29+
- Enhanced test fixtures including xarray project for comprehensive testing
30+
- Comprehensive `__init__.py` exports for syntactic analysis module
31+
- Smart dependency installation with conditional logic:
32+
- Only installs requirements files when they exist (requirements.txt, requirements-dev.txt, dev-requirements.txt, test-requirements.txt)
33+
- Only performs editable installation when package definition files are present (pyproject.toml, setup.py, setup.cfg)
34+
- Improved virtual environment setup with better dependency detection and installation logic
35+
36+
### Changed
37+
- **BREAKING CHANGE**: Updated Python version requirement from `>=3.10` to `>=3.9` for broader compatibility (closes [#17](https://github.com/codellm-devkit/codeanalyzer-python/issues/17))
38+
- **BREAKING CHANGE**: Updated dependency versions with more conservative constraints for better stability:
39+
- `pydantic` downgraded from `>=2.11.7` to `>=1.8.0,<2.0.0` for stability
40+
- `pandas` constrained to `>=1.3.0,<2.0.0`
41+
- `numpy` constrained to `>=1.21.0,<1.24.0`
42+
- `rich` constrained to `>=12.6.0,<14.0.0`
43+
- `typer` constrained to `>=0.9.0,<1.0.0`
44+
- Other dependencies updated with conservative version ranges for better compatibility
45+
- Major Architecture Enhancement: Complete rewrite of analysis caching system
46+
- `analyze()` method now implements intelligent caching with PyApplication serialization
47+
- Symbol table building redesigned to support incremental updates and cache reuse
48+
- File change detection using SHA256 content hashing for maximum accuracy
49+
- Enhanced `Codeanalyzer` constructor signature to accept `file_name` parameter for single file analysis
50+
- Refactored symbol table building from monolithic `build()` method to cache-aware file-level processing
51+
- Enhanced `Codeanalyzer` constructor signature to accept `skip_tests` and `using_ray` parameters
52+
- Improved error handling with proper context managers in core analyzer
53+
- Updated CLI to use Pydantic v1 compatible JSON serialization methods
54+
- Reorganized syntactic analysis module structure with proper exception handling and exports
55+
- Enhanced virtual environment detection with better fallback mechanisms
56+
- Symbol table builder now sets metadata fields (`last_modified`, `content_hash`) for all PyModule objects
57+
58+
### Fixed
59+
- Fixed critical symbol table bug for nested functions (closes [#15](https://github.com/codellm-devkit/codeanalyzer-python/issues/15))
60+
- Corrected `_callables()` method recursion logic to properly capture both outer and inner functions
61+
- Previously, only inner/nested functions were being captured in the symbol table
62+
- Now correctly processes module-level functions, class methods, and all nested function definitions
63+
- Fixed nested method/function signature generation in symbol table builder
64+
- Corrected `_callables()` method to properly build fully qualified signatures for nested structures
65+
- Fixed issue where nested functions and methods were getting incorrect signatures (e.g., `main.__init__` instead of `main.outer_function.NestedClass.__init__`)
66+
- Added `prefix` parameter to `_callables()` and `_add_class()` methods to maintain proper nesting context
67+
- Signatures now correctly reflect the full nested hierarchy (e.g., `main.outer_function.NestedClass.nested_class_method.method_nested_function`)
68+
- Updated class method processing to pass class signature as prefix to nested callable processing
69+
- Improved path relativization to project directory for cleaner signature generation
70+
- Fixed Pydantic v2 compatibility issues by reverting to v1 API (`json()` instead of `model_dump_json()`)
71+
- Fixed missing import statements and type annotations throughout the codebase
72+
- Fixed symbol table builder to support individual file processing for distributed execution
73+
- Improved error handling in virtual environment detection and Python interpreter resolution
74+
- Fixed schema type annotations to use proper string keys for better serialization
75+
- Enhanced import ordering and removed unnecessary blank lines in CLI module
76+
- Improved virtual environment setup reliability:
77+
- Fixed unnecessary pip installs by adding conditional logic to only install when dependencies are available
78+
- Only attempts to install requirements files if they actually exist in the project
79+
- Only performs editable installation when package definition files are present
80+
- Prevents errors and warnings from attempting to install non-existent dependencies
81+
82+
### Technical Details
83+
- Added Ray as a core dependency for distributed computing capabilities (addresses [#16](https://github.com/codellm-devkit/codeanalyzer-python/issues/16))
84+
- Implemented `@ray.remote` decorator for parallel file processing
85+
- Comprehensive caching system implementation:
86+
- `_load_pyapplication_from_cache()` and `_save_analysis_cache()` methods for PyApplication serialization
87+
- `_file_unchanged()` method with SHA256 content hash validation
88+
- Cache-aware symbol table building with selective file processing
89+
- Automatic cache statistics and performance reporting
90+
- Enhanced progress tracking for both serial and parallel execution modes with Rich progress bars
91+
- Updated schema to use `Dict[str, PyModule]` instead of `dict[Path, PyModule]` for better serialization
92+
- Extended PyModule schema with optional `last_modified` and `content_hash` fields for caching metadata
93+
- Added comprehensive exception hierarchy for better error classification and handling
94+
- Refactored symbol table building into modular, file-level processing suitable for distribution
95+
- Enhanced Python interpreter detection with support for multiple version managers (pyenv, conda, asdf)
96+
- Added `hashlib` integration for file content hashing throughout the codebase
97+
- Enhanced virtual environment setup logic:
98+
- Modified `_add_class()` method to accept `prefix` parameter and pass class signature to method processing
99+
- Updated `_callables()` method signature to include `prefix` parameter for nested context tracking
100+
- Enhanced signature building logic to use prefix when available, falling back to Jedi resolution for top-level definitions
101+
- Fixed recursive calls to pass current signature as prefix for proper nesting hierarchy
102+
- Implemented conditional dependency installation with existence checks for requirements files and package definition files
103+
104+
### Notes
105+
- This release significantly addresses the performance improvements requested in [#16](https://github.com/codellm-devkit/codeanalyzer-python/issues/16):
106+
- ✅ Ray parallelization implemented
107+
- ✅ Incremental caching with SHA256-based change detection implemented
108+
-`--file-name` option for single-file analysis implemented
109+
-`--nproc` options not yet included (still uses all available cores with Ray)
110+
- ✅ Critical bug fix for nested function detection ([#15](https://github.com/codellm-devkit/codeanalyzer-python/issues/15)) is now included in this version
111+
- Expected performance improvements: 2-10x faster on subsequent runs depending on code change frequency
112+
- Enhanced symbol table accuracy ensures all function definitions are properly captured
113+
- Virtual environment setup is now more robust and only installs dependencies when they are actually available
114+
8115
## [0.1.9] - 2025-07-14
9116

10117
### Fixed

codeanalyzer/__main__.py

Lines changed: 40 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
11
from pathlib import Path
2-
from typing import Annotated, Optional
2+
from typing import Optional, Annotated
33

44
import typer
55

66
from codeanalyzer.core import Codeanalyzer
77
from codeanalyzer.utils import _set_log_level, logger
88
from codeanalyzer.config import OutputFormat
99

10-
1110
def main(
1211
input: Annotated[
1312
Path, typer.Option("-i", "--input", help="Path to the project root directory.")
@@ -32,25 +31,45 @@ def main(
3231
using_codeql: Annotated[
3332
bool, typer.Option("--codeql/--no-codeql", help="Enable CodeQL-based analysis.")
3433
] = False,
34+
using_ray: Annotated[
35+
bool,
36+
typer.Option(
37+
"--ray/--no-ray", help="Enable Ray for distributed analysis."
38+
),
39+
] = False,
3540
rebuild_analysis: Annotated[
3641
bool,
3742
typer.Option(
3843
"--eager/--lazy",
3944
help="Enable eager or lazy analysis. Defaults to lazy.",
4045
),
4146
] = False,
47+
skip_tests: Annotated[
48+
bool,
49+
typer.Option(
50+
"--skip-tests/--include-tests",
51+
help="Skip test files in analysis.",
52+
),
53+
] = True,
54+
file_name: Annotated[
55+
Optional[Path],
56+
typer.Option(
57+
"--file-name",
58+
help="Analyze only the specified file (relative to input directory).",
59+
),
60+
] = None,
4261
cache_dir: Annotated[
4362
Optional[Path],
4463
typer.Option(
4564
"-c",
4665
"--cache-dir",
47-
help="Directory to store analysis cache.",
66+
help="Directory to store analysis cache. Defaults to '.codeanalyzer' in the input directory.",
4867
),
4968
] = None,
5069
clear_cache: Annotated[
5170
bool,
52-
typer.Option("--clear-cache/--keep-cache", help="Clear cache after analysis."),
53-
] = True,
71+
typer.Option("--clear-cache/--keep-cache", help="Clear cache after analysis. By default, cache is retained."),
72+
] = False,
5473
verbosity: Annotated[
5574
int, typer.Option("-v", count=True, help="Increase verbosity: -v, -vv, -vvv")
5675
] = 0,
@@ -62,21 +81,28 @@ def main(
6281
logger.error(f"Input path '{input}' does not exist.")
6382
raise typer.Exit(code=1)
6483

84+
# Validate file_name if provided
85+
if file_name is not None:
86+
full_file_path = input / file_name
87+
if not full_file_path.exists():
88+
logger.error(f"Specified file '{file_name}' does not exist in '{input}'.")
89+
raise typer.Exit(code=1)
90+
if not full_file_path.is_file():
91+
logger.error(f"Specified path '{file_name}' is not a file.")
92+
raise typer.Exit(code=1)
93+
if not str(file_name).endswith('.py'):
94+
logger.error(f"Specified file '{file_name}' is not a Python file (.py).")
95+
raise typer.Exit(code=1)
96+
6597
with Codeanalyzer(
66-
input, analysis_level, using_codeql, rebuild_analysis, cache_dir, clear_cache
98+
input, analysis_level, skip_tests, using_codeql, rebuild_analysis, cache_dir, clear_cache, using_ray, file_name
6799
) as analyzer:
68100
artifacts = analyzer.analyze()
69101

70102
# Handle output based on format
71103
if output is None:
72104
# Output to stdout (only for JSON)
73-
if format == OutputFormat.JSON:
74-
print(artifacts.model_dump_json(separators=(",", ":")))
75-
else:
76-
logger.error(
77-
f"Format '{format.value}' requires an output directory (use -o/--output)"
78-
)
79-
raise typer.Exit(code=1)
105+
print(artifacts.json(separators=(",", ":")))
80106
else:
81107
# Output to file
82108
output.mkdir(parents=True, exist_ok=True)
@@ -88,7 +114,7 @@ def _write_output(artifacts, output_dir: Path, format: OutputFormat):
88114
if format == OutputFormat.JSON:
89115
output_file = output_dir / "analysis.json"
90116
# Use Pydantic's json() with separators for compact output
91-
json_str = artifacts.model_dump_json(indent=None)
117+
json_str = artifacts.json(indent=None)
92118
with output_file.open("w") as f:
93119
f.write(json_str)
94120
logger.info(f"Analysis saved to {output_file}")

0 commit comments

Comments
 (0)