Discussion about Support for Multiple Programming Languages #86

Umpire2018 · 2024-12-30T03:39:48Z

@magaton @Major-wagh @sandeshchand @alikinir Adapted from issue #60 for continued discussion in this thread.

Background

Currently, the repository primarily supports Python by leveraging jedi and custom-built modules to construct the Repository Map. Developers with the necessary skills are encouraged to understand, modify, and adapt this functionality for their specific needs. Due to the substantial workload required, this project will not consider supporting additional languages in the short term. The functionality constructs the Repository Map and saves it to .project_doc_record, which serves as the basis for subsequent features in the repoagent project:

RepoAgent/repo_agent/doc_meta_info.py

Lines 296 to 391 in 825d988

    
           def find_all_referencer( 
        
               repo_path, variable_name, file_path, line_number, column_number, in_file_only=False 
        
           ): 
        
               """复制过来的之前的实现""" 
        
               script = jedi.Script(path=os.path.join(repo_path, file_path)) 
        
               try: 
        
                   if in_file_only: 
        
                       references = script.get_references( 
        
                           line=line_number, column=column_number, scope="file" 
        
                       ) 
        
                   else: 
        
                       references = script.get_references(line=line_number, column=column_number) 
        
                   # 过滤出变量名为 variable_name 的引用，并返回它们的位置 
        
                   variable_references = [ref for ref in references if ref.name == variable_name] 
        
                   # if variable_name == "need_to_generate": 
        
                   #     import pdb; pdb.set_trace() 
        
                   return [ 
        
                       (os.path.relpath(ref.module_path, repo_path), ref.line, ref.column) 
        
                       for ref in variable_references 
        
                       if not (ref.line == line_number and ref.column == column_number) 
        
                   ] 
        
               except Exception as e: 
        
                   # 打印错误信息和相关参数 
        
                   logger.error(f"Error occurred: {e}") 
        
                   logger.error( 
        
                       f"Parameters: variable_name={variable_name}, file_path={file_path}, line_number={line_number}, column_number={column_number}" 
        
                   ) 
        
                   return [] 
        
           @dataclass 
        
           class MetaInfo: 
        
               repo_path: Path = ""  # type: ignore 
        
               document_version: str = ( 
        
                   ""  # 随时间变化，""代表没完成，否则对应一个目标仓库的commit hash 
        
               ) 
        
               target_repo_hierarchical_tree: "DocItem" = field( 
        
                   default_factory=lambda: DocItem() 
        
               )  # 整个repo的文件结构 
        
               white_list: Any[List] = None 
        
               fake_file_reflection: Dict[str, str] = field(default_factory=dict) 
        
               jump_files: List[str] = field(default_factory=list) 
        
               deleted_items_from_older_meta: List[List] = field(default_factory=list) 
        
               in_generation_process: bool = False 
        
               checkpoint_lock: threading.Lock = threading.Lock() 
        
               @staticmethod 
        
               def init_meta_info(file_path_reflections, jump_files) -> MetaInfo: 
        
                   """从一个仓库path中初始化metainfo""" 
        
                   setting = SettingsManager.get_setting() 
        
                   project_abs_path = setting.project.target_repo 
        
                   print( 
        
                       f"{Fore.LIGHTRED_EX}Initializing MetaInfo: {Style.RESET_ALL}from {project_abs_path}" 
        
                   ) 
        
                   file_handler = FileHandler(project_abs_path, None) 
        
                   repo_structure = file_handler.generate_overall_structure( 
        
                       file_path_reflections, jump_files 
        
                   ) 
        
                   metainfo = MetaInfo.from_project_hierarchy_json(repo_structure) 
        
                   metainfo.repo_path = project_abs_path 
        
                   metainfo.fake_file_reflection = file_path_reflections 
        
                   metainfo.jump_files = jump_files 
        
                   return metainfo 
        
               @staticmethod 
        
               def from_checkpoint_path(checkpoint_dir_path: Path) -> MetaInfo: 
        
                   """从已有的metainfo dir里面读取metainfo""" 
        
                   setting = SettingsManager.get_setting() 
        
                   project_hierarchy_json_path = checkpoint_dir_path / "project_hierarchy.json" 
        
                   with open(project_hierarchy_json_path, "r", encoding="utf-8") as reader: 
        
                       project_hierarchy_json = json.load(reader) 
        
                   metainfo = MetaInfo.from_project_hierarchy_json(project_hierarchy_json) 
        
                   with open( 
        
                       checkpoint_dir_path / "meta-info.json", "r", encoding="utf-8" 
        
                   ) as reader: 
        
                       meta_data = json.load(reader) 
        
                       metainfo.repo_path = setting.project.target_repo 
        
                       metainfo.document_version = meta_data["doc_version"] 
        
                       metainfo.fake_file_reflection = meta_data["fake_file_reflection"] 
        
                       metainfo.jump_files = meta_data["jump_files"] 
        
                       metainfo.in_generation_process = meta_data["in_generation_process"] 
        
                       metainfo.deleted_items_from_older_meta = meta_data[ 
        
                           "deleted_items_from_older_meta" 
        
                       ] 
        
                   print(f"{Fore.CYAN}Loading MetaInfo:{Style.RESET_ALL} {checkpoint_dir_path}") 
        
                   return metainfo

Community Interest

Multiple users have expressed interest in extending this capability to additional programming languages and have provided valuable feedback. We sincerely appreciate their contributions and ideas. Below are some of my thoughts on this topic.

First, the aider project utilizes tree-sitter to implement their Repository Map feature. However, it has limitations. Since different programming languages have unique features, supporting multiple languages simultaneously requires developers to possess strong cross-language expertise.

I have personally attempted to use tree-sitter to mimic jedi for constructing reference relationships here.
However, this effort has been temporarily shelved.

Additionally, RepoGraph has made progress in this area and published a related paper.
The work introduces an effective plug-in, repo-level module that provides the desired context and significantly enhances LLM-based AI software engineering capabilities.

Another approach is the Language Server Protocol (LSP). With implementations for many languages, LSP can assist in static analysis.

Proposal

Based on the above, my personal suggestion is to adopt different implementations for different programming languages to assist in analysis. The results can then be stored in a unified format to integrate seamlessly with repoagent.

This is a preliminary idea intended to spark further discussion.

The text was updated successfully, but these errors were encountered:

magaton · 2024-12-30T09:53:44Z

Hello, you are referring to RepoGraph. How is that different / better from Aider's repomap?
This is what Aider brings:

support for all the languages from py-treesitter-languages
max-token as parameter
construct networkx graph based on the call references
use pagerank to detect most important nodes

Umpire2018 added the enhancement New feature or request label Dec 30, 2024

Umpire2018 mentioned this issue Jan 2, 2025

feat(file_handler) replace python.ast with tree-sitter to parse python #87

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion about Support for Multiple Programming Languages #86

Discussion about Support for Multiple Programming Languages #86

Umpire2018 commented Dec 30, 2024 •

edited

Loading

magaton commented Dec 30, 2024

Discussion about Support for Multiple Programming Languages #86

Discussion about Support for Multiple Programming Languages #86

Comments

Umpire2018 commented Dec 30, 2024 • edited Loading

Background

Community Interest

Proposal

magaton commented Dec 30, 2024

Umpire2018 commented Dec 30, 2024 •

edited

Loading