Skip to content

Conversation

@weizhang25
Copy link
Collaborator

A utility function that scans all files under openjudge/graders/ folder, find all grader classes, extract and return the core metadata of these graders.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @weizhang25, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new module designed to programmatically gather essential information about all Grader classes within the openjudge/graders directory. This functionality is crucial for dynamic introspection and management of available grading mechanisms, enabling automated discovery and utilization of grader capabilities without manual configuration. It provides a structured way to access key details of each grader, facilitating future development and integration.

Highlights

  • New Grader Information Utility: A new utility function, get_all_grader_info, has been introduced to dynamically discover and extract core metadata from all Grader classes located under the openjudge/graders/ directory.
  • AST-based Parsing: The utility leverages Python's ast module to parse Python source files, identify Grader classes (by tracing inheritance from a 'BaseGrader' seed), and extract details such as class names, parent classes, and the signatures and docstrings of their __init__ and aevaluate methods.
  • Structured Metadata Storage: Two internal helper classes, _MethodInfo and _GraderInfo, are defined to provide a structured way to store the extracted method and grader metadata, respectively.
  • Unit Tests: A new unit test file, tests/utils/test_grader_info.py, has been added to ensure the correct functionality and data extraction of the get_all_grader_info utility.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a utility to gather information about grader classes by statically analyzing the source code, which is a solid approach. The code is well-structured, but I've identified a few areas for improvement. The most significant issue is the brittle method for parsing function signatures, which could fail on complex cases. I've also provided suggestions to enhance code clarity by simplifying file traversal, improve correctness by fixing type hints and flawed test assumptions, and add a valuable feature by including the file path in the collected grader information.

Comment on lines +168 to +195
segment = _NEWLINE_OR_MULTI_SPACE_PATTERN.sub(" ", segment.strip())
segment = segment.replace(") :", "):").replace(") ->", ")->").replace(" :", ":")

# Method head ends in two ways: w/ or w/o return type annocation.
# Figure it out.
idx0 = segment.find("):")
idx1 = segment.find(")->")
if idx1 > 0:
idx1 = segment.find(":", idx1)

if idx0 > 0 and idx1 > 0:
if idx0 < idx1:
idx = idx0 + 2
else:
idx = idx1 + 1
elif idx0 > 0:
idx = idx0 + 2
elif idx1 > 0:
idx = idx1 + 1
else:
idx = -1

# def foo(...) -> type:
# def foo(...):
if idx > 0:
signature = segment[0:idx]
else:
signature = "SIGNATURE_NOT_FOUND"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current method of parsing the function signature is brittle. Applying a regex to the entire function's source segment can corrupt string literals within the function body. The subsequent string searching logic is also fragile and may fail with multi-line signatures or complex default arguments.

A more robust approach is to first isolate the signature portion of the code before performing any transformations. This can be done by finding the colon that terminates the signature while respecting parentheses, which prevents incorrect matches inside arguments.

            # Find the colon that ends the function signature by tracking parenthesis depth.
            # This is more robust than simple string searching.
            paren_level = 0
            end_of_sig_idx = -1
            for i, char in enumerate(segment):
                if char == '(':
                    paren_level += 1
                elif char == ')':
                    paren_level -= 1
                elif char == ':' and paren_level == 0:
                    end_of_sig_idx = i
                    break

            if end_of_sig_idx != -1:
                # Extract just the signature part, up to and including the colon.
                signature_text = segment[:end_of_sig_idx + 1]
                # Now it's safe to clean up whitespace.
                signature = _NEWLINE_OR_MULTI_SPACE_PATTERN.sub(" ", signature_text.strip())
                signature = signature.replace(" )", ")").replace(" :", ":")
            else:
                signature = "SIGNATURE_NOT_FOUND"

assert len(gi.get("class_name")) > 0, gi
assert isinstance(gi.get("parent_class_names"), list), gi
assert len(gi.get("parent_class_names")) > 0, gi
assert len(gi.get("init_method").get("signature")) > 0, gi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This assertion incorrectly assumes that every grader class will define its own __init__ method. A grader can inherit __init__ from its base class without redefining it. In such valid cases, the AST parser won't find a local __init__ definition, its signature will be an empty string, and this test will fail. This assertion should be removed to ensure the test correctly handles valid graders that don't implement a custom __init__.

Comment on lines +28 to +35
def __init__(
self,
file_path: str = "",
class_name: str = "",
parent_class_names: list = None,
init_method: _MethodInfo = None,
aevaluate_method: _MethodInfo = None,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The __init__ method signature has incorrect type hints. For example, parent_class_names: list = None should be parent_class_names: Optional[List[str]] = None to be type-correct and explicit about None being a valid default. Using Optional from the typing module is the standard way to denote optional arguments.

Suggested change
def __init__(
self,
file_path: str = "",
class_name: str = "",
parent_class_names: list = None,
init_method: _MethodInfo = None,
aevaluate_method: _MethodInfo = None,
):
def __init__(
self,
file_path: str = "",
class_name: str = "",
parent_class_names: List[str] | None = None,
init_method: "_MethodInfo" | None = None,
aevaluate_method: "_MethodInfo" | None = None,
):

Comment on lines +66 to +77
unprocessed_folders = Queue()
unprocessed_folders.put(grader_root_folder)
while not unprocessed_folders.empty():
folder = unprocessed_folders.get()
for item in folder.iterdir():
if item.is_dir():
unprocessed_folders.put(item)
elif item.stem == "__init__" or item.suffix != ".py":
# use safe heuristics to reduce candidate count
continue
else:
_get_defs_of_classes_having_parent(item, defs_of_classes_having_parent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The manual, queue-based directory traversal to find Python files is more complex than necessary. It can be simplified significantly by using pathlib.Path.rglob, which makes the code more concise, readable, and idiomatic.

Suggested change
unprocessed_folders = Queue()
unprocessed_folders.put(grader_root_folder)
while not unprocessed_folders.empty():
folder = unprocessed_folders.get()
for item in folder.iterdir():
if item.is_dir():
unprocessed_folders.put(item)
elif item.stem == "__init__" or item.suffix != ".py":
# use safe heuristics to reduce candidate count
continue
else:
_get_defs_of_classes_having_parent(item, defs_of_classes_having_parent)
for py_file in grader_root_folder.rglob("*.py"):
if py_file.stem == "__init__":
continue
_get_defs_of_classes_having_parent(py_file, defs_of_classes_having_parent)

if isinstance(node, ast.ClassDef) and node.bases:
parent_count = len(node.bases)
if parent_count > 1 or node.bases[0].id != "ABC":
defs_of_classes_having_parent[node] = source_code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The file path of the grader is a valuable piece of information that is currently being discarded. The py_file path is available here and should be propagated through to the final _GraderInfo object.

This would require a series of changes:

  1. Here, store a tuple of (source_code, str(py_file)).
  2. Update _get_grader_class_def to handle and pass on this tuple.
  3. In get_all_grader_info, pass the file path to _parse_grader_class_def.
  4. _parse_grader_class_def should accept the file path and set it on the _GraderInfo object.
  5. The test in tests/utils/test_grader_info.py on line 22 (assert not gi["file_path"]) must be updated to assert that the path is correctly populated.
Suggested change
defs_of_classes_having_parent[node] = source_code
defs_of_classes_having_parent[node] = (source_code, str(py_file))

Copy link
Collaborator

@ployts ployts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@helloml0326 helloml0326 merged commit 07de4b5 into modelscope:main Jan 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants