-
-
Notifications
You must be signed in to change notification settings - Fork 102
Add a Command to Sync External OWASP Repositories #1656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add a Command to Sync External OWASP Repositories #1656
Conversation
Summary by CodeRabbit
Summary by CodeRabbit
WalkthroughA new Django management command was introduced to synchronize external OWASP-related GitHub organizations and their repositories. A corresponding Makefile target was added to invoke this command, and it was integrated into the broader data update process. The command fetches organizations, retrieves their repositories, and syncs repository data, handling errors and logging progress. Changes
Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes were found. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🧪 Generate Unit Tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
I am yet to add tests for the command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/Makefile
(2 hunks)backend/apps/github/management/commands/github_update_external_repositories.py
(1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
backend/apps/github/management/commands/github_update_external_repositories.py
[refactor] 24-24: Too many local variables (20/15)
(R0914)
[refactor] 19-19: Too few public methods (1/2)
(R0903)
🔇 Additional comments (4)
backend/Makefile (1)
61-64
: LGTM! Consistent integration of the new command.The new Makefile target follows the established pattern and is properly integrated into the data update workflow.
backend/apps/github/management/commands/github_update_external_repositories.py (3)
1-17
: LGTM! Clean imports and setup.The imports are well-organized and appropriate for the command's functionality.
40-43
: LGTM! Appropriate organization filtering logic.The query correctly filters for OWASP-related organizations while excluding the main OWASP organization.
77-81
: Address missing multiple projects detection logic.According to the PR objectives, the command should detect and handle cases where a repository is linked to multiple projects. The current implementation doesn't handle this scenario.
Based on the PR objectives, this logic needs to be implemented. Can you clarify how to detect multiple linked projects using the Repository model?
#!/bin/bash # Search for Repository model and related project relationships ast-grep --pattern 'class Repository { $$$ }' # Search for methods that might return project relationships rg -A 5 "def.*project" --type py
backend/apps/github/management/commands/github_update_external_repositories.py
Outdated
Show resolved
Hide resolved
backend/apps/github/management/commands/github_update_external_repositories.py
Outdated
Show resolved
Hide resolved
def handle(self, *_args, **options) -> None: | ||
"""Handle the command execution. | ||
Args: | ||
*_args: Variable length argument list. | ||
**options: Arbitrary keyword arguments containing command options. | ||
""" | ||
try: | ||
gh = github.Github(os.getenv("GITHUB_TOKEN"), per_page=GITHUB_ITEMS_PER_PAGE) | ||
except BadCredentialsException: | ||
logger.warning( | ||
"Invalid GitHub token. Please create and update .env file with a valid token." | ||
) | ||
return | ||
|
||
external_organizations = Organization.objects.filter( | ||
is_owasp_related_organization=True | ||
).exclude(login=OWASP_ORGANIZATION_NAME) | ||
org_count = external_organizations.count() | ||
|
||
synced_projects = [] | ||
for org_idx, ext_org in enumerate(external_organizations): | ||
print(f"Processing organization {org_idx + 1}/{org_count}: {ext_org.login}") | ||
try: | ||
gh_organization = gh.get_organization(ext_org.login) | ||
except Exception: | ||
logger.exception("Failed fetching GitHub org %s", ext_org.login) | ||
continue | ||
|
||
try: | ||
gh_repositories = gh_organization.get_repos( | ||
type="public", | ||
sort="created", | ||
direction="desc", | ||
) | ||
gh_repositories_count = gh_repositories.totalCount | ||
except Exception: | ||
logger.exception("Failed fetching GitHub repository for org %s", ext_org.login) | ||
continue | ||
|
||
for repo_idx, gh_repository in enumerate(gh_repositories): | ||
entity_key = gh_repository.name.lower() | ||
org_key = ext_org.login.lower() | ||
repository_url = f"https://github.com/{org_key}/{entity_key}" | ||
print(f"{repo_idx + 1}/{gh_repositories_count}: {repository_url}") | ||
|
||
try: | ||
organization, repository = sync_repository(gh_repository) | ||
except Exception: | ||
logger.exception("Error syncing repository %s", repository_url) | ||
continue | ||
|
||
project = repository.project | ||
if project: | ||
project.repositories.add(repository) | ||
synced_projects.append(project) | ||
|
||
Project.bulk_save(synced_projects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Address missing command-line arguments and refactor for better structure.
Based on the PR objectives, this command should support --repository
and --offset
arguments like related commands. Additionally, the static analysis correctly identifies too many local variables.
Consider refactoring to:
- Add command-line argument support
- Extract helper methods to reduce local variables in the main handle method
- Add pagination support with offset parameter
+ def add_arguments(self, parser):
+ """Add command arguments."""
+ parser.add_argument(
+ '--repository',
+ type=str,
+ help='Process only the specified repository'
+ )
+ parser.add_argument(
+ '--offset',
+ type=int,
+ default=0,
+ help='Skip the first N repositories'
+ )
Also consider extracting methods like:
_initialize_github_client()
_process_organization(gh, org, options)
_process_repository(gh_repository, org)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def handle(self, *_args, **options) -> None: | |
"""Handle the command execution. | |
Args: | |
*_args: Variable length argument list. | |
**options: Arbitrary keyword arguments containing command options. | |
""" | |
try: | |
gh = github.Github(os.getenv("GITHUB_TOKEN"), per_page=GITHUB_ITEMS_PER_PAGE) | |
except BadCredentialsException: | |
logger.warning( | |
"Invalid GitHub token. Please create and update .env file with a valid token." | |
) | |
return | |
external_organizations = Organization.objects.filter( | |
is_owasp_related_organization=True | |
).exclude(login=OWASP_ORGANIZATION_NAME) | |
org_count = external_organizations.count() | |
synced_projects = [] | |
for org_idx, ext_org in enumerate(external_organizations): | |
print(f"Processing organization {org_idx + 1}/{org_count}: {ext_org.login}") | |
try: | |
gh_organization = gh.get_organization(ext_org.login) | |
except Exception: | |
logger.exception("Failed fetching GitHub org %s", ext_org.login) | |
continue | |
try: | |
gh_repositories = gh_organization.get_repos( | |
type="public", | |
sort="created", | |
direction="desc", | |
) | |
gh_repositories_count = gh_repositories.totalCount | |
except Exception: | |
logger.exception("Failed fetching GitHub repository for org %s", ext_org.login) | |
continue | |
for repo_idx, gh_repository in enumerate(gh_repositories): | |
entity_key = gh_repository.name.lower() | |
org_key = ext_org.login.lower() | |
repository_url = f"https://github.com/{org_key}/{entity_key}" | |
print(f"{repo_idx + 1}/{gh_repositories_count}: {repository_url}") | |
try: | |
organization, repository = sync_repository(gh_repository) | |
except Exception: | |
logger.exception("Error syncing repository %s", repository_url) | |
continue | |
project = repository.project | |
if project: | |
project.repositories.add(repository) | |
synced_projects.append(project) | |
Project.bulk_save(synced_projects) | |
def add_arguments(self, parser): | |
"""Add command arguments.""" | |
parser.add_argument( | |
'--repository', | |
type=str, | |
help='Process only the specified repository' | |
) | |
parser.add_argument( | |
'--offset', | |
type=int, | |
default=0, | |
help='Skip the first N repositories' | |
) | |
def handle(self, *_args, **options) -> None: | |
"""Handle the command execution. | |
Args: | |
*_args: Variable length argument list. | |
**options: Arbitrary keyword arguments containing command options. | |
""" | |
try: | |
gh = github.Github(os.getenv("GITHUB_TOKEN"), per_page=GITHUB_ITEMS_PER_PAGE) | |
except BadCredentialsException: | |
logger.warning( | |
"Invalid GitHub token. Please create and update .env file with a valid token." | |
) | |
return | |
external_organizations = Organization.objects.filter( | |
is_owasp_related_organization=True | |
).exclude(login=OWASP_ORGANIZATION_NAME) | |
org_count = external_organizations.count() | |
synced_projects = [] | |
for org_idx, ext_org in enumerate(external_organizations): | |
print(f"Processing organization {org_idx + 1}/{org_count}: {ext_org.login}") | |
try: | |
gh_organization = gh.get_organization(ext_org.login) | |
except Exception: | |
logger.exception("Failed fetching GitHub org %s", ext_org.login) | |
continue | |
try: | |
gh_repositories = gh_organization.get_repos( | |
type="public", | |
sort="created", | |
direction="desc", | |
) | |
gh_repositories_count = gh_repositories.totalCount | |
except Exception: | |
logger.exception("Failed fetching GitHub repository for org %s", ext_org.login) | |
continue | |
for repo_idx, gh_repository in enumerate(gh_repositories): | |
entity_key = gh_repository.name.lower() | |
org_key = ext_org.login.lower() | |
repository_url = f"https://github.com/{org_key}/{entity_key}" | |
print(f"{repo_idx + 1}/{gh_repositories_count}: {repository_url}") | |
try: | |
organization, repository = sync_repository(gh_repository) | |
except Exception: | |
logger.exception("Error syncing repository %s", repository_url) | |
continue | |
project = repository.project | |
if project: | |
project.repositories.add(repository) | |
synced_projects.append(project) | |
Project.bulk_save(synced_projects) |
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 24-24: Too many local variables (20/15)
(R0914)
🤖 Prompt for AI Agents
In
backend/apps/github/management/commands/github_update_external_repositories.py
from lines 24 to 82, the handle method lacks support for --repository and
--offset command-line arguments and contains too many local variables, making it
hard to maintain. To fix this, add argument parsing for --repository and
--offset in the add_arguments method, then refactor handle by extracting helper
methods such as _initialize_github_client to create the GitHub client,
_process_organization to handle each organization including pagination with the
offset parameter, and _process_repository to sync individual repositories. This
will reduce local variables in handle, improve readability, and add the required
command-line argument support and pagination functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arkid15r What flags should the command have?
since we are going to fetch a lot of organization's repositories, adding an offset
may complicate things or cause confusion.
adding a flag for repository
also doesn't make sense to me.
However, we can add a flag for organization
. Then only fetch that organization.
I'm currently working on adding tests. Will mark the PR as ready once done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
backend/apps/github/management/commands/github_update_external_repositories.py
Outdated
Show resolved
Hide resolved
backend/apps/github/management/commands/github_update_external_repositories.py
Outdated
Show resolved
Hide resolved
|
Resolves #1538
Added a command that syncs external OWASP GitHub repositories.
Example Change:
Notes for reviewers:
--repository
or--offset
as arguments. However, I'm not sure how to add these in this command.Repository
model to get the parent project, however, the current existing implementation only returns one project. The issue stated that if more than one project then we must log and skip. However, I don't know how to know if there are multiple projects linked.