Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor project structure, enhance logic, update configurations, and improve code quality #85

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -37,5 +37,4 @@ docs/
tests/
*.md
LICENSE
pytest.ini
setup.py
48 changes: 48 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,3 +83,51 @@ repos:
- id: markdownlint
description: "Lint markdown files."
args: ["--disable=line-length"]

- repo: https://github.com/terrencepreilly/darglint
rev: v1.8.1
hooks:
- id: darglint
name: darglint for source
args: [--docstring-style=numpy]
files: ^src/

- repo: https://github.com/pycqa/pylint
rev: v3.3.3
hooks:
- id: pylint
name: pylint for source
files: ^src/
additional_dependencies:
[
click,
fastapi-analytics,
pytest-asyncio,
python-dotenv,
slowapi,
starlette,
tiktoken,
uvicorn,
]
- id: pylint
name: pylint for tests
files: ^tests/
args:
- --rcfile=tests/.pylintrc
additional_dependencies:
[
click,
fastapi-analytics,
pytest,
pytest-asyncio,
python-dotenv,
slowapi,
starlette,
tiktoken,
uvicorn,
]

- repo: meta
hooks:
- id: check-hooks-apply
- id: check-useless-excludes
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@

[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/cyclotruc/gitingest/blob/main/LICENSE)
[![PyPI version](https://badge.fury.io/py/gitingest.svg)](https://badge.fury.io/py/gitingest)
[![GitHub stars](https://img.shields.io/github/stars/cyclotruc/gitingest?style=social.svg)](https://github.com/cyclotruc/gitingest)
[![Downloads](https://pepy.tech/badge/gitingest)](https://pepy.tech/project/gitingest)
[![GitHub issues](https://img.shields.io/github/issues/cyclotruc/gitingest)](https://github.com/cyclotruc/gitingest/issues)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[![Discord](https://dcbadge.limes.pink/api/server/https://discord.com/invite/zerRaGK9EC)](https://discord.com/invite/zerRaGK9EC)

Turn any Git repository into a prompt-friendly text ingest for LLMs.
Expand Down
63 changes: 63 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,60 @@
[project]
name = "gitingest"
version = "0.1.2"
description="CLI tool to analyze and create text dumps of codebases for LLMs"
readme = {file = "README.md", content-type = "text/markdown" }
requires-python = ">= 3.10"
dependencies = [
"click>=8.0.0",
"fastapi-analytics",
"fastapi[standard]",
"python-dotenv",
"slowapi",
"starlette",
"tiktoken",
"uvicorn",
]
license = {file = "LICENSE"}
authors = [{name = "Romain Courtois", email = "[email protected]"}]
classifiers=[
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
]

[project.scripts]
gitingest = "gitingest.cli:main"

[project.urls]
homepage = "https://gitingest.com"
github = "https://github.com/cyclotruc/gitingest"

[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"

[tool.setuptools]
packages = {find = {where = ["src"]}}
include-package-data = true

# Linting configuration
[tool.pylint.format]
max-line-length = 119

[tool.pylint.'MESSAGES CONTROL']
disable = [
"too-many-arguments",
"too-many-positional-arguments",
"too-many-locals",
"too-few-public-methods",
"broad-exception-caught",
"duplicate-code",
]

[tool.pycln]
all = true

Expand All @@ -14,3 +68,12 @@ filter_files = true

[tool.black]
line-length = 119

# Test configuration
[tool.pytest.ini_options]
pythonpath = ["src"]
testpaths = ["tests/"]
python_files = "test_*.py"
asyncio_mode = "auto"
python_classes = "Test*"
python_functions = "test_*"
8 changes: 0 additions & 8 deletions pytest.ini

This file was deleted.

4 changes: 3 additions & 1 deletion src/config.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
""" Configuration file for the project. """

MAX_DISPLAY_SIZE: int = 300_000
TMP_BASE_PATH: str = "/tmp/gitingest"
DELETE_REPO_AFTER: int = 60 * 60 # In seconds

EXAMPLE_REPOS: list[dict[str, str]] = [
{"name": "Gitingest", "url": "https://github.com/cyclotruc/gitingest"},
{"name": "GitIngest", "url": "https://github.com/cyclotruc/gitingest"},
{"name": "FastAPI", "url": "https://github.com/tiangolo/fastapi"},
{"name": "Flask", "url": "https://github.com/pallets/flask"},
{"name": "Tldraw", "url": "https://github.com/tldraw/tldraw"},
Expand Down
2 changes: 2 additions & 0 deletions src/gitingest/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
""" gitingest: A package for ingesting data from git repositories. """

from gitingest.clone import clone_repo
from gitingest.ingest import ingest
from gitingest.ingest_from_query import ingest_from_query
Expand Down
6 changes: 5 additions & 1 deletion src/gitingest/cli.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
""" Command-line interface for the GitIngest package. """

# pylint: disable=no-value-for-parameter

import click

from gitingest.ingest import ingest
Expand Down Expand Up @@ -40,7 +44,7 @@ def main(

Raises
------
click.Abort
Abort
If there is an error during the execution of the command, this exception is raised to abort the process.
"""
try:
Expand Down
41 changes: 17 additions & 24 deletions src/gitingest/clone.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
""" This module contains functions for cloning a Git repository to a local path. """

import asyncio
from dataclasses import dataclass

from gitingest.exceptions import AsyncTimeoutError
from gitingest.utils import async_timeout

CLONE_TIMEOUT: int = 20
Expand Down Expand Up @@ -59,11 +60,7 @@ async def clone_repo(config: CloneConfig) -> tuple[bytes, bytes]:
Raises
------
ValueError
If the repository does not exist or if required query parameters are missing.
RuntimeError
If any git command fails during execution.
AsyncTimeoutError
If the cloning process exceeds the specified timeout.
If the 'url' or 'local_path' parameters are missing, or if the repository is not found.
"""
# Extract and validate query parameters
url: str = config.url
Expand All @@ -81,29 +78,25 @@ async def clone_repo(config: CloneConfig) -> tuple[bytes, bytes]:
if not await _check_repo_exists(url):
raise ValueError("Repository not found, make sure it is public")

try:
if commit:
# Scenario 1: Clone and checkout a specific commit
# Clone the repository without depth to ensure full history for checkout
clone_cmd = ["git", "clone", "--single-branch", url, local_path]
await _run_git_command(*clone_cmd)

# Checkout the specific commit
checkout_cmd = ["git", "-C", local_path, "checkout", commit]
return await _run_git_command(*checkout_cmd)
if commit:
# Scenario 1: Clone and checkout a specific commit
# Clone the repository without depth to ensure full history for checkout
clone_cmd = ["git", "clone", "--single-branch", url, local_path]
await _run_git_command(*clone_cmd)

if branch and branch.lower() not in ("main", "master"):
# Checkout the specific commit
checkout_cmd = ["git", "-C", local_path, "checkout", commit]
return await _run_git_command(*checkout_cmd)

# Scenario 2: Clone a specific branch with shallow depth
clone_cmd = ["git", "clone", "--depth=1", "--single-branch", "--branch", branch, url, local_path]
return await _run_git_command(*clone_cmd)
if branch and branch.lower() not in ("main", "master"):

# Scenario 3: Clone the default branch with shallow depth
clone_cmd = ["git", "clone", "--depth=1", "--single-branch", url, local_path]
# Scenario 2: Clone a specific branch with shallow depth
clone_cmd = ["git", "clone", "--depth=1", "--single-branch", "--branch", branch, url, local_path]
return await _run_git_command(*clone_cmd)

except (RuntimeError, asyncio.TimeoutError, AsyncTimeoutError):
raise # Re-raise the exception
# Scenario 3: Clone the default branch with shallow depth
clone_cmd = ["git", "clone", "--depth=1", "--single-branch", url, local_path]
return await _run_git_command(*clone_cmd)


async def _check_repo_exists(url: str) -> bool:
Expand Down
26 changes: 24 additions & 2 deletions src/gitingest/exceptions.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
""" Custom exceptions for the GitIngest package. """


class InvalidPatternError(ValueError):
"""
Exception raised when a pattern contains invalid characters.

This exception is used to signal that a pattern provided for some operation
contains characters that are not allowed. The valid characters for the pattern
include alphanumeric characters, dash (-), underscore (_), dot (.), forward slash (/),
plus (+), and asterisk (*).

Parameters
----------
pattern : str
Expand All @@ -27,3 +28,24 @@ class AsyncTimeoutError(Exception):
This exception is used by the `async_timeout` decorator to signal that the wrapped
asynchronous function has exceeded the specified time limit for execution.
"""


class MaxFilesReachedError(Exception):
"""Exception raised when the maximum number of files is reached."""

def __init__(self, max_files: int) -> None:
super().__init__(f"Maximum number of files ({max_files}) reached.")


class MaxFileSizeReachedError(Exception):
"""Raised when the maximum file size is reached."""

def __init__(self, max_size: int):
super().__init__(f"Maximum file size limit ({max_size/1024/1024:.1f}MB) reached.")


class AlreadyVisitedError(Exception):
"""Exception raised when a symlink target has already been visited."""

def __init__(self, path: str) -> None:
super().__init__(f"Symlink target already visited: {path}")
2 changes: 2 additions & 0 deletions src/gitingest/ignore_patterns.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
""" Default ignore patterns for GitIngest. """

DEFAULT_IGNORE_PATTERNS: list[str] = [
# Python
"*.pyc",
Expand Down
15 changes: 9 additions & 6 deletions src/gitingest/ingest.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
""" Main entry point for ingesting a source and processing its contents. """

import asyncio
import inspect
import shutil
Expand Down Expand Up @@ -25,14 +27,15 @@ def ingest(
----------
source : str
The source to analyze, which can be a URL (for a GitHub repository) or a local directory path.
max_file_size : int, optional
The maximum allowed file size for file ingestion. Files larger than this size are ignored, by default 10*1024*1024 (10 MB).
max_file_size : int
Maximum allowed file size for file ingestion. Files larger than this size are ignored, by default
10*1024*1024 (10 MB).
include_patterns : list[str] | str | None, optional
A pattern or list of patterns specifying which files to include in the analysis. If `None`, all files are included.
Pattern or list of patterns specifying which files to include. If `None`, all files are included.
exclude_patterns : list[str] | str | None, optional
A pattern or list of patterns specifying which files to exclude from the analysis. If `None`, no files are excluded.
Pattern or list of patterns specifying which files to exclude. If `None`, no files are excluded.
output : str | None, optional
The file path where the summary and content should be written. If `None`, the results are not written to a file.
File path where the summary and content should be written. If `None`, the results are not written to a file.

Returns
-------
Expand Down Expand Up @@ -74,7 +77,7 @@ def ingest(
summary, tree, content = ingest_from_query(query)

if output is not None:
with open(output, "w") as f:
with open(output, "w", encoding="utf-8") as f:
f.write(tree + "\n" + content)

return summary, tree, content
Expand Down
Loading
Loading