Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add windows support and refactor setup #11

Merged
merged 8 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ jobs:
uses: ./.github/workflows/reusable-build.yml
with:
CIBW_SKIP: "pp* cp36-* cp37-*"
CIBW_BUILD: "cp*-macosx* cp*-manylinux*"
CIBW_BUILD: "cp*-macosx* cp*-manylinux* cp*-win*"
VERSION: ${{ github.ref_name }}
secrets: inherit

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/reusable-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
build-wheels:
strategy:
matrix:
os: [ubuntu-latest, macos-latest]
os: [ubuntu-latest, macos-latest, windows-latest]
fail-fast: false
name: Build wheels on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
Expand Down
15 changes: 10 additions & 5 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
with:
python-version: '3.12'
cache: 'pip' # caching pip dependencies
- name: Install dependencies
- name: Install lib from source and dependencies
run: |
python -m pip install -e .[test]
- name: Run tests
Expand All @@ -52,15 +52,15 @@ jobs:
uses: ./.github/workflows/reusable-build.yml
with:
CIBW_SKIP: "pp* cp36-* cp37-*"
CIBW_BUILD: "cp*-macosx* cp*-manylinux*"
CIBW_BUILD: "cp*-macosx* cp*-manylinux* cp*-win*"
secrets: inherit

full-tests-python:
needs: [fast-tests-python, external-build-workflow]
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10', '3.11', '3.12']
os: [ubuntu-latest, macos-latest]
os: [ubuntu-latest, macos-latest, windows-latest]
fail-fast: false
name: Test wheel on ${{ matrix.os }} and Python ${{ matrix.python-version }}
runs-on: ${{ matrix.os }}
Expand All @@ -72,16 +72,21 @@ jobs:
path: dist
- name: Show dist files
run: ls -lah ./dist
shell: bash
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: 'pip' # caching pip dependencies
- name: Install dependencies
- name: Remove sdist package to force install wheel later
run: |
rm -rf ./dist/*.tar.gz
shell: bash
- name: Install lib and dependencies
run: |
# force install package from local dist directory
pip uninstall -y codebleu || true
# TODO: check the sdist package is not installed
rm -rf ./dist/*.tar.gz
pip install --upgrade --no-deps --no-index --find-links=./dist codebleu
# install dependencies for the package and tests
pip install .[test]
Expand Down
16 changes: 9 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@ codebleu/parser/*.so
codebleu/parser/*.dylib
codebleu/parser/*.dll

/codebleu/parser/tree-sitter-c-sharp/
/codebleu/parser/tree-sitter-go/
/codebleu/parser/tree-sitter-java/
/codebleu/parser/tree-sitter-javascript/
/codebleu/parser/tree-sitter-php/
/codebleu/parser/tree-sitter-python/
/codebleu/parser/tree-sitter-ruby/
/tree_sitter/
codebleu/*.so


# Byte-compiled / optimized / DLL files
Expand Down Expand Up @@ -166,10 +175,3 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
/codebleu/parser/tree-sitter-c-sharp/
/codebleu/parser/tree-sitter-go/
/codebleu/parser/tree-sitter-java/
/codebleu/parser/tree-sitter-javascript/
/codebleu/parser/tree-sitter-php/
/codebleu/parser/tree-sitter-python/
/codebleu/parser/tree-sitter-ruby/
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
[![PyPI version](https://badge.fury.io/py/codebleu.svg)](https://badge.fury.io/py/codebleu)


This repository contains an unofficial `CodeBLEU` implementation that supports Linux and MacOS. It is available through `PyPI` and the `evaluate` library.
This repository contains an unofficial `CodeBLEU` implementation that supports `Linux`, `MacOS` and `Windows`. It is available through `PyPI` and the `evaluate` library.

The code is based on the original [CodeXGLUE/CodeBLEU](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator/CodeBLEU) and updated version by [XLCoST/CodeBLEU](https://github.com/reddy-lab-code-research/XLCoST/tree/main/code/translation/evaluator/CodeBLEU). It has been refactored, tested, built for macOS, and multiple improvements have been made to enhance usability

Expand All @@ -28,7 +28,7 @@ The metric has shown higher correlation with human evaluation than `BLEU` and `a
## Installation

As this library require `so` file compilation it is platform dependent.
Currently available for `Linux` (manylinux) and `MacOS` with Python 3.8+.
Currently available for `Linux` (manylinux), `MacOS` and `Windows` with Python 3.8+.

The metrics is available as [pip package](https://pypi.org/project/codebleu/) and can be installed as indicated above:
```bash
Expand Down
6 changes: 3 additions & 3 deletions codebleu/codebleu.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def calc_codebleu(
weights: Tuple[float, float, float, float] = (0.25, 0.25, 0.25, 0.25),
tokenizer: Optional[Callable] = None,
keywords_dir: Path = PACKAGE_DIR / "keywords",
lang_so_file: Path = PACKAGE_DIR / "parser" / "my-languages.so",
lang_so_file: Path = PACKAGE_DIR / "my-languages.so",
) -> Dict[str, float]:
"""Calculate CodeBLEU score

Expand Down Expand Up @@ -69,10 +69,10 @@ def make_weights(reference_tokens, key_word_list):
weighted_ngram_match_score = weighted_ngram_match.corpus_bleu(tokenized_refs_with_weights, tokenized_hyps)

# calculate syntax match
syntax_match_score = syntax_match.corpus_syntax_match(references, hypothesis, lang, lang_so_file)
syntax_match_score = syntax_match.corpus_syntax_match(references, hypothesis, lang, str(lang_so_file))

# calculate dataflow match
dataflow_match_score = dataflow_match.corpus_dataflow_match(references, hypothesis, lang, lang_so_file)
dataflow_match_score = dataflow_match.corpus_dataflow_match(references, hypothesis, lang, str(lang_so_file))

alpha, beta, gamma, theta = weights
code_bleu_score = (
Expand Down
20 changes: 0 additions & 20 deletions codebleu/parser/build.py

This file was deleted.

11 changes: 0 additions & 11 deletions codebleu/parser/build.sh

This file was deleted.

12 changes: 6 additions & 6 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
[build-system]
requires = ["setuptools>=61.0.0", "wheel", "tree-sitter>=0.20.0,<1.0.0"]
requires = ["setuptools>=61.0.0", "wheel", "tree-sitter>=0.20.0,<1.0.0", "requests>=2.0.0,<3.0.0"]
build-backend = "setuptools.build_meta"


[project]
name = "codebleu"
description = "Unofficial CodeBLEU implementation that supports Linux and MacOS available on PyPI."
description = "Unofficial CodeBLEU implementation that supports Linux, MacOS and Windows available on PyPI."
readme = "README.md"
license = {text = "MIT License"}
authors = [
{name = "Konstantin Chernyshev", email = "[email protected]"},
]
keywords = ["codebleu", "code", "bleu", "nlp", "natural language processing", "programming", "evaluate", "evaluation", "code generation", "matrics"]
keywords = ["codebleu", "code", "bleu", "nlp", "natural language processing", "programming", "evaluate", "evaluation", "code generation", "metrics"]
dynamic = ["version"]

requires-python = ">=3.8"
Expand Down Expand Up @@ -77,7 +77,7 @@ warn_redundant_casts = true
warn_unused_ignores = true
warn_unreachable = true
allow_untyped_decorators = true
exclude = ["codebleu/parser/tree-sitter", "codebleu/parser/tree-sitter/python"]
exclude = ["codebleu/parser/tree-sitter", "codebleu/parser/tree-sitter/python", "tree_sitter"]

[tool.pytest.ini_options]
minversion = "6.0"
Expand All @@ -86,7 +86,7 @@ python_files = "test_*.py"
addopts = "--cov=codebleu/ --cov-report term-missing"

[tool.coverage.run]
omit = ["tests/*", "codebleu/parser/tree-sitter/*"]
omit = ["tests/*", "codebleu/parser/tree-sitter/*", "tree_sitter"]


[tool.isort]
Expand All @@ -95,7 +95,7 @@ src_paths = ["codebleu", "tests"]
known_first_party = ["codebleu", "tests"]
line_length = 120
combine_as_imports = true
skip = ["build", "dist", ".venv", ".eggs", ".mypy_cache", ".pytest_cache", ".git", ".tox", ".nox", "codebleu/parser"]
skip = ["build", "dist", ".venv", ".eggs", ".mypy_cache", ".pytest_cache", ".git", ".tox", ".nox", "codebleu/parser", "tree_sitter"]

[tool.black]
line_length=120
Expand Down
75 changes: 69 additions & 6 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,24 +1,87 @@
import subprocess
from __future__ import annotations

import io
import shutil
import zipfile
from pathlib import Path

import requests
from setuptools import setup
from setuptools.dist import Distribution

from tree_sitter import Language

ROOT = Path(__file__).parent


subprocess.run(
["bash", "build.sh"],
cwd=ROOT / "codebleu" / "parser",
check=True,
tree_sitter_languages = {
"go": "https://github.com/tree-sitter/tree-sitter-go/archive/refs/tags/v0.20.0.zip",
"javascript": "https://github.com/tree-sitter/tree-sitter-javascript/archive/refs/tags/v0.20.1.zip",
"python": "https://github.com/tree-sitter/tree-sitter-python/archive/refs/tags/v0.20.4.zip",
"ruby": "https://github.com/tree-sitter/tree-sitter-ruby/archive/refs/tags/v0.19.0.zip",
"php": "https://github.com/tree-sitter/tree-sitter-php/archive/refs/tags/v0.19.0.zip",
"java": "https://github.com/tree-sitter/tree-sitter-java/archive/refs/tags/v0.20.2.zip",
"c-sharp": "https://github.com/tree-sitter/tree-sitter-c-sharp/archive/refs/tags/v0.20.0.zip",
"c": "https://github.com/tree-sitter/tree-sitter-c/archive/refs/tags/v0.20.6.zip",
"cpp": "https://github.com/tree-sitter/tree-sitter-cpp/archive/refs/tags/v0.20.3.zip",
}


def download_tree_sitter_languages(languages: dict[str, str], languages_folder: Path) -> list[str]:
if languages_folder.exists():
shutil.rmtree(languages_folder)
languages_folder.mkdir(parents=True)

extracted_folders: list[str] = []
for lang, url in languages.items():
# Download the ZIP file
response = requests.get(url)
response.raise_for_status()

# Extract the ZIP file
with zipfile.ZipFile(io.BytesIO(response.content)) as zip_f:
zip_f.extractall(languages_folder)
extracted_folders.append(zip_f.namelist()[0]) # get the name of the extracted folder

return extracted_folders


def build_tree_sitter_languages(languages: dict[str, str], languages_folder: Path, target_lib_file: Path) -> str:
extracted_folders = download_tree_sitter_languages(languages, languages_folder)

Language.build_library(
str(target_lib_file),
[str(languages_folder / lang_folder) for lang_folder in extracted_folders],
)

return str(target_lib_file)


build_tree_sitter_languages(
tree_sitter_languages,
ROOT / "tree_sitter",
ROOT / "codebleu" / "my-languages.so",
)


# tree_sitter_extension = Extension(
# 'codebleu.tree_sitter',
# sources=[],
# include_dirs=[],
# libraries=[],
# extra_objects=[
#
# ],
# )


class PlatformSpecificDistribution(Distribution):
"""Distribution which always forces a binary package with platform name"""

def has_ext_modules(self):
return True


setup(distclass=PlatformSpecificDistribution)
setup(
distclass=PlatformSpecificDistribution,
)
8 changes: 4 additions & 4 deletions tests/test_codebleu.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,17 +65,17 @@ def test_error_when_input_length_mismatch() -> None:
["public static int Sign ( double d ) { return ( float ) ( ( d == 0 ) ? 0 : ( c < 0.0 ) ? - 1 : 1) ; }"],
["public static int Sign ( double d ) { return ( int ) ( ( d == 0 ) ? 0 : ( d < 0 ) ? - 1 : 1) ; }"],
0.7846,
11/19, # In example, it is 13/21, but with new version of tree-sitter it is 11/19
2/3,
11 / 19, # In example, it is 13/21, but with new version of tree-sitter it is 11/19
2 / 3,
0.7019, # Should be 0.7238 if AST=13/21 in the paper, however at the moment tee-sitter AST is 11/19
),
# https://arxiv.org/pdf/2009.10297.pdf "3.4 Two Examples" at the page 4
(
["public static int Sign ( double d ) { return ( float ) ( ( d == 0 ) ? 0 : ( c < 0.0 ) ? - 1 : 1) ;"],
["public static int Sign ( double d ) { return ( int ) ( ( d == 0 ) ? 0 : ( d < 0 ) ? - 1 : 1) ; }"],
0.7543,
11/19, # In example, it is 13/21, but with new version of tree-sitter it is 11/19
2/3,
11 / 19, # In example, it is 13/21, but with new version of tree-sitter it is 11/19
2 / 3,
0.6873, # Should be 0.6973 if AST=13/21 in the paper, however at the moment tee-sitter AST is 11/19
),
# https://arxiv.org/pdf/2009.10297.pdf "3.4 Two Examples" at the page 4
Expand Down