Skip to content

Commit

Permalink
Refactor scripts/create_repository.py, fix some issues (#1639)
Browse files Browse the repository at this point in the history
* First create_repository.py refactoring

Preserves the behavior of the previous version in preparation for
further improvements and eventual behavior changes. Part of #1482.

Tested by running the original script, calling `git stash
third_party/...`, then running the updated script. Confirmed that the
updated script produced no new changes in the working tree compared to
the indexed changes.

* Second create_repository.py refactoring

Preserves the behavior of the previous version in preparation for further
improvements and eventual behavior changes. Part of #1482.

This change eliminates a lot of duplication, and replaces the previous
functions to add trailing commas to the output with a `re.sub()` call.

Tested by running the original script, calling `git stash third_party/...`,
then running the updated script. Confirmed that the updated script produced no
new changes in the working tree compared to the indexed changes.

Also:

- Added `#!/usr/bin/env python3` and set file mode 0755 to enable
  running the script directly instead of as an argument to `python3`.

- Used `pathlib.Path` on `__file__` to enable running the script from
  any directory, not just from inside `scripts/`.

- Updated `scripts/README.md` to reflect some of the script changes, and
  to improve the Markdown formatting in general.

- Added the `DOWNLOADED_ARTIFACTS_FILE` variable, added its value to
  `.gitignore`, and added a `Path.unlink()` at the end of the script to
  clean it up.

* Third create_repository.py refactoring

Preserves the behavior of the previous version in preparation for
further improvements and eventual behavior changes. Part of #1482.

This vastly improves performance by _not_ downloading and hashing
artifacts that are already present in the current configuration.

Added `PROTOBUF_JAVA_VERSION = "4.28.2"` as a core artifact to avoid
downgrades, and to signal the importance of this particular artifact.

Tested by running the original script, calling `git stash
third_party/...`, then running the updated script. Confirmed that the
updated script produced no new changes in the working tree compared to
the indexed changes, modulo the previous `protobuf-java` downgrade.

* Small create_repository.py format updates, fixes

Added missing traliling comma in `COORDINATE_GROUPS[0]`.

Updated `get_label` to check for groups starting with `org.scala-lang.`

Sorted `deps` in `to_rules_scala_compatible_dict` and always sets `deps`
in the return value, even when empty.

* Ensure Scala 2 libs get "_2" repo names in Scala 3

Without this change, it was possible for Scala 3 core library versions
to become overwritten with Scala 2 versions specified as dependencies of
other jars. Specifically, all the `@io_bazel_rules_scala_scala_*_2` deps
explicitly added to the `scala_3_{1,2,3,4,5}.bzl` files in #1631 for
Scalafmt get stripped of the `_2` suffix before this change.

Also computed `is_scala_3` in `create_file` and passed it through where
needed. At some point it might be worth refactoring the script into a
proper object instead.

* Bump Scalafmt to 3.8.3 in create_repository.py

This matches the version set in #1631.

* Prevent create_repository.py from downgrading jars

The script will now only update an existing artifact if the newly
resolved version is greater than that already in the
third_party/repositories file. Part of #1482.

Tested by running on `master` both before and after #1631. The script
produced no new changes, and is even faster now, since it also won't try
to downgrade existing artifact versions. More specifically, this change
prevents the following downgrades:

- scalap: from latest Scala version to an earlier version, including
  2.13.14 to 2.13.11
- com.lihaoyi.pprint: from pprint_3:0.9.0 to pprint_2.13:0.6.4
- compiler-interface: 1.10.1 to 1.3.5 or 1.9.6
- com.lihaoyi.geny: from geny_3:1.1.1 to geny_2.13:0.6.5

* Ensure all artifact deps written to repo files

Updates the main loop in `create_file` to iterate over the newly
generated artifact dictionary, not the original dictionary parsed from
the existing file. This ensures that all dependencies of the updated
artifacts are written to the file.

This also improves performance when rerunning, since the script will no
longer keep trying to fetch the previously missing artifacts.

Tested by running once, `git add`ing the results, and running again to
ensure the second run was a no-op.

* Update get_label, fix scala3 label bug

Fixes a bug in the previous implementation whereby the `scala3_`
components of org.scala-lang artifact labels listed in an artifact's
`deps` weren't transformed to `scala_`. Improved the handling of
org.scala-lang artifacts more generally via the new
`adjust_scala_lang_label` function.

Refactored the `COORDINATE_GROUPS` array into a series of separate `set`
variables named after their `get_label` transformations. Makes the
`get_label` implementation read a little more easily.

* Replace `split_artifact_and_version`

I finally noticed that `get_maven_coordinates` already did the same
parsing, essentially. I migrated the comment and implementation from
`split_artifact_and_version` into it.

* Check against current ResolvedArtifact instances

Updates `create_artifact_version_map` to
`create_current_resolved_artifacts_map` to set up future improvements.

Specifically, an existing artifact's dependency coordinates aren't
updated unless the resolved artifact version is newer. A small change
after this will ensure all existing dependency coordinates get updated.
This could be a one-off change, or we could decide to keep it; either
way, it would be a very small one to make.

Also:

- Added `MavenCoordinates.artifact_name` to replace
  `artifact_name_from_maven_coordinates`.

- Renamed `is_newer_than_current_version` to `is_newer_version`.

- Added logic in `create_artifact_metadata_map` to ensure values with
  `testonly` set to `True` aren't part of the automatic resolution
  process.

* Handle scalap, org.thesamet.scalapb special cases

Preserves the existing labels for these artifacts.

Tested by running in a new branch that updates the direct dependencies
of core artifacts. Prior to this change, the `scalap` repo and some of
the `org.thesamet.scalapb` repos were duplicated with a new label. After
this change, that duplication disappears.

* Add --version, emit command, stderr only on error

These are changes suggested by @WojciechMazur in #1639.

* Raise and catch SubprocessError, return exit code

Further embellishment of the suggestions from @WojciechMazur in #1639.

This way we can see exactly which version updates encountered an error,
and how many, while continuing to attempt to update other versions.

* Hoist output_dir from create_or_update_repo_file

Makes the logic slightly easier to follow.

* Hoist OUTPUT_DIR as top level constant

I still like passing it as an argument to
`create_or_update_repository_file`.

* Add --output_dir flag, add defaults to --help

Making the output directory configurable from the command line was the
next logical extension. This makes it possible to see what the script
will generate from scratch without having to erase the existing repo
files.

Figured it would be nice to see the default values in the `--help`
output.

* Create a new empty file if no previous file exists

Decided it might be better to literally start from scratch in
`--output_dir` than trying to copy a file from `OUTPUT_DIR`. One could
always copy files from `OUTPUT_DIR` before running the script if so
desired.

Also changed the `__main__` logic to exit immediately on
`CreateRepositoryError` rather than attempt to keep going.

* Remove redundant `file.exists()` check

Eliminated the check in `create_or_update_repository_file` in favor of
that in `copy_previous_version_or_create_new_file_if_missing`.

* Improve `create_repository.py --help` docstrings

* Refactor `get_label` in `create_repository.py`

`get_label` now uses the `SPECIAL_CASE_GROUP_LABELS` dict instead of the
`LAST_GROUP_COMPONENT_GROUP` and `NEXT_TO_LAST_GROUP_COMPONENT_GROUP`
sets.

Also added `com.google.guava` to `ARTIFACT_LABEL_ONLY_GROUPS` to
generate the correct `io_bazel_rules_scala_guava` label. Added
`com.google.api.grpc` and `dev.dirs.directories` to
`SCALA_PROTO_RULES_GROUPS`.

Updated indentation of `ARTIFACT_LABEL_ONLY_GROUPS` and
`GROUP_AND_ARTIFACT_LABEL_GROUPS` elements.

* Emit correct label for scala-collection-compat

Specifically, `org_scala_lang_modules_scala_collection_compat`.
  • Loading branch information
mbland authored Nov 4, 2024
1 parent 8480f9a commit 35353b7
Show file tree
Hide file tree
Showing 11 changed files with 954 additions and 204 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,7 @@ hash2
.vscode
unformatted-*.backup.scala
.scala-build
test/semanticdb/tempsrc
test/semanticdb/tempsrc

# From scripts/create_repository.py
repository-artifacts.json
115 changes: 80 additions & 35 deletions scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,56 +6,101 @@
- [Debugging](#debugging)
- [Requirements](#requirements)

### About
The script allows to update a certain scala_x_x.bzl file and its content (artifact, sha, dependencies), by changing the value of `root_scala_version` variable.
It can be used to create non-existent file for chosen Scala version. <br>
It's using a [https://get-coursier.io/docs/](coursier) in order to **resolve** lists the transitive dependencies of dependencies and **fetch** the JARs of it.

### Usage
Usage from `/rules_scala/scripts`:
````
python3 create_repository.py
````

### Examples
Current value of `root_scala_versions`:
## About

The script allows to update a certain scala_x_x.bzl file and its content
(artifact, sha, dependencies), by changing the value of `root_scala_version`
variable.

It can be used to create non-existent file for chosen Scala version.

It's using a [https://get-coursier.io/docs/](coursier) in order to **resolve**
lists the transitive dependencies of dependencies and **fetch** the JARs of it.

## Usage

Usage from the rules_scala root directory:

```sh
./scripts/create_repository.py
```
root_scala_versions = ["2.11.12", "2.12.19", "2.13.14", "3.1.3", "3.2.2", "3.3.3", "3.4.3", "3.5.0"]

## Examples

Current value of `root_scala_versions`:

```py
root_scala_versions = [
"2.11.12",
"2.12.19",
"2.13.14",
"3.1.3",
"3.2.2",
"3.3.3",
"3.4.3",
"3.5.0",
]
```

To **update** content of `scala_3_4.bzl` file:
```
root_scala_versions = ["2.11.12", "2.12.19", "2.13.14", "3.1.3", "3.2.2", "3.3.3", "3.4.4", "3.5.0"]
^^^^^^^ <- updated version

```py
root_scala_versions = [
"2.11.12",
"2.12.19",
"2.13.14",
"3.1.3",
"3.2.2",
"3.3.3",
"3.4.4", # <- updated version
"3.5.0"
]
```

To **create** new `scala_3_6.bzl` file:
```
root_scala_versions = ["2.11.12", "2.12.19", "2.13.14", "3.1.3", "3.2.2", "3.3.3", "3.4.3", "3.5.0", "3.6.0"]
^^^^^^^ <- new version
```

### Debugging
Certain dependency version may not have a support for chosen Scala version e.g.
```py
root_scala_versions = [
"2.11.12",
"2.12.19",
"2.13.14",
"3.1.3",
"3.2.2",
"3.3.3",
"3.4.3",
"3.5.0",
"3.6.0", # <- new version
]
```

## Debugging

Certain dependency versions may not support a specific Scala versions, e.g.,

```py
kind_projector_version = "0.13.2" if scala_major < "2.13" else "0.13.3"
```

In order of that, there may be situations that script won't work. To debug that problem and adjust the values of hard-coded variables:
```
scala_test_major = "3" if scala_major >= "3.0" else scala_major
scala_fmt_major = "2.13" if scala_major >= "3.0" else scala_major
There may be situations in which the script won't work. To debug that problem
and adjust the values of hard-coded variables:

```py
scalatest_major = "3" if scala_major >= "3.0" else scala_major
scalafmt_major = "2.13" if scala_major >= "3.0" else scala_major
kind_projector_version = "0.13.2" if scala_major < "2.13" else "0.13.3"
f"org.scalameta:scalafmt-core_{scala_fmt_major}:{"2.7.5" if scala_major == "2.11" else scala_fmt_version}"
scalafmt_version = "2.7.5" if scala_major == "2.11" else SCALAFMT_VERSION
```
there is an option to print the output of these two subprocesses:

`output = subprocess.run(f'cs fetch {artifact}', capture_output=True, text=True, shell=True).stdout.splitlines()` <br>
there is an option to print the output of these two subprocesses:

```py
command = f'cs resolve {' '.join(root_artifacts)}'
output = subprocess.run(
command, capture_output=True, text=True, shell=True
).stdout.splitlines()
```
command = f'cs resolve {' '.join(root_artifacts)}'
output = subprocess.run(command, capture_output=True, text=True, shell=True).stdout.splitlines()
```

### Requirements
Installed [Coursier](https://get-coursier.io/) and [Python 3](https://www.python.org/downloads/)
## Requirements

Install [Coursier](https://get-coursier.io/) and
[Python 3](https://www.python.org/downloads/) before running the script.
Loading

0 comments on commit 35353b7

Please sign in to comment.