Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix github commented filter and wrong until #363

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Felixoid
Copy link

Hello. Here is a couple of fixes.

It fixes #362 by monkey-patching a wrong until value. GH search API requires exact values for both since and until

It fixes #325 by additional filtering PRs and issues. For each entity, an additional request to GH API is done to check if the user has commented on it.

@Felixoid
Copy link
Author

Here's the difference for running did --width=0 --since=2024-03-19 --until=2024-03-25 --format=markdown | grep -v private on the version 0.21 and from the branch:

--- /tmp/did-0.21.md    2024-03-26 23:20:08.714955329 +0100
+++ /tmp/did-felixoid.md        2024-03-26 23:19:54.871448943 +0100
@@ -11,7 +11,5 @@

-* Issues commented on github: 8
-  * [woefe/git-prompt.zsh#44](https://github.com/woefe/git-prompt.zsh/issues/44) - [Feature] Status of the current process
+* Issues commented on github: 5
   * [ClickHouse/ClickHouse#61541](https://github.com/ClickHouse/ClickHouse/issues/61541) - Improvements in CI jobs queue
   * [psss/did#362](https://github.com/psss/did/issues/362) - Too recent GitHub pull requests included
-  * [ClickHouse/ClickHouse#58979](https://github.com/ClickHouse/ClickHouse/issues/58979) - Logical error: 'Expected single dictionary argument for function.'. (Analyzer)
   * [ClickHouse/ClickHouse#58497](https://github.com/ClickHouse/ClickHouse/issues/58497) - Why sending `SEGSEGV` does not terminate the clickhouse-server process
@@ -19,6 +17,4 @@
   * [ClickHouse/ClickHouse#54767](https://github.com/ClickHouse/ClickHouse/issues/54767) - ClickHouse Keeper docker image and VOLUME directive
-  * [ClickHouse/ClickHouse#39797](https://github.com/ClickHouse/ClickHouse/issues/39797) - Integration tests fail with timeout trying to pull `minio` image

-* Pull requests created on github: 12
-  * [psss/did#363](https://github.com/psss/did/pull/363) - Fix GH commented filter and wrong `until`
+* Pull requests created on github: 11
   * [ClickHouse/ClickHouse#61751](https://github.com/ClickHouse/ClickHouse/pull/61751) - Process removed files, decouple _check_mime
@@ -33,3 +29,3 @@

-* Pull requests commented on github: 25
+* Pull requests commented on github: 16
   * [ClickHouse/ClickHouse#61715](https://github.com/ClickHouse/ClickHouse/pull/61715) - Test submodules-retries feature of ClickHouse/checkout
@@ -37,3 +33,2 @@
   * [ClickHouse/ClickHouse#61662](https://github.com/ClickHouse/ClickHouse/pull/61662) - Fix logging of autoscaling lambda, add test for effective_capacity
-  * [ClickHouse/ClickHouse#61658](https://github.com/ClickHouse/ClickHouse/pull/61658) - CI: integration tests: use runner as py module
   * [ClickHouse/ClickHouse#61639](https://github.com/ClickHouse/ClickHouse/pull/61639) - [Experiment] Test the wrong duplicate
@@ -43,14 +38,6 @@
   * [ClickHouse/ClickHouse#61592](https://github.com/ClickHouse/ClickHouse/pull/61592) - Improve build_download_helper
-  * [ClickHouse/ClickHouse#61544](https://github.com/ClickHouse/ClickHouse/pull/61544) - Write `binary version -> commit hash` mapping to CI database (in private)
   * [ClickHouse/ClickHouse#61530](https://github.com/ClickHouse/ClickHouse/pull/61530) - Fix client `-s` argument
-  * [ClickHouse/ClickHouse#61197](https://github.com/ClickHouse/ClickHouse/pull/61197) - Merge commit style autofix
-  * [ClickHouse/ClickHouse#61105](https://github.com/ClickHouse/ClickHouse/pull/61105) - Fix removing is_active node after re-creation
   * [ClickHouse/ClickHouse#61089](https://github.com/ClickHouse/ClickHouse/pull/61089) - Analyzer: Fix 01244_optimize_distributed_group_by_sharding_key
-  * [ClickHouse/ClickHouse#60656](https://github.com/ClickHouse/ClickHouse/pull/60656) - Add --now option to enable and start the service
   * [python/mypy#16965](https://github.com/python/mypy/pull/16965) - Add a function to search for pyproject.toml in a project root
-  * [ClickHouse/ClickHouse#59394](https://github.com/ClickHouse/ClickHouse/pull/59394) - Release pull request for branch 24.1
-  * [ClickHouse/ClickHouse#58287](https://github.com/ClickHouse/ClickHouse/pull/58287) - Release pull request for branch 23.12
   * [docker-library/official-images#15846](https://github.com/docker-library/official-images/pull/15846) - Initial commitment of clickhouse official image
-  * [ClickHouse/ClickHouse#54187](https://github.com/ClickHouse/ClickHouse/pull/54187) - Release pull request for branch 23.8
-  * [ClickHouse/ClickHouse#48280](https://github.com/ClickHouse/ClickHouse/pull/48280) - Release pull request for branch 23.3

@Felixoid Felixoid force-pushed the main branch 2 times, most recently from b7983a0 to 38256ec Compare March 26, 2024 22:41
@Felixoid
Copy link
Author

Felixoid commented Mar 27, 2024

The tests fail with

:: [ 23:23:02 ] :: [  BEGIN   ] :: Running 'did --test last quarter'
[ERROR] GitHub API rate limit exceeded. Consider creating an access token.

It's expected. I am not sure how isolated the runners are now, but I feel that they are more or less static and used across the many jobs.

As a quick solution, maybe it would help to replace did --test last quarter by did --test last week. On the other hand, maybe, we could add an argument --github-approximate to filter comments by default and fallback to the previous behavior.

@Felixoid
Copy link
Author

Felixoid commented Apr 4, 2024

Hello, dear @psss.

I've spent some additional time researching whether it's possible to add an argument like --github-approximate easily. I'm not sure if it's even a way to go, so I'm looking forward to your feedback.

For now, I've found that plugins' arguments are parsed automatically on the fly. I implemented something to add a hook for global StatsGroup arguments, but it would be good to discuss whether it fits the project or not.

@Felixoid Felixoid force-pushed the main branch 2 times, most recently from a015ee9 to d585e98 Compare April 4, 2024 21:31
@Felixoid
Copy link
Author

Felixoid commented May 7, 2024

Dear @psss, is there anything I can do to have a kick-start here?

@Felixoid
Copy link
Author

The last commit fixes the next issue

> did --width=0 --since=2024-05-14 --until=2024-05-27 --format=markdown --GitHub-issues-commented
Status report for given date range (2024-05-14 to 2024-05-27)
=============================================================

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Mikhail f. Shiryaev <[email protected]>
Mikhail f. Shiryaev <[email protected]>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Traceback (most recent call last):
  File "/usr/bin/did", line 42, in <module>
    did.cli.main()
  File "/usr/lib/python3.12/site-packages/did/cli.py", line 238, in main
    user_stats.check()
  File "/usr/lib/python3.12/site-packages/did/stats.py", line 169, in check
    stat.check()
  File "/usr/lib/python3.12/site-packages/did/stats.py", line 169, in check
    stat.check()
  File "/usr/lib/python3.12/site-packages/did/stats.py", line 78, in check
    self.fetch()
  File "/usr/lib/python3.12/site-packages/did/plugins/github.py", line 223, in fetch
    if approx or self.parent.github.has_comments(issue, user, since, until)]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/did/plugins/github.py", line 104, in has_comments
    date = Date(comment["created_at"].split("T", 1)[0])
                ~~~~~~~^^^^^^^^^^^^^^
TypeError: string indices must be integers, not 'str'

And makes it more friendly and descriptive:

> python3 ./bin/did --width=0 --since=2024-05-14 --until=2024-05-27 --format=markdown --GitHub-issues-commented
Status report for given date range (2024-05-14 to 2024-05-27)
=============================================================

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Mikhail f. Shiryaev <[email protected]>
Mikhail f. Shiryaev <[email protected]>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ERROR  Problems with the token scope. Please, see the details in the response: {"message":"Resource protected by organization SAML enforcement. You must grant your Personal Access token access to this organization.","documentation_url":"https://docs.github.com/articles/authenticating-to-a-github-organization-with-saml-single-sign-on/"}

@psss
Copy link
Owner

psss commented Oct 4, 2024

@Felixoid, thanks a lot for looking into this! And sorry for the late response. The approach for fixing both issues looks good to me. Have two suggestions:

  • Would you mind covering the wrong until in a separate pull request?
  • What about implementing the approximate-commented as a config option rather than the command line?

Could you please rebase on the latest main, after recent changes there is a conflict. Sorry for the troubles.

@psss psss self-assigned this Oct 4, 2024
@psss psss added this to the 0.22 milestone Oct 4, 2024
@psss psss changed the title Fix GH commented filter and wrong until Fix github commented filter and wrong until Oct 4, 2024
@Felixoid
Copy link
Author

Felixoid commented Oct 4, 2024

No problem, will do both!

Glad you're back

@Felixoid
Copy link
Author

  • What about implementing the approximate-commented as a config option rather than the command line?

Should it be enabled or disabled by default? I'd disable it by default, but then smoking tests need the custom config

@psss
Copy link
Owner

psss commented Oct 14, 2024

Agreed, I'd also recommend to make the stats precise by default. Is the problem with the smoke test that it takes too long? We can adjust it if necessary. Also in recent commits a better support for handling github limits has been implemented, so even for larger reports it should now work nicely (although slowly) without interruption.

@Felixoid
Copy link
Author

The problem was the guaranteed out-of-requests issue. Maybe with a new approach it will be reliable

GH search API unable to identify if the issues and PRs were updated by
the user's comment. To make it reliable, one needs to search the
comments for the entity after the `since` date, sort them by created_at
field, and check if the comments in the given interval are created by
the user.
The name of the method in different classes was the same, although each
one did another thing. Renamed some of them to reflect the actual task.
@Felixoid
Copy link
Author

I'll work on two things:

  • Change the approx-comments setting
  • Review the failing tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Too recent GitHub pull requests included Github "issues commented" includes comments by other users
2 participants