Skip to content
This repository was archived by the owner on Jan 28, 2021. It is now read-only.
This repository was archived by the owner on Jan 28, 2021. It is now read-only.

Wrong state of SquashedTable in SHOW PROCESSLIST #658

@smacker

Description

@smacker
Contributor

6639/113 doesn't look correct:

Command: query
   Time: 2373
  State: SquashedTable(ref_commits, commits, commit_files)(6639/113), blobs(0/113)
   Info: SELECT DISTINCT LANGUAGE(cf.file_path, b.blob_content) AS lang
FROM blobs b
NATURAL JOIN commit_files cf
NATURAL JOIN commits c
NATURAL JOIN ref_commits r
WHERE r.ref_name = 'HEAD'
AND r.history_index = 0
AND is_binary(b.blob_content) = false
AND cf.file_path NOT REGEXP '^vendor.*'
AND lang != ""

Activity

ajnavarro

ajnavarro commented on Dec 4, 2018

@ajnavarro
Contributor

Could you provide more info? How did you reproduce it? how many repositories did you have? did you modify the repositories folder after launch gitbase?

smacker

smacker commented on Dec 4, 2018

@smacker
ContributorAuthor

I had 113 repositories. All public repos of src-d organization.

curl https://api.github.com/orgs/src-d/repos?per_page=100 > repos_list.json
mkdir -p repos
for repo in `cat repos_list.json | jq -r '.[] | .clone_url'`
do
    name=`echo ${repo%.git} | cut -d '/' -f 5`
    git clone "$repo" "repos/$name"
done

(and then repeat it with page=2)

Gitbase was restarted after cloning finished and I didn't modify repos anymore.
The query to reproduce is visible in the output of SHOW PROCESSLIST above.

I hope it would help!

ajnavarro

ajnavarro commented on Dec 4, 2018

@ajnavarro
Contributor

which version did you use? I cannot reproduce the error with the latest 0.18.0-beta.3

smacker

smacker commented on Dec 4, 2018

@smacker
ContributorAuthor
$ ./gitbase version
gitbase (v0.18.0-beta.3) - build 11-27-2018_11_31_37

Easily reproducible for me. Tried with empty everything.

Use the script above to get repositories.

Then:

mysql> select count(*) from repositories;
+----------+
| COUNT(*) |
+----------+
|      100 |
+----------+
1 row in set (0.01 sec)

mysql> SELECT DISTINCT LANGUAGE(cf.file_path, b.blob_content) AS lang
    -> FROM blobs b
    -> NATURAL JOIN commit_files cf
    -> NATURAL JOIN commits c
    -> NATURAL JOIN ref_commits r
    -> WHERE r.ref_name = 'HEAD'
    -> AND r.history_index = 0
    -> AND is_binary(b.blob_content) = false
    -> AND cf.file_path NOT REGEXP '^vendor.*'
    -> AND lang != ""
    -> ;

PROCESSLIST shows 299/100.

mysql> SHOW PROCESSLIST \G;
*************************** 1. row ***************************
     Id: 3
   User: root
   Host: 127.0.0.1:52412
     db: gitbase
Command: query
   Time: 25
  State: SquashedTable(ref_commits, commits, commit_files)(299/100), blobs(0/100)
   Info: SELECT DISTINCT LANGUAGE(cf.file_path, b.blob_content) AS lang
FROM blobs b
NATURAL JOIN commit_files cf
NATURAL JOIN commits c
NATURAL JOIN ref_commits r
WHERE r.ref_name = 'HEAD'
AND r.history_index = 0
AND is_binary(b.blob_content) = false
AND cf.file_path NOT REGEXP '^vendor.*'
AND lang != ""
*************************** 2. row ***************************
     Id: 6
   User: root
   Host: 127.0.0.1:52412
     db: gitbase
Command: query
   Time: 0
  State: running
   Info: SHOW PROCESSLIST
2 rows in set (0.00 sec)

ERROR:
No query specified

After few minutes 1399/100:

*************************** 2. row ***************************
     Id: 3
   User: root
   Host: 127.0.0.1:52412
     db: gitbase
Command: query
   Time: 125
  State: SquashedTable(ref_commits, commits, commit_files)(1399/100), blobs(0/100)
   Info: SELECT DISTINCT LANGUAGE(cf.file_path, b.blob_content) AS lang
FROM blobs b
NATURAL JOIN commit_files cf
NATURAL JOIN commits c
NATURAL JOIN ref_commits r
WHERE r.ref_name = 'HEAD'
AND r.history_index = 0
AND is_binary(b.blob_content) = false
AND cf.file_path NOT REGEXP '^vendor.*'
AND lang != ""
self-assigned this
on Dec 5, 2018
erizocosmico

erizocosmico commented on Mar 19, 2019

@erizocosmico
Contributor

The PR that enables auto in memory joins on small tables will mitigate this to a certain extent, but I don't think this can be solved in any way as long as we do multi pass on joins. There's just no way to know ahead of time the maximum number. It's a very confusing thing, but not one we can solve to the best of my knowledge.

Perhaps we should think of a way to display progress that will not incur in this. What if instead of showing X/Y we show just the number of partitions iterated? You will not know when it will end, but it serves the purpose of knowing if it's stuck and where it's stuck more or less. If we keep it this way, 90% of the time (when it's not in-memory) it's not gonna work.

kuba--

kuba-- commented on Mar 21, 2019

@kuba--
Contributor

IMO, any changing counter which can show some progress would be good enough (and maybe less confusing than proportion)

removed their assignment
on Mar 27, 2019
self-assigned this
on Apr 4, 2019
transferred this issue fromsrc-d/gitbaseon Apr 4, 2019
reopened this on May 24, 2019
removed their assignment
on May 24, 2019
kuba--

kuba-- commented on May 30, 2019

@kuba--
Contributor

Any ideas?

erizocosmico

erizocosmico commented on May 30, 2019

@erizocosmico
Contributor

It just can't be done if join is performed in multipass. Only solution is to not display (x/y), but only x

ajnavarro

ajnavarro commented on Jun 3, 2019

@ajnavarro
Contributor

Could be possible to hide y only if join cannot be done on memory?

erizocosmico

erizocosmico commented on Jun 3, 2019

@erizocosmico
Contributor

It starts in in-memory mode and then switches to multipass during execution, so you don't know until then. And the switch is done per join on the join, so the table has no knowledge about that.

ajnavarro

ajnavarro commented on Jun 3, 2019

@ajnavarro
Contributor

I think in most of the cases having the total number of partitions is really useful, so I would rather keep it, even if sometimes can be misleading. We can add some documentation explaining what those numbers mean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    blockedSome other issue is blocking thisbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @smacker@ajnavarro@erizocosmico@kuba--

      Issue actions

        Wrong state of SquashedTable in SHOW PROCESSLIST · Issue #658 · src-d/go-mysql-server