Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ProGit v1 redirects #1915

Merged
merged 64 commits into from
Nov 20, 2024
Merged

Add ProGit v1 redirects #1915

merged 64 commits into from
Nov 20, 2024

Conversation

dscho
Copy link
Member

@dscho dscho commented Nov 6, 2024

Changes

With this PR, the original ProGit v1 links (like https://git-scm.com/book/en/Getting-Started-Git-Basics) will no longer show the 404 page, but instead redirect to the corresponding v2 section.

Context

It was reported in #1782 that the ProGit v1 URLs stopped working a while ago. This was pre-Hugo/Pagefind switch, resulting in a 500 page, but after that switch, they are still broken, only now a 400 page is shown instead.

I mulled for a while what to do because we cannot have wildcard routes in a static site.

Eventually, I suggested to resurrect the previously-working URLs by inspecting the original progit repository.

In the end, I figured out a way that was much more convenient: obtain the table of contents from the Internet Archive and then use common sense to manually map the v1 URLs to the v2 URLs.

I did not want to leave potential contributors with a lot of uncertainty whether this is even workable, so I investigated a bit, and even came up with that manual v1 <-> v2 mapping..

After that, I did not want to invest even more effort myself, hoping that other people would run with the idea. But nobody bit. So today, I thought: "How hard could this be?". And to my surprise, it was not hard at all.

Note that only the first 5 commits of this PR are interesting with regards to review; The remainder of the commits were created by a manually-triggered update-book workflow run where I specifically forced a complete rebuild (the second parameter of the getPendingBookUpdates() call reflects the force-rebuild input parameter.

For convenience, I deployed this to my fork, you can verify that the ProGit v1 URLs work e.g. by picking a link from the original ProGit v1 table of contents in the Internet Archive, editing it so that it points to my forked site instead, and then verify that it redirects as intended. For example:

I also tried to make sure that the translated pages work as well. Example:

This fixes #1782

@dscho
Copy link
Member Author

dscho commented Nov 7, 2024

I just updated this by rebasing to the current gh-pages revision (skipping the book updates), then letting another update-book workflow run update it. Here is the diff (spoiler alert: it essentially just updates the eBook download URLs).

@LemmingAvalanche
Copy link
Contributor

I was asked to review this PR, a task I’m not qualified for (no Ruby knowledge etc.). So I’ll probably try to read the code back.

@LemmingAvalanche
Copy link
Contributor

I wanted to test some redirects but I had the same local-only problem as last time I tried to serve the page locally: got infinite redirects when navigating.

script/book.rb Show resolved Hide resolved
script/book.rb Show resolved Hide resolved
script/extract-book-v1-urls.rb Show resolved Hide resolved
script/extract-book-v1-urls.rb Show resolved Hide resolved
script/book.rb Show resolved Hide resolved
@LemmingAvalanche
Copy link
Contributor

I compared the book_v1.yml on Chapter 4 (since V1/V2 diverge a lot) to Dscho’s initial table:

#1782 (comment)

The mapping in that file is the same as the table.

@LemmingAvalanche
Copy link
Contributor

I was a bit confused by all the one-commit merge commits for the localized book versions. But from reading the Git mailing list I know that Dscho knows his merge commits. So there’s a reason for those.

LGTM

@dscho
Copy link
Member Author

dscho commented Nov 11, 2024

I wanted to test some redirects but I had the same local-only problem as last time I tried to serve the page locally: got infinite redirects when navigating.

@LemmingAvalanche how do you serve the local site? Via hugo --serve? That won't work, due to the .html-less pages that GitHub supports (and which we must use for backwards compatibility with the Rails app version of git-scm.com).

Could you try serving it via script/serve-public.js instead?

@dscho
Copy link
Member Author

dscho commented Nov 11, 2024

I was a bit confused by all the one-commit merge commits for the localized book versions. But from reading the Git mailing list I know that Dscho knows his merge commits. So there’s a reason for those.

@LemmingAvalanche you mean commits like 4b00169? These are a side effect of a matrix of jobs taking care of updating the individual translations of the book. The changes made by those jobs all have to be merged together. (They could be merged via a single octopus merge, but my experience is that octopus merges baffle users and Git commands alike, therefore I avoid them.)

@dscho
Copy link
Member Author

dscho commented Nov 11, 2024

Replying to this thread more visibly:

Uh oh. The $v1_to_v2 data is not even used. Which means that https://dscho.github.io/git-scm.com/book/en/Git-Tools-Submodules redirects to https://dscho.github.io/git-scm.com/book/en/v2/GitHub-Summary (i.e. 6.6 -> 6.6, when it should redirect to https://dscho.github.io/git-scm.com/book/en/v2/Git-Tools-Submodules instead, i.e. to 7.11)!

Will fix immediately.

I fixed this, force-pushed, then let the update-book.yml workflow do its job. The combined result is here.

You can see that this fixed the English links starting here.

You can verify that the Git-Tools-Submodules section I mentioned in the quoted message is now correctly aliased

Or just direct your web browser to https://dscho.github.io/git-scm.com/book/en/Git-Tools-Submodules or to https://dscho.github.io/git-scm.com/book/en/v1/Git-Tools-Submodules to see that it works now.

@dscho
Copy link
Member Author

dscho commented Nov 11, 2024

Will have to re-fix, hold on a sec.

@dscho
Copy link
Member Author

dscho commented Nov 11, 2024

Will have to re-fix, hold on a sec.

Fixed. Still had this left-over from an internal iteration: https://github.com/git/git-scm.com/compare/6e803d156d69500d0cf099771078e46934d4cd3e..8d9bfac33af659438239db9f9926f5da5181717d

@LemmingAvalanche
Copy link
Contributor

(They could be merged via a single octopus merge, but my experience is that octopus merges baffle users and Git commands alike, therefore I avoid them.)

@dscho A sound assessment. :P

@dscho
Copy link
Member Author

dscho commented Nov 14, 2024

I wanted to test some redirects but I had the same local-only problem as last time I tried to serve the page locally: got infinite redirects when navigating.

@LemmingAvalanche how do you serve the local site? Via hugo --serve? That won't work, due to the .html-less pages that GitHub supports (and which we must use for backwards compatibility with the Rails app version of git-scm.com).

Could you try serving it via script/serve-public.js instead?

@LemmingAvalanche did you have any luck with that yet?

@LemmingAvalanche
Copy link
Contributor

did you have any luck with that yet?

@dscho I’ll try this evening

@LemmingAvalanche
Copy link
Contributor

I’ll try this evening

This setup works fine.

hugo
node script/serve-public.js

I’m writing a longer reply now.

@LemmingAvalanche
Copy link
Contributor

Most links are good to go. I just checked that the title in the link makes sense with the title of the subchapter that I arrived at via the redirect.

See Dscho’s findings in this comment. Maybe in particular the Wayback Machine snapshot of the v1 book.

Setup

hugo
node script/serve-public.js

None of the chapter subtitle links work

Each v1, X.1 chapter has a link for the chapter title and one link for the chapter title plus the subtitle. The chapter title (only) links but not the second variant.

For example:

The affected links are the following (space separated). I just end up at the first link instead of the expected on.

Link Expected
localhost:5000/book/en/Getting-Started-About-Version-Control http://localhost:5000/book/en/v2/Getting-Started-About-Version-Control
localhost:5000/book/en/Git-Basics-Getting-a-Git-Repository http://localhost:5000/book/en/v2/Git-Basics-Getting-a-Git-Repository
localhost:5000/book/en/Git-Branching-What-a-Branch-Is http://localhost:5000/book/en/v2/Git-Branching-Branches-in-a-Nutshell
localhost:5000/book/en/Git-on-the-Server-The-Protocols http://localhost:5000/book/en/v2/Git-on-the-Server-The-Protocols
localhost:5000/book/en/Distributed-Git-Distributed-Workflows http://localhost:5000/book/en/v2/Distributed-Git-Distributed-Workflows
localhost:5000/book/en/Git-Tools-Revision-Selection http://localhost:5000/book/en/v2/Git-Tools-Revision-Selection
localhost:5000/book/en/Customizing-Git-Git-Configuration http://localhost:5000/book/en/v2/Customizing-Git-Git-Configuration
localhost:5000/book/en/Git-and-Other-Systems-Git-and-Subversion http://localhost:5000/book/en/v2/Git-and-Other-Systems-Git-as-a-Client
localhost:5000/book/en/Git-Internals-Plumbing-and-Porcelain http://localhost:5000/book/en/v2/Git-Internals-Plumbing-and-Porcelain

These are the links that have different titles in the two versions:

Link Expected
localhost:5000/book/en/Git-Branching-What-a-Branch-Is http://localhost:5000/book/en/v2/Git-Branching-Branches-in-a-Nutshell
localhost:5000/book/en/Git-and-Other-Systems-Git-and-Subversion http://localhost:5000/book/en/v2/Git-and-Other-Systems-Git-as-a-Client

Chapter 2

Wrong redirect for this one:

Link Expected Actual
localhost:5000/book/en/Git-Basics-Tips-and-Tricks 404 http://localhost:5000/book/en/v2/Git-Basics-Git-Aliases

I don’t see any “tips and tricks”. Seems like this should just 404.

Chapter 4

I’ve added a column 404? since the faulty behavior includes a 404. But three of them should result in 404, to be clear, just without the faulty redirect.

Link Expected Actual 404?
localhost:5000/book/en/Git-on-the-Server-GitWeb http://localhost:5000/book/en/v2/Git-on-the-Server-GitWeb http://localhost:1313/book/en/v2/Git-on-the-Server-Smart-HTTP Yes
localhost:5000/book/en/Git-on-the-Server-Gitosis 404 http://localhost:1313/book/en/v2/Git-on-the-Server-GitWeb Yes
localhost:5000/book/en/Git-on-the-Server-Gitolite 404 http://localhost:1313/book/en/v2/Git-on-the-Server-GitLab Yes
localhost:5000/book/en/Git-on-the-Server-Hosted-Git 404 http://localhost:1313/book/en/v2/Git-on-the-Server-Summary Yes

Chapter 5

The summary page doesn’t work for me.

Link Expected Actual 404?
http://localhost:5000/book/en/Distributed-Git-Summary http://localhost:5000/book/en/v2/Distributed-Git-Summary http://localhost:1313/book/en/v2/Distributed-Git-Summary Yes

Chapter 10

Here are wrong redirects for all subchapters. The pattern is that you get redirected to the previous subchapter. For example the first link redirects to 9.3 “Git and Other Systems - Summary”. The last one (summary) redirects to the redirect page that we expected for the second-to-last v1 link.

Link Expected Actual
localhost:5000/book/en/Git-Internals-Git-References http://localhost:5000/book/en/v2/Git-Internals-Git-References http://localhost:1313/book/en/v2/Git-and-Other-Systems-Summary
localhost:5000/book/en/Git-Internals-Packfiles http://localhost:5000/book/en/v2/Git-Internals-Packfiles http://localhost:5000/book/en/v2/Git-Internals-Git-References
localhost:5000/book/en/Git-Internals-The-Refspec http://localhost:5000/book/en/v2/Git-Internals-The-Refspec http://localhost:5000/book/en/v2/Git-Internals-Packfiles
localhost:5000/book/en/Git-Internals-Transfer-Protocols http://localhost:5000/book/en/v2/Git-Internals-Transfer-Protocols http://localhost:5000/book/en/v2/Git-Internals-The-Refspec
localhost:5000/book/en/Git-Internals-Maintenance-and-Data-Recovery http://localhost:5000/book/en/v2/Git-Internals-Maintenance-and-Data-Recovery http://localhost:5000/book/en/v2/Git-Internals-Transfer-Protocols
localhost:5000/book/en/Git-Internals-Summary http://localhost:5000/book/en/v2/Git-Internals-Summary http://localhost:5000/book/en/v2/Git-Internals-Maintenance-and-Data-Recovery

@dscho dscho force-pushed the book-v1-redirects branch 3 times, most recently from d162536 to a073826 Compare November 17, 2024 20:16
@dscho
Copy link
Member Author

dscho commented Nov 17, 2024

@LemmingAvalanche thank you for doing some well-needed QA!

Each v1, X.1 chapter has a link for the chapter title and one link for the chapter title plus the subtitle. The chapter title (only) links but not the second variant.

For example:

* http://localhost:5000/book/en/Git-Branching

* Which correctly redirects to http://localhost:5000/book/en/v2/Git-Branching-Branches-in-a-Nutshell

* But: http://localhost:5000/book/en/Git-Branching-Branches-in-a-Nutshell

* Gets 404

I tried very hard to get both chapter X and the X.1 section to link to X.1 (because there are no chapter links in v2). Here is where both v1's 3 and 3.1 are supposed to map to v2's 3.1.

But I see that https://dscho.github.io/git-scm.com/book/en/Git-Branching-What-a-Branch-Is does not redirect as desired...

<clicketyclick>

Aha! My command of the Ruby language was insufficient, and I failed to realize that the way I implemented book_v1_aliases, an enumerator was returned instead of an array, which means that only the first array item (and in this case, intended alias) was respected.

This is now fixed.

Chapter 2

Wrong redirect for this one:

Link Expected Actual
localhost:5000/book/en/Git-Basics-Tips-and-Tricks 404 http://localhost:5000/book/en/v2/Git-Basics-Git-Aliases

Actually, if you have a closer look, https://web.archive.org/web/20141022015825/http://git-scm.com/book/en/Git-Basics-Tips-and-Tricks#Git-Aliases talks about Git Aliases, and the other half of the section talks about completion (which I did not find mentioned in v2).

So I think that "Git Aliases" is the best match here.

Chapter 4

I’ve added a column 404? since the faulty behavior includes a 404. But three of them should result in 404, to be clear, just without the faulty redirect.

Link Expected Actual 404?
localhost:5000/book/en/Git-on-the-Server-GitWeb http://localhost:5000/book/en/v2/Git-on-the-Server-GitWeb http://localhost:1313/book/en/v2/Git-on-the-Server-Smart-HTTP Yes
localhost:5000/book/en/Git-on-the-Server-Gitosis 404 http://localhost:1313/book/en/v2/Git-on-the-Server-GitWeb Yes
localhost:5000/book/en/Git-on-the-Server-Gitolite 404 http://localhost:1313/book/en/v2/Git-on-the-Server-GitLab Yes
localhost:5000/book/en/Git-on-the-Server-Hosted-Git 404 http://localhost:1313/book/en/v2/Git-on-the-Server-Summary Yes

I considered 404ing, but thought it would be much more helpful to redirect to http://localhost:5000/book/en/v2/Git-on-the-Server-Third-Party-Hosted-Options instead. 404s kind of suck when you're trying to find information.

Chapter 5

The summary page doesn’t work for me.

Link Expected Actual 404?
http://localhost:5000/book/en/Distributed-Git-Summary http://localhost:5000/book/en/v2/Distributed-Git-Summary http://localhost:1313/book/en/v2/Distributed-Git-Summary Yes

Thank you for pointing this out! This was an inadvertent omission of the section 5.4 in the mapping, which is now fixed.

Chapter 10

Here are wrong redirects for all subchapters. The pattern is that you get redirected to the previous subchapter. For example the first link redirects to 9.3 “Git and Other Systems - Summary”. The last one (summary) redirects to the redirect page that we expected for the second-to-last v1 link.

Link Expected Actual
localhost:5000/book/en/Git-Internals-Git-References http://localhost:5000/book/en/v2/Git-Internals-Git-References http://localhost:1313/book/en/v2/Git-and-Other-Systems-Summary
localhost:5000/book/en/Git-Internals-Packfiles http://localhost:5000/book/en/v2/Git-Internals-Packfiles http://localhost:5000/book/en/v2/Git-Internals-Git-References
localhost:5000/book/en/Git-Internals-The-Refspec http://localhost:5000/book/en/v2/Git-Internals-The-Refspec http://localhost:5000/book/en/v2/Git-Internals-Packfiles
localhost:5000/book/en/Git-Internals-Transfer-Protocols http://localhost:5000/book/en/v2/Git-Internals-Transfer-Protocols http://localhost:5000/book/en/v2/Git-Internals-The-Refspec
localhost:5000/book/en/Git-Internals-Maintenance-and-Data-Recovery http://localhost:5000/book/en/v2/Git-Internals-Maintenance-and-Data-Recovery http://localhost:5000/book/en/v2/Git-Internals-Transfer-Protocols
localhost:5000/book/en/Git-Internals-Summary http://localhost:5000/book/en/v2/Git-Internals-Summary http://localhost:5000/book/en

Thank you for diligent checking!

This was a mistake in my mapping, where I accidentally doubled the "10.2" entry, which is now fixed.

@LemmingAvalanche
Copy link
Contributor

The following was tested on 9333518 (Merge branch 'book-fa' of bundle-fa/fa.bundle into book-v1-redirects, 2024-11-17)
.

Actually, if you have a closer look, https://web.archive.org/web/20141022015825/http://git-scm.com/book/en/Git-Basics-Tips-and-Tricks#Git-Aliases talks about Git Aliases, and the other half of the section talks about completion (which I did not find mentioned in v2).

So I think that "Git Aliases" is the best match here.

@dscho

Okay got it. Makes sense.

Chapter 4

...
I considered 404ing, but thought it would be much more helpful to redirect to http://localhost:5000/book/en/v2/Git-on-the-Server-Third-Party-Hosted-Options instead. 404s kind of suck when you're trying to find information.

Confirmed that this works as intended. See below in the chapter walkthrough.

Chapters 1–3

All okay. Nothing to note.

Chapter 4

Like Dscho mentioned here: some outdated-in-v2 links now point to

http://localhost:5000/book/en/v2/Git-on-the-Server-Third-Party-Hosted-Options

Which makes sense when we see that these are thirt-party things, just not
covered in v2:

localhost:5000/book/en/Git-on-the-Server-GitWeb
localhost:5000/book/en/Git-on-the-Server-Gitosis
localhost:5000/book/en/Git-on-the-Server-Gitolite

Chapter 5

All okay.

Chapter 6

v2 chapter 6 is on GitHub. Does not exist in v1.

(It seems that I ended up using the v2 chapters as the headings since they don’t line up quite as I expected)

Chapter 7

“Subtree merging” redirects to ch. 7.1:

Link Actual
localhost:5000/book/en/Git-Tools-Subtree-Merging http://localhost:5000/book/en/v2/Git-Tools-Revision-Selection

It might be better if it redirected to ch. 7.8:

http://localhost:5000/book/en/v2/Git-Tools-Advanced-Merging

Because that one has the subsubchapter (?) “Subtree Merging”.

Chapter 8

All good.

Chapter 9

v1 ch. 8.1 with title “Git and Other Systems: Git and Subversion” redirects to 9.1 “Git and Other Systems - Git as a Client”.

Link Actual
localhost:5000/book/en/Git-and-Other-Systems-Git-and-Subversion http://localhost:5000/book/en/v2/Git-and-Other-Systems-Git-as-a-Client

Which makes sense since:

  1. Both are the first subchapter
  2. There is a “Git and Subversion” subsubchapter/section in v2

Chapter 10

All good.

Thank you for diligent checking!

Thanks for the structured reply :D

In 2014, the second edition of the ProGit book was started, in 2016 this
edition became the default, and in 2020 all v1 URLs were redirected to
the landing page of v2.

At some stage, the v1 links stopped working, returning a "500 Internal
server error".

After switching the git-scm.com site to a static Hugo-generated site,
those links now return a "404 That page doesn't exist".

This is far from an ideal situation.

Unfortunately, in a static site, there is no way to install wildcard
routes that would redirect to a better URL, therefore we have to
enumerate all of the URLs that we want to redirect.

Internet Archive to the rescue!

This script downloads the tables of contents of the ProGit v1 book and
its translations, and determines the URLs that had been active back
then, maps them to the v2 equivalents (on a best effort basis), and then
writes out a YAML file with that information.

Signed-off-by: Johannes Schindelin <[email protected]>
This file was generated by running script/extract-book-v1-urls.rb.

Signed-off-by: Johannes Schindelin <[email protected]>
The `File.write(path, content)` form is quite a bit more readable than
the long form.

Signed-off-by: Johannes Schindelin <[email protected]>
The default language of the ProGit book is English, therefore
https://git-scm.com/book should redirect to the table of contents of the
English version of that book.

This means that the `/external/book/content/book/en/_index.html` file
needs to be part of the sparse checkout, otherwise the workflow run
would not be able to update it (should it ever become necessary).

This was not a problem so far because that file remained unchanged (and
is likely to remain so for quite some time yet).

Signed-off-by: Johannes Schindelin <[email protected]>
Once upon a time, there first edition of the ProGit book was available
on Git's home page, and the sun was shining. Then, one day in 2014, the
sun shone brighter and work was begun to write the second edition of
this book. At some stage, this became the default when directing web
browsers to https://git-scm.com/book, and a few years later, 2020 or so,
the links that formerly led to the first edition would redirect to v2.

Then, one day, clouds moved across the sky and the redirects from v1 to
v2 stopped working and instead a "500 Internal server error" page was
shown.

Time went by and nobody really knew how to fix it (or more likely,
wasn't in the mood, or wanted other people to fix it).

Finally, in the fall of 2024, git-scm.com was switched to a static web
site, generated using Hugo, and local development became much easier.
Naturally, the v1-to-v2 redirects were no longer in place and the v1
links therefore showed 500 no longer, but a 404.

Still, nobody knew how to fix it, or wasn't in the mood, or wanted other
people to fix it for them.

Until now. Now is the day when we resurrect the v1-to-v2 redirects, in
even more glory than ever before, for now we redirect to the v2 sections
that correspond to the v1 sections (as far as possible, that is)!

Only one (slight) fly in the ointment: URLs to v1 sections of the book
which contain anchors will keep those anchors as-are, and not translate
them to the corresponding new anchors. Example:
Git-Basics-Getting-a-Git-Repository#Cloning-an-Existing-Repository
should redirect to v2/Git-Basics-Getting-a-Git-Repository#_git_cloning,
but does not. It redirects to that page but still tries to find the
anchor `#Cloning-an-Existing-Repository`.

Alas, this is where I do not know how to fix it, or ain't in the mood,
or want other people to fix it for themselves.

This commit addresses git#1782

Signed-off-by: Johannes Schindelin <[email protected]>
dscho added 26 commits November 19, 2024 21:08
@dscho
Copy link
Member Author

dscho commented Nov 19, 2024

Chapter 7

“Subtree merging” redirects to ch. 7.1:

Link Actual
localhost:5000/book/en/Git-Tools-Subtree-Merging http://localhost:5000/book/en/v2/Git-Tools-Revision-Selection

It might be better if it redirected to ch. 7.8:

http://localhost:5000/book/en/v2/Git-Tools-Advanced-Merging

Because that one has the subsubchapter (?) “Subtree Merging”.

Fixed!

Of course it would be even better if it redirected to https://git-scm.com/book/en/v2/Git-Tools-Advanced-Merging#_subtree_merge, but that would require more than just adding aliases to the target page; Files with redirect_to front matter would need to be created, similar to the redirects for /book/en and /book/en/v1 to /book/en/v2. Doable, but I'd like to declare the current state as good enough. Agree?

@LemmingAvalanche
Copy link
Contributor

Fixed!

Confirmed.

Doable, but I'd like to declare the current state as good enough. Agree?

Yep.

LGTM

@dscho dscho merged commit 6238923 into git:gh-pages Nov 20, 2024
@dscho dscho deleted the book-v1-redirects branch November 20, 2024 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

500 Internal server error when visiting link for Pro Git v1 book
2 participants