Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: Seamless Git support #1787

Open
adamziel opened this issue Sep 19, 2024 · 0 comments
Open

Tracking: Seamless Git support #1787

adamziel opened this issue Sep 19, 2024 · 0 comments
Labels
[Package][@wp-playground] Blueprints [Type] Developer Experience [Type] Mindmap Tree [Type] Tracking Tactical breakdown of efforts across the codebase and/or tied to Overview issues.

Comments

@adamziel
Copy link
Collaborator

adamziel commented Sep 19, 2024

Let's make git a first-class citizen in Playground and enable seamless data loading from all contexts where Playground is running:

@adamziel adamziel added [Type] Developer Experience [Type] Tracking Tactical breakdown of efforts across the codebase and/or tied to Overview issues. [Package][@wp-playground] Blueprints [Type] Mindmap Tree labels Sep 19, 2024
@adamziel adamziel moved this from Inbox to Project: Up soon in Playground Board Sep 19, 2024
@adamziel adamziel moved this from Project: Up soon to Project: In Progress in Playground Board Sep 26, 2024
adamziel added a commit that referenced this issue Oct 7, 2024
## Motivation

Related to #1787

Adds a set of TypeScript functions that support the native git protocol
and can power a sparse checkout feature. This is the basis for a faster,
more user-friendly git integration. No more guessing repository paths.
Just provide the repo URL, browse the files, and tell Playground which
directories are plugins, themes, etc.

Technically, this PR performs [git sparse checkout using just
JavaScript](https://adamadam.blog/2024/06/21/cloning-a-git-repository-from-a-web-browser-using-fetch/page/1)
and a generic CORS proxy.

**This PR doesn't provide any user-facing feature yet.** However, it
paves the way to features like:

* Checkout any git repo, even non-GitHub ones, without going through the
OAuth flow
* Retrieve a subset of the files directly from the repo and without
going through zipballs.
* Provide a visual git repo browser (instead of asking the user to
manually type the path)
* Introduce a new Blueprint resource type: git repo
* Fetch the names of all the repository branches (or just the branches
with the specified prefix)
* (future) commit and push to any git repo, even non-GitHub ones

## Notable points of this PR

* Exposes the `sparseCheckout()`, `lsRefs()`, and `listFiles()`
functions from the `@wp-playground/storage` package. I'm not yet sure
whether we need a dedicated `@wp-playground/git` package or not.
* Ships basic unit test coverage for those functions.
* Silences a few warnings in the CORS proxy. CC @brandonpayton we may
not want to do that in the production release.
* Adds `isomorphic-git` as a git submodules in the `/isomorphic-git`
path. We can't rely in the published npm package because it doesn't
export the internal APIs we need to use here.
* Adds a bunch of WIP components in `@wp-playground/components`. They're
not used anywhere on the website yet and I'd rather keep them moving
with the project than isolate them in a PR until they're perfect. We'll
need some accessibility and mobile testing before using them in the
webapp, though.

## How does it even work?

Let me quote [my own
article](https://adamadam.blog/2024/06/21/cloning-a-git-repository-from-a-web-browser-using-fetch/):

### Running a Git Client in the browser

The good news was
[isomorphic-git](https://github.com/isomorphic-git/isomorphic-git),
[wasm-git](https://github.com/petersalomonsen/wasm-git), and a few other
projects were already running Git in the browser. The bad news was none
of them supported fetching a subset of files via [sparse
checkout](https://git-scm.com/docs/git-sparse-checkout). You’d still
have to download 20MB of data even if you only wanted 100KB.

However, Everything the desktop Git client does, including sparse
checkouts, can be done via
[HTTP](https://git-scm.com/docs/http-protocol/2.5.6) by requesting URLs
like
[https://github.com/WordPress/wordpress-playground.git](https://github.com/isomorphic-git/isomorphic-git.git).

Git [documentation](https://git-scm.com/) was… less than helpful, but
eventually it worked! A few hours later I was running Git commands by
sending GET and POST requests to the repository-URLs.

### Fetching a hash of the branch

The first command I needed was ls-refs to get the SHA1 hash of the right
git branch. Here’s how you can get it with fetch() for the HEAD branch
of the WordPress/wordpress-playground repo:

```ts
const response = await fetch(
  'https://github.com/WordPress/gutenberg.git/git-upload-pack',
  {
    method: 'POST',
    headers: {
        'Accept': 'application/x-git-upload-pack-advertisement',
        'content-type': 'application/x-git-upload-pack-request',
        'Git-Protocol': 'version=2'
    },
    body: [
        `0014command=ls-refs\n`,
      // ^^^^ line length in hex
        `0015agent=git/2.37.3\n`,
        `0017object-format=sha1\n`,
        '0001',
      // ^^^^ command separator
        // Filter the results to only contain the HEAD branch,
        // otherwise it will return all the branches and
        // tags which may require downloading many 
        // megabytes of data:
        `0009peel\n`,
        `0014ref-prefix HEAD\n`,
        '0000',
      // ^^^^ end of request
    ].join(""),
  }
);
```

I won’t go into details of the Git protocol – the point is with a few
special headers and lines you can be a Git client. If you paste that
fetch() in your devtools while on GitHub.com, it would return a response
similar to this:

```
0032950f5c8239b6e78e9051ec5e845bac5aa863c4cb HEAD
0000
```

Good! That’s our commit hash.

Fetching a list of objects at a specific commit
With this, we can fetch [the list of
objects](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects) in
that branch:

```ts
fetch("https://github.com/wordpress/gutenberg/git-upload-pack", {
  "headers": {
    "accept": "application/x-git-upload-pack-advertisement",
    "content-type": "application/x-git-upload-pack-request",
  },
  "referrer": "http://localhost:8000/",
  "referrerPolicy": "strict-origin-when-cross-origin",
  "body": [
      `0088want 950f5c8239b6e78e9051ec5e845bac5aa863c4cb multi_ack_detailed no-done side-band-64k thin-pack ofs-delta agent=git/2.37.3 filter \n`,
      `0015filter blob:none\n`,
      // ^ sparse checkout secret says.
      // only fetches a list of objects without
      // their content
      `0035shallow 950f5c8239b6e78e9051ec5e845bac5aa863c4cb\n`,
      `000ddeepen 1\n`,
      `0000`,
      `0009done\n`,
      `0009done\n`,
  ].join(""),
  "method": "POST"
});
```

And here’s the response:

```
00000008NAK
0026�Enumerating objects: 2189, done.
0025�Counting objects:   0% (1/2189)
...
0032�Compressing objects: 100% (1568/1568), done.
2004�PACK��(binary data)
0040 Total 2189 (delta 1), reused 1550 (delta 0), pack-reused 0
0006��0000
```

The binary data after PACK is a compressed list of all objects the
repository had at commit `950f5c8239b6e78e9051ec5e845bac5aa863c4cb`. It
is not a list of files that were committed in `950f5c`. It’s all files.

The [pack format](https://git-scm.com/docs/pack-format) is a binary
blob. It’s similar to
[ZIP](https://en.wikipedia.org/wiki/ZIP_(file_format)) in that it
encodes of a series of objects encoded as a binary header followed by
binary data. Here’s an approximate visual to help grok the idea:

```
PACK format – inaccurate explanation,
Pack consists of the string "PACK" and binary data structured roughly as follows:

 ___________________________________
|                                   |
|        ASCII string "PACK"        |
|        Binary data starts         |
|           Pack Header             |
|___________________________________|
|                                   |
|        Offset 0x0010              |
|          Object 1 Header          |  (Object type, hash,
|                                   |   data length, etc.)
|        ________________           |
|       |                |          |
|       |  Object 1 Data |          |  (Gzipped data)
|       |________________|          |
|                                   |
|        Offset 0x0050              |
|          Object 2 Header          |  
|                                   | 
|        ________________           |
|       |                |          |
|       |  Object 2 Data |          |  (Gzipped data)
|       |________________|          |
|___________________________________|
|                                   |
|           Pack Footer             |
|         Binary data ends          |
|___________________________________|
```

The decoding is tedious so I used [the
decoder](https://github.com/isomorphic-git/isomorphic-git/blob/main/src/models/GitPackIndex.js)
provided by isomorphic Git package:

```ts
const iterator = streamToIterator(await response.body);
const parsed = await parseUploadPackResponse(iterator);
const packfile = Buffer.from(await collect(parsed.packfile));

const index = await GitPackIndex.fromPack({
    pack: packfile
});
```

The parsed index object provides information about all the objects
encoded in the received packfile. Let’s peek inside:

```
{
  // ...
  "hashes": [
    "5f4f0a5367476fdb7c98ffa5fa35300ec4c3f48b",
    "950f5c8239b6e78e9051ec5e845bac5aa863c4cb",
    // ...
  ],
  "offsets": {
    "5f4f0a5367476fdb7c98ffa5fa35300ec4c3f48b": 12,
    "950f5c8239b6e78e9051ec5e845bac5aa863c4cb": 181,
    // ...
  },
  "offsetCache": {
    "12": {
      "type": "tree",
      "object": "100644 async-http-download.php\u0000��p4��\u0014�g\u0015i��\u0004��\\���100644 async-http.php\u0000�\n�8K�RT������F\u001b8�� (more binary data)"
    },
    // ...
  },
  "readDepth": 4,
  "externalReadDepth": 0
}
```

Each object has a type and some data. The decoder stored some objects in
the offsetCache, and kept track of others in form of a hash => offset in
packfile mapping.

Let’s read the details of the commit from our parsed index:

```ts
> const commit = await index.read({
    oid: '950f5c8239b6e78e9051ec5e845bac5aa863c4cb'
  });

{
  "type": "commit",
  "object": "tree c7b8440c83b8c987895f9a1949650eb60bccd2ec\nparent b6132f2d381865353e09edf88aa64a0dd042811a\nauthor Adam Zieliński <[email protected]> 1717689108 +0200\ncommitter Adam Zieliński <[email protected]> 1717689108 +0200\n\nUpdate rebuild workflow\n"
}
```

It’s the object type, the hash, and the uncompressed object bytes which,
in this case, provide us commit details in a specific microformat. From
here, we can get the tree hash and look for its details in the same
index we’ve already downloaded:

```ts
> const tree = await index.read({ oid: "c7b8440c83b8c987895f9a1949650eb60bccd2ec" })

{
  "type": "tree",
  "object": "40000 .github\u0000_O\nSgGo�|����50\u000e���40000 (... binary data ...)"
}
```

The contents of the tree object is a list of files in the repository.
Just like with commit, tree details are encoded in their own
microformat. Luckily, isomorphic-git ships relevant decoders:

```ts
> GitTree.from(result.object).entries()
[
  {
    "mode": "040000",
    "path": ".github",
    "oid": "ece277ec006eb517d5c5399d7a5c00b7e61018f1",
    "type": "blob"
  },
  {
    "mode": "100644",
    "path": "readme.txt",
    "oid": "3fe6e3aaf1dc4df204be575041383fc8e2e1e070",
    "type": "blob"
  },
  {
    "mode": "040000",
    "path": "src",
    "oid": "dbc84f20ee64fbd924617b41ee0e66128c9a8d97",
    "type": "tree"
  },
  // ...
]
```

Yay! That’s the list of files and directories in the repository root
with there hashes! From here we can recursively retrieve the ones
relevant for our sparse checkout.

### Fetching full files from specific paths

We’re finally ready to checkout a few particular paths. Let’s ask for a
blob at readme.txt and a tree at docs/tools:

```ts
const response = fetch("https://github.com/wordpress/gutenberg/git-upload-pack", {
  "headers": {
    "accept": "application/x-git-upload-pack-advertisement",
    "content-type": "application/x-git-upload-pack-request",
  },
  "body": [
      `0081want 28facb763312f40c9ab3251fb91edb87c8476cf9 multi_ack_detailed no-done side-band-64k thin-pack ofs-delta agent=git/2.37.3\n`,
      `0081want 3fe6e3aaf1dc4df204be575041383fc8e2e1e070 multi_ack_detailed no-done side-band-64k thin-pack ofs-delta agent=git/2.37.3\n`,
      `00000009done`
  ].join(""),
  "method": "POST"
});
```

The response is another index, but this time each blob comes with binary
contents. Some decoding and recursive processing later, we finally get
this:

```ts
{
    "readme.txt": "=== Gutenberg ===\nContri (...)",
    "docs/tool": {
        "index.js": "/**\n * External depe (...)",
        "manifest.js": "/* eslint no-console (...)"
    }
}
```

Yay! It took some effort, but it was worth it!

###  Cors proxy and other notes

You’ll still need to run a CORS proxy. The fetch() examples above will
work if you try them in devtools on github.com, but you won’t be able to
just use them on your site. Git API typically does not expose the
Access-Control-* headers required by the browser to run these requests.

So we need a server after all. Was this a failure, then? No! A CORS
proxy is cheaper, simpler, and safer to maintain than a Git service.
Also, it can fetch all the files in 3 fetch() requests instead of two
requests per file like the GitHub REST API requires.

#### Try it yourself

I’ve shared a functional demo that includes a CORS proxy in this
repository on GitHub:
https://github.com/adamziel/git-sparse-checkout-in-js


## Testing instructions

* Start two terminals
* Run `nx dev playground-components` in the first one
* Run `nx start playground-php-cors-proxy` in the second one to start
the PHP Cors proxy
* Go to http://localhost:5173/ and play with the UI
* Play with an early demo of git repository browser shipped in this PR:



https://github.com/user-attachments/assets/731b2a89-8004-4d0b-8c6f-8646d4840a29
adamziel added a commit that referenced this issue Oct 7, 2024
Related to #1787, Follows up on #1793

Implements GitDirectoryResource to enable loading files directly from
git repositories as follows:

```ts
{
	"landingPage": "/guides/for-plugin-developers.md",
	"steps": [
		{
			"step": "writeFiles",
			"writeToPath": "/wordpress/guides",
			"filesTree": {
				"resource": "git:directory",
				"url": "https://github.com/WordPress/wordpress-playground.git",
				"ref": "trunk",
				"path": "packages/docs/site/docs/main/guides"
			}
		}
	]
}
```

 ## Implementation details

Uses git client functions merged in
#1764 to sparse
checkout the requested files. It also leans on the PHP CORS proxy which
is now started as a part of the `npm run dev` command.

The CORS proxy URL is configurable per `compileBlueprint()` call so that each
Playground runtime may choose to either use it or not. For example, it
wouldn't be very useful in the CLI version of Playground.

 ## Testing plan

Go to
`http://localhost:5400/website-server/#{%20%22landingPage%22:%20%22/guides/for-plugin-developers.md%22,%20%22steps%22:%20[%20{%20%22step%22:%20%22writeFiles%22,%20%22writeToPath%22:%20%22/wordpress/guides%22,%20%22filesTree%22:%20{%20%22resource%22:%20%22git:directory%22,%20%22url%22:%20%22https://github.com/WordPress/wordpress-playground.git%22,%20%22ref%22:%20%22trunk%22,%20%22path%22:%20%22packages/docs/site/docs/main/guides%22%20}%20}%20]%20}`
and confirm Playground loads a markdown file.
adamziel added a commit that referenced this issue Oct 8, 2024
Related to
#1787, Follows
up on #1793

Implements GitDirectoryResource to enable loading files directly from
git repositories as follows:

```ts
{
	"landingPage": "/guides/for-plugin-developers.md",
	"steps": [
		{
			"step": "writeFiles",
			"writeToPath": "/wordpress/guides",
			"filesTree": {
				"resource": "git:directory",
				"url": "https://github.com/WordPress/wordpress-playground.git",
				"ref": "trunk",
				"path": "packages/docs/site/docs/main/guides"
			}
		}
	]
}
```

 ## Implementation details

Uses git client functions merged in
#1764 to sparse
checkout the requested files. It also leans on the PHP CORS proxy which
is now started as a part of the `npm run dev` command.

The CORS proxy URL is configurable per `compileBlueprint()` call so that
each Playground runtime may choose to either use it or not. For example,
it wouldn't be very useful in the CLI version of Playground.

 ## Testing plan

Go to

```
http://localhost:5400/website-server/#{%20%22landingPage%22:%20%22/guides/for-plugin-developers.md%22,%20%22steps%22:%20[%20{%20%22step%22:%20%22writeFiles%22,%20%22writeToPath%22:%20%22/wordpress/guides%22,%20%22filesTree%22:%20{%20%22resource%22:%20%22git:directory%22,%20%22url%22:%20%22https://github.com/WordPress/wordpress-playground.git%22,%20%22ref%22:%20%22trunk%22,%20%22path%22:%20%22packages/docs/site/docs/main/guides%22%20}%20}%20]%20}
```

And confirm the Playground loads a markdown file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Package][@wp-playground] Blueprints [Type] Developer Experience [Type] Mindmap Tree [Type] Tracking Tactical breakdown of efforts across the codebase and/or tied to Overview issues.
Projects
Status: Project: In Progress
Development

No branches or pull requests

1 participant