Skip to content

Commit

Permalink
chore: ..
Browse files Browse the repository at this point in the history
  • Loading branch information
zeeshanlakhani committed May 1, 2024
1 parent 789bec0 commit 76d73ab
Showing 1 changed file with 19 additions and 16 deletions.
35 changes: 19 additions & 16 deletions example-llm-workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ _Author_: [Zeeshan Lakhani][zeeshan-lakhani]

## Quickstart

You can run this project completely in your console (shell). It consists of running a forked [Homestar][homestar] node (read more about Homestar in [What is Homestar in a Nutshell?](#what-is-homestar-in-a-nutshell)) and a local version of the [EveryCLI][every-cli] utility, a tool for working with the [Everywhere Computer][everywhere-computer], which is an interface for running workflows composed of [Wasm][wasm] / [Wasi][wasi] functions on Homestar.
You can run this project completely in your console (shell). It consists of running a forked [Homestar][homestar] node (read more about Homestar in [What is Homestar in a Nutshell?](#what-is-homestar-in-a-nutshell)) and a local version of the [EveryCLI][every-cli] utility, a tool for working with the [Everywhere Computer][everywhere-computer], which is an interface for running workflows composed of [Wasm][wasm] / [WASI][wasi] functions on Homestar.

Let's first unzip [our tar file][tar-link] containing binaries and what's needed to run our demos, including workflow JSON files and [LLaMA 3][llama-3] models (created from [llama.cpp][llama.cpp]).
Let's first unzip [our tar file][tar-link] containing binaries and what we will need to run our demos, including workflow JSON files and [LLaMA 3][llama-3] models (created from [llama.cpp][llama.cpp]).

```console
tar -xf haii.tar.gz -C <target/directory>
Expand All @@ -23,13 +23,13 @@ cd <target/directory>/example-llm-workflows

Everything going forward should take place within the `example-llm-workflows` folder.

In one window, let's run the forked Homestar node (built with LLaMA 3 bindings) while also setting some environment variables:
In one console window, run the forked Homestar node (built with LLaMA 3 bindings) while also setting some environment variables:

```console
EVERY_CLI=true LLAMA_METAL=1 ./homestar start -c homestar_config.toml
```

In another console window, let's run the local version of EVERY-CLI:
In another console window, run the local version of EVERY-CLI:

```console
./every-cli/cli.js dev llm.wasm
Expand Down Expand Up @@ -69,15 +69,15 @@ IPFS is running at http://127.0.0.1:5001/debug/vars
➜ Local: http://127.0.0.1:3000/
```

EveryCLI has started a gateway that loads Wasm components onto [IPFS][ipfs] for persistent storage, prepares workflows, and integrates with the Homestar runtime to schedule and execute these workflows.
EveryCLI has started a gateway that loads Wasm components onto [IPFS][ipfs] for persistent storage. It also prepares workflows and integrates with the Homestar runtime to schedule and execute workflows.

With EveryCLI up, in another window, let's POST a job/workflow to the local EveryCLI `localhost` endpoint, which will execute the [`simple.json`](./simple.json2) workflow on [Homestar][homestar]:
With EveryCLI up, in another window, POST a job/workflow to the local EveryCLI `localhost` endpoint, which will execute the [`simple.json`](./simple.json2) workflow on [Homestar][homestar]:

```console
curl localhost:3000/run --json @simple.json
```

Once this completes (which can take some time mind you), we'll get back a response like this from the execution, which is returning text from a [LLaMA 3 instruction tuned model][instruction-llama]:
Once this completes (which can take some time mind you), we'll get back a response like this from the execution, which is returning generated text from a [LLaMA 3 instruction tuned model][instruction-llama] about an awesome topic like *Attractions in Pittsburgh*.

> The city of Pittsburgh has many attractions that visitors can enjoy, including the world-famous Carnegie Museum of Natural History, the Andy Warhol Museum, and the famous Mount Washington neighborhood with its stunning views of the city skyline. Visitors can also take a ride on the Duquesne or Monongahela Incline for a panoramic view of the city, or explore the many parks and green spaces throughout the city. Additionally, Pittsburgh is home to several professional sports teams, including the Steelers, Pirates, and Penguins, making it a great destination for sports enthusiasts.
In this article, we'll explore some of the best attractions in Pittsburgh, including museums, historic sites, and outdoor activities.
Expand Down Expand Up @@ -229,7 +229,7 @@ fn gen(topic: String) -> String {

## Expanded Tutorial and Background

Next, we're going to dive deeper into working with workflows and defining [AI Chains][ai-chains], which chain LLM steps together. All of these example workflows are somewhat paired down, but they were inspired by the paragraph/article map-reduce collaborative writing task discussed in the [CrowdForge: Crowdsourcing Complex Work][crowdforge] paper, one in which we reproduced for homework with Google's API-based [Gemini language model][gemini].
Next, we're going to dive deeper into working with workflows and defining [AI Chains][ai-chains], which chain LLM steps together. All of these example workflows are somewhat pared down, but they were inspired by the paragraph/article map-reduce collaborative writing task discussed in the [CrowdForge: Crowdsourcing Complex Work][crowdforge] paper, the one in which we reproduced for homework with Google's API-based [Gemini language model][gemini].

### The Basics

Expand All @@ -245,7 +245,7 @@ We've already covered this simplistic LLM prompt workflow centered on providing
curl localhost:3000/run --json @chain1.json
```

Chaining multiple prompts together can address a much wider range of human tasks and allow for the exploration of different prompting techniques, like few-shot and [chain-of-thought][chain-of-thought-paper] variations, which can help guide LLMs to better empirical results. Given that our compute engine is designed around the framework of an ordered workflow or pipeline, where the output of one task feeds into the input of another, we can easily generate AI chains by way of pipelines where tasks build off each previous task's output:
Chaining multiple prompts together can address a much wider range of human tasks and allow for the exploration of different prompting techniques, like [few-shot][few-shot] and [chain-of-thought][chain-of-thought-paper] variations, which can help guide LLMs to better empirical results. Given that our compute engine is designed around the framework of an ordered workflow or pipeline, where the output of one task feeds into the input of another, we can easily generate AI chains by way of pipelines where tasks build off each previous task's output:

``` json
{
Expand Down Expand Up @@ -323,7 +323,7 @@ Here's an example response provided by the model from the executed workflow:
curl localhost:3000/run --json @map_reduce.json
```

To more applicability encode the [MapReduce][map-reduce] example from the [Crowdforge][crowdforge] paper, I implemented a `prompt_chain` Wasm/Wasi function registered on the Host that takes in a system prompt (e.g. "You are journalist writing about cities."), an input (e.g. an ongoing article), a map step prompt with a `{{text}}` placeholder that is filled in, a reduce step, which folds over (combines) the generated text(s) from the map step, and then the optional LLaMA model stored as a [`gguf`][gguf]. If the optional model path is not provided, the Host will fall back to the default `Meta-Llama-3-8B-Instruct.Q4_0.gguf` model.
To more applicability encode the [MapReduce][map-reduce] example from the [Crowdforge][crowdforge] paper, I implemented a `prompt_chain` Wasm/WASI function registered on the Host that takes in a system prompt (e.g. "You are journalist writing about cities."), an input (e.g. an ongoing article), a map step prompt with a `{{text}}` placeholder that is filled in, a reduce step, which folds over (combines) the generated text(s) from the map step, and then the optional LLaMA model stored as a [`gguf`][gguf]. If the optional model path is not provided, the Host will fall back to the default `Meta-Llama-3-8B-Instruct.Q4_0.gguf` model.

```rust
async fn prompt_chain(
Expand Down Expand Up @@ -411,26 +411,28 @@ Is there a way to always provide *new* hallucinations from the Host model? Yes,

## Foundations for Privacy

The learning goals of this project were to experiment with working with LLMs locally on hosts where the training data and tuning of a model remains private and only derived information from prompt-based AI chains can be shared with other users/peers for consumption, allowing for working with AI computation in ways not tied to any specific vendor or large cloud provider. Essentially, this showcases a positive avenue for decentralized, user-controlled AI-oriented computation that's everything that [IEEE Spectrum's Open-Source AI Is Uniquely Dangerous][ieee] isn't.
The learning goals of this project were to experiment with working with LLMs locally on hosts where the training data and tuning of a model remains private and only derived information from prompt-based AI chains can be shared with other users/peers for consumption. AI computation is possible without being tied to a specific vendor or large cloud provider. Essentially, this showcases a positive avenue for decentralized, user-controlled AI-oriented computation that's everything that [IEEE Spectrum's Open-Source AI Is Uniquely Dangerous][ieee] isn't.

Localized, open-source LLM models for home and on-prem use cases are growing in popularity, even leading to the creation of a [LocalLLaMA reddit][reddit-post] community! We've seen how [GDPR][gdpr] has increased the need for companies to be more careful around [PII management and data privacy isolation across regions and region-compliance laws][so-privacy].

IP is also a concern for companies, as they want to protect proprietary data, including trained models of their own. Some companies have the funding to pay OpenAI or other cloud AI provider platforms to work with their data through private channels, but not every company can afford this; nor does this present security against a massive IP data leak.

Self-hosted, privately managed model deployments hit on many of the privacy and security modules taught in our course. Incorporating ways for users to chain LLM steps together while controlling what inference gets exhibited without the infrastructure concerns or data risks typically associated with external cloud services, presents a unique opportunity to democratize AI capabilities. By ensuring that users can interact with and execute complex AI
workflows with ease, the project aims to bridge the gap between advanced AI technologies and those with some software development background. This approach not only aligns with the course's focus on privacy and security, but also empowers users by providing them with tools to leverage AI in a secure, private, and user-friendly manner.
workflows with ease, this project aims to bridge the gap between advanced AI technologies and those with some software development background. This approach not only aligns with the course's focus on privacy and security, but also empowers users by providing them with tools to leverage AI in a secure, private, and user-friendly manner.

Regarding further alignment with our class, the rise of LLMs has made the transparency of existence, operation, and first-class design for transparency urgent ethical matters as more and more companies move to gather data and push AI agents into our everyday lives. OpenAI's chat interface (to GPT) spawned much of the burgeoning popularity of LLMs, but with OpenAI, we know the CEO of the company is linked to another company (Worldcoin) that has [shady digital-identity practices][iris-scanning].

**Trust** has come up a lot in our course. The future of HCI/AI is deeply entangled in what of our data is private and how we can control what becomes public. This project demonstrates tooling that makes this line distinct and controllable.
**Trust** has come up a lot in our course. The future of HCI/AI is deeply entangled in what parts of our data are private and how we can control what becomes public. This project demonstrates tooling that makes this line distinct and controllable.

### Peer Sharing with Only 1 LLM

![3 connected notes, but only 1 host with bindings for LLaMA 3](./llama_host.png)

In the [video walkthrough][video] we showcase how one node built without LLM features and LLaMA bindings can receive receipts for a workflow run on a node it's connected with over the LAN (local-area network) that was compiled with the features and bindings. Imagine finely tuning a LLaMA model on custom healthcare data (which is doable) and wanting certain users to run prompt chains against it. You wouldn't want the users of the platform to access the model itself, but you'd like for them to interact with it with clear separation. These on-prem models are becoming more and more important for companies, countries, and the people in between them.
In the [video walkthrough][video] we showcase how one node built without LLM features and LLaMA bindings can receive receipts for a workflow run on a node it's connected to over the LAN (local-area network) that was compiled with the features and bindings. Imagine finely tuning a LLaMA model on custom healthcare data (which is doable) and wanting certain users to run prompt chains against it. You wouldn't want the users of the platform to access the model itself, but you'd like for them to interact with it with clear separation. These on-prem models are becoming more and more important for companies, countries, and the people in between them.

### Running LLM Tasks Offline

The [video walkthrough][video] illustrates the power of executing AI-focused computational workflows on local devices, particularly in offline scenarios. This can also be extended to scenarios where one wants to intentionally restrict online connectivity. All of this not only underscores the versatility and robustness of decentralized AI frameworks but also highlights the broader implications of embracing [local-first software][wired] principles within the realm of HCI/AI. By prioritizing local execution and data processing, users gain greater control over their computing environments, ensuring privacy, and security, and reducing reliance on external infrastructures.
The [video walkthrough][video] illustrates the power of executing AI-focused computational workflows on local devices, particularly in offline scenarios. This can also be extended to scenarios where one wants to intentionally restrict online connectivity. All of this not only underscores the versatility and robustness of decentralized AI frameworks but also highlights the broader implications of embracing [local-first software][wired] principles within the realm of HCI/AI. By prioritizing local execution and data processing, users gain greater control over their computing environments, ensuring privacy and security, and reducing reliance on external infrastructures.

This paradigm shift towards local-first approaches resonates deeply within ongoing HCI/AI research, emphasizing user empowerment, data sovereignty, and the preservation of privacy in an increasingly interconnected landscape. As the boundaries between digital and physical realms blur, embracing local-first methodologies becomes pivotal in shaping a more transparent, accountable, and user-centric future for AI-driven technologies.

Expand All @@ -447,7 +449,7 @@ The outputs of these functions are indexed by their instruction, composed of the
For this project, I forked and extended the Homestar codebase (of which I'm a lead developer) to

* Provide feature-flagged (for cross-platform compilation) local-only LLaMA model bindings with tuned parameters.
* Integrate general-usage Wasm/Wasi Host LLM-chain prompt functions (with typed interfaces) for execution by Guest code written in any language that can be compiled to Wasm.
* Integrate general-usage Wasm/WASI Host LLM-chain prompt functions (with typed interfaces) for execution by Guest code written in any language that can be compiled to Wasm.
* Provide Guest example code (written in Rust) and workflows inspired by our class homework and the [crowdforge] MapReduce example (for the demo!).

[ai-chains]: https://arxiv.org/abs/2110.01691
Expand All @@ -458,6 +460,7 @@ For this project, I forked and extended the Homestar codebase (of which I'm a le
[durable-fn]: https://angelhof.github.io/files/papers/durable-functions-2021-oopsla.pdf
[every-cli]: https://docs.everywhere.computer/everycli/
[everywhere-computer]: https://docs.everywhere.computer
[few-shot]: https://www.promptingguide.ai/techniques/fewshot
[gemini]: https://arxiv.org/pdf/2312.11805
[gdpr]: https://gdpr-info.eu/
[gguf]: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md
Expand Down

0 comments on commit 76d73ab

Please sign in to comment.