Skip to content

Commit

Permalink
chore: merge main branch
Browse files Browse the repository at this point in the history
  • Loading branch information
zhoushaw committed Dec 30, 2024
2 parents dfb15aa + 8042bcc commit a028b3c
Show file tree
Hide file tree
Showing 100 changed files with 3,587 additions and 9,516 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
name: LLM Connectivity Issue / 模型连接错误
about: How to solve the LLM connectivity problem
title: "[Connectivity]"
labels: ''
assignees: ''

---

## Read this before open issue

How to choose and config a model: https://midscenejs.com/model-provider.html

Use this project to check the connection: https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test

## If the error persists, tell us these information

- Where are you using Midscene.js (Chrome extension, yaml with cli, Puppeteer,…)

- The version of Midscene.js or Extension

- The error message

- The model name and endpoint (if could be public)

## Security Check

Do NOT include your API key in your issue! Revoke it immediately if it has already been leaked in your issue.
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
with:
ref: ${{ github.event.inputs.branch }}
- name: Pushing to the protected branch 'protected'
uses: CasperWA/push-protected@v2
uses: zhoushaw/push-protected@v2
with:
token: ${{ secrets.PUSH_TO_PROTECTED_BRANCH }}
branch: ${{ github.event.inputs.branch }}
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ jspm_packages/

# dotenv environment variables file
.env
.env.*

# next.js build output
.next
Expand Down
13 changes: 12 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,16 @@
},
"editor.defaultFormatter": "biomejs.biome",
"editor.formatOnSave": true,
"cSpell.words": ["AITEST", "aweme", "douyin", "httpbin", "iconfont", "taobao"]
"cSpell.words": [
"AITEST",
"Aliyun",
"aweme",
"doubao",
"douyin",
"httpbin",
"iconfont",
"qwen",
"taobao",
"Volcengine"
]
}
1 change: 1 addition & 0 deletions README.ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ Midscene.jsは、自然言語を使用してページを制御し、アサーシ
* [YAML形式の自動化スクリプトを使用する](https://midscenejs.com/automate-with-scripts-in-yaml.html)
* [Puppeteerとの統合](https://midscenejs.com/integrate-with-puppeteer.html)
* [Playwrightとの統合](https://midscenejs.com/integrate-with-playwright.html)
* [モデルとサービスプロバイダーのカスタマイズ](https://midscenejs.com/model-provider.html)

## ライセンス

Expand Down
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,14 @@ Midscene.js is an AI-powered automation SDK can control the page, perform assert
* [Automate with Scripts in YAML](https://midscenejs.com/automate-with-scripts-in-yaml.html)
* [Integrate with Puppeteer](https://midscenejs.com/integrate-with-puppeteer.html)
* [Integrate with Playwright](https://midscenejs.com/integrate-with-playwright.html)
* [Customize Model and Provider](https://midscenejs.com/model-provider.html)

## Community

* [Lark Group](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=291q2b25-e913-411a-8c51-191e59aab14d)


<img src="https://github.com/user-attachments/assets/7c132fbf-37a7-4005-8fb1-59342efdf9b2" alt="lark group link" width="300" />

## License

Expand Down
9 changes: 8 additions & 1 deletion README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Midscene.js 是一个由 AI 驱动的自动化 SDK,能够使用自然语言对
- **自然语言互动 👆**:只需描述你的步骤,Midscene 会为你规划和操作用户界面
- **理解UI、JSON格式回答 🔍**:你可以提出关于数据格式的要求,然后得到 JSON 格式的预期回应。
- **直观断言 🤔**:用自然语言表达你的断言,AI 会理解并处理。
- **开箱即用的LLM 🪓**使用公开的多模态大语言模型( 如GPT-4o ),无需任何定制训练。
- **开箱即用的LLM 🪓**支持使用公开的多模态大语言模型( 如 GPT-4o ),无需任何定制训练。
- **可视化报告 🎞️**:通过我们的测试报告和 Playground,你可以轻松理解和调试整个过程。
- **全新体验 🔥**:体验全新的自动化开发世界,尽情享受吧!

Expand All @@ -43,6 +43,13 @@ Midscene.js 是一个由 AI 驱动的自动化 SDK,能够使用自然语言对
* [使用 YAML 格式的自动化脚本](https://midscenejs.com/zh/automate-with-scripts-in-yaml.html)
* [集成到 Puppeteer](https://midscenejs.com/zh/integrate-with-puppeteer.html)
* [集成到 Playwright](https://midscenejs.com/zh/integrate-with-playwright.html)
* [自定义模型和服务商](https://midscenejs.com/zh/model-provider.html)

## 社区

* [飞书群](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=291q2b25-e913-411a-8c51-191e59aab14d)

<img src="https://github.com/user-attachments/assets/7c132fbf-37a7-4005-8fb1-59342efdf9b2" alt="lark group link" width="300" />


## 授权许可
Expand Down
2 changes: 1 addition & 1 deletion apps/site/docs/en/automate-with-scripts-in-yaml.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ or you can use a `.env` file to store the configuration
OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

or you may [customize model provider](./model-provider.html)
or you may [customize model and provider](./model-provider.html)

## Start

Expand Down
10 changes: 3 additions & 7 deletions apps/site/docs/en/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,7 @@

## Can Midscene smartly plan the actions according to my one-line goal? Like executing "Tweet 'hello world'"

Midscene is an automation assistance SDK with a key feature of action stability — ensuring the same actions are performed in each run. To maintain this stability, we encourage you to provide detailed instructions to help the AI understand each step of your task.

If you require a 'goal-to-task' AI planning tool, you can develop one based on Midscene.
No. Midscene is an automation assistance SDK with a key feature of action stability — ensuring the same actions are performed in each run. To maintain this stability, we encourage you to provide detailed instructions to help the AI understand each step of your task.

Related Docs: [Prompting Tips](./prompting-tips.html)

Expand All @@ -16,11 +14,9 @@ There are some limitations with Midscene. We are still working on them.
2. LLM is not 100% stable. Even GPT-4o can't return the right answer all the time. Following the [Prompting Tips](./prompting-tips) will help improve stability.
3. Since we use JavaScript to retrieve items from the page, the elements inside the iframe cannot be accessed.

## Which LLM should I choose ?

Midscene needs a multimodal Large Language Model (LLM) to understand the UI. Currently, we find that OpenAI's GPT-4o performs much better than others.
## Can I use a model other than `gpt-4o`?

You can [customize model provider](./model-provider.html) if needed.
Yes. You can [customize model and provider](./model-provider.html) if needed.

## About the token cost

Expand Down
10 changes: 8 additions & 2 deletions apps/site/docs/en/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ Introducing Midscene.js, an innovative SDK designed to bring joy back to automat

Midscene.js leverages a multimodal Large Language Model (LLM) to intuitively “understand” your user interface and carry out the necessary actions. You can simply describe the interaction steps or expected data formats, and the AI will handle the execution for you.

Currently, the model we are using by default is the OpenAI GPT-4o model, while you can customize it to a different model if needed.

<div style={{"width": "100%", "display": "flex", justifyContent: "center"}}>
<iframe
style={{"maxWidth": "100%", "width": "800px", "height": "450px"}}
Expand Down Expand Up @@ -67,3 +65,11 @@ Midscene will provide a visual report after each run. With this report, you can
## Just you and model provider, no third-party services

⁠Midscene.js is an open-source project (GitHub: [Midscene](https://github.com/web-infra-dev/midscene/)) under the MIT license. You can run it in your own environment. All data gathered from pages will be sent directly to OpenAI or the custom model provider according to your configuration. Therefore, only you and the model provider will have access to the data. No third-party platform will access the data.

## Customize Model

Currently, the model we are using by default is the OpenAI GPT-4o model, while you can [customize it to a different multimodal model](./model-provider.html) if needed.

## Start with Chrome Extension

To quickly experience the main features of Midscene, you can use the [Chrome Extension](./quick-experience.html). It allows you to use Midscene on any webpage without writing any code.
2 changes: 1 addition & 1 deletion apps/site/docs/en/integrate-with-playwright.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ you can check the demo project of Playwright here: [https://github.com/web-infra

## Preparation

Config the OpenAI API key, or [customize model provider](./model-provider.html)
Config the OpenAI API key, or [customize model and provider](./model-provider.html)

```bash
# replace with your own
Expand Down
4 changes: 3 additions & 1 deletion apps/site/docs/en/integrate-with-puppeteer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,13 @@ import { PackageManagerTabs } from '@theme';

:::info Demo Project
you can check the demo project of Puppeteer here: [https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo](https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo)

There is also a demo of Puppeteer with Vitest: [https://github.com/web-infra-dev/midscene-example/tree/main/puppeteer-with-vitest-demo](https://github.com/web-infra-dev/midscene-example/tree/main/puppeteer-with-vitest-demo)
:::

## Preparation

Config the OpenAI API key, or [customize model provider](./model-provider.html)
Config the OpenAI API key, or [customize model and provider](./model-provider.html)

```bash
# replace with your own
Expand Down
80 changes: 76 additions & 4 deletions apps/site/docs/en/model-provider.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Customize Model Provider
# Customize Model and Provider

Midscene uses the OpenAI SDK as the default AI service. You can customize the configuration using environment variables.
Midscene uses the OpenAI SDK to call AI services. You can customize the configuration using environment variables. All the configs can also be used in the [Chrome Extension](./quick-experience.html).

There are the main configs, in which `OPENAI_API_KEY` is required.
These are the main configs, in which `OPENAI_API_KEY` is required.

Required:

Expand All @@ -21,11 +21,83 @@ export OPENAI_BASE_URL="https://..."
export OPENAI_USE_AZURE="true"

# if you want to specify a model name other than gpt-4o
export MIDSCENE_MODEL_NAME='claude-3-opus-20240229';
export MIDSCENE_MODEL_NAME='qwen-vl-max-latest';

# if you want to pass customized JSON data to the `init` process of OpenAI SDK
export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"baseURL":"....","defaultHeaders":{"key": "value"}}'

# if you want to use proxy. Midscene uses `socks-proxy-agent` under the hood.
export MIDSCENE_OPENAI_SOCKS_PROXY="socks5://127.0.0.1:1080"

# if you want to specify the max tokens for the model
export OPENAI_MAX_TOKENS=2048
```

## Using Azure OpenAI Service

```bash
export MIDSCENE_USE_AZURE_OPENAI=1
export MIDSCENE_AZURE_OPENAI_SCOPE="https://cognitiveservices.azure.com/.default"
export MIDSCENE_AZURE_OPENAI_INIT_CONFIG_JSON='{"apiVersion": "2024-11-01-preview", "endpoint": "...", "deployment": "..."}'
```

## Choose a model other than `gpt-4o`

We find that `gpt-4o` performs the best for Midscene at this moment. The other known supported models are `claude-3-opus-20240229`, `gemini-1.5-pro`, `qwen-vl-max-latest`, `doubao-vision-pro-32k`

If you want to use other models, please follow these steps:

1. Choose a model that supports image input (a.k.a. multimodal model).
2. Find out how to to call it with an OpenAI SDK compatible endpoint. Usually you should set the `OPENAI_BASE_URL`, `OPENAI_API_KEY` and `MIDSCENE_MODEL_NAME`.
3. If you find it not working well after changing the model, you can try using some short and clear prompt (or roll back to the previous model). See more details in [Prompting Tips](./prompting-tips.html).
4. Remember to follow the terms of use of each model.

## Example: Using `claude-3-opus-20240229` from Anthropic

When configuring `MIDSCENE_USE_ANTHROPIC_SDK=1`, Midscene will use Anthropic SDK (`@anthropic-ai/sdk`) to call the model.

Configure the environment variables:

```bash
export MIDSCENE_USE_ANTHROPIC_SDK=1
export ANTHROPIC_API_KEY="....."
export MIDSCENE_MODEL_NAME="claude-3-opus-20240229"
```

## Example: Using `gemini-1.5-pro` from Google

Configure the environment variables:

```bash
export OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai"
export OPENAI_API_KEY="....."
export MIDSCENE_MODEL_NAME="gemini-1.5-pro"
```

## Example: Using `qwen-vl-max-latest` from Aliyun

Configure the environment variables:

```bash
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
```

## Example: Using `doubao-vision-pro-32k` from Volcengine

Create a inference point first: https://console.volcengine.com/ark/region:ark+cn-beijing/endpoint

Configure the environment variables:

```bash
export OPENAI_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
export OPENAI_API_KEY="..."
export MIDSCENE_MODEL_NAME="ep-202....."
```

## Troubleshooting LLM Service Connectivity Issues

If you want to troubleshoot connectivity issues, you can use the 'connectivity-test' folder in our example project: [https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test](https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test)

Put your `.env` file in the `connectivity-test` folder, and run the test with `npm i && npm run test`.
12 changes: 9 additions & 3 deletions apps/site/docs/en/quick-experience.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Experience By Chrome Extension
# Quick Experience by Chrome Extension

Midscene.js provides a Chrome extension. By using it, you can quickly experience the main features of Midscene on any webpage, without needing to set up a code project.

Expand All @@ -20,7 +20,7 @@ Start the extension (may be folded by default), setup the config by pasting the
OPENAI_API_KEY="sk-replace-by-your-own"
```

You can also paste the configuration as described in [customize model provider](./model-provider.html) here.
You can also paste the configuration as described in [customize model and provider](./model-provider.html) here.

## Start experiencing

Expand All @@ -43,4 +43,10 @@ After experiencing, you may want to write some code to integrate Midscene. There

* Extension fails to run and shows 'Cannot access a chrome-extension:// URL of different extension'

Make sure you are using the Midscene extension on a normal http(s):// page. If the error persists, it's mainly due to conflicts with other extensions injecting `<iframes />` into the page. Try disabling the suspicious plugins and refresh.
It's mainly due to conflicts with other extensions injecting `<iframe />` or `<script />` into the page. Try disabling the suspicious plugins and refresh.

To find the suspicious plugins:

1. Open the Devtools of the page, find the `<script>` or `<iframe>` with a url like `chrome-extension://{ID-of-the-suspicious-plugin}/...`.
2. Copy the ID from the url, open chrome://extensions/, find the plugin with the same ID, disable it.
3. Refresh the page, try again.
2 changes: 1 addition & 1 deletion apps/site/docs/zh/automate-with-scripts-in-yaml.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

[自定义模型服务](./model-provider.html)
[自定义模型和服务商](./model-provider.html)

## 开始

Expand Down
6 changes: 2 additions & 4 deletions apps/site/docs/zh/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,9 @@ Midscene 存在一些局限性,我们仍在努力改进。
2. 稳定性风险:即使是 GPT-4o 也无法确保 100% 返回正确答案。遵循 [编写提示词的技巧](./prompting-tips) 可以帮助提高 SDK 稳定性。
3. 元素访问受限:由于我们使用 JavaScript 从页面提取元素,所以无法访问 iframe 内部的元素。

## 选用那个 LLM 模型
## 能否选用 `gpt-4o` 以外的其他模型

Midscene 需要一个能够理解用户界面的多模态大型语言模型。目前,我们发现 OpenAI 的 GPT-4o 表现最好,远超其它模型。

你可以根据需要[自定义模型服务](./model-provider.html)
可以。你可以[自定义模型和服务商](./model-provider.html)

## 关于 token 成本

Expand Down
16 changes: 11 additions & 5 deletions apps/site/docs/zh/index.mdx
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
# Midscene.js - AI 加持,带来愉悦的 UI 自动化体验

UI 自动化太难维护了。UI 自动化脚本里往往到处都是选择器,比如 `#ids``data-test``.selectors`。在需要重构的时候,这可能会让人感到非常头疼,尽管在这种情况下,理论上UI自动化应该能够发挥作用
UI 自动化太难维护了。UI 自动化脚本里往往到处都是选择器,比如 `#ids``data-test``.selectors`。在需要重构的时候,这可能会让人感到非常头疼,尽管在这种情况下,UI 自动化应该能够发挥作用

我们在这里推出 Midscene.js,助你重拾编码的乐趣。

Midscene.js 采用了多模态大语言模型(LLM),能够直观地“理解”你的用户界面并执行必要的操作。你只需描述交互步骤或期望的数据格式,AI 就能为你完成任务。

目前我们默认选择的是 OpenAI GPT-4o 作为模型,你也可以自定义为其他模型。

<video src="/introduction/Midscene.mp4" controls/>

## 通过 AI 执行交互、提取数据和断言
Expand All @@ -22,7 +20,7 @@ Midscene.js 采用了多模态大语言模型(LLM),能够直观地“理

```typescript
// 👀 输入关键字,执行搜索
// 注:尽管这是一个英文页面,你也可以用中文指令控制它
// 尽管这是一个英文页面,你也可以用中文指令控制它
await ai('在搜索框输入 "Headphones" ,敲回车');

// 👀 找到列表里耳机相关的信息
Expand Down Expand Up @@ -55,4 +53,12 @@ console.log("headphones in stock", items);

## 直连模型端,无需三方服务

Midscene.js 是一个采用 MIT 许可证的开源项目 (GitHub: [Midscene](https://github.com/web-infra-dev/midscene/)) 。项目代码运行在用户的自有环境中,所有从页面收集的数据会依照用户的配置,直接传送到 OpenAI 或指定的自定义模型。因此,数据仅用户和指定的模型服务商可访问,任何第三方平台均无法获取这些数据。
Midscene.js 是一个采用 MIT 许可证的开源项目 (GitHub: [Midscene](https://github.com/web-infra-dev/midscene/)) 。项目代码运行在用户的自有环境中,所有从页面收集的数据会依照用户的配置,直接传送到 OpenAI 或指定的自定义模型。因此,数据仅用户和指定的模型服务商可访问,任何第三方平台均无法获取这些数据。

## 自定义模型

目前我们默认选择的是 OpenAI GPT-4o 作为模型,你也可以[自定义为其他多模态模型](./model-provider.html)

## 从 Chrome插件开始快速体验

通过使用 Midscene.js Chrome 插件,你可以快速在任意网页上体验 Midscene 的主要功能,而无需编写任何代码。请参照文档 [通过 Chrome 插件快速体验](./quick-experience.html) 进行安装和配置。
2 changes: 1 addition & 1 deletion apps/site/docs/zh/integrate-with-playwright.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import { PackageManagerTabs } from '@theme';

## 准备工作

配置 OpenAI API Key,或 [自定义模型服务](./model-provider.html)
配置 OpenAI API Key,或 [自定义模型和服务商](./model-provider.html)

```bash
# 更新为你自己的 Key
Expand Down
4 changes: 3 additions & 1 deletion apps/site/docs/zh/integrate-with-puppeteer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,13 @@ import { PackageManagerTabs } from '@theme';

:::info 样例项目
你可以在这里看到向 Puppeteer 集成的样例项目:[https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo](https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo)

这里还有一个 Puppeteer 和 Vitest 结合的样例项目:[https://github.com/web-infra-dev/midscene-example/tree/main/puppeteer-with-vitest-demo](https://github.com/web-infra-dev/midscene-example/tree/main/puppeteer-with-vitest-demo)
:::

## 准备工作

配置 OpenAI API Key,或 [自定义模型服务](./model-provider.html)
配置 OpenAI API Key,或 [自定义模型和服务商](./model-provider.html)

```bash
# 更新为你自己的 Key
Expand Down
Loading

0 comments on commit a028b3c

Please sign in to comment.