Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement cli wrapper #43

Merged
merged 7 commits into from
Aug 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -101,4 +101,7 @@ __ai_responses__/


.nx/cache
.nx/workspace-data
.nx/workspace-data
# Midscene.js dump files
midscene_run/report
midscene_run/dump
32 changes: 28 additions & 4 deletions apps/site/docs/en/docs/getting-started/quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,45 @@

import { PackageManagerTabs } from '@theme';


In this example, we use OpenAI GPT-4o to search headphones on eBay, and then get the result items and prices in JSON format.

Remember to prepare an API key that is eligible for accessing OpenAI's GPT-4o before running.

## Preparation

Config the API key
Config the OpenAI API key, or [customize model vendor](../usage/model-vendor.html)

```bash
# replace by your own
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

## Experience with Command Line Tools

Command line version of Midscene is a very convenient way to experience the basics.

⁠Ensure that you have [Node.js](https://nodejs.org/) installed.

```bash
# headless mode to visit bing.com and search for 'weather today'
npx @midscene/cli --url https://wwww.bing.com --action "type 'weather today', hit enter"

# headed mode (i.e. visible browser) to visit bing.com and search for 'weather today'
npx @midscene/cli --headed --url https://wwww.bing.com --action "type 'weather today', hit enter"

# visit github status page and save the status to ./status.json
npx @midscene/cli \
--url https://www.githubstatus.com/ \
--query-output status.json \
--query '{name: string, status: string}[], service status of github page'
```

If you want to dive deep into Midscene, we recommend using the SDK version and integrating it with Playwright or Puppeteer.

### View test report after running

After running, Midscene will generate a log dump, which is placed in `./midscene_run/report/latest.web-dump.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.

## Integrate with Playwright

> [Playwright.js](https://playwright.com/) is an open-source automation library developed by Microsoft, primarily designed for end-to-end testing and web scraping of web applications.
Expand Down Expand Up @@ -182,11 +207,10 @@ npx ts-node demo.ts
# ]
```

### Step 4. view test report after running
### View test report after running

After running, Midscene will generate a log dump, which is placed in `./midscene_run/report/latest.web-dump.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.


## View demo report

Click the 'Load Demo' button in the [Visualization Tool](/visualization/), you will be able to see the results of the previous code as well as some other samples.
Expand Down
8 changes: 8 additions & 0 deletions apps/site/docs/en/docs/more/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ There are some limitations with Midscene. We are still working on them.

Midscene needs a multimodal Large Language Model (LLM) to understand the UI. Currently, we find that OpenAI's GPT-4o performs much better than others.

You can [customize model vendor](../usage/model-vendor.html) if needed.

### About the token cost

Image resolution and element numbers (i.e., a UI context size created by Midscene) will affect the token bill.
Expand All @@ -35,6 +37,12 @@ Here are some typical data with GPT-4o.

> The price data was calculated in August 2024.

### What data is sent to LLM ?

Currently, the contents are:
1. the key information extracted from the DOM, such as text content, class name, tag name, coordinates;
2. a screenshot of the page.

### The automation process is running more slowly than it did before

Since Midscene.js invokes AI for each planning and querying operation, the running time may increase by a factor of 3 to 10 compared to traditional Playwright scripts, for instance from 5 seconds to 20 seconds. This is currently inevitable but may improve with advancements in LLMs.
Expand Down
32 changes: 2 additions & 30 deletions apps/site/docs/en/docs/usage/API.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,6 @@
# API Reference

## config AI vendor

Midscene uses the OpenAI SDK as the default AI service. You can customize the configuration using environment variables.

There are the main configs, in which `OPENAI_API_KEY` is required.

Required:

```bash
# replace by your own
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

Optional:

```bash
# optional, if you want to use a customized endpoint
export OPENAI_BASE_URL="https://..."

# optional, if you want to specify a model name other than gpt-4o
export MIDSCENE_MODEL_NAME='claude-3-opus-20240229';

# optional, if you want to pass customized JSON data to the `init` process of OpenAI SDK
export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"baseURL":"....","defaultHeaders":{"key": "value"}}'
```

## Integration

### Puppeteer
## Integrate with Puppeteer

To initialize:

Expand All @@ -40,7 +12,7 @@ const mid = new PuppeteerAgent(puppeteerPageInstance);

You can view the integration sample in [quick-start](../getting-started/quick-start).

### Playwright
## Integrate with Playwright

You can view the integration sample in [quick-start](../getting-started/quick-start).

Expand Down
2 changes: 1 addition & 1 deletion apps/site/docs/en/docs/usage/_meta.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
["API.md", "cache.md"]
["API.md", "cli.md", "cache.md", "model-vendor.md"]
73 changes: 73 additions & 0 deletions apps/site/docs/en/docs/usage/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Command Line Tools

`@midscene/cli` is the command line version of Midscene. It is suitable for executing very simple tasks or experiencing the basics of Midscene.

## Preparation

* Install Node.js

⁠Ensure that you have [Node.js](https://nodejs.org/) installed.

* Config AI vendor

```bash
# replace by your own
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

Related Docs:
* [Customize model vendor](./model-vendor.html)

## Examples

```bash
# headed mode (i.e. visible browser) to visit bing.com and search for 'weather today'
npx @midscene/cli --headed --url https://wwww.bing.com --action "type 'weather today', hit enter" --sleep 3000

# visit github status page and save the status to ./status.json
npx @midscene/cli --url https://www.githubstatus.com/ \
--query-output status.json \
--query '{name: string, status: string}[], service status of github page'
```

Or you may install @midscene/cli globally before calling

```bash
# install
npm i -g @midscene/cli

# call by `midscene`
midscene --url https://wwww.bing.com --action "type 'weather today', hit enter"
```

## Usage

Usage: `midscene [options] [actions]`

Options:

```log
Options:
--url <url> The URL to visit, required
--user-agent <ua> The user agent to use, optional
--viewport-width <width> The width of the viewport, optional
--viewport-height <height> The height of the viewport, optional
--viewport-scale <scale> The device scale factor, optional
--headed Run in headed mode, default false
--help Display this help message
--version Display the version

Actions (order matters, can be used multiple times):
--action <action> Perform an action, optional
--assert <assert> Perform an assert, optional
--query-output <path> Save the result of the query to a file, this must be put before --query, optional
--query <query> Perform a query, optional
--sleep <ms> Sleep for a number of milliseconds, optional`
```


## Note

1. Always put options before any action param
2. The order of action param matters. For example, `--action "some action" --query "some data"` means taking some action first, then querying.
3. If you have some more complex requirements, such as loop operations, using the SDK version (instead of this cli) is an easier way to achieve them.
25 changes: 25 additions & 0 deletions apps/site/docs/en/docs/usage/model-vendor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Customize model vendor

Midscene uses the OpenAI SDK as the default AI service. You can customize the configuration using environment variables.

There are the main configs, in which `OPENAI_API_KEY` is required.

Required:

```bash
# replace by your own
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

Optional:

```bash
# optional, if you want to use a customized endpoint
export OPENAI_BASE_URL="https://..."

# optional, if you want to specify a model name other than gpt-4o
export MIDSCENE_MODEL_NAME='claude-3-opus-20240229';

# optional, if you want to pass customized JSON data to the `init` process of OpenAI SDK
export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"baseURL":"....","defaultHeaders":{"key": "value"}}'
```
28 changes: 27 additions & 1 deletion apps/site/docs/zh/docs/getting-started/quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,39 @@ import { PackageManagerTabs } from '@theme';

## 准备工作

配置 API Key
配置 OpenAI API Key,或 [自定义模型服务](../usage//model-vendor.html)

```bash
# 更新为你自己的 Key
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

## 使用命令行版本体验

你可以快速使用命令行版本的 Midscene 来体验它的基础能力。

请确保你已安装 [Node.js](https://nodejs.org/)。

```bash
# headless mode to visit bing.com and search for 'weather today'
npx @midscene/cli --url https://wwww.bing.com --action "type 'weather today', hit enter"

# headed mode (i.e. visible browser) to visit bing.com and search for 'weather today'
npx @midscene/cli --headed --url https://wwww.bing.com --action "type 'weather today', hit enter"

# visit github status page and save the status to ./status.json
npx @midscene/cli \
--url https://www.githubstatus.com/ \
--query-output status.json \
--query '{name: string, status: string}[], service status of github page'
```

如果你想更深入地了解 Midscene,我们建议使用 SDK 版本,并将其与 Playwright 或 Puppeteer 集成。

### 查看运行报告

运行 Midscene 之后,系统会生成一个日志文件,默认存放在 `./midscene_run/report/latest.web-dump.json`。然后,你可以把这个文件导入 [可视化工具](/visualization/),这样你就能更清楚地了解整个过程。

## 集成到 Playwright

> [Playwright.js](https://playwright.com/) 是由微软开发的一个开源自动化库,主要用于对网络应用程序进行端到端测试(end-to-end test)和网页抓取。
Expand Down
18 changes: 15 additions & 3 deletions apps/site/docs/zh/docs/more/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@ Midscene 存在一些局限性,我们仍在努力改进。
2. 稳定性不足:即使是 GPT-4o 也无法确保 100% 返回正确答案。遵循 [编写提示词的技巧](./prompting-tips) 可以帮助提高 SDK 稳定性。
3. 元素访问受限:由于我们使用 JavaScript 从页面提取元素,所以无法访问 iframe 内部的元素。

### 选用那个 LLM 模型?

Midscene 需要一个能够理解用户界面的多模态大型语言模型。目前,我们发现 OpenAI 的 GPT-4o 表现最好,远超其它模型。

你可以根据需要[自定义模型服务](../usage/model-vendor.html)。

### 关于 token 成本

图像分辨率和元素数量(即 Midscene 创建的 UI 上下文大小)会显著影响 token 消耗。
Expand All @@ -25,12 +31,18 @@ Midscene 存在一些局限性,我们仍在努力改进。

|任务 | 分辨率 | Prompt Tokens / 价格 | Completion Tokens / 价格 |
|-----|------------|--------------|---------------|
|拆解(Plan)执行搜索的步骤| 1280x800| 6,975 / $0.034875 |150 / $0.00225|
|定位(Locate)搜索框| 1280x800 | 8,004 / $0.04002 | 92 / $0.00138 |
|提取(Query)商品信息| 1280x800| 13,403 / $0.067015 | 95 / $0.001425 |
|拆解(Plan)执行步骤,分析如何在 eBay 进行一次搜索| 1280x800| 6,975 / $0.034875 |150 / $0.00225|
|定位(Locate)eBay 上的搜索框| 1280x800 | 8,004 / $0.04002 | 92 / $0.00138 |
|提取(Query)eBay 搜索结果的商品信息| 1280x800| 13,403 / $0.067015 | 95 / $0.001425 |

> 这些价格数据测算于 2024 年 8 月

### 会有哪些信息发送到 LLM ?

这些信息:
1. 从 DOM 提取的关键信息,如文字内容、class name、tag name、坐标
2. 界面截图

### 脚本运行偏慢?

由于 Midscene.js 每次进行规划(Planning)和查询(Query)时都会调用 AI,其运行耗时可能比传统 Playwright 用例增加 3 到 10 倍,比如从 5 秒变成 20秒。目前,这一点仍无法避免。但随着大型语言模型(LLM)的进步,未来性能可能会有所改善。
Expand Down
34 changes: 3 additions & 31 deletions apps/site/docs/zh/docs/usage/API.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,6 @@
# API 接口文档
# API 接口

## 配置 AI 服务商

Midscene 默认集成了 OpenAI SDK 调用 AI 服务,你也可以通过环境变量来自定义配置。

主要配置项如下,其中 `OPENAI_API_KEY` 是必选项:

必选项:

```bash
# 替换为你自己的 API Key
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

可选项:

```bash
# 可选, 如果你想更换 base URL
export OPENAI_BASE_URL="https://..."

# 可选, 如果你想指定模型名称
export MIDSCENE_MODEL_NAME='claude-3-opus-20240229';

# 可选, 如果你想变更 SDK 的初始化参数
export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"baseURL":"....","defaultHeaders":{"key": "value"}}'
```

## 集成

### 与 Puppeteer 集成
## 与 Puppeteer 集成

初始化方法:

Expand All @@ -40,7 +12,7 @@ const mid = new PuppeteerAgent(puppeteerPageInstance);

你可以在[快速开始](../getting-started/quick-start) 中找到完整的集成样例。

### 与 Playwright 集成
## 与 Playwright 集成

你可以在[快速开始](../getting-started/quick-start) 中找到完整的集成样例。

Expand Down
2 changes: 1 addition & 1 deletion apps/site/docs/zh/docs/usage/_meta.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
["API.md", "cache.md"]
["API.md", "cli.md", "cache.md", "model-vendor.md"]
Loading
Loading