Skip to content

Commit

Permalink
fix: keypress issue in chrome extension (#201)
Browse files Browse the repository at this point in the history
* fix: keypress issue in chrome extension

* fix: keypress issue in chrome extension

* fix: connectivity

* doc: update readme
  • Loading branch information
yuyutaotao authored Dec 23, 2024
1 parent 21b3574 commit e6343b6
Show file tree
Hide file tree
Showing 7 changed files with 968 additions and 90 deletions.
6 changes: 3 additions & 3 deletions apps/site/docs/en/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ Introducing Midscene.js, an innovative SDK designed to bring joy back to automat

Midscene.js leverages a multimodal Large Language Model (LLM) to intuitively “understand” your user interface and carry out the necessary actions. You can simply describe the interaction steps or expected data formats, and the AI will handle the execution for you.

Currently, the model we are using by default is the OpenAI GPT-4o model, while you can [customize it to a different multimodal model](./model-provider.html) if needed.

<div style={{"width": "100%", "display": "flex", justifyContent: "center"}}>
<iframe
style={{"maxWidth": "100%", "width": "800px", "height": "450px"}}
Expand Down Expand Up @@ -68,7 +66,9 @@ Midscene will provide a visual report after each run. With this report, you can

⁠Midscene.js is an open-source project (GitHub: [Midscene](https://github.com/web-infra-dev/midscene/)) under the MIT license. You can run it in your own environment. All data gathered from pages will be sent directly to OpenAI or the custom model provider according to your configuration. Therefore, only you and the model provider will have access to the data. No third-party platform will access the data.

For custom model, you can refer to [Customize Model and Provider](./model-provider.html) document.
## Customize Model

Currently, the model we are using by default is the OpenAI GPT-4o model, while you can [customize it to a different multimodal model](./model-provider.html) if needed.

## Start with Chrome Extension

Expand Down
8 changes: 4 additions & 4 deletions apps/site/docs/zh/index.mdx
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
# Midscene.js - AI 加持,带来愉悦的 UI 自动化体验

UI 自动化太难维护了。UI 自动化脚本里往往到处都是选择器,比如 `#ids``data-test``.selectors`。在需要重构的时候,这可能会让人感到非常头疼,尽管在这种情况下,理论上UI自动化应该能够发挥作用
UI 自动化太难维护了。UI 自动化脚本里往往到处都是选择器,比如 `#ids``data-test``.selectors`。在需要重构的时候,这可能会让人感到非常头疼,尽管在这种情况下,UI 自动化应该能够发挥作用

我们在这里推出 Midscene.js,助你重拾编码的乐趣。

Midscene.js 采用了多模态大语言模型(LLM),能够直观地“理解”你的用户界面并执行必要的操作。你只需描述交互步骤或期望的数据格式,AI 就能为你完成任务。

目前我们默认选择的是 OpenAI GPT-4o 作为模型,你也可以[自定义为其他多模态模型](./model-provider.html)

<video src="/introduction/Midscene.mp4" controls/>

## 通过 AI 执行交互、提取数据和断言
Expand Down Expand Up @@ -57,7 +55,9 @@ console.log("headphones in stock", items);

Midscene.js 是一个采用 MIT 许可证的开源项目 (GitHub: [Midscene](https://github.com/web-infra-dev/midscene/)) 。项目代码运行在用户的自有环境中,所有从页面收集的数据会依照用户的配置,直接传送到 OpenAI 或指定的自定义模型。因此,数据仅用户和指定的模型服务商可访问,任何第三方平台均无法获取这些数据。

关于自定义模型,可以参考 [自定义模型和服务商](./model-provider.html) 文档。
## 自定义模型

目前我们默认选择的是 OpenAI GPT-4o 作为模型,你也可以[自定义为其他多模态模型](./model-provider.html)

## 从 Chrome插件开始快速体验

Expand Down
2 changes: 1 addition & 1 deletion apps/site/docs/zh/model-provider.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ export MIDSCENE_AZURE_OPENAI_INIT_CONFIG_JSON='{"apiVersion": "2024-11-01-previe

## 选用 `gpt-4o` 以外的其他模型

我们发现 `gpt-4o` 是目前表现最佳的模型。其他已知支持的模型有:`claude-3-opus-20240229`, `gemini-1.5-pro`, `qwen-vl-max-latest` (千问), `doubao-vision-pro-32k` (豆包)
我们发现 `gpt-4o` 是目前表现最佳的模型。其他已知支持的模型有:`claude-3-opus-20240229`, `gemini-1.5-pro`, `qwen-vl-max-latest`千问), `doubao-vision-pro-32k`豆包)

如果你想要使用其他模型,请遵循以下步骤:

Expand Down
115 changes: 59 additions & 56 deletions packages/midscene/tests/ai/connectivity.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,76 +23,79 @@ vi.setConfig({
return;
}

describe.skipIf(process.env.CI)(
`LLM service connectivity: ${envFile}`,
() => {
beforeAll(() => {
const result = dotenv.config({
debug: true,
path: configPath,
override: true,
});
if (result.error) {
throw result.error;
}
describe(`LLM service connectivity: ${envFile}`, () => {
beforeAll(() => {
const result = dotenv.config({
debug: true,
path: configPath,
override: true,
});
if (result.error) {
throw result.error;
}
});

it('text only', async () => {
const result = await call([
it('text only', async () => {
const result = await call([
{
role: 'system',
content: 'Answer the question',
},
{
role: 'user',
content:
'鲁迅认识周树人吗?回答我:1. 分析原因 2.回答:是/否/无效问题',
},
]);

expect(result.content.length).toBeGreaterThan(1);
});

it('call to get json result', async () => {
const result = await callToGetJSONObject<{ answer: number }>(
[
{
role: 'system',
content: 'Answer the question',
content: 'Answer the question with JSON: {answer: number}',
},
{
role: 'user',
content:
'鲁迅认识周树人吗?回答我:1. 分析原因 2.回答:是/否/无效问题',
content: '3 x 5 = ?',
},
]);

expect(result.content.length).toBeGreaterThan(1);
});
],
AIActionType.EXTRACT_DATA,
);
expect(result.content).toEqual({ answer: 15 });
});

it('call to get json result', async () => {
const result = await callToGetJSONObject<{ answer: number }>(
[
it('image input', async () => {
const imagePath = getFixture('baidu.png');
const result = await call([
{
role: 'user',
content: [
{
role: 'system',
content: 'Answer the question with JSON: {answer: number}',
type: 'text',
text: 'Describe this image in one sentence.',
},
{
role: 'user',
content: '3 x 5 = ?',
type: 'image_url',
image_url: {
url: base64Encoded(imagePath),
detail: 'high',
},
},
],
AIActionType.EXTRACT_DATA,
);
expect(result.content).toEqual({ answer: 15 });
});
},
]);

it('image input', async () => {
const imagePath = getFixture('baidu.png');
const result = await call([
{
role: 'user',
content: [
{
type: 'text',
text: 'Describe this image in one sentence.',
},
{
type: 'image_url',
image_url: {
url: base64Encoded(imagePath),
detail: 'high',
},
},
],
},
]);
expect(result.content.length).toBeGreaterThan(10);
});
});
});

expect(result.content.length).toBeGreaterThan(10);
});
},
);
describe('keep at least one test in each suite', () => {
it('test', () => {
expect(1).toBe(1);
});
});
Loading

0 comments on commit e6343b6

Please sign in to comment.