Skip to content

Commit

Permalink
feat: update planning prompt
Browse files Browse the repository at this point in the history
  • Loading branch information
yuyutaotao committed Dec 16, 2024
1 parent b4debd5 commit 2346a80
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 17 deletions.
22 changes: 11 additions & 11 deletions packages/midscene/src/ai-model/prompt/planning.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ const quickAnswerFormat = () => {

const sample = matchByPosition
? '{"position": { x: 100, y: 200 }}'
: '{"id": "14562"}';
: '{"id": "c81c4e9a33"}';

return {
description,
Expand All @@ -41,18 +41,18 @@ You are a versatile professional in software UI automation. Your outstanding con
## Workflow
1. Receive the user's element description, screenshot, and instruction.
2. Decompose the user's task into a sequence of actions, and place it in the \`actions\` field. There are different types of actions (Tap / Hover / Input / KeyboardPress / Scroll / Error / Sleep). Please refer to the "About the action" section below.
3. Precisely locate the target element if it's already shown in the screenshot, put the location info in the \`locate\` field.
4. Consider whether a task will be accomplished after all the actions
2. Decompose the user's task into a sequence of actions, and place it in the \`actions\` field. There are different types of actions (Tap / Hover / Input / KeyboardPress / Scroll / Error / Sleep). The "About the action" section below will give you more details.
3. Precisely locate the target element if it's already shown in the screenshot, put the location info in the \`locate\` field of the action.
4. If some target elements is not shown in the screenshot, consider the user's instruction is not feasible on this page. Follow the next steps.
5. Consider whether the user's instruction will be accomplished after all the actions
- If yes, set \`taskWillBeAccomplished\` to true
- If no, don't plan more actions by closing the array. Get ready to reevaluate the task. Some talent people like you will handle this. Give him a clear description of what have been done and what to do next. Put your new plan in the \`furtherPlan\` field. Refer to the "How to compose the \`taskWillBeAccomplished\` and \`furtherPlan\` fields" section for more details.
- If no, don't plan more actions by closing the array. Get ready to reevaluate the task. Some talent people like you will handle this. Give him a clear description of what have been done and what to do next. Put your new plan in the \`furtherPlan\` field. The "How to compose the \`taskWillBeAccomplished\` and \`furtherPlan\` fields" section will give you more details.
## Constraints
- All the actions you composed MUST be based on the page context information you get.
- Trust the "What have been done" field about the task (if any), don't repeat actions in it.
- Some elements may be shown after some actions are finished, consider this as a normal situation.
- If you cannot plan any actions, consider the page content is irrelevant to the task. Put the error message in the \`error\` field.
- If you cannot plan any actions at all, consider the page content is irrelevant to the task. Put the error message in the \`error\` field.
## About the \`actions\` field
Expand All @@ -61,9 +61,9 @@ You are a versatile professional in software UI automation. Your outstanding con
The \`locate\` param is commonly used in the \`param\` field of the action, means to locate the target element to perform the action, it follows the following scheme:
type LocateParam = {
"id": string, // the id of the element found. If its not on the page, locate should be null
"id": string, // the id of the element found. It should either be the id marked with a rectangle in the screenshot or the id described in the description.
prompt?: string // the description of the element to find. It can only be omitted when locate is null.
} | null
} | null // If it's not on the page, the LocateParam should be null
### Supported actions
Expand Down Expand Up @@ -102,7 +102,7 @@ Please return the result in JSON format as follows:
"type": "Tap",
"param": null,
"locate": {
{"id": "14562"},
{"id": "c81c4e9a33"},
prompt: "the search bar"
} | null,
},
Expand All @@ -124,7 +124,7 @@ ${samplePageDescription}
By viewing the page screenshot and description, you should consider this and output the JSON:
* The main steps should be: tap the switch button, sleep, and tap the 'English' option
* The language switch button is shown in the screenshot. By checking the page screenshot, you can locate its ID by the coordinates and context information.
* The language switch button is shown in the screenshot, but it's not marked with a rectangle. So we have to use the page description to find the element. By carefully checking the context information (coordinates, attributes, content, etc.), you can find the element.
* The "English" option button is not shown in the screenshot now, it means it may only show after the previous actions are finished. So the last action will have a \`null\` value in the \`locate\` field.
* The task cannot be accomplished (because we cannot see the "English" option now), so a \`furtherPlan\` field is needed.
Expand Down
14 changes: 8 additions & 6 deletions packages/midscene/src/ai-model/prompt/util.ts
Original file line number Diff line number Diff line change
Expand Up @@ -203,16 +203,17 @@ export function elementByPosition(

export const samplePageDescription = `
The size of the page: 1280 x 720
Some of the elements are marked with a rectangle in the screenshot, some are not.
JSON description of the elements in screenshot:
id=1231: {
"markerId": 2, // The number indicated by the boxed label in the screenshot
JSON description of all the elements in screenshot:
id=c81c4e9a33: {
"markerId": 2, // The number indicated by the rectangle label in the screenshot
"attributes": // Attributes of the element
{"data-id":"@submit s0","class":".gh-search","aria-label":"搜索","nodeType":"IMG", "src": "image_url"},
"rect": { "left": 16, "top": 378, "width": 89, "height": 16 } // Position of the element in the page
}
id=459308: {
id=5a29bf6419bd: {
"content": "获取优惠券",
"attributes": { "nodeType": "TEXT" },
"rect": { "left": 32, "top": 332, "width": 70, "height": 18 }
Expand Down Expand Up @@ -244,7 +245,7 @@ export async function describeUserPage<
const idElementMap: Record<string, ElementType> = {};
elementsInfo.forEach((item) => {
idElementMap[item.id] = item;
// sometimes GPT will mess up the indexId and id, we use indexId as a backup
// accept indexId/markerId as a backup
if ((item as any).indexId) {
idElementMap[(item as any).indexId] = item;
}
Expand All @@ -267,12 +268,13 @@ export async function describeUserPage<
return {
description: `
The size of the page: ${describeSize({ width, height })}
Some of the elements are marked with a rectangle in the screenshot, some are not.
${
// if match by id, use the description of the element
getAIConfig(MATCH_BY_POSITION)
? ''
: `Json description of the page elements:\n${contentList}`
: `Json description of all the page elements:\n${contentList}`
}
`,
elementById(id: string) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ describe(
const item2 = filterTargetElement(content2);
expect(item2).toBeDefined();
expect(item2?.id).toBe(item?.id);

await reset();
});

it('check screenshot size - 1x', async () => {
Expand Down

0 comments on commit 2346a80

Please sign in to comment.