Skip to content

Commit fd67f2a

Browse files
quanruclaude
andcommitted
refactor(core,web-integration,docs): rename API methods for clarity
BREAKING CHANGE: Renamed aiAction() to aiAct() and logScreenshot() to recordToReport() for improved naming consistency. The aiAction() method is kept as deprecated for backward compatibility. Changes: - Renamed aiAction() to aiAct() across core and web-integration - Renamed logScreenshot() to recordToReport() - Updated all English and Chinese documentation - Updated code examples in README files - Updated Playwright fixture to support new method names - Added deprecation warning for aiAction() method - Updated all test files and examples This improves API consistency and clarity while maintaining backward compatibility through deprecated methods. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 80a2c97 commit fd67f2a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+242
-188
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ Read more about [Choose a model](https://midscenejs.com/choose-a-model)
8181
Midscene will automatically plan the steps and execute them. It may be slower and heavily rely on the quality of the AI model.
8282

8383
```javascript
84-
await aiAction('click all the records one by one. If one record contains the text "completed", skip it');
84+
await aiAct('click all the records one by one. If one record contains the text "completed", skip it');
8585
```
8686

8787
### Workflow Style

README.zh.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ Midscene.js 支持视觉语言模型,例如 `Qwen3-VL`、`Doubao-1.6-vision`
8181
Midscene 会自动规划步骤并执行。它可能较慢,并且深度依赖于 AI 模型的质量。
8282

8383
```javascript
84-
await aiAction('click all the records one by one. If one record contains the text "completed", skip it');
84+
await aiAct('click all the records one by one. If one record contains the text "completed", skip it');
8585
```
8686

8787
### 工作流风格

apps/report/src/components/store/index.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ export const useBlackboardPreference = create<{
3131
},
3232
}));
3333
export interface HistoryItem {
34-
type: 'aiAction' | 'aiQuery' | 'aiAssert';
34+
type: 'aiAct' | 'aiQuery' | 'aiAssert';
3535
prompt: string;
3636
timestamp: number;
3737
}

apps/site/docs/en/api.mdx

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ These Agents share some common constructor parameters:
1717
- `reportFileName: string`: The name of the report file. (Default: generated by midscene)
1818
- `autoPrintReportMsg: boolean`: If true, report messages will be printed. (Default: true)
1919
- `cacheId: string | undefined`: If provided, this cacheId will be used to save or match the cache. (Default: undefined, means cache feature is disabled)
20-
- `actionContext: string`: Some background knowledge that should be sent to the AI model when calling `agent.aiAction()`, like 'close the cookie consent dialog first if it exists' (Default: undefined)
20+
- `actionContext: string`: Some background knowledge that should be sent to the AI model when calling `agent.aiAct()`, like 'close the cookie consent dialog first if it exists' (Default: undefined)
2121
- `onTaskStartTip: (tip: string) => void | Promise<void>`: Optional hook that fires before each execution task begins with a human-readable summary of the task (Default: undefined)
2222

2323
In Playwright and Puppeteer, there are some common parameters:
@@ -42,14 +42,14 @@ In Midscene, you can choose to use either auto planning or instant action.
4242

4343
:::
4444

45-
### `agent.aiAction()` or `.ai()`
45+
### `agent.aiAct()` or `.ai()`
4646

4747
This method allows you to perform a series of UI actions described in natural language. Midscene automatically plans the steps and executes them.
4848

4949
- Type
5050

5151
```typescript
52-
function aiAction(
52+
function aiAct(
5353
prompt: string,
5454
options?: {
5555
cacheable?: boolean;
@@ -72,7 +72,7 @@ function ai(prompt: string): Promise<void>; // shorthand form
7272

7373
```typescript
7474
// Basic usage
75-
await agent.aiAction(
75+
await agent.aiAct(
7676
'Type "JavaScript" into the search box, then click the search button',
7777
);
7878

@@ -82,14 +82,14 @@ await agent.ai(
8282
);
8383

8484
// When using UI Agent models like ui-tars, you can try a more goal-driven prompt
85-
await agent.aiAction('Post a Tweet "Hello World"');
85+
await agent.aiAct('Post a Tweet "Hello World"');
8686
```
8787

8888
:::tip
8989

9090
Under the hood, Midscene uses AI model to split the instruction into a series of steps (a.k.a. "Planning"). It then executes these steps sequentially. If Midscene determines that the actions cannot be performed, an error will be thrown.
9191

92-
For optimal results, please provide clear and detailed instructions for `agent.aiAction()`. For guides about writing prompts, you may read this doc: [Tips for Writing Prompts](./prompting-tips).
92+
For optimal results, please provide clear and detailed instructions for `agent.aiAct()`. For guides about writing prompts, you may read this doc: [Tips for Writing Prompts](./prompting-tips).
9393

9494
Related Documentation:
9595

@@ -700,7 +700,7 @@ For more information about YAML scripts, please refer to [Automate with Scripts
700700

701701
### `agent.setAIActionContext()`
702702

703-
Set the background knowledge that should be sent to the AI model when calling `agent.aiAction()` or `agent.ai()`. This will override the previous setting.
703+
Set the background knowledge that should be sent to the AI model when calling `agent.aiAct()` or `agent.ai()`. This will override the previous setting.
704704

705705
For instant action type APIs, like `aiTap()`, this setting will not take effect.
706706

@@ -749,14 +749,14 @@ const result = await agent.evaluateJavaScript('document.title');
749749
console.log(result);
750750
```
751751

752-
### `agent.logScreenshot()`
752+
### `agent.recordToReport()`
753753

754754
Log the current screenshot with a description in the report file.
755755

756756
- Type
757757

758758
```typescript
759-
function logScreenshot(title?: string, options?: Object): Promise<void>;
759+
function recordToReport(title?: string, options?: Object): Promise<void>;
760760
```
761761

762762
- Parameters:
@@ -772,7 +772,7 @@ function logScreenshot(title?: string, options?: Object): Promise<void>;
772772
- Examples:
773773

774774
```typescript
775-
await agent.logScreenshot('Login page', {
775+
await agent.recordToReport('Login page', {
776776
content: 'User A',
777777
});
778778
```
@@ -872,7 +872,7 @@ export MIDSCENE_RUN_DIR=midscene_run # The default value is the midscene_run in
872872

873873
### Customize the replanning cycle limit
874874

875-
Set the `MIDSCENE_REPLANNING_CYCLE_LIMIT` variable to customize the maximum number of replanning cycles allowed during action execution (`aiAction`).
875+
Set the `MIDSCENE_REPLANNING_CYCLE_LIMIT` variable to customize the maximum number of replanning cycles allowed during action execution (`aiAct`).
876876

877877
```bash
878878
export MIDSCENE_REPLANNING_CYCLE_LIMIT=10 # The default value is 10. When the AI needs to replan more than this limit, an error will be thrown suggesting to split the task into multiple steps
@@ -1143,8 +1143,8 @@ describe('Android Settings Test', () => {
11431143
await sleep(1000);
11441144
await adb.shell('am start -n com.android.settings/.Settings');
11451145
await sleep(1000);
1146-
await agent.aiAction('find and enter WLAN setting');
1147-
await agent.aiAction(
1146+
await agent.aiAct('find and enter WLAN setting');
1147+
await agent.aiAct(
11481148
'toggle WLAN status *once*, if WLAN is off pls turn it on, otherwise turn it off.',
11491149
);
11501150
});
@@ -1154,8 +1154,8 @@ describe('Android Settings Test', () => {
11541154
await sleep(1000);
11551155
await adb.shell('am start -n com.android.settings/.Settings');
11561156
await sleep(1000);
1157-
await agent.aiAction('find and enter bluetooth setting');
1158-
await agent.aiAction(
1157+
await agent.aiAct('find and enter bluetooth setting');
1158+
await agent.aiAct(
11591159
'toggle bluetooth status *once*, if bluetooth is off pls turn it on, otherwise turn it off.',
11601160
);
11611161
});

apps/site/docs/en/automate-with-scripts-in-yaml.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ If you need to use the `aiActionContext` parameter, you can set it through the g
183183
```yaml
184184
# Global AI agent configuration
185185
agent:
186-
# Background knowledge to send to the AI model when calling aiAction, optional.
186+
# Background knowledge to send to the AI model when calling aiAct, optional.
187187
aiActionContext: <string>
188188
```
189189

@@ -269,12 +269,12 @@ tasks:
269269
# Auto Planning (.ai)
270270
# ----------------
271271
272-
# Perform an interaction. `ai` is a shorthand for `aiAction`.
272+
# Perform an interaction. `ai` is a shorthand for `aiAct`.
273273
- ai: <prompt>
274274
cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.
275275

276276
# This usage is the same as `ai`.
277-
- aiAction: <prompt>
277+
- aiAct: <prompt>
278278
cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.
279279

280280
# Instant Action (.aiTap, .aiHover, .aiInput, .aiKeyboardPress, .aiScroll)
@@ -317,7 +317,7 @@ tasks:
317317
cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.
318318

319319
# Log the current screenshot with a description in the report file.
320-
- logScreenshot: <title> # Optional, the title of the screenshot. If not provided, the title will be 'untitled'.
320+
- recordToReport: <title> # Optional, the title of the screenshot. If not provided, the title will be 'untitled'.
321321
content: <content> # Optional, the description of the screenshot.
322322

323323
# Data Extraction

apps/site/docs/en/blog-programming-practice-using-structured-api.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,24 @@
11
# Use JavaScript to optimize the AI automation code
22

3-
Many developers love using `ai` or `aiAction` to accomplish complex tasks, and even describe all logic in a single natural language instruction. Although it may seem 'intelligent', in practice, this approach may not provide a reliable and efficient experience, and results in an endless loop of Prompt tuning.
3+
Many developers love using `ai` or `aiAct` to accomplish complex tasks, and even describe all logic in a single natural language instruction. Although it may seem 'intelligent', in practice, this approach may not provide a reliable and efficient experience, and results in an endless loop of Prompt tuning.
44

55
Here is a typical example, developers may write a large logic storm with long descriptions, such as:
66

77
```javascript
88
// complex tasks
9-
aiAction(`
9+
aiAct(`
1010
1. click the first user
1111
2. click the chat bubble on the right side of the user page
1212
3. if I have already sent a message to him/her, go back to the previous page
1313
4. if I have not sent a message to him/her, input a greeting text and click send
1414
`)
1515
```
1616

17-
Another common misconception is that the complex workflow can be effectively controlled using `aiAction` methods. These prompts are far from reliable when compared to traditional JavaScript. For example:
17+
Another common misconception is that the complex workflow can be effectively controlled using `aiAct` methods. These prompts are far from reliable when compared to traditional JavaScript. For example:
1818

1919
```javascript
2020
// not stable !
21-
aiAction('click all the records one by one. If one record contains the text "completed", skip it')
21+
aiAct('click all the records one by one. If one record contains the text "completed", skip it')
2222
```
2323

2424
## One path to optimize the automation code: use JavaScript and structured API
@@ -27,7 +27,7 @@ From v0.16.10, Midscene provides data extraction methods like `aiBoolean` `aiStr
2727

2828
Combining them with the instant action methods, like `aiTap`, `aiInput`, `aiScroll`, `aiHover`, etc., you can split complex logic into multiple steps to improve the stability of the automation code.
2929

30-
Let's take the first bad case above, you can convert the `.aiAction` method into a structured API call:
30+
Let's take the first bad case above, you can convert the `.aiAct` method into a structured API call:
3131

3232
Original prompt:
3333

@@ -53,7 +53,7 @@ After modifying the coding style, the whole process can be much more reliable an
5353
Here is another example, this is what it looks like before rewriting:
5454

5555
```javascript
56-
aiAction(`
56+
aiAct(`
5757
1. click the first unfollowed user, enter the user's homepage
5858
2. click the follow button
5959
3. go back to the previous page
@@ -185,14 +185,14 @@ After you input the prompt, the AI IDE will convert the prompt into structured j
185185

186186
Enjoy it!
187187

188-
## Which approach is best: `aiAction` or structured code?
188+
## Which approach is best: `aiAct` or structured code?
189189

190190
There is no standard answer. It depends on the model's ability, the complexity of the actual business.
191191

192-
Generally, if you encounter the following situations, you should consider abandoning the `aiAction` method:
192+
Generally, if you encounter the following situations, you should consider abandoning the `aiAct` method:
193193

194-
- The success rate of `aiAction` does not meet the requirements after multiple retries
195-
- You have already felt tired and spent too much time repeatedly tuning the `aiAction` prompt
194+
- The success rate of `aiAct` does not meet the requirements after multiple retries
195+
- You have already felt tired and spent too much time repeatedly tuning the `aiAct` prompt
196196
- You need to debug the script step by step
197197

198198
## What's next?

apps/site/docs/en/blog-support-android-automation.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,10 @@ android:
4747
tasks:
4848
- name: search headphones
4949
flow:
50-
- aiAction: open browser and navigate to ebay.com
51-
- aiAction: type 'Headphones' in ebay search box, hit Enter
50+
- aiAct: open browser and navigate to ebay.com
51+
- aiAct: type 'Headphones' in ebay search box, hit Enter
5252
- sleep: 5000
53-
- aiAction: scroll down the page for 800px
53+
- aiAct: scroll down the page for 800px
5454

5555
- name: extract headphones info
5656
flow:
@@ -88,7 +88,7 @@ Promise.resolve(
8888
await sleep(5000);
8989

9090
// 👀 type keywords, perform a search
91-
await agent.aiAction('type "Headphones" in search box, hit Enter');
91+
await agent.aiAct('type "Headphones" in search box, hit Enter');
9292

9393
// 👀 wait for the loading
9494
await agent.aiWaitFor("there is at least one headphone item on page");

apps/site/docs/en/blog-support-ios-automation.mdx

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -42,11 +42,11 @@ ios:
4242
tasks:
4343
- name: search content
4444
flow:
45-
- aiAction: tap address bar
46-
- aiAction: input 'Midscene AI automation'
47-
- aiAction: tap search button
45+
- aiAct: tap address bar
46+
- aiAct: input 'Midscene AI automation'
47+
- aiAct: tap search button
4848
- sleep: 3000
49-
- aiAction: scroll down 500px
49+
- aiAct: scroll down 500px
5050

5151
- name: extract search results
5252
flow:
@@ -89,10 +89,10 @@ Promise.resolve(
8989
await sleep(3000);
9090

9191
// 👀 tap address bar and input search keywords
92-
await agent.aiAction('tap address bar and input "Midscene automation"');
92+
await agent.aiAct('tap address bar and input "Midscene automation"');
9393

9494
// 👀 perform search
95-
await agent.aiAction('tap search button');
95+
await agent.aiAct('tap search button');
9696

9797
// 👀 wait for loading to complete
9898
await agent.aiWaitFor("there is at least one search result on the page");

apps/site/docs/en/caching.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ With caching hit, time cost is significantly reduced. For example, in the follow
1919
Midscene's caching mechanism is based on input stability and output reusability. When the same task instructions are repeatedly executed in similar page environments, Midscene will prioritize using cached results to avoid repeated AI model calls, significantly improving execution efficiency.
2020

2121
The core caching mechanisms include:
22-
- **Task instruction caching**: For planning operations (such as `ai`, `aiAction`), Midscene uses the prompt instruction as the cache key to store the execution plan returned by AI
22+
- **Task instruction caching**: For planning operations (such as `ai`, `aiAct`), Midscene uses the prompt instruction as the cache key to store the execution plan returned by AI
2323
- **Element location caching**: For location operations (such as `aiLocate`, `aiTap`), the system uses the location prompt as the cache key to store element XPath information, and verifies whether the XPath is still valid on the next execution
2424
- **Invalidation mechanism**: When cache becomes invalid, the system automatically falls back to AI model for re-analysis
2525
- **Never cache query results**: The query results like `aiBoolean`, `aiQuery`, `aiAssert` will never be cached.

apps/site/docs/en/changelog.mdx

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ We've adapted the latest Qwen `Qwen3-VL` model, giving developers faster and mor
1313

1414
### 🤖 AI core capability enhancement
1515

16-
- **UI-TARS Model Performance Optimization**: Optimized aiAction planning, improved dialogue history management, and provided better context awareness capabilities
17-
- **AI Assertion and Action Optimization**: We updated the prompt for `aiAssert` and optimized the internal implementation of `aiAction`, making AI-driven assertions and action execution more precise and reliable
16+
- **UI-TARS Model Performance Optimization**: Optimized aiAct planning, improved dialogue history management, and provided better context awareness capabilities
17+
- **AI Assertion and Action Optimization**: We updated the prompt for `aiAssert` and optimized the internal implementation of `aiAct`, making AI-driven assertions and action execution more precise and reliable
1818

1919
### 📊 Reporting and debugging experience optimization
2020
- **URL Parameter Playback Control**: To improve debugging experience, you can now directly control the default behavior of report playback through URL parameters
@@ -194,7 +194,7 @@ Based on the introduction of [Rslib](https://github.com/web-infra-dev/rslib) in
194194
- Support storing more complex data structures, laying the foundation for future feature extensions
195195

196196
#### 3️⃣ Customize replanning cycle limit
197-
- Set the `MIDSCENE_REPLANNING_CYCLE_LIMIT` environment variable to customize the maximum number of re-planning cycles allowed when executing operations (aiAction).
197+
- Set the `MIDSCENE_REPLANNING_CYCLE_LIMIT` environment variable to customize the maximum number of re-planning cycles allowed when executing operations (aiAct).
198198
- The default value is 10. When the AI needs to re-plan more than this limit, an error will be thrown and suggest splitting the task.
199199
- Provide more flexible task execution control, adapting to different automation scenarios
200200

@@ -306,14 +306,14 @@ Reduce the size of the generated report by trimming redundant data, significantl
306306

307307
### Custom node in report
308308

309-
* Add the `logScreenshot` API to the agent. Take a screenshot of the current page as a report node, and support setting the node title and description to make the automated testing process more intuitive. Applicable for capturing screenshots of key steps, error status capture, UI validation, etc.
309+
* Add the `recordToReport` API to the agent. Take a screenshot of the current page as a report node, and support setting the node title and description to make the automated testing process more intuitive. Applicable for capturing screenshots of key steps, error status capture, UI validation, etc.
310310

311-
![](/blog/logScreenshot-api.png)
311+
![](/blog/recordToReport-api.png)
312312

313313
* Example:
314314

315315
```javascript
316-
test('login github', async ({ ai, aiAssert, aiInput, logScreenshot }) => {
316+
test('login github', async ({ ai, aiAssert, aiInput, recordToReport }) => {
317317
if (CACHE_TIME_OUT) {
318318
test.setTimeout(200 * 1000);
319319
}
@@ -322,7 +322,7 @@ test('login github', async ({ ai, aiAssert, aiInput, logScreenshot }) => {
322322
await aiInput('123456', 'password');
323323

324324
// log by your own
325-
await logScreenshot('Login page', {
325+
await recordToReport('Login page', {
326326
content: 'Username is quanru, password is 123456',
327327
});
328328

0 commit comments

Comments
 (0)