You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor(core,web-integration,docs): rename API methods for clarity
BREAKING CHANGE: Renamed aiAction() to aiAct() and logScreenshot()
to recordToReport() for improved naming consistency. The aiAction()
method is kept as deprecated for backward compatibility.
Changes:
- Renamed aiAction() to aiAct() across core and web-integration
- Renamed logScreenshot() to recordToReport()
- Updated all English and Chinese documentation
- Updated code examples in README files
- Updated Playwright fixture to support new method names
- Added deprecation warning for aiAction() method
- Updated all test files and examples
This improves API consistency and clarity while maintaining
backward compatibility through deprecated methods.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: apps/site/docs/en/api.mdx
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ These Agents share some common constructor parameters:
17
17
-`reportFileName: string`: The name of the report file. (Default: generated by midscene)
18
18
-`autoPrintReportMsg: boolean`: If true, report messages will be printed. (Default: true)
19
19
-`cacheId: string | undefined`: If provided, this cacheId will be used to save or match the cache. (Default: undefined, means cache feature is disabled)
20
-
-`actionContext: string`: Some background knowledge that should be sent to the AI model when calling `agent.aiAction()`, like 'close the cookie consent dialog first if it exists' (Default: undefined)
20
+
-`actionContext: string`: Some background knowledge that should be sent to the AI model when calling `agent.aiAct()`, like 'close the cookie consent dialog first if it exists' (Default: undefined)
21
21
-`onTaskStartTip: (tip: string) => void | Promise<void>`: Optional hook that fires before each execution task begins with a human-readable summary of the task (Default: undefined)
22
22
23
23
In Playwright and Puppeteer, there are some common parameters:
@@ -42,14 +42,14 @@ In Midscene, you can choose to use either auto planning or instant action.
42
42
43
43
:::
44
44
45
-
### `agent.aiAction()` or `.ai()`
45
+
### `agent.aiAct()` or `.ai()`
46
46
47
47
This method allows you to perform a series of UI actions described in natural language. Midscene automatically plans the steps and executes them.
48
48
49
49
- Type
50
50
51
51
```typescript
52
-
functionaiAction(
52
+
functionaiAct(
53
53
prompt:string,
54
54
options?: {
55
55
cacheable?:boolean;
@@ -72,7 +72,7 @@ function ai(prompt: string): Promise<void>; // shorthand form
72
72
73
73
```typescript
74
74
// Basic usage
75
-
awaitagent.aiAction(
75
+
awaitagent.aiAct(
76
76
'Type "JavaScript" into the search box, then click the search button',
77
77
);
78
78
@@ -82,14 +82,14 @@ await agent.ai(
82
82
);
83
83
84
84
// When using UI Agent models like ui-tars, you can try a more goal-driven prompt
85
-
awaitagent.aiAction('Post a Tweet "Hello World"');
85
+
awaitagent.aiAct('Post a Tweet "Hello World"');
86
86
```
87
87
88
88
:::tip
89
89
90
90
Under the hood, Midscene uses AI model to split the instruction into a series of steps (a.k.a. "Planning"). It then executes these steps sequentially. If Midscene determines that the actions cannot be performed, an error will be thrown.
91
91
92
-
For optimal results, please provide clear and detailed instructions for `agent.aiAction()`. For guides about writing prompts, you may read this doc: [Tips for Writing Prompts](./prompting-tips).
92
+
For optimal results, please provide clear and detailed instructions for `agent.aiAct()`. For guides about writing prompts, you may read this doc: [Tips for Writing Prompts](./prompting-tips).
93
93
94
94
Related Documentation:
95
95
@@ -700,7 +700,7 @@ For more information about YAML scripts, please refer to [Automate with Scripts
700
700
701
701
### `agent.setAIActionContext()`
702
702
703
-
Set the background knowledge that should be sent to the AI model when calling `agent.aiAction()` or `agent.ai()`. This will override the previous setting.
703
+
Set the background knowledge that should be sent to the AI model when calling `agent.aiAct()` or `agent.ai()`. This will override the previous setting.
704
704
705
705
For instant action type APIs, like `aiTap()`, this setting will not take effect.
706
706
@@ -749,14 +749,14 @@ const result = await agent.evaluateJavaScript('document.title');
749
749
console.log(result);
750
750
```
751
751
752
-
### `agent.logScreenshot()`
752
+
### `agent.recordToReport()`
753
753
754
754
Log the current screenshot with a description in the report file.
@@ -772,7 +772,7 @@ function logScreenshot(title?: string, options?: Object): Promise<void>;
772
772
- Examples:
773
773
774
774
```typescript
775
-
awaitagent.logScreenshot('Login page', {
775
+
awaitagent.recordToReport('Login page', {
776
776
content: 'User A',
777
777
});
778
778
```
@@ -872,7 +872,7 @@ export MIDSCENE_RUN_DIR=midscene_run # The default value is the midscene_run in
872
872
873
873
### Customize the replanning cycle limit
874
874
875
-
Set the `MIDSCENE_REPLANNING_CYCLE_LIMIT` variable to customize the maximum number of replanning cycles allowed during action execution (`aiAction`).
875
+
Set the `MIDSCENE_REPLANNING_CYCLE_LIMIT` variable to customize the maximum number of replanning cycles allowed during action execution (`aiAct`).
876
876
877
877
```bash
878
878
export MIDSCENE_REPLANNING_CYCLE_LIMIT=10 # The default value is 10. When the AI needs to replan more than this limit, an error will be thrown suggesting to split the task into multiple steps
Copy file name to clipboardExpand all lines: apps/site/docs/en/blog-programming-practice-using-structured-api.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,24 +1,24 @@
1
1
# Use JavaScript to optimize the AI automation code
2
2
3
-
Many developers love using `ai` or `aiAction` to accomplish complex tasks, and even describe all logic in a single natural language instruction. Although it may seem 'intelligent', in practice, this approach may not provide a reliable and efficient experience, and results in an endless loop of Prompt tuning.
3
+
Many developers love using `ai` or `aiAct` to accomplish complex tasks, and even describe all logic in a single natural language instruction. Although it may seem 'intelligent', in practice, this approach may not provide a reliable and efficient experience, and results in an endless loop of Prompt tuning.
4
4
5
5
Here is a typical example, developers may write a large logic storm with long descriptions, such as:
6
6
7
7
```javascript
8
8
// complex tasks
9
-
aiAction(`
9
+
aiAct(`
10
10
1. click the first user
11
11
2. click the chat bubble on the right side of the user page
12
12
3. if I have already sent a message to him/her, go back to the previous page
13
13
4. if I have not sent a message to him/her, input a greeting text and click send
14
14
`)
15
15
```
16
16
17
-
Another common misconception is that the complex workflow can be effectively controlled using `aiAction` methods. These prompts are far from reliable when compared to traditional JavaScript. For example:
17
+
Another common misconception is that the complex workflow can be effectively controlled using `aiAct` methods. These prompts are far from reliable when compared to traditional JavaScript. For example:
18
18
19
19
```javascript
20
20
// not stable !
21
-
aiAction('click all the records one by one. If one record contains the text "completed", skip it')
21
+
aiAct('click all the records one by one. If one record contains the text "completed", skip it')
22
22
```
23
23
24
24
## One path to optimize the automation code: use JavaScript and structured API
@@ -27,7 +27,7 @@ From v0.16.10, Midscene provides data extraction methods like `aiBoolean` `aiStr
27
27
28
28
Combining them with the instant action methods, like `aiTap`, `aiInput`, `aiScroll`, `aiHover`, etc., you can split complex logic into multiple steps to improve the stability of the automation code.
29
29
30
-
Let's take the first bad case above, you can convert the `.aiAction` method into a structured API call:
30
+
Let's take the first bad case above, you can convert the `.aiAct` method into a structured API call:
31
31
32
32
Original prompt:
33
33
@@ -53,7 +53,7 @@ After modifying the coding style, the whole process can be much more reliable an
53
53
Here is another example, this is what it looks like before rewriting:
54
54
55
55
```javascript
56
-
aiAction(`
56
+
aiAct(`
57
57
1. click the first unfollowed user, enter the user's homepage
58
58
2. click the follow button
59
59
3. go back to the previous page
@@ -185,14 +185,14 @@ After you input the prompt, the AI IDE will convert the prompt into structured j
185
185
186
186
Enjoy it!
187
187
188
-
## Which approach is best: `aiAction` or structured code?
188
+
## Which approach is best: `aiAct` or structured code?
189
189
190
190
There is no standard answer. It depends on the model's ability, the complexity of the actual business.
191
191
192
-
Generally, if you encounter the following situations, you should consider abandoning the `aiAction` method:
192
+
Generally, if you encounter the following situations, you should consider abandoning the `aiAct` method:
193
193
194
-
- The success rate of `aiAction` does not meet the requirements after multiple retries
195
-
- You have already felt tired and spent too much time repeatedly tuning the `aiAction` prompt
194
+
- The success rate of `aiAct` does not meet the requirements after multiple retries
195
+
- You have already felt tired and spent too much time repeatedly tuning the `aiAct` prompt
Copy file name to clipboardExpand all lines: apps/site/docs/en/caching.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ With caching hit, time cost is significantly reduced. For example, in the follow
19
19
Midscene's caching mechanism is based on input stability and output reusability. When the same task instructions are repeatedly executed in similar page environments, Midscene will prioritize using cached results to avoid repeated AI model calls, significantly improving execution efficiency.
20
20
21
21
The core caching mechanisms include:
22
-
-**Task instruction caching**: For planning operations (such as `ai`, `aiAction`), Midscene uses the prompt instruction as the cache key to store the execution plan returned by AI
22
+
-**Task instruction caching**: For planning operations (such as `ai`, `aiAct`), Midscene uses the prompt instruction as the cache key to store the execution plan returned by AI
23
23
-**Element location caching**: For location operations (such as `aiLocate`, `aiTap`), the system uses the location prompt as the cache key to store element XPath information, and verifies whether the XPath is still valid on the next execution
24
24
-**Invalidation mechanism**: When cache becomes invalid, the system automatically falls back to AI model for re-analysis
25
25
-**Never cache query results**: The query results like `aiBoolean`, `aiQuery`, `aiAssert` will never be cached.
Copy file name to clipboardExpand all lines: apps/site/docs/en/changelog.mdx
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,8 +13,8 @@ We've adapted the latest Qwen `Qwen3-VL` model, giving developers faster and mor
13
13
14
14
### 🤖 AI core capability enhancement
15
15
16
-
-**UI-TARS Model Performance Optimization**: Optimized aiAction planning, improved dialogue history management, and provided better context awareness capabilities
17
-
-**AI Assertion and Action Optimization**: We updated the prompt for `aiAssert` and optimized the internal implementation of `aiAction`, making AI-driven assertions and action execution more precise and reliable
16
+
-**UI-TARS Model Performance Optimization**: Optimized aiAct planning, improved dialogue history management, and provided better context awareness capabilities
17
+
-**AI Assertion and Action Optimization**: We updated the prompt for `aiAssert` and optimized the internal implementation of `aiAct`, making AI-driven assertions and action execution more precise and reliable
18
18
19
19
### 📊 Reporting and debugging experience optimization
20
20
-**URL Parameter Playback Control**: To improve debugging experience, you can now directly control the default behavior of report playback through URL parameters
@@ -194,7 +194,7 @@ Based on the introduction of [Rslib](https://github.com/web-infra-dev/rslib) in
194
194
- Support storing more complex data structures, laying the foundation for future feature extensions
195
195
196
196
#### 3️⃣ Customize replanning cycle limit
197
-
- Set the `MIDSCENE_REPLANNING_CYCLE_LIMIT` environment variable to customize the maximum number of re-planning cycles allowed when executing operations (aiAction).
197
+
- Set the `MIDSCENE_REPLANNING_CYCLE_LIMIT` environment variable to customize the maximum number of re-planning cycles allowed when executing operations (aiAct).
198
198
- The default value is 10. When the AI needs to re-plan more than this limit, an error will be thrown and suggest splitting the task.
199
199
- Provide more flexible task execution control, adapting to different automation scenarios
200
200
@@ -306,14 +306,14 @@ Reduce the size of the generated report by trimming redundant data, significantl
306
306
307
307
### Custom node in report
308
308
309
-
* Add the `logScreenshot` API to the agent. Take a screenshot of the current page as a report node, and support setting the node title and description to make the automated testing process more intuitive. Applicable for capturing screenshots of key steps, error status capture, UI validation, etc.
309
+
* Add the `recordToReport` API to the agent. Take a screenshot of the current page as a report node, and support setting the node title and description to make the automated testing process more intuitive. Applicable for capturing screenshots of key steps, error status capture, UI validation, etc.
0 commit comments