Add the evaluation on MemoryAgentBench #39

HUST-AI-HYZ · 2025-08-19T16:56:39Z

What

Add evaluation for MemoryAgentBench.
Include scripts/configs to run the benchmark within MIRIX.

Why

Provide standardized memory capability evaluation and comparable metrics.

How to run

Commands: python main.py --agent_name mirix --dataset MemoryAgentBench --config_path ../mirix/configs/mirix_azure_example.yaml --num_exp 2
Outcome: Currently it will produce the results json and log files.

Links

Benchmark: https://github.com/HUST-AI-HYZ/MemoryAgentBench

Checklist

Docs updated if needed
CI passes

wangyu-ustc and others added 22 commits July 12, 2025 13:00

finish public_evaluations

7b143c5

fix a bug for evaluation

d65093d

public_evaluations: update conversation_creator; add bench_template

1dab78e

add the memoryagentbench

adc793e

add extral settings

799de01

add hyperparamters

c6657ca

upload changes

16a5d33

sync

ff360e1

sync

54336cb

Azure

96af825

sync

07b9bb3

check the main / sub process

c6464b3

sync

6218406

sync

e7fd109

sqlite db bug try to fix

f792183

set back to previous main.py

83a32d6

sync

114d353

fixed bug on sqlitedb

190e8a4

num exp constraints

b37c639

fix the bug on sqlite db

370590d

add log utils

9f5376c

optimization in logging

5058c33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add the evaluation on MemoryAgentBench #39

Add the evaluation on MemoryAgentBench #39

Uh oh!

HUST-AI-HYZ commented Aug 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add the evaluation on MemoryAgentBench #39

Are you sure you want to change the base?

Add the evaluation on MemoryAgentBench #39

Uh oh!

Conversation

HUST-AI-HYZ commented Aug 19, 2025

What

Why

How to run

Links

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants