-
Notifications
You must be signed in to change notification settings - Fork 18
Functionality for noRandomDuplicates parameter and custom Stream options with seeding #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not quite like that we introduce a new way to add vertices in the execution graph. The core need here is to let one existing vertex in the graph to have multiple streams with different random seed. What you can probably do is to remove the StreamPath, and keep the StreamCount and Seeds. For queries, always use the stage's queries.
| func (m *MySQLRunRecorder) RecordQuery(_ context.Context, s *Stage, result *QueryResult) { | ||
| recordNewQuery := `INSERT INTO pbench_queries (run_id, stage_id, query_file, query_index, query_id, sequence_no, | ||
| cold_run, succeeded, start_time, end_time, row_count, expected_row_count, duration_ms, info_url) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)` | ||
| cold_run, succeeded, start_time, end_time, row_count, expected_row_count, duration_ms, info_url, seed) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update pbench_queries_ddl.sql with the new column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why do we need a seed in this table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two tables, pbench_runs and pbench_queries. Currently pbench_runs has a seed column, but with this additional functionality to be able to seed each stream, there had to be some way to add reporting for multiple seeds in a run. The simplest way I found to do it was to add 'seed' as a column to pbench_queries and group by stage_id to be able to present it in Grafana.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking for other ways to report seed per stream, please let me know if you have any suggestions.
| ) | ||
|
|
||
| // Streams defines the configuration for stream-based execution | ||
| type Streams struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| type Streams struct { | |
| type Stream struct { |
This is not a collection
| return fmt.Errorf("stream_count must be positive, got %d for stream %s", spec.StreamCount, spec.StreamPath) | ||
| } | ||
|
|
||
| if len(spec.Seeds) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using your Validate() method? The code seems duplicated.
Add NoRandomDuplicates and Custom Stream Seeding Features
NoRandomDuplicates:
Prevents duplicate query selection during random execution until all queries have been executed at least once.
When true: Creates shuffled sequences of all queries, executing each query once before repeating any.
Custom Streams and Seeding:
Purpose: Enables deterministic random behavior across stream instances while maintaining reproducibility.
Individual Stream Seeds:
Base Stream Seed (applied to all instances with offset for each one):