Skip to content

Conversation

@allenshen13
Copy link
Member

Add NoRandomDuplicates and Custom Stream Seeding Features

NoRandomDuplicates:

Prevents duplicate query selection during random execution until all queries have been executed at least once.

When true: Creates shuffled sequences of all queries, executing each query once before repeating any.

{
  "random_execution": true,
  "randomly_execute_until": "20",
  "no_random_duplicates": true,
  "query_files": [
    "../queries/complex-1.sql",
    "../queries/complex-2.sql",
    "../queries/complex-3_ordered.sql",
  }

Custom Streams and Seeding:

Purpose: Enables deterministic random behavior across stream instances while maintaining reproducibility.

Individual Stream Seeds:

{
  "streams": [
    {
      "stream_name": "query_stream.json",
      "stream_count": 3,
      "seeds": [12345, 67890, 11111]
    }
  ]
}

Base Stream Seed (applied to all instances with offset for each one):

{
  "streams": [
    {
      "stream_name": "query_stream.json", 
      "stream_count": 3,
      "seeds": [54321]
    }
  ]
}

@allenshen13 allenshen13 added the enhancement New feature or request label Oct 15, 2025
Copy link
Collaborator

@ethanyzhang ethanyzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not quite like that we introduce a new way to add vertices in the execution graph. The core need here is to let one existing vertex in the graph to have multiple streams with different random seed. What you can probably do is to remove the StreamPath, and keep the StreamCount and Seeds. For queries, always use the stage's queries.

func (m *MySQLRunRecorder) RecordQuery(_ context.Context, s *Stage, result *QueryResult) {
recordNewQuery := `INSERT INTO pbench_queries (run_id, stage_id, query_file, query_index, query_id, sequence_no,
cold_run, succeeded, start_time, end_time, row_count, expected_row_count, duration_ms, info_url) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
cold_run, succeeded, start_time, end_time, row_count, expected_row_count, duration_ms, info_url, seed) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update pbench_queries_ddl.sql with the new column

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why do we need a seed in this table?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two tables, pbench_runs and pbench_queries. Currently pbench_runs has a seed column, but with this additional functionality to be able to seed each stream, there had to be some way to add reporting for multiple seeds in a run. The simplest way I found to do it was to add 'seed' as a column to pbench_queries and group by stage_id to be able to present it in Grafana.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking for other ways to report seed per stream, please let me know if you have any suggestions.

)

// Streams defines the configuration for stream-based execution
type Streams struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type Streams struct {
type Stream struct {

This is not a collection

return fmt.Errorf("stream_count must be positive, got %d for stream %s", spec.StreamCount, spec.StreamPath)
}

if len(spec.Seeds) > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using your Validate() method? The code seems duplicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants