Functionality for noRandomDuplicates parameter and custom Stream options with seeding #62

allenshen13 · 2025-09-30T23:56:27Z

Add NoRandomDuplicates and Custom Stream Seeding Features

NoRandomDuplicates:

Prevents duplicate query selection during random execution until all queries have been executed at least once.

When true: Creates shuffled sequences of all queries, executing each query once before repeating any.

{
  "random_execution": true,
  "randomly_execute_until": "20",
  "no_random_duplicates": true,
  "query_files": [
    "../queries/complex-1.sql",
    "../queries/complex-2.sql",
    "../queries/complex-3_ordered.sql",
  }

Custom Streams and Seeding:

Purpose: Enables deterministic random behavior across stream instances while maintaining reproducibility.

Individual Stream Seeds:

{
  "streams": [
    {
      "stream_name": "query_stream.json",
      "stream_count": 3,
      "seeds": [12345, 67890, 11111]
    }
  ]
}

Base Stream Seed (applied to all instances with offset for each one):

{
  "streams": [
    {
      "stream_name": "query_stream.json", 
      "stream_count": 3,
      "seeds": [54321]
    }
  ]
}

…ons with seeding

ethanyzhang

I do not quite like that we introduce a new way to add vertices in the execution graph. The core need here is to let one existing vertex in the graph to have multiple streams with different random seed. What you can probably do is to remove the StreamPath, and keep the StreamCount and Seeds. For queries, always use the stage's queries.

ethanyzhang · 2025-10-24T12:41:03Z

stage/mysql_run_recorder.go

 func (m *MySQLRunRecorder) RecordQuery(_ context.Context, s *Stage, result *QueryResult) {
 	recordNewQuery := `INSERT INTO pbench_queries (run_id, stage_id, query_file, query_index, query_id, sequence_no,
-cold_run, succeeded, start_time, end_time, row_count, expected_row_count, duration_ms, info_url) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`
+cold_run, succeeded, start_time, end_time, row_count, expected_row_count, duration_ms, info_url, seed) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`


Please update pbench_queries_ddl.sql with the new column

And why do we need a seed in this table?

There are two tables, pbench_runs and pbench_queries. Currently pbench_runs has a seed column, but with this additional functionality to be able to seed each stream, there had to be some way to add reporting for multiple seeds in a run. The simplest way I found to do it was to add 'seed' as a column to pbench_queries and group by stage_id to be able to present it in Grafana.

I was looking for other ways to report seed per stream, please let me know if you have any suggestions.

cmd/round/main.go

ethanyzhang · 2025-10-25T12:42:27Z

stage/streams.go

+)
+
+// Streams defines the configuration for stream-based execution
+type Streams struct {


Suggested change

type Streams struct {

type Stream struct {

This is not a collection

ethanyzhang · 2025-10-25T12:42:56Z

stage/map.go

+			return fmt.Errorf("stream_count must be positive, got %d for stream %s", spec.StreamCount, spec.StreamPath)
+		}
+
+		if len(spec.Seeds) > 0 {


Why not using your Validate() method? The code seems duplicated.

Functionality for noRandomDuplicates parameter and custom Stream opti…

d1a5c83

…ons with seeding

allenshen13 requested a review from ethanyzhang as a code owner September 30, 2025 23:56

allenshen13 added 2 commits October 8, 2025 14:19

Fixed functionality for custom seeds in streams.

de56042

Fixes to stream indexing and naming conventions

2f057f8

allenshen13 added the enhancement New feature or request label Oct 15, 2025

allenshen13 added 2 commits October 16, 2025 16:22

Seed recording

ae8c95d

Increase bufio Writer size

9b38245

ethanyzhang requested changes Oct 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Functionality for noRandomDuplicates parameter and custom Stream options with seeding #62

Functionality for noRandomDuplicates parameter and custom Stream options with seeding #62

Uh oh!

allenshen13 commented Sep 30, 2025

Uh oh!

ethanyzhang left a comment

Uh oh!

ethanyzhang Oct 24, 2025

Uh oh!

ethanyzhang Oct 25, 2025

Uh oh!

allenshen13 Oct 27, 2025

Uh oh!

allenshen13 Oct 27, 2025

Uh oh!

Uh oh!

ethanyzhang Oct 25, 2025

Uh oh!

ethanyzhang Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Functionality for noRandomDuplicates parameter and custom Stream options with seeding #62

Are you sure you want to change the base?

Functionality for noRandomDuplicates parameter and custom Stream options with seeding #62

Uh oh!

Conversation

allenshen13 commented Sep 30, 2025

Add NoRandomDuplicates and Custom Stream Seeding Features

NoRandomDuplicates:

Custom Streams and Seeding:

Uh oh!

ethanyzhang left a comment

Choose a reason for hiding this comment

Uh oh!

ethanyzhang Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

ethanyzhang Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

allenshen13 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

allenshen13 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ethanyzhang Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

ethanyzhang Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants