binlog parallel parsing

There are some use cases where parsing binlog is the bottleneck, such as syncing from severals hours/days ago after dumping big uninterested table. Throughput is around 40 to 50 thousands records per second in production for us and exhausted one cpu core. If we could parse binlog in parallel, much higher throughput in this scenario could be reached I think.

To make this possible, we have to break the sequential assumption(within one input stream) from input to sliding window. One possible solution is add a `prepare` step before `submit` in scheduler. Sequence sensitive logic such as id allocating should be done before `prepare`, then start parallel parsing, and finally submit it as before.  

  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

binlog parallel parsing #270

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

binlog parallel parsing #270

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions