Skip to content

Blockgen job fails to clean up failed reduce attempts #56

@xkrogen

Description

@xkrogen

The block generation job has custom output logic to allow each reducer to output to multiple block files.

When speculative execution is enabled, this can result in two copies of the same block file being generated (one of which may be incomplete). This can be worked around by setting mapreduce.reduce.speculative = false.

When a reducer attempt fails, the partial output files will not be cleaned up. I'm not aware of an easy workaround for this beyond manually cleaning up the files after the job completes.

We should have each reducer use a staging directory and only move the output files when it completes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions