Skip to content

find_unique_reads.py keeps lower-scoring duplicates instead of highest MAPQ #293

@HDash

Description

@HDash

Description of the bug

There is a logic error in find_unique_reads.py where duplicate read1s are not filtered correctly.

Expected behaviour:
When multiple alignments share the same read1, only the alignment with the highest MAPQ score should be retained.

Observed behaviour:
In cases of duplicates, the alignment with the lowest MAPQ is kept instead.

Cause:
In lines 91–92:

elif alignments[unique_id][1] > row[4]:
alignments[unique_id] = [row[3], row[4]]

  • alignments[unique_id][1] is the stored MAPQ.
  • row[4] is the new alignment’s MAPQ.
  • The current comparison replaces the stored alignment if the stored MAPQ is higher, effectively keeping the lower-scoring alignment.

Fix:
The comparison should be reversed ensuring the alignment with the highest MAPQ is kept.

Command used and terminal output

NA

Relevant files

NA

System information

NA

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions