[BARNA-206] Duplicated reads have no unique ids - Jira

XML

Word

Printable

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.14
Fix Version/s: Simulator 1.1 (API 1.15)
Component/s: Simulator
Labels:
- git-branch-develop

Description

Some reads are identical in all respects and occur twice (both in fastq and bed of course). Here's an example of the header from a simulated fastq:

@Chr1:8017397-8021712W:AT1G22660.1:541:2005:723:872:S/2
@Chr1:8017397-8021712W:AT1G22660.1:541:2005:723:872:A/1
@Chr1:8017397-8021712W:AT1G22660.1:541:2005:723:872:S/2
@Chr1:8017397-8021712W:AT1G22660.1:541:2005:723:872:A/1

Fastq files are supposed to have unique ids. I am wondering how softwares usually handle this (do they just skip the 2nd occurrence?) and hence, the impact on the reads after mapping.

I did a test on a fastq generated from flux-simulator directly, that contained a total of about 14 million reads and found 2 * 489353 = app. 1 million reads duplicated. I have removed them for now. Now, the reads may occur duplicated (due to amplification or other factors or due to chance which is highly unlikely here though), however, the read IDs in those cases must be somehow changed. Right?

Attachments

Activity

People

Assignee:: Thasso Griebel

Reporter:: Thasso Griebel

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Aug/12 11:30 AM

Updated:: 09/Oct/15 4:34 AM

Resolved:: 02/Aug/12 2:02 PM