Uploaded image for project: 'Barna Package'
  1. Barna Package
  2. BARNA-206

Duplicated reads have no unique ids

    XMLWordPrintable

Details

    Description

      Some reads are identical in all respects and occur twice (both in fastq and bed of course). Here's an example of the header from a simulated fastq:

      @Chr1:8017397-8021712W:AT1G22660.1:541:2005:723:872:S/2
      @Chr1:8017397-8021712W:AT1G22660.1:541:2005:723:872:A/1
      @Chr1:8017397-8021712W:AT1G22660.1:541:2005:723:872:S/2
      @Chr1:8017397-8021712W:AT1G22660.1:541:2005:723:872:A/1
      

      Fastq files are supposed to have unique ids. I am wondering how softwares usually handle this (do they just skip the 2nd occurrence?) and hence, the impact on the reads after mapping.

      I did a test on a fastq generated from flux-simulator directly, that contained a total of about 14 million reads and found 2 * 489353 = app. 1 million reads duplicated. I have removed them for now. Now, the reads may occur duplicated (due to amplification or other factors or due to chance which is highly unlikely here though), however, the read IDs in those cases must be somehow changed. Right?

      Attachments

        Activity

          People

            thasso Thasso Griebel
            thasso Thasso Griebel
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: