: It uses the BAM file's alignment data to "predict" the contents of the FASTQ files. Only the differences (residual information) between the two are stored.
: This effectively "hides" the FASTQ data within the BAM file's footprint, potentially halving the storage costs for genomic repositories.
For a standard research paper of 15-20 pages, your outline should be no more than four pages in length. Sacred Heart University Deep FASTQ and BAM co-compression in Genozip 15 - bioRxiv 15p.zip
Because BAM files originate from FASTQ files, they contain nearly identical sequence information, creating a massive that traditional compression (like GZIP or BGZF) treats as separate, redundant data. The "Deep" Compression Method
: Containing raw sequence reads and quality scores. : It uses the BAM file's alignment data
: Decompression is handled automatically by standard Genozip commands like genounzip (to restore the full set) or genocat (to extract a specific file). Comparison: Deep vs. Standard ZIP
: Containing those same reads mapped to a reference genome. For a standard research paper of 15-20 pages,
The primary bottleneck in modern bioinformatics is the massive storage requirement for Next-Generation Sequencing (NGS) data. Standard workflows typically involve two main file types: