Why do I get so many insertions from Minimap2 on my Nanopore WGS?

I'm a starting my analysis on nanopore whole genome sequencing. I start my analysis from this popular Github.

The sample I downloaded was WGS for NA12878, so I would assume it's alignment to GRCh38 shouldn't be that bad.

But ... I'm getting lot's of insertions...

enter image description here

I'm not sure it was caused by the sequencing platform, the MiniMap aligner or my bugs.

I supply all my commands, starting from sample download.

wget https://s3.amazonaws.com/nanopore-human-wgs/rel5-guppy-0.3.0-chunk10k.fastq.gz

gunzip rel5-guppy-0.3.0-chunk10k.fastq.gz

wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

minimap2 -d GRCh38_full_analysis_set_plus_decoy_hla.fa.index GRCh38_full_analysis_set_plus_decoy_hla.fa

minimap2 -ax map-ont -t 20 GRCh38_full_analysis_set_plus_decoy_hla.fa.index rel5-guppy-0.3.0-chunk10k.fastq | samtools view -bS - | samtools sort -@ 5 > rel5-guppy-0.3.0-chunk10k.bam 

samtools index rel5-guppy-0.3.0-chunk10k.bam

edited 16 hours ago

Daniel Standage

2,184328

asked yesterday

SmallChess

1,343622

add a comment |

I'm a starting my analysis on nanopore whole genome sequencing. I start my analysis from this popular Github.

The sample I downloaded was WGS for NA12878, so I would assume it's alignment to GRCh38 shouldn't be that bad.

But ... I'm getting lot's of insertions...

enter image description here

I'm not sure it was caused by the sequencing platform, the MiniMap aligner or my bugs.

I supply all my commands, starting from sample download.

wget https://s3.amazonaws.com/nanopore-human-wgs/rel5-guppy-0.3.0-chunk10k.fastq.gz

gunzip rel5-guppy-0.3.0-chunk10k.fastq.gz

wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

minimap2 -d GRCh38_full_analysis_set_plus_decoy_hla.fa.index GRCh38_full_analysis_set_plus_decoy_hla.fa

minimap2 -ax map-ont -t 20 GRCh38_full_analysis_set_plus_decoy_hla.fa.index rel5-guppy-0.3.0-chunk10k.fastq | samtools view -bS - | samtools sort -@ 5 > rel5-guppy-0.3.0-chunk10k.bam 

samtools index rel5-guppy-0.3.0-chunk10k.bam

edited 16 hours ago

Daniel Standage

2,184328

asked yesterday

SmallChess

1,343622

add a comment |

I'm a starting my analysis on nanopore whole genome sequencing. I start my analysis from this popular Github.

The sample I downloaded was WGS for NA12878, so I would assume it's alignment to GRCh38 shouldn't be that bad.

But ... I'm getting lot's of insertions...

enter image description here

I'm not sure it was caused by the sequencing platform, the MiniMap aligner or my bugs.

I supply all my commands, starting from sample download.

wget https://s3.amazonaws.com/nanopore-human-wgs/rel5-guppy-0.3.0-chunk10k.fastq.gz

gunzip rel5-guppy-0.3.0-chunk10k.fastq.gz

wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

minimap2 -d GRCh38_full_analysis_set_plus_decoy_hla.fa.index GRCh38_full_analysis_set_plus_decoy_hla.fa

minimap2 -ax map-ont -t 20 GRCh38_full_analysis_set_plus_decoy_hla.fa.index rel5-guppy-0.3.0-chunk10k.fastq | samtools view -bS - | samtools sort -@ 5 > rel5-guppy-0.3.0-chunk10k.bam 

samtools index rel5-guppy-0.3.0-chunk10k.bam

edited 16 hours ago

Daniel Standage

2,184328

asked yesterday

SmallChess

1,343622

I'm a starting my analysis on nanopore whole genome sequencing. I start my analysis from this popular Github.

The sample I downloaded was WGS for NA12878, so I would assume it's alignment to GRCh38 shouldn't be that bad.

But ... I'm getting lot's of insertions...

enter image description here

I'm not sure it was caused by the sequencing platform, the MiniMap aligner or my bugs.

I supply all my commands, starting from sample download.

wget https://s3.amazonaws.com/nanopore-human-wgs/rel5-guppy-0.3.0-chunk10k.fastq.gz

gunzip rel5-guppy-0.3.0-chunk10k.fastq.gz

wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

minimap2 -d GRCh38_full_analysis_set_plus_decoy_hla.fa.index GRCh38_full_analysis_set_plus_decoy_hla.fa

minimap2 -ax map-ont -t 20 GRCh38_full_analysis_set_plus_decoy_hla.fa.index rel5-guppy-0.3.0-chunk10k.fastq | samtools view -bS - | samtools sort -@ 5 > rel5-guppy-0.3.0-chunk10k.bam 

samtools index rel5-guppy-0.3.0-chunk10k.bam

sequence-alignment sequencing nanopore errors

edited 16 hours ago

Daniel Standage

2,184328

asked yesterday

SmallChess

1,343622

edited 16 hours ago

Daniel Standage

2,184328

asked yesterday

SmallChess

1,343622

edited 16 hours ago

Daniel Standage

2,184328

edited 16 hours ago

Daniel Standage

2,184328

edited 16 hours ago

Daniel Standage

2,184328

asked yesterday

SmallChess

1,343622

asked yesterday

SmallChess

1,343622

asked yesterday

SmallChess

1,343622

add a comment |

1 Answer
1

active

oldest

votes

Insertions and deletions are the dominant error mode of long read sequencing, including nanopore sequencing. What you see is not unexpected. Things may have improved by now if you would download the raw fast5 data and repeat the basecalling. There is no need to gunzip the fastq.gz prior to alignment. Your commands for alignment look alright, except that (if you are using a recent version of samtools) you don't really need samtools view and can pipe directly to samtools sort, using -o to specify the output format:

minimap2 -ax map-ont -t 20 <index> <fastq> | samtools sort -@5 -o alignment.bam

In a recent version of igv you can right-click the alignment and chose a "Quick consensus mode" (or something like that), which will make your alignments at least a bit prettier. Note how the consensus sequence (top of the graph) is quite clean - while individual reads may be noisy this is mostly randomly distributed and as such ironed out in consensus.

answered 23 hours ago

Wouter De Coster

49716

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "676"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f6980%2fwhy-do-i-get-so-many-insertions-from-minimap2-on-my-nanopore-wgs%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

minimap2 -ax map-ont -t 20 <index> <fastq> | samtools sort -@5 -o alignment.bam

answered 23 hours ago

Wouter De Coster

49716

add a comment |

minimap2 -ax map-ont -t 20 <index> <fastq> | samtools sort -@5 -o alignment.bam

answered 23 hours ago

Wouter De Coster

49716

add a comment |

minimap2 -ax map-ont -t 20 <index> <fastq> | samtools sort -@5 -o alignment.bam

answered 23 hours ago

Wouter De Coster

49716

minimap2 -ax map-ont -t 20 <index> <fastq> | samtools sort -@5 -o alignment.bam

answered 23 hours ago

Wouter De Coster

49716

answered 23 hours ago

Wouter De Coster

49716

answered 23 hours ago

Wouter De Coster

49716

answered 23 hours ago

Wouter De Coster

49716

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Bioinformatics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr