Why do I get so many insertions from Minimap2 on my Nanopore WGS?
$begingroup$
I'm a starting my analysis on nanopore whole genome sequencing. I start my analysis from this popular Github.
The sample I downloaded was WGS for NA12878, so I would assume it's alignment to GRCh38 shouldn't be that bad.
But ... I'm getting lot's of insertions...
I'm not sure it was caused by the sequencing platform, the MiniMap aligner or my bugs.
I supply all my commands, starting from sample download.
wget https://s3.amazonaws.com/nanopore-human-wgs/rel5-guppy-0.3.0-chunk10k.fastq.gz
gunzip rel5-guppy-0.3.0-chunk10k.fastq.gz
wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
minimap2 -d GRCh38_full_analysis_set_plus_decoy_hla.fa.index GRCh38_full_analysis_set_plus_decoy_hla.fa
minimap2 -ax map-ont -t 20 GRCh38_full_analysis_set_plus_decoy_hla.fa.index rel5-guppy-0.3.0-chunk10k.fastq | samtools view -bS - | samtools sort -@ 5 > rel5-guppy-0.3.0-chunk10k.bam
samtools index rel5-guppy-0.3.0-chunk10k.bam
sequence-alignment sequencing nanopore errors
$endgroup$
add a comment |
$begingroup$
I'm a starting my analysis on nanopore whole genome sequencing. I start my analysis from this popular Github.
The sample I downloaded was WGS for NA12878, so I would assume it's alignment to GRCh38 shouldn't be that bad.
But ... I'm getting lot's of insertions...
I'm not sure it was caused by the sequencing platform, the MiniMap aligner or my bugs.
I supply all my commands, starting from sample download.
wget https://s3.amazonaws.com/nanopore-human-wgs/rel5-guppy-0.3.0-chunk10k.fastq.gz
gunzip rel5-guppy-0.3.0-chunk10k.fastq.gz
wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
minimap2 -d GRCh38_full_analysis_set_plus_decoy_hla.fa.index GRCh38_full_analysis_set_plus_decoy_hla.fa
minimap2 -ax map-ont -t 20 GRCh38_full_analysis_set_plus_decoy_hla.fa.index rel5-guppy-0.3.0-chunk10k.fastq | samtools view -bS - | samtools sort -@ 5 > rel5-guppy-0.3.0-chunk10k.bam
samtools index rel5-guppy-0.3.0-chunk10k.bam
sequence-alignment sequencing nanopore errors
$endgroup$
add a comment |
$begingroup$
I'm a starting my analysis on nanopore whole genome sequencing. I start my analysis from this popular Github.
The sample I downloaded was WGS for NA12878, so I would assume it's alignment to GRCh38 shouldn't be that bad.
But ... I'm getting lot's of insertions...
I'm not sure it was caused by the sequencing platform, the MiniMap aligner or my bugs.
I supply all my commands, starting from sample download.
wget https://s3.amazonaws.com/nanopore-human-wgs/rel5-guppy-0.3.0-chunk10k.fastq.gz
gunzip rel5-guppy-0.3.0-chunk10k.fastq.gz
wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
minimap2 -d GRCh38_full_analysis_set_plus_decoy_hla.fa.index GRCh38_full_analysis_set_plus_decoy_hla.fa
minimap2 -ax map-ont -t 20 GRCh38_full_analysis_set_plus_decoy_hla.fa.index rel5-guppy-0.3.0-chunk10k.fastq | samtools view -bS - | samtools sort -@ 5 > rel5-guppy-0.3.0-chunk10k.bam
samtools index rel5-guppy-0.3.0-chunk10k.bam
sequence-alignment sequencing nanopore errors
$endgroup$
I'm a starting my analysis on nanopore whole genome sequencing. I start my analysis from this popular Github.
The sample I downloaded was WGS for NA12878, so I would assume it's alignment to GRCh38 shouldn't be that bad.
But ... I'm getting lot's of insertions...
I'm not sure it was caused by the sequencing platform, the MiniMap aligner or my bugs.
I supply all my commands, starting from sample download.
wget https://s3.amazonaws.com/nanopore-human-wgs/rel5-guppy-0.3.0-chunk10k.fastq.gz
gunzip rel5-guppy-0.3.0-chunk10k.fastq.gz
wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
minimap2 -d GRCh38_full_analysis_set_plus_decoy_hla.fa.index GRCh38_full_analysis_set_plus_decoy_hla.fa
minimap2 -ax map-ont -t 20 GRCh38_full_analysis_set_plus_decoy_hla.fa.index rel5-guppy-0.3.0-chunk10k.fastq | samtools view -bS - | samtools sort -@ 5 > rel5-guppy-0.3.0-chunk10k.bam
samtools index rel5-guppy-0.3.0-chunk10k.bam
sequence-alignment sequencing nanopore errors
sequence-alignment sequencing nanopore errors
edited 16 hours ago
Daniel Standage
2,184328
2,184328
asked yesterday
SmallChessSmallChess
1,343622
1,343622
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Insertions and deletions are the dominant error mode of long read sequencing, including nanopore sequencing. What you see is not unexpected. Things may have improved by now if you would download the raw fast5 data and repeat the basecalling. There is no need to gunzip
the fastq.gz prior to alignment. Your commands for alignment look alright, except that (if you are using a recent version of samtools) you don't really need samtools view and can pipe directly to samtools sort, using -o to specify the output format:
minimap2 -ax map-ont -t 20 <index> <fastq> | samtools sort -@5 -o alignment.bam
In a recent version of igv you can right-click the alignment and chose a "Quick consensus mode" (or something like that), which will make your alignments at least a bit prettier. Note how the consensus sequence (top of the graph) is quite clean - while individual reads may be noisy this is mostly randomly distributed and as such ironed out in consensus.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "676"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f6980%2fwhy-do-i-get-so-many-insertions-from-minimap2-on-my-nanopore-wgs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Insertions and deletions are the dominant error mode of long read sequencing, including nanopore sequencing. What you see is not unexpected. Things may have improved by now if you would download the raw fast5 data and repeat the basecalling. There is no need to gunzip
the fastq.gz prior to alignment. Your commands for alignment look alright, except that (if you are using a recent version of samtools) you don't really need samtools view and can pipe directly to samtools sort, using -o to specify the output format:
minimap2 -ax map-ont -t 20 <index> <fastq> | samtools sort -@5 -o alignment.bam
In a recent version of igv you can right-click the alignment and chose a "Quick consensus mode" (or something like that), which will make your alignments at least a bit prettier. Note how the consensus sequence (top of the graph) is quite clean - while individual reads may be noisy this is mostly randomly distributed and as such ironed out in consensus.
$endgroup$
add a comment |
$begingroup$
Insertions and deletions are the dominant error mode of long read sequencing, including nanopore sequencing. What you see is not unexpected. Things may have improved by now if you would download the raw fast5 data and repeat the basecalling. There is no need to gunzip
the fastq.gz prior to alignment. Your commands for alignment look alright, except that (if you are using a recent version of samtools) you don't really need samtools view and can pipe directly to samtools sort, using -o to specify the output format:
minimap2 -ax map-ont -t 20 <index> <fastq> | samtools sort -@5 -o alignment.bam
In a recent version of igv you can right-click the alignment and chose a "Quick consensus mode" (or something like that), which will make your alignments at least a bit prettier. Note how the consensus sequence (top of the graph) is quite clean - while individual reads may be noisy this is mostly randomly distributed and as such ironed out in consensus.
$endgroup$
add a comment |
$begingroup$
Insertions and deletions are the dominant error mode of long read sequencing, including nanopore sequencing. What you see is not unexpected. Things may have improved by now if you would download the raw fast5 data and repeat the basecalling. There is no need to gunzip
the fastq.gz prior to alignment. Your commands for alignment look alright, except that (if you are using a recent version of samtools) you don't really need samtools view and can pipe directly to samtools sort, using -o to specify the output format:
minimap2 -ax map-ont -t 20 <index> <fastq> | samtools sort -@5 -o alignment.bam
In a recent version of igv you can right-click the alignment and chose a "Quick consensus mode" (or something like that), which will make your alignments at least a bit prettier. Note how the consensus sequence (top of the graph) is quite clean - while individual reads may be noisy this is mostly randomly distributed and as such ironed out in consensus.
$endgroup$
Insertions and deletions are the dominant error mode of long read sequencing, including nanopore sequencing. What you see is not unexpected. Things may have improved by now if you would download the raw fast5 data and repeat the basecalling. There is no need to gunzip
the fastq.gz prior to alignment. Your commands for alignment look alright, except that (if you are using a recent version of samtools) you don't really need samtools view and can pipe directly to samtools sort, using -o to specify the output format:
minimap2 -ax map-ont -t 20 <index> <fastq> | samtools sort -@5 -o alignment.bam
In a recent version of igv you can right-click the alignment and chose a "Quick consensus mode" (or something like that), which will make your alignments at least a bit prettier. Note how the consensus sequence (top of the graph) is quite clean - while individual reads may be noisy this is mostly randomly distributed and as such ironed out in consensus.
answered 23 hours ago
Wouter De CosterWouter De Coster
49716
49716
add a comment |
add a comment |
Thanks for contributing an answer to Bioinformatics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f6980%2fwhy-do-i-get-so-many-insertions-from-minimap2-on-my-nanopore-wgs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown