Random uninterruptible sleep processes / IO spikes












1















Starting last week, I've been having a problem where various processes will go into the uninterruptible sleep state for about 5-10 minutes at a time and then unblock themselves like nothing ever happened. It might happen a few times an hour or only a few times a day.



I'm running Arch with kernel 4.20.3-arch1-1-ARCH and have two hard drives in a RAID 1 array with the filesystem encrypted with LUKS.



Running ps, I see that the following processes are commonly in uninterruptible sleep during these IO spikes:



md125_raid1
dmcrypt_write/2
jbd2/dm-1-8
kworker/u16:2+flush-253:1


The output from iostat during a spike:



Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 1.67 0.00 0.00 2.00 0.00 0.00
sdd 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 2.67 0.00 0.00 2.00 0.00 0.00
md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md126 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00
md125 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md124 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 26.00 0.01 0.00 4.00 26.00 1.30
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 51.00 0.00 0.00 0.00 100.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00




  • dm-1 always hits 100% utilization.

  • There is no relevant info in the kernel log.

  • Both disks are ~6 months old and pass a SMART self-test.


I'm not really sure where to go from here. It doesn't seem like it's any particular program that's causing this, but rather something in the kernel or the RAID/LUKS code. Is there anything else I can do to further debug what is causing this problem?










share|improve this question


















  • 1





    Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with echo 3 > /proc/sys/vm/drop_caches unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.

    – dirkt
    Jan 23 at 6:06











  • @dirkt Awesome, that unblocks it for me too. I have to keep a window open running /bin/sh as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!

    – shanet
    Jan 23 at 19:29











  • If you find out the actual root problem of this, I'd be very much interested in it.

    – dirkt
    Jan 24 at 7:02
















1















Starting last week, I've been having a problem where various processes will go into the uninterruptible sleep state for about 5-10 minutes at a time and then unblock themselves like nothing ever happened. It might happen a few times an hour or only a few times a day.



I'm running Arch with kernel 4.20.3-arch1-1-ARCH and have two hard drives in a RAID 1 array with the filesystem encrypted with LUKS.



Running ps, I see that the following processes are commonly in uninterruptible sleep during these IO spikes:



md125_raid1
dmcrypt_write/2
jbd2/dm-1-8
kworker/u16:2+flush-253:1


The output from iostat during a spike:



Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 1.67 0.00 0.00 2.00 0.00 0.00
sdd 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 2.67 0.00 0.00 2.00 0.00 0.00
md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md126 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00
md125 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md124 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 26.00 0.01 0.00 4.00 26.00 1.30
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 51.00 0.00 0.00 0.00 100.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00




  • dm-1 always hits 100% utilization.

  • There is no relevant info in the kernel log.

  • Both disks are ~6 months old and pass a SMART self-test.


I'm not really sure where to go from here. It doesn't seem like it's any particular program that's causing this, but rather something in the kernel or the RAID/LUKS code. Is there anything else I can do to further debug what is causing this problem?










share|improve this question


















  • 1





    Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with echo 3 > /proc/sys/vm/drop_caches unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.

    – dirkt
    Jan 23 at 6:06











  • @dirkt Awesome, that unblocks it for me too. I have to keep a window open running /bin/sh as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!

    – shanet
    Jan 23 at 19:29











  • If you find out the actual root problem of this, I'd be very much interested in it.

    – dirkt
    Jan 24 at 7:02














1












1








1








Starting last week, I've been having a problem where various processes will go into the uninterruptible sleep state for about 5-10 minutes at a time and then unblock themselves like nothing ever happened. It might happen a few times an hour or only a few times a day.



I'm running Arch with kernel 4.20.3-arch1-1-ARCH and have two hard drives in a RAID 1 array with the filesystem encrypted with LUKS.



Running ps, I see that the following processes are commonly in uninterruptible sleep during these IO spikes:



md125_raid1
dmcrypt_write/2
jbd2/dm-1-8
kworker/u16:2+flush-253:1


The output from iostat during a spike:



Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 1.67 0.00 0.00 2.00 0.00 0.00
sdd 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 2.67 0.00 0.00 2.00 0.00 0.00
md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md126 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00
md125 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md124 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 26.00 0.01 0.00 4.00 26.00 1.30
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 51.00 0.00 0.00 0.00 100.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00




  • dm-1 always hits 100% utilization.

  • There is no relevant info in the kernel log.

  • Both disks are ~6 months old and pass a SMART self-test.


I'm not really sure where to go from here. It doesn't seem like it's any particular program that's causing this, but rather something in the kernel or the RAID/LUKS code. Is there anything else I can do to further debug what is causing this problem?










share|improve this question














Starting last week, I've been having a problem where various processes will go into the uninterruptible sleep state for about 5-10 minutes at a time and then unblock themselves like nothing ever happened. It might happen a few times an hour or only a few times a day.



I'm running Arch with kernel 4.20.3-arch1-1-ARCH and have two hard drives in a RAID 1 array with the filesystem encrypted with LUKS.



Running ps, I see that the following processes are commonly in uninterruptible sleep during these IO spikes:



md125_raid1
dmcrypt_write/2
jbd2/dm-1-8
kworker/u16:2+flush-253:1


The output from iostat during a spike:



Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 1.67 0.00 0.00 2.00 0.00 0.00
sdd 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 2.67 0.00 0.00 2.00 0.00 0.00
md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md126 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00
md125 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md124 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 26.00 0.01 0.00 4.00 26.00 1.30
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 51.00 0.00 0.00 0.00 100.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00




  • dm-1 always hits 100% utilization.

  • There is no relevant info in the kernel log.

  • Both disks are ~6 months old and pass a SMART self-test.


I'm not really sure where to go from here. It doesn't seem like it's any particular program that's causing this, but rather something in the kernel or the RAID/LUKS code. Is there anything else I can do to further debug what is causing this problem?







linux hard-drive raid arch-linux luks






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 23 at 5:11









shanetshanet

1571311




1571311








  • 1





    Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with echo 3 > /proc/sys/vm/drop_caches unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.

    – dirkt
    Jan 23 at 6:06











  • @dirkt Awesome, that unblocks it for me too. I have to keep a window open running /bin/sh as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!

    – shanet
    Jan 23 at 19:29











  • If you find out the actual root problem of this, I'd be very much interested in it.

    – dirkt
    Jan 24 at 7:02














  • 1





    Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with echo 3 > /proc/sys/vm/drop_caches unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.

    – dirkt
    Jan 23 at 6:06











  • @dirkt Awesome, that unblocks it for me too. I have to keep a window open running /bin/sh as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!

    – shanet
    Jan 23 at 19:29











  • If you find out the actual root problem of this, I'd be very much interested in it.

    – dirkt
    Jan 24 at 7:02








1




1





Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with echo 3 > /proc/sys/vm/drop_caches unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.

– dirkt
Jan 23 at 6:06





Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with echo 3 > /proc/sys/vm/drop_caches unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.

– dirkt
Jan 23 at 6:06













@dirkt Awesome, that unblocks it for me too. I have to keep a window open running /bin/sh as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!

– shanet
Jan 23 at 19:29





@dirkt Awesome, that unblocks it for me too. I have to keep a window open running /bin/sh as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!

– shanet
Jan 23 at 19:29













If you find out the actual root problem of this, I'd be very much interested in it.

– dirkt
Jan 24 at 7:02





If you find out the actual root problem of this, I'd be very much interested in it.

– dirkt
Jan 24 at 7:02










0






active

oldest

votes











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1397287%2frandom-uninterruptible-sleep-processes-io-spikes%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Super User!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1397287%2frandom-uninterruptible-sleep-processes-io-spikes%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

Origin of the phrase “under your belt”?