Random uninterruptible sleep processes / IO spikes
Starting last week, I've been having a problem where various processes will go into the uninterruptible sleep state for about 5-10 minutes at a time and then unblock themselves like nothing ever happened. It might happen a few times an hour or only a few times a day.
I'm running Arch with kernel 4.20.3-arch1-1-ARCH and have two hard drives in a RAID 1 array with the filesystem encrypted with LUKS.
Running ps
, I see that the following processes are commonly in uninterruptible sleep during these IO spikes:
md125_raid1
dmcrypt_write/2
jbd2/dm-1-8
kworker/u16:2+flush-253:1
The output from iostat
during a spike:
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 1.67 0.00 0.00 2.00 0.00 0.00
sdd 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 2.67 0.00 0.00 2.00 0.00 0.00
md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md126 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00
md125 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md124 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 26.00 0.01 0.00 4.00 26.00 1.30
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 51.00 0.00 0.00 0.00 100.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1
always hits 100% utilization.- There is no relevant info in the kernel log.
- Both disks are ~6 months old and pass a SMART self-test.
I'm not really sure where to go from here. It doesn't seem like it's any particular program that's causing this, but rather something in the kernel or the RAID/LUKS code. Is there anything else I can do to further debug what is causing this problem?
linux hard-drive raid arch-linux luks
add a comment |
Starting last week, I've been having a problem where various processes will go into the uninterruptible sleep state for about 5-10 minutes at a time and then unblock themselves like nothing ever happened. It might happen a few times an hour or only a few times a day.
I'm running Arch with kernel 4.20.3-arch1-1-ARCH and have two hard drives in a RAID 1 array with the filesystem encrypted with LUKS.
Running ps
, I see that the following processes are commonly in uninterruptible sleep during these IO spikes:
md125_raid1
dmcrypt_write/2
jbd2/dm-1-8
kworker/u16:2+flush-253:1
The output from iostat
during a spike:
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 1.67 0.00 0.00 2.00 0.00 0.00
sdd 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 2.67 0.00 0.00 2.00 0.00 0.00
md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md126 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00
md125 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md124 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 26.00 0.01 0.00 4.00 26.00 1.30
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 51.00 0.00 0.00 0.00 100.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1
always hits 100% utilization.- There is no relevant info in the kernel log.
- Both disks are ~6 months old and pass a SMART self-test.
I'm not really sure where to go from here. It doesn't seem like it's any particular program that's causing this, but rather something in the kernel or the RAID/LUKS code. Is there anything else I can do to further debug what is causing this problem?
linux hard-drive raid arch-linux luks
1
Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches withecho 3 > /proc/sys/vm/drop_caches
unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.
– dirkt
Jan 23 at 6:06
@dirkt Awesome, that unblocks it for me too. I have to keep a window open running/bin/sh
as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!
– shanet
Jan 23 at 19:29
If you find out the actual root problem of this, I'd be very much interested in it.
– dirkt
Jan 24 at 7:02
add a comment |
Starting last week, I've been having a problem where various processes will go into the uninterruptible sleep state for about 5-10 minutes at a time and then unblock themselves like nothing ever happened. It might happen a few times an hour or only a few times a day.
I'm running Arch with kernel 4.20.3-arch1-1-ARCH and have two hard drives in a RAID 1 array with the filesystem encrypted with LUKS.
Running ps
, I see that the following processes are commonly in uninterruptible sleep during these IO spikes:
md125_raid1
dmcrypt_write/2
jbd2/dm-1-8
kworker/u16:2+flush-253:1
The output from iostat
during a spike:
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 1.67 0.00 0.00 2.00 0.00 0.00
sdd 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 2.67 0.00 0.00 2.00 0.00 0.00
md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md126 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00
md125 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md124 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 26.00 0.01 0.00 4.00 26.00 1.30
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 51.00 0.00 0.00 0.00 100.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1
always hits 100% utilization.- There is no relevant info in the kernel log.
- Both disks are ~6 months old and pass a SMART self-test.
I'm not really sure where to go from here. It doesn't seem like it's any particular program that's causing this, but rather something in the kernel or the RAID/LUKS code. Is there anything else I can do to further debug what is causing this problem?
linux hard-drive raid arch-linux luks
Starting last week, I've been having a problem where various processes will go into the uninterruptible sleep state for about 5-10 minutes at a time and then unblock themselves like nothing ever happened. It might happen a few times an hour or only a few times a day.
I'm running Arch with kernel 4.20.3-arch1-1-ARCH and have two hard drives in a RAID 1 array with the filesystem encrypted with LUKS.
Running ps
, I see that the following processes are commonly in uninterruptible sleep during these IO spikes:
md125_raid1
dmcrypt_write/2
jbd2/dm-1-8
kworker/u16:2+flush-253:1
The output from iostat
during a spike:
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 1.67 0.00 0.00 2.00 0.00 0.00
sdd 0.00 1.50 0.00 3.00 0.00 0.00 0.00 0.00 0.00 2.67 0.00 0.00 2.00 0.00 0.00
md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md126 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00
md125 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md124 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 26.00 0.01 0.00 4.00 26.00 1.30
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 51.00 0.00 0.00 0.00 100.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1
always hits 100% utilization.- There is no relevant info in the kernel log.
- Both disks are ~6 months old and pass a SMART self-test.
I'm not really sure where to go from here. It doesn't seem like it's any particular program that's causing this, but rather something in the kernel or the RAID/LUKS code. Is there anything else I can do to further debug what is causing this problem?
linux hard-drive raid arch-linux luks
linux hard-drive raid arch-linux luks
asked Jan 23 at 5:11
shanetshanet
1571311
1571311
1
Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches withecho 3 > /proc/sys/vm/drop_caches
unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.
– dirkt
Jan 23 at 6:06
@dirkt Awesome, that unblocks it for me too. I have to keep a window open running/bin/sh
as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!
– shanet
Jan 23 at 19:29
If you find out the actual root problem of this, I'd be very much interested in it.
– dirkt
Jan 24 at 7:02
add a comment |
1
Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches withecho 3 > /proc/sys/vm/drop_caches
unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.
– dirkt
Jan 23 at 6:06
@dirkt Awesome, that unblocks it for me too. I have to keep a window open running/bin/sh
as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!
– shanet
Jan 23 at 19:29
If you find out the actual root problem of this, I'd be very much interested in it.
– dirkt
Jan 24 at 7:02
1
1
Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with
echo 3 > /proc/sys/vm/drop_caches
unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.– dirkt
Jan 23 at 6:06
Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with
echo 3 > /proc/sys/vm/drop_caches
unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.– dirkt
Jan 23 at 6:06
@dirkt Awesome, that unblocks it for me too. I have to keep a window open running
/bin/sh
as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!– shanet
Jan 23 at 19:29
@dirkt Awesome, that unblocks it for me too. I have to keep a window open running
/bin/sh
as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!– shanet
Jan 23 at 19:29
If you find out the actual root problem of this, I'd be very much interested in it.
– dirkt
Jan 24 at 7:02
If you find out the actual root problem of this, I'd be very much interested in it.
– dirkt
Jan 24 at 7:02
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1397287%2frandom-uninterruptible-sleep-processes-io-spikes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1397287%2frandom-uninterruptible-sleep-processes-io-spikes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with
echo 3 > /proc/sys/vm/drop_caches
unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.– dirkt
Jan 23 at 6:06
@dirkt Awesome, that unblocks it for me too. I have to keep a window open running
/bin/sh
as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!– shanet
Jan 23 at 19:29
If you find out the actual root problem of this, I'd be very much interested in it.
– dirkt
Jan 24 at 7:02