Random uninterruptible sleep processes / IO spikes

Starting last week, I've been having a problem where various processes will go into the uninterruptible sleep state for about 5-10 minutes at a time and then unblock themselves like nothing ever happened. It might happen a few times an hour or only a few times a day.

I'm running Arch with kernel 4.20.3-arch1-1-ARCH and have two hard drives in a RAID 1 array with the filesystem encrypted with LUKS.

Running ps, I see that the following processes are commonly in uninterruptible sleep during these IO spikes:

md125_raid1

dmcrypt_write/2

jbd2/dm-1-8

kworker/u16:2+flush-253:1

The output from iostat during a spike:

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util

sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sdc              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sdb              0.00    1.50      0.00      3.00     0.00     0.00   0.00   0.00    0.00    1.67   0.00     0.00     2.00   0.00   0.00

sdd              0.00    1.50      0.00      3.00     0.00     0.00   0.00   0.00    0.00    2.67   0.00     0.00     2.00   0.00   0.00

md127            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

md126            0.00    0.50      0.00      2.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     4.00   0.00   0.00

md125            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sde              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sdf              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

md124            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

dm-0             0.00    0.50      0.00      2.00     0.00     0.00   0.00   0.00    0.00   26.00   0.01     0.00     4.00  26.00   1.30

dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00  51.00     0.00     0.00   0.00 100.00

dm-2             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

dm-1 always hits 100% utilization.

There is no relevant info in the kernel log.

Both disks are ~6 months old and pass a SMART self-test.

I'm not really sure where to go from here. It doesn't seem like it's any particular program that's causing this, but rather something in the kernel or the RAID/LUKS code. Is there anything else I can do to further debug what is causing this problem?

asked Jan 23 at 5:11

shanet

1571311

1

Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with echo 3 > /proc/sys/vm/drop_caches unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.

– dirkt
Jan 23 at 6:06

@dirkt Awesome, that unblocks it for me too. I have to keep a window open running /bin/sh as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!

– shanet
Jan 23 at 19:29

If you find out the actual root problem of this, I'd be very much interested in it.

– dirkt
Jan 24 at 7:02

add a comment |

I'm running Arch with kernel 4.20.3-arch1-1-ARCH and have two hard drives in a RAID 1 array with the filesystem encrypted with LUKS.

Running ps, I see that the following processes are commonly in uninterruptible sleep during these IO spikes:

md125_raid1

dmcrypt_write/2

jbd2/dm-1-8

kworker/u16:2+flush-253:1

The output from iostat during a spike:

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util

sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sdc              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sdb              0.00    1.50      0.00      3.00     0.00     0.00   0.00   0.00    0.00    1.67   0.00     0.00     2.00   0.00   0.00

sdd              0.00    1.50      0.00      3.00     0.00     0.00   0.00   0.00    0.00    2.67   0.00     0.00     2.00   0.00   0.00

md127            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

md126            0.00    0.50      0.00      2.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     4.00   0.00   0.00

md125            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sde              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sdf              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

md124            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

dm-0             0.00    0.50      0.00      2.00     0.00     0.00   0.00   0.00    0.00   26.00   0.01     0.00     4.00  26.00   1.30

dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00  51.00     0.00     0.00   0.00 100.00

dm-2             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

dm-1 always hits 100% utilization.

There is no relevant info in the kernel log.

Both disks are ~6 months old and pass a SMART self-test.

asked Jan 23 at 5:11

shanet

1571311

1

Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with echo 3 > /proc/sys/vm/drop_caches unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.

– dirkt
Jan 23 at 6:06

@dirkt Awesome, that unblocks it for me too. I have to keep a window open running /bin/sh as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!

– shanet
Jan 23 at 19:29

If you find out the actual root problem of this, I'd be very much interested in it.

– dirkt
Jan 24 at 7:02

add a comment |

I'm running Arch with kernel 4.20.3-arch1-1-ARCH and have two hard drives in a RAID 1 array with the filesystem encrypted with LUKS.

Running ps, I see that the following processes are commonly in uninterruptible sleep during these IO spikes:

md125_raid1

dmcrypt_write/2

jbd2/dm-1-8

kworker/u16:2+flush-253:1

The output from iostat during a spike:

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util

sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sdc              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sdb              0.00    1.50      0.00      3.00     0.00     0.00   0.00   0.00    0.00    1.67   0.00     0.00     2.00   0.00   0.00

sdd              0.00    1.50      0.00      3.00     0.00     0.00   0.00   0.00    0.00    2.67   0.00     0.00     2.00   0.00   0.00

md127            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

md126            0.00    0.50      0.00      2.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     4.00   0.00   0.00

md125            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sde              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sdf              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

md124            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

dm-0             0.00    0.50      0.00      2.00     0.00     0.00   0.00   0.00    0.00   26.00   0.01     0.00     4.00  26.00   1.30

dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00  51.00     0.00     0.00   0.00 100.00

dm-2             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

dm-1 always hits 100% utilization.

There is no relevant info in the kernel log.

Both disks are ~6 months old and pass a SMART self-test.

asked Jan 23 at 5:11

shanet

1571311

I'm running Arch with kernel 4.20.3-arch1-1-ARCH and have two hard drives in a RAID 1 array with the filesystem encrypted with LUKS.

Running ps, I see that the following processes are commonly in uninterruptible sleep during these IO spikes:

md125_raid1

dmcrypt_write/2

jbd2/dm-1-8

kworker/u16:2+flush-253:1

The output from iostat during a spike:

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util

sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sdc              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sdb              0.00    1.50      0.00      3.00     0.00     0.00   0.00   0.00    0.00    1.67   0.00     0.00     2.00   0.00   0.00

sdd              0.00    1.50      0.00      3.00     0.00     0.00   0.00   0.00    0.00    2.67   0.00     0.00     2.00   0.00   0.00

md127            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

md126            0.00    0.50      0.00      2.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     4.00   0.00   0.00

md125            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sde              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

sdf              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

md124            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

dm-0             0.00    0.50      0.00      2.00     0.00     0.00   0.00   0.00    0.00   26.00   0.01     0.00     4.00  26.00   1.30

dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00  51.00     0.00     0.00   0.00 100.00

dm-2             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

dm-1 always hits 100% utilization.

There is no relevant info in the kernel log.

Both disks are ~6 months old and pass a SMART self-test.

linux hard-drive raid arch-linux luks

asked Jan 23 at 5:11

shanet

1571311

asked Jan 23 at 5:11

shanet

1571311

asked Jan 23 at 5:11

shanet

1571311

asked Jan 23 at 5:11

shanet

1571311

asked Jan 23 at 5:11

shanet

1571311

1

Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with echo 3 > /proc/sys/vm/drop_caches unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.

– dirkt
Jan 23 at 6:06

@dirkt Awesome, that unblocks it for me too. I have to keep a window open running /bin/sh as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!

– shanet
Jan 23 at 19:29

If you find out the actual root problem of this, I'd be very much interested in it.

– dirkt
Jan 24 at 7:02

add a comment |

1

Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with echo 3 > /proc/sys/vm/drop_caches unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.

– dirkt
Jan 23 at 6:06

@dirkt Awesome, that unblocks it for me too. I have to keep a window open running /bin/sh as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!

– shanet
Jan 23 at 19:29

If you find out the actual root problem of this, I'd be very much interested in it.

– dirkt
Jan 24 at 7:02

Not sure if this is related: I've had a similar problem for quite some time now (though no RAID). Dropping caches with echo 3 > /proc/sys/vm/drop_caches unblocks the processes and improves I/O throughput, though they may block again after some time. I haven't found out the reason for this.

– dirkt
Jan 23 at 6:06

@dirkt Awesome, that unblocks it for me too. I have to keep a window open running /bin/sh as root to run it because otherwise zsh will block on IO when trying to run a command. It doesn't solve the problem, but at least it can unblock it so the system is usable without waiting 10 minutes. Thank you!

– shanet
Jan 23 at 19:29

If you find out the actual root problem of this, I'd be very much interested in it.

– dirkt
Jan 24 at 7:02

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1397287%2frandom-uninterruptible-sleep-processes-io-spikes%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Super User!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr