Block storage rewrites

Summary

In a block storage system, new data is written in blocks. We are going to represent the flash memory as one sequential array. We have a list of block writes coming in the form of arrays of size 2: writes[i] = [first_block_written, last_block_written].

Each block has a rewrite limit. If rewrites on a block reach a certain specified threshold we should run a special diagnostic.

Given blockCount (an integer representing the total number of blocks), writes (the list of block-write arrays of size 2), and threshold, your task is to return the list of disjoint block segments, each consisting of blocks that have reached the rewrite threshold. The list of block segments should be sorted in increasing order by their left ends.

Examples

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 2, the output should be blockStorageRewrites(blockCount, writes, threshold) = [[2, 5]].

After the first write, the blocks 0, 1, 2, 3 and 4 were written in once;
After the second write, the blocks 0, 1, 2 and 5 were written in once, and the blocks 3 and 4 reached the rewrite threshold;
After the final write, the blocks 2 and 5 reached the rewrite threshold as well, so the blocks that should be diagnosed are 2, 3, 4 and 5.
Blocks 2, 3, 4 and 5 form one consequent segment [2, 5].

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 3, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[3, 4]]

For blockCount = 10, writes = [[3, 4], [0, 1], [6, 6]], and threshold = 1, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[0, 1], [3, 4], [6, 6]]

Constraints

1 ≤ blockCount ≤ 10**5

0 ≤ writes.length ≤ 10**5

writes[i].length = 2

0 ≤ writes[i][0] ≤ writes[i][1] < blockCount

First try

from itertools import groupby



def group_blocks(num_writes, threshold):

    i = 0

    for key, group in groupby(num_writes, lambda x : x >= threshold):

        # This is faster compared to len(list(g))

        length = sum(1 for _ in group)

        if key:

            yield [i, i + length - 1]

        i += length



def blockStorageRewrites(blockCount, writes, threshold):

    num_writes = [0] * blockCount

    for lower, upper in writes:

        for i in range(lower, upper + 1):

            num_writes[i] += 1

    return list(group_blocks(num_writes, threshold))

Here I just do exactly what is asked, I create an array of blockCount size, loop over the writes and lastly group the consecutive ranges with itertoos.groupby

After trying to optimize

I'm not quite sure what the complexity was, but I tried to lessen the load, yet I still didn't pass the TLE constraints

def get_bad_blocks(blockCount, writes, threshold):

    cons = False

    u = l = -1

    for i in range(blockCount):

        count = 0

        for lower, upper in writes:

            if lower <= i <= upper:

                count += 1

            if count >= threshold:

                if not cons:

                    cons = True

                    l = i

                    u = -1

                break

        else:

            u = i - 1

            cons = False



        if u != -1 and l != -1:

            yield [l, u]

            l = -1

    if cons:

        yield [l, i]





def blockStorageRewrites(blockCount, writes, threshold):    

    return list(get_bad_blocks(blockCount, writes, threshold))

Questions

You can review any and all, but preferably I'm looking for answers that:

Tell me if my first example is readable

I'm less concerned about readability in my second try, and more interested in speed

Please ignore the PEP8 naming violations as this is an issue with the programming challenge site

edited 12 hours ago

asked 13 hours ago

Ludisposed

8,60722164

1

$begingroup$
I don't think it's possible to give an "answer" addressing the performance issue — your current algorithm is unsalvageable. Consider what would happen in a realistic scenario where blockCount=1000000. The very first thing you'd do is loop for i in range(1000000)! That cannot possibly be part of a valid solution. You need to come up with an algorithm that doesn't take O(blockCount) time.
$endgroup$
– Quuxplusone
13 hours ago

$begingroup$
The comment "should be faster compared to len(list(g))" makes me think you should be doing some performance tests :)
$endgroup$
– Peilonrayz
13 hours ago

$begingroup$
@Quuxplusone probably, but I fail to see how this can be done without looping over the entire blockcounts... hence the asking for a review.
$endgroup$
– Ludisposed
13 hours ago

$begingroup$
@Peilonrayz I have tested that locally, it should be faster because it doesn't have to construct the list as far as I can tell
$endgroup$
– Ludisposed
13 hours ago

2

$begingroup$
I don't have time to check now, but my guess is that you would want to use a (default)dict with only those blocks that are encountered in writes. If it is a starting block, add +1, if it's the end block, subtract 1. Then go over the values with itertool.accumulate to get the numbers of writes between blocks.
$endgroup$
– Georgy
12 hours ago

|
show 4 more comments

Summary

In a block storage system, new data is written in blocks. We are going to represent the flash memory as one sequential array. We have a list of block writes coming in the form of arrays of size 2: writes[i] = [first_block_written, last_block_written].

Each block has a rewrite limit. If rewrites on a block reach a certain specified threshold we should run a special diagnostic.

Given blockCount (an integer representing the total number of blocks), writes (the list of block-write arrays of size 2), and threshold, your task is to return the list of disjoint block segments, each consisting of blocks that have reached the rewrite threshold. The list of block segments should be sorted in increasing order by their left ends.

Examples

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 2, the output should be blockStorageRewrites(blockCount, writes, threshold) = [[2, 5]].

After the first write, the blocks 0, 1, 2, 3 and 4 were written in once;
After the second write, the blocks 0, 1, 2 and 5 were written in once, and the blocks 3 and 4 reached the rewrite threshold;
After the final write, the blocks 2 and 5 reached the rewrite threshold as well, so the blocks that should be diagnosed are 2, 3, 4 and 5.
Blocks 2, 3, 4 and 5 form one consequent segment [2, 5].

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 3, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[3, 4]]

For blockCount = 10, writes = [[3, 4], [0, 1], [6, 6]], and threshold = 1, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[0, 1], [3, 4], [6, 6]]

Constraints

1 ≤ blockCount ≤ 10**5

0 ≤ writes.length ≤ 10**5

writes[i].length = 2

0 ≤ writes[i][0] ≤ writes[i][1] < blockCount

First try

from itertools import groupby



def group_blocks(num_writes, threshold):

    i = 0

    for key, group in groupby(num_writes, lambda x : x >= threshold):

        # This is faster compared to len(list(g))

        length = sum(1 for _ in group)

        if key:

            yield [i, i + length - 1]

        i += length



def blockStorageRewrites(blockCount, writes, threshold):

    num_writes = [0] * blockCount

    for lower, upper in writes:

        for i in range(lower, upper + 1):

            num_writes[i] += 1

    return list(group_blocks(num_writes, threshold))

Here I just do exactly what is asked, I create an array of blockCount size, loop over the writes and lastly group the consecutive ranges with itertoos.groupby

After trying to optimize

I'm not quite sure what the complexity was, but I tried to lessen the load, yet I still didn't pass the TLE constraints

def get_bad_blocks(blockCount, writes, threshold):

    cons = False

    u = l = -1

    for i in range(blockCount):

        count = 0

        for lower, upper in writes:

            if lower <= i <= upper:

                count += 1

            if count >= threshold:

                if not cons:

                    cons = True

                    l = i

                    u = -1

                break

        else:

            u = i - 1

            cons = False



        if u != -1 and l != -1:

            yield [l, u]

            l = -1

    if cons:

        yield [l, i]





def blockStorageRewrites(blockCount, writes, threshold):    

    return list(get_bad_blocks(blockCount, writes, threshold))

Questions

You can review any and all, but preferably I'm looking for answers that:

Tell me if my first example is readable

I'm less concerned about readability in my second try, and more interested in speed

Please ignore the PEP8 naming violations as this is an issue with the programming challenge site

edited 12 hours ago

asked 13 hours ago

Ludisposed

8,60722164

1

$begingroup$
I don't think it's possible to give an "answer" addressing the performance issue — your current algorithm is unsalvageable. Consider what would happen in a realistic scenario where blockCount=1000000. The very first thing you'd do is loop for i in range(1000000)! That cannot possibly be part of a valid solution. You need to come up with an algorithm that doesn't take O(blockCount) time.
$endgroup$
– Quuxplusone
13 hours ago

$begingroup$
The comment "should be faster compared to len(list(g))" makes me think you should be doing some performance tests :)
$endgroup$
– Peilonrayz
13 hours ago

$begingroup$
@Quuxplusone probably, but I fail to see how this can be done without looping over the entire blockcounts... hence the asking for a review.
$endgroup$
– Ludisposed
13 hours ago

$begingroup$
@Peilonrayz I have tested that locally, it should be faster because it doesn't have to construct the list as far as I can tell
$endgroup$
– Ludisposed
13 hours ago

2

$begingroup$
I don't have time to check now, but my guess is that you would want to use a (default)dict with only those blocks that are encountered in writes. If it is a starting block, add +1, if it's the end block, subtract 1. Then go over the values with itertool.accumulate to get the numbers of writes between blocks.
$endgroup$
– Georgy
12 hours ago

|
show 4 more comments

Summary

In a block storage system, new data is written in blocks. We are going to represent the flash memory as one sequential array. We have a list of block writes coming in the form of arrays of size 2: writes[i] = [first_block_written, last_block_written].

Each block has a rewrite limit. If rewrites on a block reach a certain specified threshold we should run a special diagnostic.

Given blockCount (an integer representing the total number of blocks), writes (the list of block-write arrays of size 2), and threshold, your task is to return the list of disjoint block segments, each consisting of blocks that have reached the rewrite threshold. The list of block segments should be sorted in increasing order by their left ends.

Examples

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 2, the output should be blockStorageRewrites(blockCount, writes, threshold) = [[2, 5]].

After the first write, the blocks 0, 1, 2, 3 and 4 were written in once;
After the second write, the blocks 0, 1, 2 and 5 were written in once, and the blocks 3 and 4 reached the rewrite threshold;
After the final write, the blocks 2 and 5 reached the rewrite threshold as well, so the blocks that should be diagnosed are 2, 3, 4 and 5.
Blocks 2, 3, 4 and 5 form one consequent segment [2, 5].

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 3, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[3, 4]]

For blockCount = 10, writes = [[3, 4], [0, 1], [6, 6]], and threshold = 1, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[0, 1], [3, 4], [6, 6]]

Constraints

1 ≤ blockCount ≤ 10**5

0 ≤ writes.length ≤ 10**5

writes[i].length = 2

0 ≤ writes[i][0] ≤ writes[i][1] < blockCount

First try

from itertools import groupby



def group_blocks(num_writes, threshold):

    i = 0

    for key, group in groupby(num_writes, lambda x : x >= threshold):

        # This is faster compared to len(list(g))

        length = sum(1 for _ in group)

        if key:

            yield [i, i + length - 1]

        i += length



def blockStorageRewrites(blockCount, writes, threshold):

    num_writes = [0] * blockCount

    for lower, upper in writes:

        for i in range(lower, upper + 1):

            num_writes[i] += 1

    return list(group_blocks(num_writes, threshold))

Here I just do exactly what is asked, I create an array of blockCount size, loop over the writes and lastly group the consecutive ranges with itertoos.groupby

After trying to optimize

I'm not quite sure what the complexity was, but I tried to lessen the load, yet I still didn't pass the TLE constraints

def get_bad_blocks(blockCount, writes, threshold):

    cons = False

    u = l = -1

    for i in range(blockCount):

        count = 0

        for lower, upper in writes:

            if lower <= i <= upper:

                count += 1

            if count >= threshold:

                if not cons:

                    cons = True

                    l = i

                    u = -1

                break

        else:

            u = i - 1

            cons = False



        if u != -1 and l != -1:

            yield [l, u]

            l = -1

    if cons:

        yield [l, i]





def blockStorageRewrites(blockCount, writes, threshold):    

    return list(get_bad_blocks(blockCount, writes, threshold))

Questions

You can review any and all, but preferably I'm looking for answers that:

Tell me if my first example is readable

I'm less concerned about readability in my second try, and more interested in speed

Please ignore the PEP8 naming violations as this is an issue with the programming challenge site

edited 12 hours ago

asked 13 hours ago

Ludisposed

8,60722164

Summary

In a block storage system, new data is written in blocks. We are going to represent the flash memory as one sequential array. We have a list of block writes coming in the form of arrays of size 2: writes[i] = [first_block_written, last_block_written].

Each block has a rewrite limit. If rewrites on a block reach a certain specified threshold we should run a special diagnostic.

Given blockCount (an integer representing the total number of blocks), writes (the list of block-write arrays of size 2), and threshold, your task is to return the list of disjoint block segments, each consisting of blocks that have reached the rewrite threshold. The list of block segments should be sorted in increasing order by their left ends.

Examples

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 2, the output should be blockStorageRewrites(blockCount, writes, threshold) = [[2, 5]].

After the first write, the blocks 0, 1, 2, 3 and 4 were written in once;
After the second write, the blocks 0, 1, 2 and 5 were written in once, and the blocks 3 and 4 reached the rewrite threshold;
After the final write, the blocks 2 and 5 reached the rewrite threshold as well, so the blocks that should be diagnosed are 2, 3, 4 and 5.
Blocks 2, 3, 4 and 5 form one consequent segment [2, 5].

For blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]], and threshold = 3, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[3, 4]]

For blockCount = 10, writes = [[3, 4], [0, 1], [6, 6]], and threshold = 1, the output should be
blockStorageRewrites(blockCount, writes, threshold) = [[0, 1], [3, 4], [6, 6]]

Constraints

1 ≤ blockCount ≤ 10**5

0 ≤ writes.length ≤ 10**5

writes[i].length = 2

0 ≤ writes[i][0] ≤ writes[i][1] < blockCount

First try

from itertools import groupby



def group_blocks(num_writes, threshold):

    i = 0

    for key, group in groupby(num_writes, lambda x : x >= threshold):

        # This is faster compared to len(list(g))

        length = sum(1 for _ in group)

        if key:

            yield [i, i + length - 1]

        i += length



def blockStorageRewrites(blockCount, writes, threshold):

    num_writes = [0] * blockCount

    for lower, upper in writes:

        for i in range(lower, upper + 1):

            num_writes[i] += 1

    return list(group_blocks(num_writes, threshold))

Here I just do exactly what is asked, I create an array of blockCount size, loop over the writes and lastly group the consecutive ranges with itertoos.groupby

After trying to optimize

I'm not quite sure what the complexity was, but I tried to lessen the load, yet I still didn't pass the TLE constraints

def get_bad_blocks(blockCount, writes, threshold):

    cons = False

    u = l = -1

    for i in range(blockCount):

        count = 0

        for lower, upper in writes:

            if lower <= i <= upper:

                count += 1

            if count >= threshold:

                if not cons:

                    cons = True

                    l = i

                    u = -1

                break

        else:

            u = i - 1

            cons = False



        if u != -1 and l != -1:

            yield [l, u]

            l = -1

    if cons:

        yield [l, i]





def blockStorageRewrites(blockCount, writes, threshold):    

    return list(get_bad_blocks(blockCount, writes, threshold))

Questions

You can review any and all, but preferably I'm looking for answers that:

Tell me if my first example is readable

I'm less concerned about readability in my second try, and more interested in speed

Please ignore the PEP8 naming violations as this is an issue with the programming challenge site

python python-3.x programming-challenge time-limit-exceeded

edited 12 hours ago

asked 13 hours ago

Ludisposed

8,60722164

edited 12 hours ago

asked 13 hours ago

Ludisposed

8,60722164

edited 12 hours ago

asked 13 hours ago

Ludisposed

8,60722164

asked 13 hours ago

Ludisposed

8,60722164

asked 13 hours ago

Ludisposed

8,60722164

1

$begingroup$
I don't think it's possible to give an "answer" addressing the performance issue — your current algorithm is unsalvageable. Consider what would happen in a realistic scenario where blockCount=1000000. The very first thing you'd do is loop for i in range(1000000)! That cannot possibly be part of a valid solution. You need to come up with an algorithm that doesn't take O(blockCount) time.
$endgroup$
– Quuxplusone
13 hours ago

$begingroup$
The comment "should be faster compared to len(list(g))" makes me think you should be doing some performance tests :)
$endgroup$
– Peilonrayz
13 hours ago

$begingroup$
@Quuxplusone probably, but I fail to see how this can be done without looping over the entire blockcounts... hence the asking for a review.
$endgroup$
– Ludisposed
13 hours ago

$begingroup$
@Peilonrayz I have tested that locally, it should be faster because it doesn't have to construct the list as far as I can tell
$endgroup$
– Ludisposed
13 hours ago

2

$begingroup$
I don't have time to check now, but my guess is that you would want to use a (default)dict with only those blocks that are encountered in writes. If it is a starting block, add +1, if it's the end block, subtract 1. Then go over the values with itertool.accumulate to get the numbers of writes between blocks.
$endgroup$
– Georgy
12 hours ago

|
show 4 more comments

1

$begingroup$
I don't think it's possible to give an "answer" addressing the performance issue — your current algorithm is unsalvageable. Consider what would happen in a realistic scenario where blockCount=1000000. The very first thing you'd do is loop for i in range(1000000)! That cannot possibly be part of a valid solution. You need to come up with an algorithm that doesn't take O(blockCount) time.
$endgroup$
– Quuxplusone
13 hours ago

$begingroup$
The comment "should be faster compared to len(list(g))" makes me think you should be doing some performance tests :)
$endgroup$
– Peilonrayz
13 hours ago

$begingroup$
@Quuxplusone probably, but I fail to see how this can be done without looping over the entire blockcounts... hence the asking for a review.
$endgroup$
– Ludisposed
13 hours ago

$begingroup$
@Peilonrayz I have tested that locally, it should be faster because it doesn't have to construct the list as far as I can tell
$endgroup$
– Ludisposed
13 hours ago

2

$begingroup$
I don't have time to check now, but my guess is that you would want to use a (default)dict with only those blocks that are encountered in writes. If it is a starting block, add +1, if it's the end block, subtract 1. Then go over the values with itertool.accumulate to get the numbers of writes between blocks.
$endgroup$
– Georgy
12 hours ago

I don't think it's possible to give an "answer" addressing the performance issue — your current algorithm is unsalvageable. Consider what would happen in a realistic scenario where blockCount=1000000. The very first thing you'd do is loop for i in range(1000000)! That cannot possibly be part of a valid solution. You need to come up with an algorithm that doesn't take O(blockCount) time.

– Quuxplusone
13 hours ago

The comment "should be faster compared to len(list(g))" makes me think you should be doing some performance tests :)

– Peilonrayz
13 hours ago

@Quuxplusone probably, but I fail to see how this can be done without looping over the entire blockcounts... hence the asking for a review.

– Ludisposed
13 hours ago

@Peilonrayz I have tested that locally, it should be faster because it doesn't have to construct the list as far as I can tell

– Ludisposed
13 hours ago

I don't have time to check now, but my guess is that you would want to use a (default)dict with only those blocks that are encountered in writes. If it is a starting block, add +1, if it's the end block, subtract 1. Then go over the values with itertool.accumulate to get the numbers of writes between blocks.

– Georgy
12 hours ago

|
show 4 more comments

1 Answer
1

active

oldest

votes

First off when you're optimizing code you need to know what to optimize. At first I thought the problem code was not the groupby, but instead the creation of num_writes. And so I changed your code to be able to find the performance of it.

import cProfile

import random



from itertools import groupby



def group_blocks(num_writes, threshold):

    i = 0

    for key, group in groupby(num_writes, lambda x : x >= threshold):

        # This is faster compared to len(list(g))

        length = sum(1 for _ in group)

        if key:

            yield [i, i + length - 1]

        i += length





def build_writes(block_count, writes):

    num_writes = [0] * block_count

    for lower, upper in writes:

        for i in range(lower, upper + 1):

            num_writes[i] += 1

    return num_writes





def blockStorageRewrites(blockCount, writes, threshold):

    num_writes = build_writes(blockCount, writes)

    return list(group_blocks(num_writes, threshold))





block_count = 10**5

writes = 

for _ in range(10**4):

    a = random.randrange(0, block_count)

    b = random.randrange(a, block_count)

    writes.append([a, b])





cProfile.run('blockStorageRewrites(block_count, writes, 10**4)')

Resulting in the following profile:

         200008 function calls in 25.377 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.002    0.002   25.377   25.377 <string>:1(<module>)

   100001    0.019    0.000    0.025    0.000 lus.py:10(<genexpr>)

        1   25.342   25.342   25.342   25.342 lus.py:16(build_writes)

        1    0.000    0.000   25.375   25.375 lus.py:24(blockStorageRewrites)

        1    0.000    0.000    0.033    0.033 lus.py:6(group_blocks)

   100000    0.007    0.000    0.007    0.000 lus.py:8(<lambda>)

        1    0.000    0.000   25.377   25.377 {built-in method builtins.exec}

        1    0.007    0.007    0.033    0.033 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Changing the code as per Georgy's comment to:

def build_writes(block_count, writes):

    num_writes = dict(enumerate([0] * block_count))

    for lower, upper in writes:

        num_writes[lower] += 1

        num_writes[upper] -= 1

    return list(accumulate(num_writes))

Gets the following profile, which is orders of magnitude faster:

         200011 function calls in 0.066 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.002    0.002    0.066    0.066 <string>:1(<module>)

   100002    0.021    0.000    0.028    0.000 lus.py:10(<genexpr>)

        1    0.025    0.025    0.025    0.025 lus.py:16(build_writes)

        1    0.003    0.003    0.064    0.064 lus.py:24(blockStorageRewrites)

        2    0.000    0.000    0.036    0.018 lus.py:6(group_blocks)

   100000    0.008    0.000    0.008    0.000 lus.py:8(<lambda>)

        1    0.000    0.000    0.066    0.066 {built-in method builtins.exec}

        2    0.008    0.004    0.036    0.018 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

answered 12 hours ago

Peilonrayz

25.7k338109

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f215355%2fblock-storage-rewrites%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

import cProfile

import random



from itertools import groupby



def group_blocks(num_writes, threshold):

    i = 0

    for key, group in groupby(num_writes, lambda x : x >= threshold):

        # This is faster compared to len(list(g))

        length = sum(1 for _ in group)

        if key:

            yield [i, i + length - 1]

        i += length





def build_writes(block_count, writes):

    num_writes = [0] * block_count

    for lower, upper in writes:

        for i in range(lower, upper + 1):

            num_writes[i] += 1

    return num_writes





def blockStorageRewrites(blockCount, writes, threshold):

    num_writes = build_writes(blockCount, writes)

    return list(group_blocks(num_writes, threshold))





block_count = 10**5

writes = 

for _ in range(10**4):

    a = random.randrange(0, block_count)

    b = random.randrange(a, block_count)

    writes.append([a, b])





cProfile.run('blockStorageRewrites(block_count, writes, 10**4)')

Resulting in the following profile:

         200008 function calls in 25.377 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.002    0.002   25.377   25.377 <string>:1(<module>)

   100001    0.019    0.000    0.025    0.000 lus.py:10(<genexpr>)

        1   25.342   25.342   25.342   25.342 lus.py:16(build_writes)

        1    0.000    0.000   25.375   25.375 lus.py:24(blockStorageRewrites)

        1    0.000    0.000    0.033    0.033 lus.py:6(group_blocks)

   100000    0.007    0.000    0.007    0.000 lus.py:8(<lambda>)

        1    0.000    0.000   25.377   25.377 {built-in method builtins.exec}

        1    0.007    0.007    0.033    0.033 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Changing the code as per Georgy's comment to:

def build_writes(block_count, writes):

    num_writes = dict(enumerate([0] * block_count))

    for lower, upper in writes:

        num_writes[lower] += 1

        num_writes[upper] -= 1

    return list(accumulate(num_writes))

Gets the following profile, which is orders of magnitude faster:

         200011 function calls in 0.066 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.002    0.002    0.066    0.066 <string>:1(<module>)

   100002    0.021    0.000    0.028    0.000 lus.py:10(<genexpr>)

        1    0.025    0.025    0.025    0.025 lus.py:16(build_writes)

        1    0.003    0.003    0.064    0.064 lus.py:24(blockStorageRewrites)

        2    0.000    0.000    0.036    0.018 lus.py:6(group_blocks)

   100000    0.008    0.000    0.008    0.000 lus.py:8(<lambda>)

        1    0.000    0.000    0.066    0.066 {built-in method builtins.exec}

        2    0.008    0.004    0.036    0.018 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

answered 12 hours ago

Peilonrayz

25.7k338109

add a comment |

import cProfile

import random



from itertools import groupby



def group_blocks(num_writes, threshold):

    i = 0

    for key, group in groupby(num_writes, lambda x : x >= threshold):

        # This is faster compared to len(list(g))

        length = sum(1 for _ in group)

        if key:

            yield [i, i + length - 1]

        i += length





def build_writes(block_count, writes):

    num_writes = [0] * block_count

    for lower, upper in writes:

        for i in range(lower, upper + 1):

            num_writes[i] += 1

    return num_writes





def blockStorageRewrites(blockCount, writes, threshold):

    num_writes = build_writes(blockCount, writes)

    return list(group_blocks(num_writes, threshold))





block_count = 10**5

writes = 

for _ in range(10**4):

    a = random.randrange(0, block_count)

    b = random.randrange(a, block_count)

    writes.append([a, b])





cProfile.run('blockStorageRewrites(block_count, writes, 10**4)')

Resulting in the following profile:

         200008 function calls in 25.377 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.002    0.002   25.377   25.377 <string>:1(<module>)

   100001    0.019    0.000    0.025    0.000 lus.py:10(<genexpr>)

        1   25.342   25.342   25.342   25.342 lus.py:16(build_writes)

        1    0.000    0.000   25.375   25.375 lus.py:24(blockStorageRewrites)

        1    0.000    0.000    0.033    0.033 lus.py:6(group_blocks)

   100000    0.007    0.000    0.007    0.000 lus.py:8(<lambda>)

        1    0.000    0.000   25.377   25.377 {built-in method builtins.exec}

        1    0.007    0.007    0.033    0.033 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Changing the code as per Georgy's comment to:

def build_writes(block_count, writes):

    num_writes = dict(enumerate([0] * block_count))

    for lower, upper in writes:

        num_writes[lower] += 1

        num_writes[upper] -= 1

    return list(accumulate(num_writes))

Gets the following profile, which is orders of magnitude faster:

         200011 function calls in 0.066 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.002    0.002    0.066    0.066 <string>:1(<module>)

   100002    0.021    0.000    0.028    0.000 lus.py:10(<genexpr>)

        1    0.025    0.025    0.025    0.025 lus.py:16(build_writes)

        1    0.003    0.003    0.064    0.064 lus.py:24(blockStorageRewrites)

        2    0.000    0.000    0.036    0.018 lus.py:6(group_blocks)

   100000    0.008    0.000    0.008    0.000 lus.py:8(<lambda>)

        1    0.000    0.000    0.066    0.066 {built-in method builtins.exec}

        2    0.008    0.004    0.036    0.018 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

answered 12 hours ago

Peilonrayz

25.7k338109

add a comment |

import cProfile

import random



from itertools import groupby



def group_blocks(num_writes, threshold):

    i = 0

    for key, group in groupby(num_writes, lambda x : x >= threshold):

        # This is faster compared to len(list(g))

        length = sum(1 for _ in group)

        if key:

            yield [i, i + length - 1]

        i += length





def build_writes(block_count, writes):

    num_writes = [0] * block_count

    for lower, upper in writes:

        for i in range(lower, upper + 1):

            num_writes[i] += 1

    return num_writes





def blockStorageRewrites(blockCount, writes, threshold):

    num_writes = build_writes(blockCount, writes)

    return list(group_blocks(num_writes, threshold))





block_count = 10**5

writes = 

for _ in range(10**4):

    a = random.randrange(0, block_count)

    b = random.randrange(a, block_count)

    writes.append([a, b])





cProfile.run('blockStorageRewrites(block_count, writes, 10**4)')

Resulting in the following profile:

         200008 function calls in 25.377 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.002    0.002   25.377   25.377 <string>:1(<module>)

   100001    0.019    0.000    0.025    0.000 lus.py:10(<genexpr>)

        1   25.342   25.342   25.342   25.342 lus.py:16(build_writes)

        1    0.000    0.000   25.375   25.375 lus.py:24(blockStorageRewrites)

        1    0.000    0.000    0.033    0.033 lus.py:6(group_blocks)

   100000    0.007    0.000    0.007    0.000 lus.py:8(<lambda>)

        1    0.000    0.000   25.377   25.377 {built-in method builtins.exec}

        1    0.007    0.007    0.033    0.033 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Changing the code as per Georgy's comment to:

def build_writes(block_count, writes):

    num_writes = dict(enumerate([0] * block_count))

    for lower, upper in writes:

        num_writes[lower] += 1

        num_writes[upper] -= 1

    return list(accumulate(num_writes))

Gets the following profile, which is orders of magnitude faster:

         200011 function calls in 0.066 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.002    0.002    0.066    0.066 <string>:1(<module>)

   100002    0.021    0.000    0.028    0.000 lus.py:10(<genexpr>)

        1    0.025    0.025    0.025    0.025 lus.py:16(build_writes)

        1    0.003    0.003    0.064    0.064 lus.py:24(blockStorageRewrites)

        2    0.000    0.000    0.036    0.018 lus.py:6(group_blocks)

   100000    0.008    0.000    0.008    0.000 lus.py:8(<lambda>)

        1    0.000    0.000    0.066    0.066 {built-in method builtins.exec}

        2    0.008    0.004    0.036    0.018 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

answered 12 hours ago

Peilonrayz

25.7k338109

import cProfile

import random



from itertools import groupby



def group_blocks(num_writes, threshold):

    i = 0

    for key, group in groupby(num_writes, lambda x : x >= threshold):

        # This is faster compared to len(list(g))

        length = sum(1 for _ in group)

        if key:

            yield [i, i + length - 1]

        i += length





def build_writes(block_count, writes):

    num_writes = [0] * block_count

    for lower, upper in writes:

        for i in range(lower, upper + 1):

            num_writes[i] += 1

    return num_writes





def blockStorageRewrites(blockCount, writes, threshold):

    num_writes = build_writes(blockCount, writes)

    return list(group_blocks(num_writes, threshold))





block_count = 10**5

writes = 

for _ in range(10**4):

    a = random.randrange(0, block_count)

    b = random.randrange(a, block_count)

    writes.append([a, b])





cProfile.run('blockStorageRewrites(block_count, writes, 10**4)')

Resulting in the following profile:

         200008 function calls in 25.377 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.002    0.002   25.377   25.377 <string>:1(<module>)

   100001    0.019    0.000    0.025    0.000 lus.py:10(<genexpr>)

        1   25.342   25.342   25.342   25.342 lus.py:16(build_writes)

        1    0.000    0.000   25.375   25.375 lus.py:24(blockStorageRewrites)

        1    0.000    0.000    0.033    0.033 lus.py:6(group_blocks)

   100000    0.007    0.000    0.007    0.000 lus.py:8(<lambda>)

        1    0.000    0.000   25.377   25.377 {built-in method builtins.exec}

        1    0.007    0.007    0.033    0.033 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Changing the code as per Georgy's comment to:

def build_writes(block_count, writes):

    num_writes = dict(enumerate([0] * block_count))

    for lower, upper in writes:

        num_writes[lower] += 1

        num_writes[upper] -= 1

    return list(accumulate(num_writes))

Gets the following profile, which is orders of magnitude faster:

         200011 function calls in 0.066 seconds



   Ordered by: standard name



   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.002    0.002    0.066    0.066 <string>:1(<module>)

   100002    0.021    0.000    0.028    0.000 lus.py:10(<genexpr>)

        1    0.025    0.025    0.025    0.025 lus.py:16(build_writes)

        1    0.003    0.003    0.064    0.064 lus.py:24(blockStorageRewrites)

        2    0.000    0.000    0.036    0.018 lus.py:6(group_blocks)

   100000    0.008    0.000    0.008    0.000 lus.py:8(<lambda>)

        1    0.000    0.000    0.066    0.066 {built-in method builtins.exec}

        2    0.008    0.004    0.036    0.018 {built-in method builtins.sum}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

answered 12 hours ago

Peilonrayz

25.7k338109

answered 12 hours ago

Peilonrayz

25.7k338109

answered 12 hours ago

Peilonrayz

25.7k338109

answered 12 hours ago

Peilonrayz

25.7k338109

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr

Block storage rewrites

Summary

Examples

Constraints

First try

After trying to optimize

Questions

Summary

Examples

Constraints

First try

After trying to optimize

Questions

Summary

Examples

Constraints

First try

After trying to optimize

Questions

Summary

Examples

Constraints

First try

After trying to optimize

Questions

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

RAC Tourist Trophy

Block storage rewrites

Summary

Examples

Constraints

First try

After trying to optimize

Questions

Summary

Examples

Constraints

First try

After trying to optimize

Questions

Summary

Examples

Constraints

First try

After trying to optimize

Questions

Summary

Examples

Constraints

First try

After trying to optimize

Questions

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

RAC Tourist Trophy

1 Answer
1

1 Answer
1

1 Answer
1