Range (list) as dummy columns

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have two columns with start and end range. I want make dummy columns for range between this columns. I can make it by apply method, but it is very slow. Can I make it without apply (because I have ~2-5M rows).

Entire DataFrame:

    start     end

0   36        36

1   31        31

2   29        29

3   10        10

4   35        35

5   42        44

6   24        26

What I want to see:

    start   end 8   9   10  24  25  26  29  31  35  36  42  43  44

0   36      36  NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN

1   31      31  NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN

2   29      29  NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN

3   10      10  NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

4   35      35  NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN

5   42      44  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 1.0

6   24      26  NaN NaN NaN 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN

7   25      25  NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN

8   35      35  NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN

9   8       10  1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Now I use this code:

import itertools



def zip_with_scalar(l, o):

    return dict(zip(l, itertools.repeat(o)))

df.merge(df.apply(lambda s: pd.Series(zip_with_scalar(range(s['start'], s['end']+1), 1)), axis = 1), left_index=True, right_index=True)

edited Nov 23 '18 at 11:18

asked Nov 23 '18 at 11:13

Sergey Panyushkin

404

add a comment |

Entire DataFrame:

    start     end

0   36        36

1   31        31

2   29        29

3   10        10

4   35        35

5   42        44

6   24        26

What I want to see:

    start   end 8   9   10  24  25  26  29  31  35  36  42  43  44

0   36      36  NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN

1   31      31  NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN

2   29      29  NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN

3   10      10  NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

4   35      35  NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN

5   42      44  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 1.0

6   24      26  NaN NaN NaN 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN

7   25      25  NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN

8   35      35  NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN

9   8       10  1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Now I use this code:

import itertools



def zip_with_scalar(l, o):

    return dict(zip(l, itertools.repeat(o)))

df.merge(df.apply(lambda s: pd.Series(zip_with_scalar(range(s['start'], s['end']+1), 1)), axis = 1), left_index=True, right_index=True)

edited Nov 23 '18 at 11:18

asked Nov 23 '18 at 11:13

Sergey Panyushkin

404

add a comment |

Entire DataFrame:

    start     end

0   36        36

1   31        31

2   29        29

3   10        10

4   35        35

5   42        44

6   24        26

What I want to see:

    start   end 8   9   10  24  25  26  29  31  35  36  42  43  44

0   36      36  NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN

1   31      31  NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN

2   29      29  NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN

3   10      10  NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

4   35      35  NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN

5   42      44  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 1.0

6   24      26  NaN NaN NaN 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN

7   25      25  NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN

8   35      35  NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN

9   8       10  1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Now I use this code:

import itertools



def zip_with_scalar(l, o):

    return dict(zip(l, itertools.repeat(o)))

df.merge(df.apply(lambda s: pd.Series(zip_with_scalar(range(s['start'], s['end']+1), 1)), axis = 1), left_index=True, right_index=True)

edited Nov 23 '18 at 11:18

asked Nov 23 '18 at 11:13

Sergey Panyushkin

404

Entire DataFrame:

    start     end

0   36        36

1   31        31

2   29        29

3   10        10

4   35        35

5   42        44

6   24        26

What I want to see:

    start   end 8   9   10  24  25  26  29  31  35  36  42  43  44

0   36      36  NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN

1   31      31  NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN

2   29      29  NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN

3   10      10  NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

4   35      35  NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN

5   42      44  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 1.0

6   24      26  NaN NaN NaN 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN

7   25      25  NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN

8   35      35  NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN

9   8       10  1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Now I use this code:

import itertools



def zip_with_scalar(l, o):

    return dict(zip(l, itertools.repeat(o)))

df.merge(df.apply(lambda s: pd.Series(zip_with_scalar(range(s['start'], s['end']+1), 1)), axis = 1), left_index=True, right_index=True)

python pandas

edited Nov 23 '18 at 11:18

asked Nov 23 '18 at 11:13

Sergey Panyushkin

404

edited Nov 23 '18 at 11:18

asked Nov 23 '18 at 11:13

Sergey Panyushkin

404

edited Nov 23 '18 at 11:18

asked Nov 23 '18 at 11:13

Sergey Panyushkin

404

asked Nov 23 '18 at 11:13

Sergey Panyushkin

404

asked Nov 23 '18 at 11:13

Sergey Panyushkin

404

add a comment |

1 Answer
1

active

oldest

votes

Use list comprehension with DataFrame constructor:

a = [dict.fromkeys(range(x, y), 1) for x, y in zip(df['start'], df['end']+1)]

df = df.join(pd.DataFrame(a, index=df.index))

print (df)

   start  end   10   24   25   26   29   31   35   36   42   43   44

0     36   36  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN

1     31   31  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN  NaN

2     29   29  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN  NaN  NaN

3     10   10  1.0  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

4     35   35  NaN  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN

5     42   44  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  1.0  1.0

6     24   26  NaN  1.0  1.0  1.0  NaN  NaN  NaN  NaN  NaN  NaN  NaN

Performance:

#[70000 rows x 2 columns]

df = pd.concat([df] * 10000, ignore_index=True)



def a(df):

    a = [dict.fromkeys(range(x, y), 1) for x, y in zip(df['start'], df['end']+1)]

    return df.join(pd.DataFrame(a, index=df.index))



import itertools



def zip_with_scalar(l, o):

    return dict(zip(l, itertools.repeat(o)))

def b(df):

    return df.merge(df.apply(lambda s: pd.Series(zip_with_scalar(range(s['start'], s['end']+1), 1)), axis = 1), left_index=True, right_index=True)





In [176]: %timeit a(df.copy())

202 ms ± 6.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [177]: %timeit b(df.copy())

38.9 s ± 1.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 23 '18 at 11:52

answered Nov 23 '18 at 11:32

jezrael

356k26320396

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53445635%2frange-list-as-dummy-columns%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Use list comprehension with DataFrame constructor:

a = [dict.fromkeys(range(x, y), 1) for x, y in zip(df['start'], df['end']+1)]

df = df.join(pd.DataFrame(a, index=df.index))

print (df)

   start  end   10   24   25   26   29   31   35   36   42   43   44

0     36   36  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN

1     31   31  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN  NaN

2     29   29  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN  NaN  NaN

3     10   10  1.0  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

4     35   35  NaN  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN

5     42   44  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  1.0  1.0

6     24   26  NaN  1.0  1.0  1.0  NaN  NaN  NaN  NaN  NaN  NaN  NaN

Performance:

#[70000 rows x 2 columns]

df = pd.concat([df] * 10000, ignore_index=True)



def a(df):

    a = [dict.fromkeys(range(x, y), 1) for x, y in zip(df['start'], df['end']+1)]

    return df.join(pd.DataFrame(a, index=df.index))



import itertools



def zip_with_scalar(l, o):

    return dict(zip(l, itertools.repeat(o)))

def b(df):

    return df.merge(df.apply(lambda s: pd.Series(zip_with_scalar(range(s['start'], s['end']+1), 1)), axis = 1), left_index=True, right_index=True)





In [176]: %timeit a(df.copy())

202 ms ± 6.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [177]: %timeit b(df.copy())

38.9 s ± 1.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 23 '18 at 11:52

answered Nov 23 '18 at 11:32

jezrael

356k26320396

add a comment |

Use list comprehension with DataFrame constructor:

a = [dict.fromkeys(range(x, y), 1) for x, y in zip(df['start'], df['end']+1)]

df = df.join(pd.DataFrame(a, index=df.index))

print (df)

   start  end   10   24   25   26   29   31   35   36   42   43   44

0     36   36  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN

1     31   31  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN  NaN

2     29   29  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN  NaN  NaN

3     10   10  1.0  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

4     35   35  NaN  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN

5     42   44  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  1.0  1.0

6     24   26  NaN  1.0  1.0  1.0  NaN  NaN  NaN  NaN  NaN  NaN  NaN

Performance:

#[70000 rows x 2 columns]

df = pd.concat([df] * 10000, ignore_index=True)



def a(df):

    a = [dict.fromkeys(range(x, y), 1) for x, y in zip(df['start'], df['end']+1)]

    return df.join(pd.DataFrame(a, index=df.index))



import itertools



def zip_with_scalar(l, o):

    return dict(zip(l, itertools.repeat(o)))

def b(df):

    return df.merge(df.apply(lambda s: pd.Series(zip_with_scalar(range(s['start'], s['end']+1), 1)), axis = 1), left_index=True, right_index=True)





In [176]: %timeit a(df.copy())

202 ms ± 6.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [177]: %timeit b(df.copy())

38.9 s ± 1.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 23 '18 at 11:52

answered Nov 23 '18 at 11:32

jezrael

356k26320396

add a comment |

Use list comprehension with DataFrame constructor:

a = [dict.fromkeys(range(x, y), 1) for x, y in zip(df['start'], df['end']+1)]

df = df.join(pd.DataFrame(a, index=df.index))

print (df)

   start  end   10   24   25   26   29   31   35   36   42   43   44

0     36   36  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN

1     31   31  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN  NaN

2     29   29  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN  NaN  NaN

3     10   10  1.0  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

4     35   35  NaN  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN

5     42   44  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  1.0  1.0

6     24   26  NaN  1.0  1.0  1.0  NaN  NaN  NaN  NaN  NaN  NaN  NaN

Performance:

#[70000 rows x 2 columns]

df = pd.concat([df] * 10000, ignore_index=True)



def a(df):

    a = [dict.fromkeys(range(x, y), 1) for x, y in zip(df['start'], df['end']+1)]

    return df.join(pd.DataFrame(a, index=df.index))



import itertools



def zip_with_scalar(l, o):

    return dict(zip(l, itertools.repeat(o)))

def b(df):

    return df.merge(df.apply(lambda s: pd.Series(zip_with_scalar(range(s['start'], s['end']+1), 1)), axis = 1), left_index=True, right_index=True)





In [176]: %timeit a(df.copy())

202 ms ± 6.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [177]: %timeit b(df.copy())

38.9 s ± 1.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 23 '18 at 11:52

answered Nov 23 '18 at 11:32

jezrael

356k26320396

Use list comprehension with DataFrame constructor:

a = [dict.fromkeys(range(x, y), 1) for x, y in zip(df['start'], df['end']+1)]

df = df.join(pd.DataFrame(a, index=df.index))

print (df)

   start  end   10   24   25   26   29   31   35   36   42   43   44

0     36   36  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN

1     31   31  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN  NaN

2     29   29  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN  NaN  NaN

3     10   10  1.0  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

4     35   35  NaN  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN

5     42   44  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  1.0  1.0

6     24   26  NaN  1.0  1.0  1.0  NaN  NaN  NaN  NaN  NaN  NaN  NaN

Performance:

#[70000 rows x 2 columns]

df = pd.concat([df] * 10000, ignore_index=True)



def a(df):

    a = [dict.fromkeys(range(x, y), 1) for x, y in zip(df['start'], df['end']+1)]

    return df.join(pd.DataFrame(a, index=df.index))



import itertools



def zip_with_scalar(l, o):

    return dict(zip(l, itertools.repeat(o)))

def b(df):

    return df.merge(df.apply(lambda s: pd.Series(zip_with_scalar(range(s['start'], s['end']+1), 1)), axis = 1), left_index=True, right_index=True)





In [176]: %timeit a(df.copy())

202 ms ± 6.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



In [177]: %timeit b(df.copy())

38.9 s ± 1.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 23 '18 at 11:52

answered Nov 23 '18 at 11:32

jezrael

356k26320396

edited Nov 23 '18 at 11:52

answered Nov 23 '18 at 11:32

jezrael

356k26320396

answered Nov 23 '18 at 11:32

jezrael

356k26320396

answered Nov 23 '18 at 11:32

jezrael

356k26320396

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr