Create binary pandas dataframe (optimize loop for)

up vote
0
down vote

favorite

Imagine I have this dataframe :

test = pd.DataFrame({"id" : [0,1,4,3],

                "cit" : [[6,7], , [9,2,1], [0,1]]})

This DataFrame :

       id   cit

   0    0   [6, 7]

   1    1   

   2    4   [9, 2, 1]

   3    3   [0, 1]

(in reality, I have a Dataframe with ~13 000 rows)

The cit columns are links for id (one way), the id #0 have links with id #6 and id #7, the id #1 have no link, the id #4 have links with #9, #2 and #1 and id #3 have links with id #0 and id #1

if there is a link, I want to put 1 if 2 id are linked, else 0

I want to have this output :

id  0   1   4   3

0   X   0   0   1

1   0   X   1   1

4   1   1   X   0

3   1   0   0   X

I have written a code but with 2 for loops..
I want to optimize the following code :

for i in range(len(test.id)):

   tmp = 

   for j in range(len(test.cit)):

     if test.id.iloc[i] in test.cit.iloc[j]:

        tmp.append(str(1))

     else:

        tmp.append(str(0))

   t2.loc[str(test.id.iloc[i])] = tmp

   print(i, '/' , len(test.id))

t2.values[[np.arange(len(test.id))]*2] = "X"

And I don't know how to copy the upper triangular to lower triangular for a DataFrame (I can do it with for loop but 4 for loops with 13 000 rows, it will be very slow..)

I checked the iterrows() and itertuples() functions but I have no idea how can I do it, same for isin() or apply/map() functions..

Thanks in advance for your help.

asked Nov 17 at 22:39

Hervé

add a comment |

up vote
0
down vote

favorite

Imagine I have this dataframe :

test = pd.DataFrame({"id" : [0,1,4,3],

                "cit" : [[6,7], , [9,2,1], [0,1]]})

This DataFrame :

       id   cit

   0    0   [6, 7]

   1    1   

   2    4   [9, 2, 1]

   3    3   [0, 1]

(in reality, I have a Dataframe with ~13 000 rows)

The cit columns are links for id (one way), the id #0 have links with id #6 and id #7, the id #1 have no link, the id #4 have links with #9, #2 and #1 and id #3 have links with id #0 and id #1

if there is a link, I want to put 1 if 2 id are linked, else 0

I want to have this output :

id  0   1   4   3

0   X   0   0   1

1   0   X   1   1

4   1   1   X   0

3   1   0   0   X

I have written a code but with 2 for loops..
I want to optimize the following code :

for i in range(len(test.id)):

   tmp = 

   for j in range(len(test.cit)):

     if test.id.iloc[i] in test.cit.iloc[j]:

        tmp.append(str(1))

     else:

        tmp.append(str(0))

   t2.loc[str(test.id.iloc[i])] = tmp

   print(i, '/' , len(test.id))

t2.values[[np.arange(len(test.id))]*2] = "X"

And I don't know how to copy the upper triangular to lower triangular for a DataFrame (I can do it with for loop but 4 for loops with 13 000 rows, it will be very slow..)

I checked the iterrows() and itertuples() functions but I have no idea how can I do it, same for isin() or apply/map() functions..

Thanks in advance for your help.

asked Nov 17 at 22:39

Hervé

add a comment |

up vote
0
down vote

favorite

Imagine I have this dataframe :

test = pd.DataFrame({"id" : [0,1,4,3],

                "cit" : [[6,7], , [9,2,1], [0,1]]})

This DataFrame :

       id   cit

   0    0   [6, 7]

   1    1   

   2    4   [9, 2, 1]

   3    3   [0, 1]

(in reality, I have a Dataframe with ~13 000 rows)

The cit columns are links for id (one way), the id #0 have links with id #6 and id #7, the id #1 have no link, the id #4 have links with #9, #2 and #1 and id #3 have links with id #0 and id #1

if there is a link, I want to put 1 if 2 id are linked, else 0

I want to have this output :

id  0   1   4   3

0   X   0   0   1

1   0   X   1   1

4   1   1   X   0

3   1   0   0   X

I have written a code but with 2 for loops..
I want to optimize the following code :

for i in range(len(test.id)):

   tmp = 

   for j in range(len(test.cit)):

     if test.id.iloc[i] in test.cit.iloc[j]:

        tmp.append(str(1))

     else:

        tmp.append(str(0))

   t2.loc[str(test.id.iloc[i])] = tmp

   print(i, '/' , len(test.id))

t2.values[[np.arange(len(test.id))]*2] = "X"

And I don't know how to copy the upper triangular to lower triangular for a DataFrame (I can do it with for loop but 4 for loops with 13 000 rows, it will be very slow..)

I checked the iterrows() and itertuples() functions but I have no idea how can I do it, same for isin() or apply/map() functions..

Thanks in advance for your help.

asked Nov 17 at 22:39

Hervé

Imagine I have this dataframe :

test = pd.DataFrame({"id" : [0,1,4,3],

                "cit" : [[6,7], , [9,2,1], [0,1]]})

This DataFrame :

       id   cit

   0    0   [6, 7]

   1    1   

   2    4   [9, 2, 1]

   3    3   [0, 1]

(in reality, I have a Dataframe with ~13 000 rows)

The cit columns are links for id (one way), the id #0 have links with id #6 and id #7, the id #1 have no link, the id #4 have links with #9, #2 and #1 and id #3 have links with id #0 and id #1

if there is a link, I want to put 1 if 2 id are linked, else 0

I want to have this output :

id  0   1   4   3

0   X   0   0   1

1   0   X   1   1

4   1   1   X   0

3   1   0   0   X

I have written a code but with 2 for loops..
I want to optimize the following code :

for i in range(len(test.id)):

   tmp = 

   for j in range(len(test.cit)):

     if test.id.iloc[i] in test.cit.iloc[j]:

        tmp.append(str(1))

     else:

        tmp.append(str(0))

   t2.loc[str(test.id.iloc[i])] = tmp

   print(i, '/' , len(test.id))

t2.values[[np.arange(len(test.id))]*2] = "X"

And I don't know how to copy the upper triangular to lower triangular for a DataFrame (I can do it with for loop but 4 for loops with 13 000 rows, it will be very slow..)

I checked the iterrows() and itertuples() functions but I have no idea how can I do it, same for isin() or apply/map() functions..

Thanks in advance for your help.

python pandas list loops optimization

asked Nov 17 at 22:39

Hervé

asked Nov 17 at 22:39

Hervé

asked Nov 17 at 22:39

Hervé

asked Nov 17 at 22:39

Hervé

asked Nov 17 at 22:39

Hervé

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

I'd create a new DataFrame, and then you can use pd.crosstab

import pandas as pd



df = (pd.DataFrame(test.cit.values.tolist(), 

                   index = test.id)

        .stack()

        .reset_index(level=1, drop=True)

        .to_frame())



pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)

Output:

    0  1  2  6  7  9

id                  

0   0  0  0  1  1  0

3   1  1  0  0  0  0

4   0  1  1  0  0  1

If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.

edited Nov 18 at 4:02

answered Nov 18 at 3:09

ALollz

10.1k31134

Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
– Hervé
Nov 18 at 17:37

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53356217%2fcreate-binary-pandas-dataframe-optimize-loop-for%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

I'd create a new DataFrame, and then you can use pd.crosstab

import pandas as pd



df = (pd.DataFrame(test.cit.values.tolist(), 

                   index = test.id)

        .stack()

        .reset_index(level=1, drop=True)

        .to_frame())



pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)

Output:

    0  1  2  6  7  9

id                  

0   0  0  0  1  1  0

3   1  1  0  0  0  0

4   0  1  1  0  0  1

If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.

edited Nov 18 at 4:02

answered Nov 18 at 3:09

ALollz

10.1k31134

Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
– Hervé
Nov 18 at 17:37

add a comment |

up vote
0
down vote

I'd create a new DataFrame, and then you can use pd.crosstab

import pandas as pd



df = (pd.DataFrame(test.cit.values.tolist(), 

                   index = test.id)

        .stack()

        .reset_index(level=1, drop=True)

        .to_frame())



pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)

Output:

    0  1  2  6  7  9

id                  

0   0  0  0  1  1  0

3   1  1  0  0  0  0

4   0  1  1  0  0  1

If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.

edited Nov 18 at 4:02

answered Nov 18 at 3:09

ALollz

10.1k31134

Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
– Hervé
Nov 18 at 17:37

add a comment |

up vote
0
down vote

I'd create a new DataFrame, and then you can use pd.crosstab

import pandas as pd



df = (pd.DataFrame(test.cit.values.tolist(), 

                   index = test.id)

        .stack()

        .reset_index(level=1, drop=True)

        .to_frame())



pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)

Output:

    0  1  2  6  7  9

id                  

0   0  0  0  1  1  0

3   1  1  0  0  0  0

4   0  1  1  0  0  1

If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.

edited Nov 18 at 4:02

answered Nov 18 at 3:09

ALollz

10.1k31134

I'd create a new DataFrame, and then you can use pd.crosstab

import pandas as pd



df = (pd.DataFrame(test.cit.values.tolist(), 

                   index = test.id)

        .stack()

        .reset_index(level=1, drop=True)

        .to_frame())



pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)

Output:

    0  1  2  6  7  9

id                  

0   0  0  0  1  1  0

3   1  1  0  0  0  0

4   0  1  1  0  0  1

If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.

edited Nov 18 at 4:02

answered Nov 18 at 3:09

ALollz

10.1k31134

edited Nov 18 at 4:02

answered Nov 18 at 3:09

ALollz

10.1k31134

answered Nov 18 at 3:09

ALollz

10.1k31134

answered Nov 18 at 3:09

ALollz

10.1k31134

Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
– Hervé
Nov 18 at 17:37

add a comment |

Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
– Hervé
Nov 18 at 17:37

Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
– Hervé
Nov 18 at 17:37

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr