Create binary pandas dataframe (optimize loop for)
up vote
0
down vote
favorite
Imagine I have this dataframe :
test = pd.DataFrame({"id" : [0,1,4,3],
"cit" : [[6,7], , [9,2,1], [0,1]]})
This DataFrame :
id cit
0 0 [6, 7]
1 1
2 4 [9, 2, 1]
3 3 [0, 1]
(in reality, I have a Dataframe with ~13 000 rows)
The cit columns are links for id (one way), the id #0 have links with id #6 and id #7, the id #1 have no link, the id #4 have links with #9, #2 and #1 and id #3 have links with id #0 and id #1
if there is a link, I want to put 1 if 2 id are linked, else 0
I want to have this output :
id 0 1 4 3
0 X 0 0 1
1 0 X 1 1
4 1 1 X 0
3 1 0 0 X
I have written a code but with 2 for loops..
I want to optimize the following code :
for i in range(len(test.id)):
tmp =
for j in range(len(test.cit)):
if test.id.iloc[i] in test.cit.iloc[j]:
tmp.append(str(1))
else:
tmp.append(str(0))
t2.loc[str(test.id.iloc[i])] = tmp
print(i, '/' , len(test.id))
t2.values[[np.arange(len(test.id))]*2] = "X"
And I don't know how to copy the upper triangular to lower triangular for a DataFrame (I can do it with for loop but 4 for loops with 13 000 rows, it will be very slow..)
I checked the iterrows() and itertuples() functions but I have no idea how can I do it, same for isin() or apply/map() functions..
Thanks in advance for your help.
python pandas list loops optimization
add a comment |
up vote
0
down vote
favorite
Imagine I have this dataframe :
test = pd.DataFrame({"id" : [0,1,4,3],
"cit" : [[6,7], , [9,2,1], [0,1]]})
This DataFrame :
id cit
0 0 [6, 7]
1 1
2 4 [9, 2, 1]
3 3 [0, 1]
(in reality, I have a Dataframe with ~13 000 rows)
The cit columns are links for id (one way), the id #0 have links with id #6 and id #7, the id #1 have no link, the id #4 have links with #9, #2 and #1 and id #3 have links with id #0 and id #1
if there is a link, I want to put 1 if 2 id are linked, else 0
I want to have this output :
id 0 1 4 3
0 X 0 0 1
1 0 X 1 1
4 1 1 X 0
3 1 0 0 X
I have written a code but with 2 for loops..
I want to optimize the following code :
for i in range(len(test.id)):
tmp =
for j in range(len(test.cit)):
if test.id.iloc[i] in test.cit.iloc[j]:
tmp.append(str(1))
else:
tmp.append(str(0))
t2.loc[str(test.id.iloc[i])] = tmp
print(i, '/' , len(test.id))
t2.values[[np.arange(len(test.id))]*2] = "X"
And I don't know how to copy the upper triangular to lower triangular for a DataFrame (I can do it with for loop but 4 for loops with 13 000 rows, it will be very slow..)
I checked the iterrows() and itertuples() functions but I have no idea how can I do it, same for isin() or apply/map() functions..
Thanks in advance for your help.
python pandas list loops optimization
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Imagine I have this dataframe :
test = pd.DataFrame({"id" : [0,1,4,3],
"cit" : [[6,7], , [9,2,1], [0,1]]})
This DataFrame :
id cit
0 0 [6, 7]
1 1
2 4 [9, 2, 1]
3 3 [0, 1]
(in reality, I have a Dataframe with ~13 000 rows)
The cit columns are links for id (one way), the id #0 have links with id #6 and id #7, the id #1 have no link, the id #4 have links with #9, #2 and #1 and id #3 have links with id #0 and id #1
if there is a link, I want to put 1 if 2 id are linked, else 0
I want to have this output :
id 0 1 4 3
0 X 0 0 1
1 0 X 1 1
4 1 1 X 0
3 1 0 0 X
I have written a code but with 2 for loops..
I want to optimize the following code :
for i in range(len(test.id)):
tmp =
for j in range(len(test.cit)):
if test.id.iloc[i] in test.cit.iloc[j]:
tmp.append(str(1))
else:
tmp.append(str(0))
t2.loc[str(test.id.iloc[i])] = tmp
print(i, '/' , len(test.id))
t2.values[[np.arange(len(test.id))]*2] = "X"
And I don't know how to copy the upper triangular to lower triangular for a DataFrame (I can do it with for loop but 4 for loops with 13 000 rows, it will be very slow..)
I checked the iterrows() and itertuples() functions but I have no idea how can I do it, same for isin() or apply/map() functions..
Thanks in advance for your help.
python pandas list loops optimization
Imagine I have this dataframe :
test = pd.DataFrame({"id" : [0,1,4,3],
"cit" : [[6,7], , [9,2,1], [0,1]]})
This DataFrame :
id cit
0 0 [6, 7]
1 1
2 4 [9, 2, 1]
3 3 [0, 1]
(in reality, I have a Dataframe with ~13 000 rows)
The cit columns are links for id (one way), the id #0 have links with id #6 and id #7, the id #1 have no link, the id #4 have links with #9, #2 and #1 and id #3 have links with id #0 and id #1
if there is a link, I want to put 1 if 2 id are linked, else 0
I want to have this output :
id 0 1 4 3
0 X 0 0 1
1 0 X 1 1
4 1 1 X 0
3 1 0 0 X
I have written a code but with 2 for loops..
I want to optimize the following code :
for i in range(len(test.id)):
tmp =
for j in range(len(test.cit)):
if test.id.iloc[i] in test.cit.iloc[j]:
tmp.append(str(1))
else:
tmp.append(str(0))
t2.loc[str(test.id.iloc[i])] = tmp
print(i, '/' , len(test.id))
t2.values[[np.arange(len(test.id))]*2] = "X"
And I don't know how to copy the upper triangular to lower triangular for a DataFrame (I can do it with for loop but 4 for loops with 13 000 rows, it will be very slow..)
I checked the iterrows() and itertuples() functions but I have no idea how can I do it, same for isin() or apply/map() functions..
Thanks in advance for your help.
python pandas list loops optimization
python pandas list loops optimization
asked Nov 17 at 22:39
Hervé
63
63
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
I'd create a new DataFrame, and then you can use pd.crosstab
import pandas as pd
df = (pd.DataFrame(test.cit.values.tolist(),
index = test.id)
.stack()
.reset_index(level=1, drop=True)
.to_frame())
pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)
Output:
0 1 2 6 7 9
id
0 0 0 0 1 1 0
3 1 1 0 0 0 0
4 0 1 1 0 0 1
If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.
Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
– Hervé
Nov 18 at 17:37
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
I'd create a new DataFrame, and then you can use pd.crosstab
import pandas as pd
df = (pd.DataFrame(test.cit.values.tolist(),
index = test.id)
.stack()
.reset_index(level=1, drop=True)
.to_frame())
pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)
Output:
0 1 2 6 7 9
id
0 0 0 0 1 1 0
3 1 1 0 0 0 0
4 0 1 1 0 0 1
If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.
Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
– Hervé
Nov 18 at 17:37
add a comment |
up vote
0
down vote
I'd create a new DataFrame, and then you can use pd.crosstab
import pandas as pd
df = (pd.DataFrame(test.cit.values.tolist(),
index = test.id)
.stack()
.reset_index(level=1, drop=True)
.to_frame())
pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)
Output:
0 1 2 6 7 9
id
0 0 0 0 1 1 0
3 1 1 0 0 0 0
4 0 1 1 0 0 1
If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.
Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
– Hervé
Nov 18 at 17:37
add a comment |
up vote
0
down vote
up vote
0
down vote
I'd create a new DataFrame, and then you can use pd.crosstab
import pandas as pd
df = (pd.DataFrame(test.cit.values.tolist(),
index = test.id)
.stack()
.reset_index(level=1, drop=True)
.to_frame())
pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)
Output:
0 1 2 6 7 9
id
0 0 0 0 1 1 0
3 1 1 0 0 0 0
4 0 1 1 0 0 1
If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.
I'd create a new DataFrame, and then you can use pd.crosstab
import pandas as pd
df = (pd.DataFrame(test.cit.values.tolist(),
index = test.id)
.stack()
.reset_index(level=1, drop=True)
.to_frame())
pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)
Output:
0 1 2 6 7 9
id
0 0 0 0 1 1 0
3 1 1 0 0 0 0
4 0 1 1 0 0 1
If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.
edited Nov 18 at 4:02
answered Nov 18 at 3:09
ALollz
10.1k31134
10.1k31134
Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
– Hervé
Nov 18 at 17:37
add a comment |
Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
– Hervé
Nov 18 at 17:37
Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
– Hervé
Nov 18 at 17:37
Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
– Hervé
Nov 18 at 17:37
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53356217%2fcreate-binary-pandas-dataframe-optimize-loop-for%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown