Create binary pandas dataframe (optimize loop for)











up vote
0
down vote

favorite












Imagine I have this dataframe :



test = pd.DataFrame({"id" : [0,1,4,3],
"cit" : [[6,7], , [9,2,1], [0,1]]})


This DataFrame :



       id   cit
0 0 [6, 7]
1 1
2 4 [9, 2, 1]
3 3 [0, 1]


(in reality, I have a Dataframe with ~13 000 rows)



The cit columns are links for id (one way), the id #0 have links with id #6 and id #7, the id #1 have no link, the id #4 have links with #9, #2 and #1 and id #3 have links with id #0 and id #1



if there is a link, I want to put 1 if 2 id are linked, else 0



I want to have this output :



id  0   1   4   3
0 X 0 0 1
1 0 X 1 1
4 1 1 X 0
3 1 0 0 X


I have written a code but with 2 for loops..
I want to optimize the following code :



for i in range(len(test.id)):
tmp =
for j in range(len(test.cit)):
if test.id.iloc[i] in test.cit.iloc[j]:
tmp.append(str(1))
else:
tmp.append(str(0))
t2.loc[str(test.id.iloc[i])] = tmp
print(i, '/' , len(test.id))
t2.values[[np.arange(len(test.id))]*2] = "X"


And I don't know how to copy the upper triangular to lower triangular for a DataFrame (I can do it with for loop but 4 for loops with 13 000 rows, it will be very slow..)



I checked the iterrows() and itertuples() functions but I have no idea how can I do it, same for isin() or apply/map() functions..



Thanks in advance for your help.










share|improve this question


























    up vote
    0
    down vote

    favorite












    Imagine I have this dataframe :



    test = pd.DataFrame({"id" : [0,1,4,3],
    "cit" : [[6,7], , [9,2,1], [0,1]]})


    This DataFrame :



           id   cit
    0 0 [6, 7]
    1 1
    2 4 [9, 2, 1]
    3 3 [0, 1]


    (in reality, I have a Dataframe with ~13 000 rows)



    The cit columns are links for id (one way), the id #0 have links with id #6 and id #7, the id #1 have no link, the id #4 have links with #9, #2 and #1 and id #3 have links with id #0 and id #1



    if there is a link, I want to put 1 if 2 id are linked, else 0



    I want to have this output :



    id  0   1   4   3
    0 X 0 0 1
    1 0 X 1 1
    4 1 1 X 0
    3 1 0 0 X


    I have written a code but with 2 for loops..
    I want to optimize the following code :



    for i in range(len(test.id)):
    tmp =
    for j in range(len(test.cit)):
    if test.id.iloc[i] in test.cit.iloc[j]:
    tmp.append(str(1))
    else:
    tmp.append(str(0))
    t2.loc[str(test.id.iloc[i])] = tmp
    print(i, '/' , len(test.id))
    t2.values[[np.arange(len(test.id))]*2] = "X"


    And I don't know how to copy the upper triangular to lower triangular for a DataFrame (I can do it with for loop but 4 for loops with 13 000 rows, it will be very slow..)



    I checked the iterrows() and itertuples() functions but I have no idea how can I do it, same for isin() or apply/map() functions..



    Thanks in advance for your help.










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      Imagine I have this dataframe :



      test = pd.DataFrame({"id" : [0,1,4,3],
      "cit" : [[6,7], , [9,2,1], [0,1]]})


      This DataFrame :



             id   cit
      0 0 [6, 7]
      1 1
      2 4 [9, 2, 1]
      3 3 [0, 1]


      (in reality, I have a Dataframe with ~13 000 rows)



      The cit columns are links for id (one way), the id #0 have links with id #6 and id #7, the id #1 have no link, the id #4 have links with #9, #2 and #1 and id #3 have links with id #0 and id #1



      if there is a link, I want to put 1 if 2 id are linked, else 0



      I want to have this output :



      id  0   1   4   3
      0 X 0 0 1
      1 0 X 1 1
      4 1 1 X 0
      3 1 0 0 X


      I have written a code but with 2 for loops..
      I want to optimize the following code :



      for i in range(len(test.id)):
      tmp =
      for j in range(len(test.cit)):
      if test.id.iloc[i] in test.cit.iloc[j]:
      tmp.append(str(1))
      else:
      tmp.append(str(0))
      t2.loc[str(test.id.iloc[i])] = tmp
      print(i, '/' , len(test.id))
      t2.values[[np.arange(len(test.id))]*2] = "X"


      And I don't know how to copy the upper triangular to lower triangular for a DataFrame (I can do it with for loop but 4 for loops with 13 000 rows, it will be very slow..)



      I checked the iterrows() and itertuples() functions but I have no idea how can I do it, same for isin() or apply/map() functions..



      Thanks in advance for your help.










      share|improve this question













      Imagine I have this dataframe :



      test = pd.DataFrame({"id" : [0,1,4,3],
      "cit" : [[6,7], , [9,2,1], [0,1]]})


      This DataFrame :



             id   cit
      0 0 [6, 7]
      1 1
      2 4 [9, 2, 1]
      3 3 [0, 1]


      (in reality, I have a Dataframe with ~13 000 rows)



      The cit columns are links for id (one way), the id #0 have links with id #6 and id #7, the id #1 have no link, the id #4 have links with #9, #2 and #1 and id #3 have links with id #0 and id #1



      if there is a link, I want to put 1 if 2 id are linked, else 0



      I want to have this output :



      id  0   1   4   3
      0 X 0 0 1
      1 0 X 1 1
      4 1 1 X 0
      3 1 0 0 X


      I have written a code but with 2 for loops..
      I want to optimize the following code :



      for i in range(len(test.id)):
      tmp =
      for j in range(len(test.cit)):
      if test.id.iloc[i] in test.cit.iloc[j]:
      tmp.append(str(1))
      else:
      tmp.append(str(0))
      t2.loc[str(test.id.iloc[i])] = tmp
      print(i, '/' , len(test.id))
      t2.values[[np.arange(len(test.id))]*2] = "X"


      And I don't know how to copy the upper triangular to lower triangular for a DataFrame (I can do it with for loop but 4 for loops with 13 000 rows, it will be very slow..)



      I checked the iterrows() and itertuples() functions but I have no idea how can I do it, same for isin() or apply/map() functions..



      Thanks in advance for your help.







      python pandas list loops optimization






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 17 at 22:39









      Hervé

      63




      63
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          I'd create a new DataFrame, and then you can use pd.crosstab



          import pandas as pd

          df = (pd.DataFrame(test.cit.values.tolist(),
          index = test.id)
          .stack()
          .reset_index(level=1, drop=True)
          .to_frame())

          pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)


          Output:



              0  1  2  6  7  9
          id
          0 0 0 0 1 1 0
          3 1 1 0 0 0 0
          4 0 1 1 0 0 1




          If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.






          share|improve this answer























          • Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
            – Hervé
            Nov 18 at 17:37













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53356217%2fcreate-binary-pandas-dataframe-optimize-loop-for%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          I'd create a new DataFrame, and then you can use pd.crosstab



          import pandas as pd

          df = (pd.DataFrame(test.cit.values.tolist(),
          index = test.id)
          .stack()
          .reset_index(level=1, drop=True)
          .to_frame())

          pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)


          Output:



              0  1  2  6  7  9
          id
          0 0 0 0 1 1 0
          3 1 1 0 0 0 0
          4 0 1 1 0 0 1




          If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.






          share|improve this answer























          • Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
            – Hervé
            Nov 18 at 17:37

















          up vote
          0
          down vote













          I'd create a new DataFrame, and then you can use pd.crosstab



          import pandas as pd

          df = (pd.DataFrame(test.cit.values.tolist(),
          index = test.id)
          .stack()
          .reset_index(level=1, drop=True)
          .to_frame())

          pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)


          Output:



              0  1  2  6  7  9
          id
          0 0 0 0 1 1 0
          3 1 1 0 0 0 0
          4 0 1 1 0 0 1




          If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.






          share|improve this answer























          • Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
            – Hervé
            Nov 18 at 17:37















          up vote
          0
          down vote










          up vote
          0
          down vote









          I'd create a new DataFrame, and then you can use pd.crosstab



          import pandas as pd

          df = (pd.DataFrame(test.cit.values.tolist(),
          index = test.id)
          .stack()
          .reset_index(level=1, drop=True)
          .to_frame())

          pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)


          Output:



              0  1  2  6  7  9
          id
          0 0 0 0 1 1 0
          3 1 1 0 0 0 0
          4 0 1 1 0 0 1




          If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.






          share|improve this answer














          I'd create a new DataFrame, and then you can use pd.crosstab



          import pandas as pd

          df = (pd.DataFrame(test.cit.values.tolist(),
          index = test.id)
          .stack()
          .reset_index(level=1, drop=True)
          .to_frame())

          pd.crosstab(df.index, df[0].values.astype(int)).rename_axis(None,1).rename_axis('id', 0)


          Output:



              0  1  2  6  7  9
          id
          0 0 0 0 1 1 0
          3 1 1 0 0 0 0
          4 0 1 1 0 0 1




          If needed, you can reindex afterwards to then get all rows or all columns. But since your expected output didn't match the data you provided, not sure if that's needed.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 18 at 4:02

























          answered Nov 18 at 3:09









          ALollz

          10.1k31134




          10.1k31134












          • Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
            – Hervé
            Nov 18 at 17:37




















          • Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
            – Hervé
            Nov 18 at 17:37


















          Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
          – Hervé
          Nov 18 at 17:37






          Looking good ! Didn't know there was such a function like that, for the 6,7 and 9, I forgot to say there's id who are not in id columns (because of pre-processing filter)
          – Hervé
          Nov 18 at 17:37




















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53356217%2fcreate-binary-pandas-dataframe-optimize-loop-for%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Paul Cézanne

          UIScrollView CustomStickyHeader Resize height generates problems when scroll is too fast

          Angular material date-picker (MatDatepicker) auto completes the date on focus out