How to compute the correlations of long format dataframe with pandas?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







3















I have a dataframe with 3 columns.



UserId | ItemId | Rating


(where Rating is the rating a User gave to an Item. It's a np.float16. The 2 Id's are np.int32)



How do you best compute correlations between items using python pandas?



My take is to first pivot the table (wide format) and then apply pd.corr



df = df.pivot(index='UserId', columns='ItemId', values='Rating')
df.corr()


It's working on small datasets, but not on big ones.



That first step creates a big matrix dataset mostly full of missing values. It's quite ram intensive and I can't run it with bigger dataframes.



Isn't there a simpler way to compute the correlations directly on the long dataset, without pivoting?



(I looked into pd.groupBy, but that seems to only split the dataframe, not what I'm looking for.)



EDIT: oversimplified data and working pivot code



import pandas as pd
import numpy as np
d = {'UserId': [1,2,3, 1,2,3, 1,2,3],
'ItemId': [1,1,1, 2,2,2, 3,3,3],
'Rating': [1.1,4.5,7.1, 5.5,3.1,5.5, 1.1,np.nan,2.2]}
df = pd.DataFrame(data=d)
df = df.astype(dtype={'UserId': np.int32, 'ItemId': np.int32, 'Rating': np.float32})
print(df.info())
pivot = df.pivot(index='UserId', columns='ItemId', values='Rating')
print('')
print(pivot)
corr = pivot.corr()
print('')
print(corr)


EDIT2: Large random data generator



def randDf(size = 100):
## MAKE RANDOM DATAFRAME, df =======================
import numpy as np
import pandas as pd
import random
import math
dict_for_df = {}
for i in ('UserId','ItemId','Rating'):
dict_for_df[i] = {}
for j in range(size):
if i=='Rating': val = round( random.random()*5, 1)
else: val = round( random.random() * math.sqrt(size/2) )
dict_for_df[i][j] = val # store in a dict
# print(dict_for_df)
df = pd.DataFrame(dict_for_df) # after the loop convert the dict to a dataframe
# print(df.head())
df = df.astype(dtype={'UserId': np.int32, 'ItemId': np.int32, 'Rating': np.float32})
# df = df.astype(dtype={'UserId': np.int64, 'ItemId': np.int64, 'Rating': np.float64})
## remove doubles -----
df.drop_duplicates(subset=['UserId','ItemId'], keep='first', inplace=True)
## show -----
print(df.info())
print(df.head())
return df
# =======================

df = randDf()









share|improve this question




















  • 1





    Could you provide some example data and expected output, and explain in more detail what you mean by "correlations between items?" Are you interested in finding situations in which, e.g., User A liked Item 1 and also Item 2 (a particular user's ratings for two different products are correlated)? Also, how many total users, items, and ratings do you have? Has each user rated each item exactly once?

    – Peter Leimbigler
    Nov 23 '18 at 13:41











  • @PeterLeimbigler Small example added in the question (but data types are not respected). My numbers are ~50k items and ~200k users ~20M ratings. No not "exactly" once: maximum once. (thus once or never)

    – stallingOne
    Nov 23 '18 at 13:57


















3















I have a dataframe with 3 columns.



UserId | ItemId | Rating


(where Rating is the rating a User gave to an Item. It's a np.float16. The 2 Id's are np.int32)



How do you best compute correlations between items using python pandas?



My take is to first pivot the table (wide format) and then apply pd.corr



df = df.pivot(index='UserId', columns='ItemId', values='Rating')
df.corr()


It's working on small datasets, but not on big ones.



That first step creates a big matrix dataset mostly full of missing values. It's quite ram intensive and I can't run it with bigger dataframes.



Isn't there a simpler way to compute the correlations directly on the long dataset, without pivoting?



(I looked into pd.groupBy, but that seems to only split the dataframe, not what I'm looking for.)



EDIT: oversimplified data and working pivot code



import pandas as pd
import numpy as np
d = {'UserId': [1,2,3, 1,2,3, 1,2,3],
'ItemId': [1,1,1, 2,2,2, 3,3,3],
'Rating': [1.1,4.5,7.1, 5.5,3.1,5.5, 1.1,np.nan,2.2]}
df = pd.DataFrame(data=d)
df = df.astype(dtype={'UserId': np.int32, 'ItemId': np.int32, 'Rating': np.float32})
print(df.info())
pivot = df.pivot(index='UserId', columns='ItemId', values='Rating')
print('')
print(pivot)
corr = pivot.corr()
print('')
print(corr)


EDIT2: Large random data generator



def randDf(size = 100):
## MAKE RANDOM DATAFRAME, df =======================
import numpy as np
import pandas as pd
import random
import math
dict_for_df = {}
for i in ('UserId','ItemId','Rating'):
dict_for_df[i] = {}
for j in range(size):
if i=='Rating': val = round( random.random()*5, 1)
else: val = round( random.random() * math.sqrt(size/2) )
dict_for_df[i][j] = val # store in a dict
# print(dict_for_df)
df = pd.DataFrame(dict_for_df) # after the loop convert the dict to a dataframe
# print(df.head())
df = df.astype(dtype={'UserId': np.int32, 'ItemId': np.int32, 'Rating': np.float32})
# df = df.astype(dtype={'UserId': np.int64, 'ItemId': np.int64, 'Rating': np.float64})
## remove doubles -----
df.drop_duplicates(subset=['UserId','ItemId'], keep='first', inplace=True)
## show -----
print(df.info())
print(df.head())
return df
# =======================

df = randDf()









share|improve this question




















  • 1





    Could you provide some example data and expected output, and explain in more detail what you mean by "correlations between items?" Are you interested in finding situations in which, e.g., User A liked Item 1 and also Item 2 (a particular user's ratings for two different products are correlated)? Also, how many total users, items, and ratings do you have? Has each user rated each item exactly once?

    – Peter Leimbigler
    Nov 23 '18 at 13:41











  • @PeterLeimbigler Small example added in the question (but data types are not respected). My numbers are ~50k items and ~200k users ~20M ratings. No not "exactly" once: maximum once. (thus once or never)

    – stallingOne
    Nov 23 '18 at 13:57














3












3








3


1






I have a dataframe with 3 columns.



UserId | ItemId | Rating


(where Rating is the rating a User gave to an Item. It's a np.float16. The 2 Id's are np.int32)



How do you best compute correlations between items using python pandas?



My take is to first pivot the table (wide format) and then apply pd.corr



df = df.pivot(index='UserId', columns='ItemId', values='Rating')
df.corr()


It's working on small datasets, but not on big ones.



That first step creates a big matrix dataset mostly full of missing values. It's quite ram intensive and I can't run it with bigger dataframes.



Isn't there a simpler way to compute the correlations directly on the long dataset, without pivoting?



(I looked into pd.groupBy, but that seems to only split the dataframe, not what I'm looking for.)



EDIT: oversimplified data and working pivot code



import pandas as pd
import numpy as np
d = {'UserId': [1,2,3, 1,2,3, 1,2,3],
'ItemId': [1,1,1, 2,2,2, 3,3,3],
'Rating': [1.1,4.5,7.1, 5.5,3.1,5.5, 1.1,np.nan,2.2]}
df = pd.DataFrame(data=d)
df = df.astype(dtype={'UserId': np.int32, 'ItemId': np.int32, 'Rating': np.float32})
print(df.info())
pivot = df.pivot(index='UserId', columns='ItemId', values='Rating')
print('')
print(pivot)
corr = pivot.corr()
print('')
print(corr)


EDIT2: Large random data generator



def randDf(size = 100):
## MAKE RANDOM DATAFRAME, df =======================
import numpy as np
import pandas as pd
import random
import math
dict_for_df = {}
for i in ('UserId','ItemId','Rating'):
dict_for_df[i] = {}
for j in range(size):
if i=='Rating': val = round( random.random()*5, 1)
else: val = round( random.random() * math.sqrt(size/2) )
dict_for_df[i][j] = val # store in a dict
# print(dict_for_df)
df = pd.DataFrame(dict_for_df) # after the loop convert the dict to a dataframe
# print(df.head())
df = df.astype(dtype={'UserId': np.int32, 'ItemId': np.int32, 'Rating': np.float32})
# df = df.astype(dtype={'UserId': np.int64, 'ItemId': np.int64, 'Rating': np.float64})
## remove doubles -----
df.drop_duplicates(subset=['UserId','ItemId'], keep='first', inplace=True)
## show -----
print(df.info())
print(df.head())
return df
# =======================

df = randDf()









share|improve this question
















I have a dataframe with 3 columns.



UserId | ItemId | Rating


(where Rating is the rating a User gave to an Item. It's a np.float16. The 2 Id's are np.int32)



How do you best compute correlations between items using python pandas?



My take is to first pivot the table (wide format) and then apply pd.corr



df = df.pivot(index='UserId', columns='ItemId', values='Rating')
df.corr()


It's working on small datasets, but not on big ones.



That first step creates a big matrix dataset mostly full of missing values. It's quite ram intensive and I can't run it with bigger dataframes.



Isn't there a simpler way to compute the correlations directly on the long dataset, without pivoting?



(I looked into pd.groupBy, but that seems to only split the dataframe, not what I'm looking for.)



EDIT: oversimplified data and working pivot code



import pandas as pd
import numpy as np
d = {'UserId': [1,2,3, 1,2,3, 1,2,3],
'ItemId': [1,1,1, 2,2,2, 3,3,3],
'Rating': [1.1,4.5,7.1, 5.5,3.1,5.5, 1.1,np.nan,2.2]}
df = pd.DataFrame(data=d)
df = df.astype(dtype={'UserId': np.int32, 'ItemId': np.int32, 'Rating': np.float32})
print(df.info())
pivot = df.pivot(index='UserId', columns='ItemId', values='Rating')
print('')
print(pivot)
corr = pivot.corr()
print('')
print(corr)


EDIT2: Large random data generator



def randDf(size = 100):
## MAKE RANDOM DATAFRAME, df =======================
import numpy as np
import pandas as pd
import random
import math
dict_for_df = {}
for i in ('UserId','ItemId','Rating'):
dict_for_df[i] = {}
for j in range(size):
if i=='Rating': val = round( random.random()*5, 1)
else: val = round( random.random() * math.sqrt(size/2) )
dict_for_df[i][j] = val # store in a dict
# print(dict_for_df)
df = pd.DataFrame(dict_for_df) # after the loop convert the dict to a dataframe
# print(df.head())
df = df.astype(dtype={'UserId': np.int32, 'ItemId': np.int32, 'Rating': np.float32})
# df = df.astype(dtype={'UserId': np.int64, 'ItemId': np.int64, 'Rating': np.float64})
## remove doubles -----
df.drop_duplicates(subset=['UserId','ItemId'], keep='first', inplace=True)
## show -----
print(df.info())
print(df.head())
return df
# =======================

df = randDf()






python pandas dataframe correlation






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 '18 at 13:45







stallingOne

















asked Nov 23 '18 at 13:31









stallingOnestallingOne

1,50332036




1,50332036








  • 1





    Could you provide some example data and expected output, and explain in more detail what you mean by "correlations between items?" Are you interested in finding situations in which, e.g., User A liked Item 1 and also Item 2 (a particular user's ratings for two different products are correlated)? Also, how many total users, items, and ratings do you have? Has each user rated each item exactly once?

    – Peter Leimbigler
    Nov 23 '18 at 13:41











  • @PeterLeimbigler Small example added in the question (but data types are not respected). My numbers are ~50k items and ~200k users ~20M ratings. No not "exactly" once: maximum once. (thus once or never)

    – stallingOne
    Nov 23 '18 at 13:57














  • 1





    Could you provide some example data and expected output, and explain in more detail what you mean by "correlations between items?" Are you interested in finding situations in which, e.g., User A liked Item 1 and also Item 2 (a particular user's ratings for two different products are correlated)? Also, how many total users, items, and ratings do you have? Has each user rated each item exactly once?

    – Peter Leimbigler
    Nov 23 '18 at 13:41











  • @PeterLeimbigler Small example added in the question (but data types are not respected). My numbers are ~50k items and ~200k users ~20M ratings. No not "exactly" once: maximum once. (thus once or never)

    – stallingOne
    Nov 23 '18 at 13:57








1




1





Could you provide some example data and expected output, and explain in more detail what you mean by "correlations between items?" Are you interested in finding situations in which, e.g., User A liked Item 1 and also Item 2 (a particular user's ratings for two different products are correlated)? Also, how many total users, items, and ratings do you have? Has each user rated each item exactly once?

– Peter Leimbigler
Nov 23 '18 at 13:41





Could you provide some example data and expected output, and explain in more detail what you mean by "correlations between items?" Are you interested in finding situations in which, e.g., User A liked Item 1 and also Item 2 (a particular user's ratings for two different products are correlated)? Also, how many total users, items, and ratings do you have? Has each user rated each item exactly once?

– Peter Leimbigler
Nov 23 '18 at 13:41













@PeterLeimbigler Small example added in the question (but data types are not respected). My numbers are ~50k items and ~200k users ~20M ratings. No not "exactly" once: maximum once. (thus once or never)

– stallingOne
Nov 23 '18 at 13:57





@PeterLeimbigler Small example added in the question (but data types are not respected). My numbers are ~50k items and ~200k users ~20M ratings. No not "exactly" once: maximum once. (thus once or never)

– stallingOne
Nov 23 '18 at 13:57












1 Answer
1






active

oldest

votes


















0














I had another go, and have something that gets exactly the same correlation numbers as your method without using pivot, but is much slower. I can't say whether it uses less or more memory:



from scipy.stats.stats import pearsonr   
import itertools
import pandas as pd
import numpy as np

d =
itemids = list(set(df['ItemId']))
pairsofitems = list(itertools.combinations(itemids,2))

for itempair in pairsofitems:
a = df[df['ItemId'] == itempair[0]][['Rating', 'UserId']]
b = df[df['ItemId'] == itempair[1]][['Rating', 'UserId']]

z = np.ones(len(set(df.UserId)), dtype=int)
z = z * np.nan
z[a.UserId.values] = a.Rating.values

w = np.ones(len(set(df.UserId)), dtype=int)
w = w * np.nan
w[b.UserId.values] = b.Rating.values

bad = ~np.logical_or(np.isnan(w), np.isnan(z))
z = np.compress(bad, z)
w = np.compress(bad, w)
d.append({'firstitem': itempair[0],
'seconditem': itempair[1],
'correlation': pearsonr(z,w)[0]})

df_out = pd.DataFrame(d, columns=['firstitem', 'seconditem', 'correlation'])


This was of help working out to handle the nan's before taking the correlation.



The slicing in the two lines after the for loop take time. I think though, it may have potential if the bottlenecks could be fixed.



Yes, there is some repetition in there with the z and w variables, could put that in a function.



Some explanation of what it does:




  • find all combinations of pairs within your items

  • organise and "x" and "y" set of points for UserId / Rating where any point pair where one of the two is missing (nan) is dropped. I think of a scatter plot and the correlation being how well a straight line fits through it.

  • run pearson correlation on this x-y pair

  • put the ItemId each pair and correlation into a dataframe






share|improve this answer


























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53447658%2fhow-to-compute-the-correlations-of-long-format-dataframe-with-pandas%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I had another go, and have something that gets exactly the same correlation numbers as your method without using pivot, but is much slower. I can't say whether it uses less or more memory:



    from scipy.stats.stats import pearsonr   
    import itertools
    import pandas as pd
    import numpy as np

    d =
    itemids = list(set(df['ItemId']))
    pairsofitems = list(itertools.combinations(itemids,2))

    for itempair in pairsofitems:
    a = df[df['ItemId'] == itempair[0]][['Rating', 'UserId']]
    b = df[df['ItemId'] == itempair[1]][['Rating', 'UserId']]

    z = np.ones(len(set(df.UserId)), dtype=int)
    z = z * np.nan
    z[a.UserId.values] = a.Rating.values

    w = np.ones(len(set(df.UserId)), dtype=int)
    w = w * np.nan
    w[b.UserId.values] = b.Rating.values

    bad = ~np.logical_or(np.isnan(w), np.isnan(z))
    z = np.compress(bad, z)
    w = np.compress(bad, w)
    d.append({'firstitem': itempair[0],
    'seconditem': itempair[1],
    'correlation': pearsonr(z,w)[0]})

    df_out = pd.DataFrame(d, columns=['firstitem', 'seconditem', 'correlation'])


    This was of help working out to handle the nan's before taking the correlation.



    The slicing in the two lines after the for loop take time. I think though, it may have potential if the bottlenecks could be fixed.



    Yes, there is some repetition in there with the z and w variables, could put that in a function.



    Some explanation of what it does:




    • find all combinations of pairs within your items

    • organise and "x" and "y" set of points for UserId / Rating where any point pair where one of the two is missing (nan) is dropped. I think of a scatter plot and the correlation being how well a straight line fits through it.

    • run pearson correlation on this x-y pair

    • put the ItemId each pair and correlation into a dataframe






    share|improve this answer






























      0














      I had another go, and have something that gets exactly the same correlation numbers as your method without using pivot, but is much slower. I can't say whether it uses less or more memory:



      from scipy.stats.stats import pearsonr   
      import itertools
      import pandas as pd
      import numpy as np

      d =
      itemids = list(set(df['ItemId']))
      pairsofitems = list(itertools.combinations(itemids,2))

      for itempair in pairsofitems:
      a = df[df['ItemId'] == itempair[0]][['Rating', 'UserId']]
      b = df[df['ItemId'] == itempair[1]][['Rating', 'UserId']]

      z = np.ones(len(set(df.UserId)), dtype=int)
      z = z * np.nan
      z[a.UserId.values] = a.Rating.values

      w = np.ones(len(set(df.UserId)), dtype=int)
      w = w * np.nan
      w[b.UserId.values] = b.Rating.values

      bad = ~np.logical_or(np.isnan(w), np.isnan(z))
      z = np.compress(bad, z)
      w = np.compress(bad, w)
      d.append({'firstitem': itempair[0],
      'seconditem': itempair[1],
      'correlation': pearsonr(z,w)[0]})

      df_out = pd.DataFrame(d, columns=['firstitem', 'seconditem', 'correlation'])


      This was of help working out to handle the nan's before taking the correlation.



      The slicing in the two lines after the for loop take time. I think though, it may have potential if the bottlenecks could be fixed.



      Yes, there is some repetition in there with the z and w variables, could put that in a function.



      Some explanation of what it does:




      • find all combinations of pairs within your items

      • organise and "x" and "y" set of points for UserId / Rating where any point pair where one of the two is missing (nan) is dropped. I think of a scatter plot and the correlation being how well a straight line fits through it.

      • run pearson correlation on this x-y pair

      • put the ItemId each pair and correlation into a dataframe






      share|improve this answer




























        0












        0








        0







        I had another go, and have something that gets exactly the same correlation numbers as your method without using pivot, but is much slower. I can't say whether it uses less or more memory:



        from scipy.stats.stats import pearsonr   
        import itertools
        import pandas as pd
        import numpy as np

        d =
        itemids = list(set(df['ItemId']))
        pairsofitems = list(itertools.combinations(itemids,2))

        for itempair in pairsofitems:
        a = df[df['ItemId'] == itempair[0]][['Rating', 'UserId']]
        b = df[df['ItemId'] == itempair[1]][['Rating', 'UserId']]

        z = np.ones(len(set(df.UserId)), dtype=int)
        z = z * np.nan
        z[a.UserId.values] = a.Rating.values

        w = np.ones(len(set(df.UserId)), dtype=int)
        w = w * np.nan
        w[b.UserId.values] = b.Rating.values

        bad = ~np.logical_or(np.isnan(w), np.isnan(z))
        z = np.compress(bad, z)
        w = np.compress(bad, w)
        d.append({'firstitem': itempair[0],
        'seconditem': itempair[1],
        'correlation': pearsonr(z,w)[0]})

        df_out = pd.DataFrame(d, columns=['firstitem', 'seconditem', 'correlation'])


        This was of help working out to handle the nan's before taking the correlation.



        The slicing in the two lines after the for loop take time. I think though, it may have potential if the bottlenecks could be fixed.



        Yes, there is some repetition in there with the z and w variables, could put that in a function.



        Some explanation of what it does:




        • find all combinations of pairs within your items

        • organise and "x" and "y" set of points for UserId / Rating where any point pair where one of the two is missing (nan) is dropped. I think of a scatter plot and the correlation being how well a straight line fits through it.

        • run pearson correlation on this x-y pair

        • put the ItemId each pair and correlation into a dataframe






        share|improve this answer















        I had another go, and have something that gets exactly the same correlation numbers as your method without using pivot, but is much slower. I can't say whether it uses less or more memory:



        from scipy.stats.stats import pearsonr   
        import itertools
        import pandas as pd
        import numpy as np

        d =
        itemids = list(set(df['ItemId']))
        pairsofitems = list(itertools.combinations(itemids,2))

        for itempair in pairsofitems:
        a = df[df['ItemId'] == itempair[0]][['Rating', 'UserId']]
        b = df[df['ItemId'] == itempair[1]][['Rating', 'UserId']]

        z = np.ones(len(set(df.UserId)), dtype=int)
        z = z * np.nan
        z[a.UserId.values] = a.Rating.values

        w = np.ones(len(set(df.UserId)), dtype=int)
        w = w * np.nan
        w[b.UserId.values] = b.Rating.values

        bad = ~np.logical_or(np.isnan(w), np.isnan(z))
        z = np.compress(bad, z)
        w = np.compress(bad, w)
        d.append({'firstitem': itempair[0],
        'seconditem': itempair[1],
        'correlation': pearsonr(z,w)[0]})

        df_out = pd.DataFrame(d, columns=['firstitem', 'seconditem', 'correlation'])


        This was of help working out to handle the nan's before taking the correlation.



        The slicing in the two lines after the for loop take time. I think though, it may have potential if the bottlenecks could be fixed.



        Yes, there is some repetition in there with the z and w variables, could put that in a function.



        Some explanation of what it does:




        • find all combinations of pairs within your items

        • organise and "x" and "y" set of points for UserId / Rating where any point pair where one of the two is missing (nan) is dropped. I think of a scatter plot and the correlation being how well a straight line fits through it.

        • run pearson correlation on this x-y pair

        • put the ItemId each pair and correlation into a dataframe







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 27 '18 at 16:01

























        answered Nov 27 '18 at 15:51









        cardamomcardamom

        2,06811344




        2,06811344
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53447658%2fhow-to-compute-the-correlations-of-long-format-dataframe-with-pandas%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            If I really need a card on my start hand, how many mulligans make sense? [duplicate]

            Alcedinidae

            Can an atomic nucleus contain both particles and antiparticles? [duplicate]