Pandas Split DataFrame using row index












1














I want to split dataframe by uneven number of rows using row index.



The below code:



groups = df.groupby((np.arange(len(df.index))/l[1]).astype(int))


works only for uniform number of rows.



df

a b c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7

l = [2, 5, 7]

df1
1 1 1
2 2 2

df2
3,3,3
4,4,4
5,5,5

df3
6,6,6
7,7,7

df4
8,8,8









share|improve this question
























  • have you tried df.loc?
    – Mohit Motwani
    Nov 20 '18 at 11:10










  • Do you want to split randomly or do you have some set of indexes you'd like to split with?
    – Mohit Motwani
    Nov 20 '18 at 11:12










  • Not random, I would like split based on array l. First 2 rows then from 3rd to 5th row and so on
    – Pradeep Tummala
    Nov 21 '18 at 7:37
















1














I want to split dataframe by uneven number of rows using row index.



The below code:



groups = df.groupby((np.arange(len(df.index))/l[1]).astype(int))


works only for uniform number of rows.



df

a b c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7

l = [2, 5, 7]

df1
1 1 1
2 2 2

df2
3,3,3
4,4,4
5,5,5

df3
6,6,6
7,7,7

df4
8,8,8









share|improve this question
























  • have you tried df.loc?
    – Mohit Motwani
    Nov 20 '18 at 11:10










  • Do you want to split randomly or do you have some set of indexes you'd like to split with?
    – Mohit Motwani
    Nov 20 '18 at 11:12










  • Not random, I would like split based on array l. First 2 rows then from 3rd to 5th row and so on
    – Pradeep Tummala
    Nov 21 '18 at 7:37














1












1








1







I want to split dataframe by uneven number of rows using row index.



The below code:



groups = df.groupby((np.arange(len(df.index))/l[1]).astype(int))


works only for uniform number of rows.



df

a b c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7

l = [2, 5, 7]

df1
1 1 1
2 2 2

df2
3,3,3
4,4,4
5,5,5

df3
6,6,6
7,7,7

df4
8,8,8









share|improve this question















I want to split dataframe by uneven number of rows using row index.



The below code:



groups = df.groupby((np.arange(len(df.index))/l[1]).astype(int))


works only for uniform number of rows.



df

a b c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7

l = [2, 5, 7]

df1
1 1 1
2 2 2

df2
3,3,3
4,4,4
5,5,5

df3
6,6,6
7,7,7

df4
8,8,8






python pandas dataframe pandas-groupby






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 11:04









anky_91

1,055214




1,055214










asked Nov 20 '18 at 10:51









Pradeep Tummala

133




133












  • have you tried df.loc?
    – Mohit Motwani
    Nov 20 '18 at 11:10










  • Do you want to split randomly or do you have some set of indexes you'd like to split with?
    – Mohit Motwani
    Nov 20 '18 at 11:12










  • Not random, I would like split based on array l. First 2 rows then from 3rd to 5th row and so on
    – Pradeep Tummala
    Nov 21 '18 at 7:37


















  • have you tried df.loc?
    – Mohit Motwani
    Nov 20 '18 at 11:10










  • Do you want to split randomly or do you have some set of indexes you'd like to split with?
    – Mohit Motwani
    Nov 20 '18 at 11:12










  • Not random, I would like split based on array l. First 2 rows then from 3rd to 5th row and so on
    – Pradeep Tummala
    Nov 21 '18 at 7:37
















have you tried df.loc?
– Mohit Motwani
Nov 20 '18 at 11:10




have you tried df.loc?
– Mohit Motwani
Nov 20 '18 at 11:10












Do you want to split randomly or do you have some set of indexes you'd like to split with?
– Mohit Motwani
Nov 20 '18 at 11:12




Do you want to split randomly or do you have some set of indexes you'd like to split with?
– Mohit Motwani
Nov 20 '18 at 11:12












Not random, I would like split based on array l. First 2 rows then from 3rd to 5th row and so on
– Pradeep Tummala
Nov 21 '18 at 7:37




Not random, I would like split based on array l. First 2 rows then from 3rd to 5th row and so on
– Pradeep Tummala
Nov 21 '18 at 7:37












5 Answers
5






active

oldest

votes


















1














You could use list comprehension with a little modications your list, l, first.



print(df)

a b c
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 5 5 5
5 6 6 6
6 7 7 7
7 8 8 8


l = [2,5,7]
l_mod = [0] + l + [max(l)+1]

list_of_dfs = [df.iloc[l_mod[n]:l_mod[n+1]] for n in range(len(l_mod)-1)]


Output:



list_of_dfs[0]

a b c
0 1 1 1
1 2 2 2

list_of_dfs[1]

a b c
2 3 3 3
3 4 4 4
4 5 5 5

list_of_dfs[2]

a b c
5 6 6 6
6 7 7 7

list_of_dfs[3]

a b c
7 8 8 8





share|improve this answer





















  • Thanks. Works pretty well in minimum lines
    – Pradeep Tummala
    Nov 21 '18 at 7:47










  • @PradeepTummala if this answer helped you, would you consider upvoting and accepting.
    – Scott Boston
    Dec 4 '18 at 14:20



















0














I think this is you are looking for.,



l = [2, 5, 7]
dfs=
i=0
for val in l:
if i==0:
temp=df.iloc[:val]
dfs.append(temp)
elif i==len(l):
temp=df.iloc[val]
dfs.append(temp)
else:
temp=df.iloc[l[i-1]:val]
dfs.append(temp)
i+=1


Output:



   a  b  c
0 1 1 1
1 2 2 2
a b c
2 3 3 3
3 4 4 4
4 5 5 5
a b c
5 6 6 6
6 7 7 7


Another Solution:



l = [2, 5, 7]
t= np.arange(l[-1])
l.reverse()
for val in l:
t[:val]=val
temp=pd.DataFrame(t)
temp=pd.concat([df,temp],axis=1)
for u,v in temp.groupby(0):
print v


Output:



   a  b  c  0
0 1 1 1 2
1 2 2 2 2
a b c 0
2 3 3 3 5
3 4 4 4 5
4 5 5 5 5
a b c 0
5 6 6 6 7
6 7 7 7 7





share|improve this answer































    0














    Do this:



    l = [2,5,7]
    c = 0
    d = dict() # A dictionary to hold multiple dataframes

    In [477]: for i in l:
    ...: if c == 0:
    ...: index_list = df[df.a <= i].index
    ...: else:
    ...: index_list = df[(df.a > l[c-1]) & (df.a <= l[c])].index
    ...: min_index = index_list[0]
    ...: max_index = index_list[-1] + 1
    ...: d[i] = df.iloc[min_index:max_index]
    ...: c += 1
    ...:


    In [479]: for key in d.keys():
    ...: print(d[key])
    ...:
    a b c
    0 1 1 1
    1 2 2 2
    a b c
    2 3 3 3
    3 4 4 4
    4 5 5 5
    a b c
    5 6 6 6
    6 7 7 7





    share|improve this answer































      0














      You can create an array to use for indexing via NumPy:



      import pandas as pd, numpy as np

      df = pd.DataFrame(np.arange(24).reshape((8, 3)), columns=list('abc'))

      L = [2, 5, 7]
      idx = np.cumsum(np.in1d(np.arange(len(df.index)), L))

      for _, chunk in df.groupby(idx):
      print(chunk, 'n')

      a b c
      0 0 1 2
      1 3 4 5

      a b c
      2 6 7 8
      3 9 10 11
      4 12 13 14

      a b c
      5 15 16 17
      6 18 19 20

      a b c
      7 21 22 23


      Instead of defining a new variable for each dataframe, you can use a dictionary:



      d = dict(tuple(df.groupby(idx)))

      print(d[1]) # print second groupby value

      a b c
      2 6 7 8
      3 9 10 11
      4 12 13 14





      share|improve this answer





























        0














        I think this is what you need:



        df = pd.DataFrame({'a': np.arange(1, 8),
        'b': np.arange(1, 8),
        'c': np.arange(1, 8)})
        df.head()
        a b c
        0 1 1 1
        1 2 2 2
        2 3 3 3
        3 4 4 4
        4 5 5 5
        5 6 6 6
        6 7 7 7

        last_check = 0
        dfs =
        for ind in [2, 5, 7]:
        dfs.append(df.loc[last_check:ind-1])
        last_check = ind


        Although list comprehension are much more efficient than a for loop, the last_check is necessary if you don't have a pattern in your list of indices.



        dfs[0]

        a b c
        0 1 1 1
        1 2 2 2

        dfs[2]

        a b c
        5 6 6 6
        6 7 7 7





        share|improve this answer





















          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391378%2fpandas-split-dataframe-using-row-index%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          5 Answers
          5






          active

          oldest

          votes








          5 Answers
          5






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          You could use list comprehension with a little modications your list, l, first.



          print(df)

          a b c
          0 1 1 1
          1 2 2 2
          2 3 3 3
          3 4 4 4
          4 5 5 5
          5 6 6 6
          6 7 7 7
          7 8 8 8


          l = [2,5,7]
          l_mod = [0] + l + [max(l)+1]

          list_of_dfs = [df.iloc[l_mod[n]:l_mod[n+1]] for n in range(len(l_mod)-1)]


          Output:



          list_of_dfs[0]

          a b c
          0 1 1 1
          1 2 2 2

          list_of_dfs[1]

          a b c
          2 3 3 3
          3 4 4 4
          4 5 5 5

          list_of_dfs[2]

          a b c
          5 6 6 6
          6 7 7 7

          list_of_dfs[3]

          a b c
          7 8 8 8





          share|improve this answer





















          • Thanks. Works pretty well in minimum lines
            – Pradeep Tummala
            Nov 21 '18 at 7:47










          • @PradeepTummala if this answer helped you, would you consider upvoting and accepting.
            – Scott Boston
            Dec 4 '18 at 14:20
















          1














          You could use list comprehension with a little modications your list, l, first.



          print(df)

          a b c
          0 1 1 1
          1 2 2 2
          2 3 3 3
          3 4 4 4
          4 5 5 5
          5 6 6 6
          6 7 7 7
          7 8 8 8


          l = [2,5,7]
          l_mod = [0] + l + [max(l)+1]

          list_of_dfs = [df.iloc[l_mod[n]:l_mod[n+1]] for n in range(len(l_mod)-1)]


          Output:



          list_of_dfs[0]

          a b c
          0 1 1 1
          1 2 2 2

          list_of_dfs[1]

          a b c
          2 3 3 3
          3 4 4 4
          4 5 5 5

          list_of_dfs[2]

          a b c
          5 6 6 6
          6 7 7 7

          list_of_dfs[3]

          a b c
          7 8 8 8





          share|improve this answer





















          • Thanks. Works pretty well in minimum lines
            – Pradeep Tummala
            Nov 21 '18 at 7:47










          • @PradeepTummala if this answer helped you, would you consider upvoting and accepting.
            – Scott Boston
            Dec 4 '18 at 14:20














          1












          1








          1






          You could use list comprehension with a little modications your list, l, first.



          print(df)

          a b c
          0 1 1 1
          1 2 2 2
          2 3 3 3
          3 4 4 4
          4 5 5 5
          5 6 6 6
          6 7 7 7
          7 8 8 8


          l = [2,5,7]
          l_mod = [0] + l + [max(l)+1]

          list_of_dfs = [df.iloc[l_mod[n]:l_mod[n+1]] for n in range(len(l_mod)-1)]


          Output:



          list_of_dfs[0]

          a b c
          0 1 1 1
          1 2 2 2

          list_of_dfs[1]

          a b c
          2 3 3 3
          3 4 4 4
          4 5 5 5

          list_of_dfs[2]

          a b c
          5 6 6 6
          6 7 7 7

          list_of_dfs[3]

          a b c
          7 8 8 8





          share|improve this answer












          You could use list comprehension with a little modications your list, l, first.



          print(df)

          a b c
          0 1 1 1
          1 2 2 2
          2 3 3 3
          3 4 4 4
          4 5 5 5
          5 6 6 6
          6 7 7 7
          7 8 8 8


          l = [2,5,7]
          l_mod = [0] + l + [max(l)+1]

          list_of_dfs = [df.iloc[l_mod[n]:l_mod[n+1]] for n in range(len(l_mod)-1)]


          Output:



          list_of_dfs[0]

          a b c
          0 1 1 1
          1 2 2 2

          list_of_dfs[1]

          a b c
          2 3 3 3
          3 4 4 4
          4 5 5 5

          list_of_dfs[2]

          a b c
          5 6 6 6
          6 7 7 7

          list_of_dfs[3]

          a b c
          7 8 8 8






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 20 '18 at 14:40









          Scott Boston

          51.8k72955




          51.8k72955












          • Thanks. Works pretty well in minimum lines
            – Pradeep Tummala
            Nov 21 '18 at 7:47










          • @PradeepTummala if this answer helped you, would you consider upvoting and accepting.
            – Scott Boston
            Dec 4 '18 at 14:20


















          • Thanks. Works pretty well in minimum lines
            – Pradeep Tummala
            Nov 21 '18 at 7:47










          • @PradeepTummala if this answer helped you, would you consider upvoting and accepting.
            – Scott Boston
            Dec 4 '18 at 14:20
















          Thanks. Works pretty well in minimum lines
          – Pradeep Tummala
          Nov 21 '18 at 7:47




          Thanks. Works pretty well in minimum lines
          – Pradeep Tummala
          Nov 21 '18 at 7:47












          @PradeepTummala if this answer helped you, would you consider upvoting and accepting.
          – Scott Boston
          Dec 4 '18 at 14:20




          @PradeepTummala if this answer helped you, would you consider upvoting and accepting.
          – Scott Boston
          Dec 4 '18 at 14:20













          0














          I think this is you are looking for.,



          l = [2, 5, 7]
          dfs=
          i=0
          for val in l:
          if i==0:
          temp=df.iloc[:val]
          dfs.append(temp)
          elif i==len(l):
          temp=df.iloc[val]
          dfs.append(temp)
          else:
          temp=df.iloc[l[i-1]:val]
          dfs.append(temp)
          i+=1


          Output:



             a  b  c
          0 1 1 1
          1 2 2 2
          a b c
          2 3 3 3
          3 4 4 4
          4 5 5 5
          a b c
          5 6 6 6
          6 7 7 7


          Another Solution:



          l = [2, 5, 7]
          t= np.arange(l[-1])
          l.reverse()
          for val in l:
          t[:val]=val
          temp=pd.DataFrame(t)
          temp=pd.concat([df,temp],axis=1)
          for u,v in temp.groupby(0):
          print v


          Output:



             a  b  c  0
          0 1 1 1 2
          1 2 2 2 2
          a b c 0
          2 3 3 3 5
          3 4 4 4 5
          4 5 5 5 5
          a b c 0
          5 6 6 6 7
          6 7 7 7 7





          share|improve this answer




























            0














            I think this is you are looking for.,



            l = [2, 5, 7]
            dfs=
            i=0
            for val in l:
            if i==0:
            temp=df.iloc[:val]
            dfs.append(temp)
            elif i==len(l):
            temp=df.iloc[val]
            dfs.append(temp)
            else:
            temp=df.iloc[l[i-1]:val]
            dfs.append(temp)
            i+=1


            Output:



               a  b  c
            0 1 1 1
            1 2 2 2
            a b c
            2 3 3 3
            3 4 4 4
            4 5 5 5
            a b c
            5 6 6 6
            6 7 7 7


            Another Solution:



            l = [2, 5, 7]
            t= np.arange(l[-1])
            l.reverse()
            for val in l:
            t[:val]=val
            temp=pd.DataFrame(t)
            temp=pd.concat([df,temp],axis=1)
            for u,v in temp.groupby(0):
            print v


            Output:



               a  b  c  0
            0 1 1 1 2
            1 2 2 2 2
            a b c 0
            2 3 3 3 5
            3 4 4 4 5
            4 5 5 5 5
            a b c 0
            5 6 6 6 7
            6 7 7 7 7





            share|improve this answer


























              0












              0








              0






              I think this is you are looking for.,



              l = [2, 5, 7]
              dfs=
              i=0
              for val in l:
              if i==0:
              temp=df.iloc[:val]
              dfs.append(temp)
              elif i==len(l):
              temp=df.iloc[val]
              dfs.append(temp)
              else:
              temp=df.iloc[l[i-1]:val]
              dfs.append(temp)
              i+=1


              Output:



                 a  b  c
              0 1 1 1
              1 2 2 2
              a b c
              2 3 3 3
              3 4 4 4
              4 5 5 5
              a b c
              5 6 6 6
              6 7 7 7


              Another Solution:



              l = [2, 5, 7]
              t= np.arange(l[-1])
              l.reverse()
              for val in l:
              t[:val]=val
              temp=pd.DataFrame(t)
              temp=pd.concat([df,temp],axis=1)
              for u,v in temp.groupby(0):
              print v


              Output:



                 a  b  c  0
              0 1 1 1 2
              1 2 2 2 2
              a b c 0
              2 3 3 3 5
              3 4 4 4 5
              4 5 5 5 5
              a b c 0
              5 6 6 6 7
              6 7 7 7 7





              share|improve this answer














              I think this is you are looking for.,



              l = [2, 5, 7]
              dfs=
              i=0
              for val in l:
              if i==0:
              temp=df.iloc[:val]
              dfs.append(temp)
              elif i==len(l):
              temp=df.iloc[val]
              dfs.append(temp)
              else:
              temp=df.iloc[l[i-1]:val]
              dfs.append(temp)
              i+=1


              Output:



                 a  b  c
              0 1 1 1
              1 2 2 2
              a b c
              2 3 3 3
              3 4 4 4
              4 5 5 5
              a b c
              5 6 6 6
              6 7 7 7


              Another Solution:



              l = [2, 5, 7]
              t= np.arange(l[-1])
              l.reverse()
              for val in l:
              t[:val]=val
              temp=pd.DataFrame(t)
              temp=pd.concat([df,temp],axis=1)
              for u,v in temp.groupby(0):
              print v


              Output:



                 a  b  c  0
              0 1 1 1 2
              1 2 2 2 2
              a b c 0
              2 3 3 3 5
              3 4 4 4 5
              4 5 5 5 5
              a b c 0
              5 6 6 6 7
              6 7 7 7 7






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 20 '18 at 11:31

























              answered Nov 20 '18 at 11:13









              Mohamed Thasin ah

              3,45931238




              3,45931238























                  0














                  Do this:



                  l = [2,5,7]
                  c = 0
                  d = dict() # A dictionary to hold multiple dataframes

                  In [477]: for i in l:
                  ...: if c == 0:
                  ...: index_list = df[df.a <= i].index
                  ...: else:
                  ...: index_list = df[(df.a > l[c-1]) & (df.a <= l[c])].index
                  ...: min_index = index_list[0]
                  ...: max_index = index_list[-1] + 1
                  ...: d[i] = df.iloc[min_index:max_index]
                  ...: c += 1
                  ...:


                  In [479]: for key in d.keys():
                  ...: print(d[key])
                  ...:
                  a b c
                  0 1 1 1
                  1 2 2 2
                  a b c
                  2 3 3 3
                  3 4 4 4
                  4 5 5 5
                  a b c
                  5 6 6 6
                  6 7 7 7





                  share|improve this answer




























                    0














                    Do this:



                    l = [2,5,7]
                    c = 0
                    d = dict() # A dictionary to hold multiple dataframes

                    In [477]: for i in l:
                    ...: if c == 0:
                    ...: index_list = df[df.a <= i].index
                    ...: else:
                    ...: index_list = df[(df.a > l[c-1]) & (df.a <= l[c])].index
                    ...: min_index = index_list[0]
                    ...: max_index = index_list[-1] + 1
                    ...: d[i] = df.iloc[min_index:max_index]
                    ...: c += 1
                    ...:


                    In [479]: for key in d.keys():
                    ...: print(d[key])
                    ...:
                    a b c
                    0 1 1 1
                    1 2 2 2
                    a b c
                    2 3 3 3
                    3 4 4 4
                    4 5 5 5
                    a b c
                    5 6 6 6
                    6 7 7 7





                    share|improve this answer


























                      0












                      0








                      0






                      Do this:



                      l = [2,5,7]
                      c = 0
                      d = dict() # A dictionary to hold multiple dataframes

                      In [477]: for i in l:
                      ...: if c == 0:
                      ...: index_list = df[df.a <= i].index
                      ...: else:
                      ...: index_list = df[(df.a > l[c-1]) & (df.a <= l[c])].index
                      ...: min_index = index_list[0]
                      ...: max_index = index_list[-1] + 1
                      ...: d[i] = df.iloc[min_index:max_index]
                      ...: c += 1
                      ...:


                      In [479]: for key in d.keys():
                      ...: print(d[key])
                      ...:
                      a b c
                      0 1 1 1
                      1 2 2 2
                      a b c
                      2 3 3 3
                      3 4 4 4
                      4 5 5 5
                      a b c
                      5 6 6 6
                      6 7 7 7





                      share|improve this answer














                      Do this:



                      l = [2,5,7]
                      c = 0
                      d = dict() # A dictionary to hold multiple dataframes

                      In [477]: for i in l:
                      ...: if c == 0:
                      ...: index_list = df[df.a <= i].index
                      ...: else:
                      ...: index_list = df[(df.a > l[c-1]) & (df.a <= l[c])].index
                      ...: min_index = index_list[0]
                      ...: max_index = index_list[-1] + 1
                      ...: d[i] = df.iloc[min_index:max_index]
                      ...: c += 1
                      ...:


                      In [479]: for key in d.keys():
                      ...: print(d[key])
                      ...:
                      a b c
                      0 1 1 1
                      1 2 2 2
                      a b c
                      2 3 3 3
                      3 4 4 4
                      4 5 5 5
                      a b c
                      5 6 6 6
                      6 7 7 7






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Nov 20 '18 at 12:24

























                      answered Nov 20 '18 at 11:20









                      Mayank Porwal

                      4,4991624




                      4,4991624























                          0














                          You can create an array to use for indexing via NumPy:



                          import pandas as pd, numpy as np

                          df = pd.DataFrame(np.arange(24).reshape((8, 3)), columns=list('abc'))

                          L = [2, 5, 7]
                          idx = np.cumsum(np.in1d(np.arange(len(df.index)), L))

                          for _, chunk in df.groupby(idx):
                          print(chunk, 'n')

                          a b c
                          0 0 1 2
                          1 3 4 5

                          a b c
                          2 6 7 8
                          3 9 10 11
                          4 12 13 14

                          a b c
                          5 15 16 17
                          6 18 19 20

                          a b c
                          7 21 22 23


                          Instead of defining a new variable for each dataframe, you can use a dictionary:



                          d = dict(tuple(df.groupby(idx)))

                          print(d[1]) # print second groupby value

                          a b c
                          2 6 7 8
                          3 9 10 11
                          4 12 13 14





                          share|improve this answer


























                            0














                            You can create an array to use for indexing via NumPy:



                            import pandas as pd, numpy as np

                            df = pd.DataFrame(np.arange(24).reshape((8, 3)), columns=list('abc'))

                            L = [2, 5, 7]
                            idx = np.cumsum(np.in1d(np.arange(len(df.index)), L))

                            for _, chunk in df.groupby(idx):
                            print(chunk, 'n')

                            a b c
                            0 0 1 2
                            1 3 4 5

                            a b c
                            2 6 7 8
                            3 9 10 11
                            4 12 13 14

                            a b c
                            5 15 16 17
                            6 18 19 20

                            a b c
                            7 21 22 23


                            Instead of defining a new variable for each dataframe, you can use a dictionary:



                            d = dict(tuple(df.groupby(idx)))

                            print(d[1]) # print second groupby value

                            a b c
                            2 6 7 8
                            3 9 10 11
                            4 12 13 14





                            share|improve this answer
























                              0












                              0








                              0






                              You can create an array to use for indexing via NumPy:



                              import pandas as pd, numpy as np

                              df = pd.DataFrame(np.arange(24).reshape((8, 3)), columns=list('abc'))

                              L = [2, 5, 7]
                              idx = np.cumsum(np.in1d(np.arange(len(df.index)), L))

                              for _, chunk in df.groupby(idx):
                              print(chunk, 'n')

                              a b c
                              0 0 1 2
                              1 3 4 5

                              a b c
                              2 6 7 8
                              3 9 10 11
                              4 12 13 14

                              a b c
                              5 15 16 17
                              6 18 19 20

                              a b c
                              7 21 22 23


                              Instead of defining a new variable for each dataframe, you can use a dictionary:



                              d = dict(tuple(df.groupby(idx)))

                              print(d[1]) # print second groupby value

                              a b c
                              2 6 7 8
                              3 9 10 11
                              4 12 13 14





                              share|improve this answer












                              You can create an array to use for indexing via NumPy:



                              import pandas as pd, numpy as np

                              df = pd.DataFrame(np.arange(24).reshape((8, 3)), columns=list('abc'))

                              L = [2, 5, 7]
                              idx = np.cumsum(np.in1d(np.arange(len(df.index)), L))

                              for _, chunk in df.groupby(idx):
                              print(chunk, 'n')

                              a b c
                              0 0 1 2
                              1 3 4 5

                              a b c
                              2 6 7 8
                              3 9 10 11
                              4 12 13 14

                              a b c
                              5 15 16 17
                              6 18 19 20

                              a b c
                              7 21 22 23


                              Instead of defining a new variable for each dataframe, you can use a dictionary:



                              d = dict(tuple(df.groupby(idx)))

                              print(d[1]) # print second groupby value

                              a b c
                              2 6 7 8
                              3 9 10 11
                              4 12 13 14






                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Nov 20 '18 at 14:04









                              jpp

                              92.2k2053103




                              92.2k2053103























                                  0














                                  I think this is what you need:



                                  df = pd.DataFrame({'a': np.arange(1, 8),
                                  'b': np.arange(1, 8),
                                  'c': np.arange(1, 8)})
                                  df.head()
                                  a b c
                                  0 1 1 1
                                  1 2 2 2
                                  2 3 3 3
                                  3 4 4 4
                                  4 5 5 5
                                  5 6 6 6
                                  6 7 7 7

                                  last_check = 0
                                  dfs =
                                  for ind in [2, 5, 7]:
                                  dfs.append(df.loc[last_check:ind-1])
                                  last_check = ind


                                  Although list comprehension are much more efficient than a for loop, the last_check is necessary if you don't have a pattern in your list of indices.



                                  dfs[0]

                                  a b c
                                  0 1 1 1
                                  1 2 2 2

                                  dfs[2]

                                  a b c
                                  5 6 6 6
                                  6 7 7 7





                                  share|improve this answer


























                                    0














                                    I think this is what you need:



                                    df = pd.DataFrame({'a': np.arange(1, 8),
                                    'b': np.arange(1, 8),
                                    'c': np.arange(1, 8)})
                                    df.head()
                                    a b c
                                    0 1 1 1
                                    1 2 2 2
                                    2 3 3 3
                                    3 4 4 4
                                    4 5 5 5
                                    5 6 6 6
                                    6 7 7 7

                                    last_check = 0
                                    dfs =
                                    for ind in [2, 5, 7]:
                                    dfs.append(df.loc[last_check:ind-1])
                                    last_check = ind


                                    Although list comprehension are much more efficient than a for loop, the last_check is necessary if you don't have a pattern in your list of indices.



                                    dfs[0]

                                    a b c
                                    0 1 1 1
                                    1 2 2 2

                                    dfs[2]

                                    a b c
                                    5 6 6 6
                                    6 7 7 7





                                    share|improve this answer
























                                      0












                                      0








                                      0






                                      I think this is what you need:



                                      df = pd.DataFrame({'a': np.arange(1, 8),
                                      'b': np.arange(1, 8),
                                      'c': np.arange(1, 8)})
                                      df.head()
                                      a b c
                                      0 1 1 1
                                      1 2 2 2
                                      2 3 3 3
                                      3 4 4 4
                                      4 5 5 5
                                      5 6 6 6
                                      6 7 7 7

                                      last_check = 0
                                      dfs =
                                      for ind in [2, 5, 7]:
                                      dfs.append(df.loc[last_check:ind-1])
                                      last_check = ind


                                      Although list comprehension are much more efficient than a for loop, the last_check is necessary if you don't have a pattern in your list of indices.



                                      dfs[0]

                                      a b c
                                      0 1 1 1
                                      1 2 2 2

                                      dfs[2]

                                      a b c
                                      5 6 6 6
                                      6 7 7 7





                                      share|improve this answer












                                      I think this is what you need:



                                      df = pd.DataFrame({'a': np.arange(1, 8),
                                      'b': np.arange(1, 8),
                                      'c': np.arange(1, 8)})
                                      df.head()
                                      a b c
                                      0 1 1 1
                                      1 2 2 2
                                      2 3 3 3
                                      3 4 4 4
                                      4 5 5 5
                                      5 6 6 6
                                      6 7 7 7

                                      last_check = 0
                                      dfs =
                                      for ind in [2, 5, 7]:
                                      dfs.append(df.loc[last_check:ind-1])
                                      last_check = ind


                                      Although list comprehension are much more efficient than a for loop, the last_check is necessary if you don't have a pattern in your list of indices.



                                      dfs[0]

                                      a b c
                                      0 1 1 1
                                      1 2 2 2

                                      dfs[2]

                                      a b c
                                      5 6 6 6
                                      6 7 7 7






                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Nov 21 '18 at 9:37









                                      Mohit Motwani

                                      1,1111422




                                      1,1111422






























                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.





                                          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                          Please pay close attention to the following guidance:


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391378%2fpandas-split-dataframe-using-row-index%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                                          Alcedinidae

                                          RAC Tourist Trophy