How to get sum of values in column based on variables in other column separately? [duplicate]











up vote
4
down vote

favorite
1













This question already has an answer here:




  • How to calculate the sum of the data that have the same ID in the first column?

    4 answers




I have a table data like below



abc 1   1   1
bcd 2 2 4
bcd 12 23 3
cde 3 5 5
cde 3 4 5
cde 14 2 25


I want the sum of values in each column based on variables in first column and desired result is like below:



abc 1   1   1
bcd 14 25 7
cde 20 11 35


I used awk command like this



awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath


and I got a result below:



abc 3
bcd 46
cde 66


I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.










share|improve this question









New contributor




awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac 2 days ago


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















    up vote
    4
    down vote

    favorite
    1













    This question already has an answer here:




    • How to calculate the sum of the data that have the same ID in the first column?

      4 answers




    I have a table data like below



    abc 1   1   1
    bcd 2 2 4
    bcd 12 23 3
    cde 3 5 5
    cde 3 4 5
    cde 14 2 25


    I want the sum of values in each column based on variables in first column and desired result is like below:



    abc 1   1   1
    bcd 14 25 7
    cde 20 11 35


    I used awk command like this



    awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath


    and I got a result below:



    abc 3
    bcd 46
    cde 66


    I think the end of my code is wrong but don't know how to fix it.
    I need some directions to fix the code.










    share|improve this question









    New contributor




    awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.











    marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac 2 days ago


    This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

















      up vote
      4
      down vote

      favorite
      1









      up vote
      4
      down vote

      favorite
      1






      1






      This question already has an answer here:




      • How to calculate the sum of the data that have the same ID in the first column?

        4 answers




      I have a table data like below



      abc 1   1   1
      bcd 2 2 4
      bcd 12 23 3
      cde 3 5 5
      cde 3 4 5
      cde 14 2 25


      I want the sum of values in each column based on variables in first column and desired result is like below:



      abc 1   1   1
      bcd 14 25 7
      cde 20 11 35


      I used awk command like this



      awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath


      and I got a result below:



      abc 3
      bcd 46
      cde 66


      I think the end of my code is wrong but don't know how to fix it.
      I need some directions to fix the code.










      share|improve this question









      New contributor




      awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      This question already has an answer here:




      • How to calculate the sum of the data that have the same ID in the first column?

        4 answers




      I have a table data like below



      abc 1   1   1
      bcd 2 2 4
      bcd 12 23 3
      cde 3 5 5
      cde 3 4 5
      cde 14 2 25


      I want the sum of values in each column based on variables in first column and desired result is like below:



      abc 1   1   1
      bcd 14 25 7
      cde 20 11 35


      I used awk command like this



      awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath


      and I got a result below:



      abc 3
      bcd 46
      cde 66


      I think the end of my code is wrong but don't know how to fix it.
      I need some directions to fix the code.





      This question already has an answer here:




      • How to calculate the sum of the data that have the same ID in the first column?

        4 answers








      shell-script text-processing awk numeric-data






      share|improve this question









      New contributor




      awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 2 days ago









      terdon

      126k31243419




      126k31243419






      New contributor




      awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 days ago









      awkprob

      232




      232




      New contributor




      awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      awkprob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




      marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac 2 days ago


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






      marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac 2 days ago


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
























          3 Answers
          3






          active

          oldest

          votes

















          up vote
          4
          down vote



          accepted










          You were fairly close. 
          You see what you were doing wrong, don't you? 
          You were keeping one total for each column 1 value,
          when you should have been keeping three.



          This is similar to Inian's answer,
          but trivially extendable to handle any number of columns:



          awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
          END {for(i in a) {
          printf "%s", i
          for (n=2; n<=4; ++n) printf "t%s", a[i][n]
          printf "n"
          }
          }'


          Rather than keep three arrays, like Inian's answer,
          it keeps a two-dimensional array.






          share|improve this answer





















          • Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
            – terdon
            2 days ago










          • @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
            – Scott
            2 days ago










          • Ah, yes indeed. Fair point.
            – terdon
            2 days ago










          • @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
            – awkprob
            2 days ago










          • @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
            – awkprob
            yesterday


















          up vote
          4
          down vote













          So long as your file is tab-delimited, datamash is a good fit for this.



          $ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
          abc 1 1 1
          bcd 14 25 7
          cde 20 11 35


          Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.



          Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:



          sed -i 's/ +/t/g' tablefilepath





          share|improve this answer

















          • 1




            At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
            – steeldriver
            2 days ago












          • @steeldriver Thanks!
            – cryptarch
            2 days ago


















          up vote
          2
          down vote













          Using awk summing up the columns 2-4 based on 1.



          awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file





          share|improve this answer




























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            4
            down vote



            accepted










            You were fairly close. 
            You see what you were doing wrong, don't you? 
            You were keeping one total for each column 1 value,
            when you should have been keeping three.



            This is similar to Inian's answer,
            but trivially extendable to handle any number of columns:



            awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
            END {for(i in a) {
            printf "%s", i
            for (n=2; n<=4; ++n) printf "t%s", a[i][n]
            printf "n"
            }
            }'


            Rather than keep three arrays, like Inian's answer,
            it keeps a two-dimensional array.






            share|improve this answer





















            • Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
              – terdon
              2 days ago










            • @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
              – Scott
              2 days ago










            • Ah, yes indeed. Fair point.
              – terdon
              2 days ago










            • @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
              – awkprob
              2 days ago










            • @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
              – awkprob
              yesterday















            up vote
            4
            down vote



            accepted










            You were fairly close. 
            You see what you were doing wrong, don't you? 
            You were keeping one total for each column 1 value,
            when you should have been keeping three.



            This is similar to Inian's answer,
            but trivially extendable to handle any number of columns:



            awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
            END {for(i in a) {
            printf "%s", i
            for (n=2; n<=4; ++n) printf "t%s", a[i][n]
            printf "n"
            }
            }'


            Rather than keep three arrays, like Inian's answer,
            it keeps a two-dimensional array.






            share|improve this answer





















            • Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
              – terdon
              2 days ago










            • @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
              – Scott
              2 days ago










            • Ah, yes indeed. Fair point.
              – terdon
              2 days ago










            • @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
              – awkprob
              2 days ago










            • @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
              – awkprob
              yesterday













            up vote
            4
            down vote



            accepted







            up vote
            4
            down vote



            accepted






            You were fairly close. 
            You see what you were doing wrong, don't you? 
            You were keeping one total for each column 1 value,
            when you should have been keeping three.



            This is similar to Inian's answer,
            but trivially extendable to handle any number of columns:



            awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
            END {for(i in a) {
            printf "%s", i
            for (n=2; n<=4; ++n) printf "t%s", a[i][n]
            printf "n"
            }
            }'


            Rather than keep three arrays, like Inian's answer,
            it keeps a two-dimensional array.






            share|improve this answer












            You were fairly close. 
            You see what you were doing wrong, don't you? 
            You were keeping one total for each column 1 value,
            when you should have been keeping three.



            This is similar to Inian's answer,
            but trivially extendable to handle any number of columns:



            awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
            END {for(i in a) {
            printf "%s", i
            for (n=2; n<=4; ++n) printf "t%s", a[i][n]
            printf "n"
            }
            }'


            Rather than keep three arrays, like Inian's answer,
            it keeps a two-dimensional array.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 2 days ago









            Scott

            6,76642650




            6,76642650












            • Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
              – terdon
              2 days ago










            • @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
              – Scott
              2 days ago










            • Ah, yes indeed. Fair point.
              – terdon
              2 days ago










            • @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
              – awkprob
              2 days ago










            • @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
              – awkprob
              yesterday


















            • Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
              – terdon
              2 days ago










            • @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
              – Scott
              2 days ago










            • Ah, yes indeed. Fair point.
              – terdon
              2 days ago










            • @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
              – awkprob
              2 days ago










            • @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
              – awkprob
              yesterday
















            Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
            – terdon
            2 days ago




            Why limit it at all? Why not awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'? I mean, why use for (n=2; n<=4; ++n) in the END{} block instead of just iterating over the array so you don't need to keep track of its size?
            – terdon
            2 days ago












            @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
            – Scott
            2 days ago




            @terdon: Thanks for dropping by.  "for (variable in array) [which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX  Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd, abc, cde); but that can be fixed by piping awk into sort.  Your enhancement would output the columns in random order, with no way to fix it by post-processing.
            – Scott
            2 days ago












            Ah, yes indeed. Fair point.
            – terdon
            2 days ago




            Ah, yes indeed. Fair point.
            – terdon
            2 days ago












            @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
            – awkprob
            2 days ago




            @Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
            – awkprob
            2 days ago












            @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
            – awkprob
            yesterday




            @Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
            – awkprob
            yesterday












            up vote
            4
            down vote













            So long as your file is tab-delimited, datamash is a good fit for this.



            $ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
            abc 1 1 1
            bcd 14 25 7
            cde 20 11 35


            Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.



            Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:



            sed -i 's/ +/t/g' tablefilepath





            share|improve this answer

















            • 1




              At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
              – steeldriver
              2 days ago












            • @steeldriver Thanks!
              – cryptarch
              2 days ago















            up vote
            4
            down vote













            So long as your file is tab-delimited, datamash is a good fit for this.



            $ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
            abc 1 1 1
            bcd 14 25 7
            cde 20 11 35


            Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.



            Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:



            sed -i 's/ +/t/g' tablefilepath





            share|improve this answer

















            • 1




              At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
              – steeldriver
              2 days ago












            • @steeldriver Thanks!
              – cryptarch
              2 days ago













            up vote
            4
            down vote










            up vote
            4
            down vote









            So long as your file is tab-delimited, datamash is a good fit for this.



            $ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
            abc 1 1 1
            bcd 14 25 7
            cde 20 11 35


            Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.



            Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:



            sed -i 's/ +/t/g' tablefilepath





            share|improve this answer












            So long as your file is tab-delimited, datamash is a good fit for this.



            $ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
            abc 1 1 1
            bcd 14 25 7
            cde 20 11 35


            Datamash can also work with non-tabs, if you specify -t <delimiter>. But tabs seem closest to the example input you have provided.



            Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:



            sed -i 's/ +/t/g' tablefilepath






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 2 days ago









            cryptarch

            3766




            3766








            • 1




              At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
              – steeldriver
              2 days ago












            • @steeldriver Thanks!
              – cryptarch
              2 days ago














            • 1




              At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
              – steeldriver
              2 days ago












            • @steeldriver Thanks!
              – cryptarch
              2 days ago








            1




            1




            At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
            – steeldriver
            2 days ago






            At least in recent versions, there's a -W (--whitespace) option that should allow arbitrary whitespace delimiters
            – steeldriver
            2 days ago














            @steeldriver Thanks!
            – cryptarch
            2 days ago




            @steeldriver Thanks!
            – cryptarch
            2 days ago










            up vote
            2
            down vote













            Using awk summing up the columns 2-4 based on 1.



            awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file





            share|improve this answer

























              up vote
              2
              down vote













              Using awk summing up the columns 2-4 based on 1.



              awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file





              share|improve this answer























                up vote
                2
                down vote










                up vote
                2
                down vote









                Using awk summing up the columns 2-4 based on 1.



                awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file





                share|improve this answer












                Using awk summing up the columns 2-4 based on 1.



                awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i]  }' file






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 2 days ago









                Inian

                3,805824




                3,805824















                    Popular posts from this blog

                    Paul Cézanne

                    UIScrollView CustomStickyHeader Resize height generates problems when scroll is too fast

                    Angular material date-picker (MatDatepicker) auto completes the date on focus out