Find duplicate record in csv by shell script(Ubuntu)











up vote
2
down vote

favorite
1












I have below csv



name,mobile
name1,123456
name2,98765
name1,123456
name3,98765
name1,123456
name4,344545443


If two record has mobile then that record will be considered as duplicate . But while printing the duplicate record first record has to ignore



So my output should be like this



name,mobile
name1,123456
name1,123456
name2,98765


So here 123456 is 3 times in my file but I only want to print it two time for me first occurrence is unique and all other occurrence is duplicate.



I have tried



awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1'  file1.csv file1.csv


It gives me



name1,123456
name2,98765
name1,123456
name3,98765
name1,123456


it's not ignoring the first occurrence



Please help me on this










share|improve this question
























  • @NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
    – user10676353
    Nov 19 at 18:33










  • What happened to "name3" and "name4" in your output?
    – glenn jackman
    Nov 19 at 18:33










  • @glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
    – user10676353
    Nov 19 at 18:35










  • @NicoHaase I have updated my question. Please have a look and help me to get out of it
    – user10676353
    Nov 19 at 18:46















up vote
2
down vote

favorite
1












I have below csv



name,mobile
name1,123456
name2,98765
name1,123456
name3,98765
name1,123456
name4,344545443


If two record has mobile then that record will be considered as duplicate . But while printing the duplicate record first record has to ignore



So my output should be like this



name,mobile
name1,123456
name1,123456
name2,98765


So here 123456 is 3 times in my file but I only want to print it two time for me first occurrence is unique and all other occurrence is duplicate.



I have tried



awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1'  file1.csv file1.csv


It gives me



name1,123456
name2,98765
name1,123456
name3,98765
name1,123456


it's not ignoring the first occurrence



Please help me on this










share|improve this question
























  • @NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
    – user10676353
    Nov 19 at 18:33










  • What happened to "name3" and "name4" in your output?
    – glenn jackman
    Nov 19 at 18:33










  • @glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
    – user10676353
    Nov 19 at 18:35










  • @NicoHaase I have updated my question. Please have a look and help me to get out of it
    – user10676353
    Nov 19 at 18:46













up vote
2
down vote

favorite
1









up vote
2
down vote

favorite
1






1





I have below csv



name,mobile
name1,123456
name2,98765
name1,123456
name3,98765
name1,123456
name4,344545443


If two record has mobile then that record will be considered as duplicate . But while printing the duplicate record first record has to ignore



So my output should be like this



name,mobile
name1,123456
name1,123456
name2,98765


So here 123456 is 3 times in my file but I only want to print it two time for me first occurrence is unique and all other occurrence is duplicate.



I have tried



awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1'  file1.csv file1.csv


It gives me



name1,123456
name2,98765
name1,123456
name3,98765
name1,123456


it's not ignoring the first occurrence



Please help me on this










share|improve this question















I have below csv



name,mobile
name1,123456
name2,98765
name1,123456
name3,98765
name1,123456
name4,344545443


If two record has mobile then that record will be considered as duplicate . But while printing the duplicate record first record has to ignore



So my output should be like this



name,mobile
name1,123456
name1,123456
name2,98765


So here 123456 is 3 times in my file but I only want to print it two time for me first occurrence is unique and all other occurrence is duplicate.



I have tried



awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1'  file1.csv file1.csv


It gives me



name1,123456
name2,98765
name1,123456
name3,98765
name1,123456


it's not ignoring the first occurrence



Please help me on this







csv awk






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 at 18:49









glenn jackman

165k26142234




165k26142234










asked Nov 19 at 18:28









user10676353

133




133












  • @NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
    – user10676353
    Nov 19 at 18:33










  • What happened to "name3" and "name4" in your output?
    – glenn jackman
    Nov 19 at 18:33










  • @glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
    – user10676353
    Nov 19 at 18:35










  • @NicoHaase I have updated my question. Please have a look and help me to get out of it
    – user10676353
    Nov 19 at 18:46


















  • @NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
    – user10676353
    Nov 19 at 18:33










  • What happened to "name3" and "name4" in your output?
    – glenn jackman
    Nov 19 at 18:33










  • @glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
    – user10676353
    Nov 19 at 18:35










  • @NicoHaase I have updated my question. Please have a look and help me to get out of it
    – user10676353
    Nov 19 at 18:46
















@NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
– user10676353
Nov 19 at 18:33




@NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
– user10676353
Nov 19 at 18:33












What happened to "name3" and "name4" in your output?
– glenn jackman
Nov 19 at 18:33




What happened to "name3" and "name4" in your output?
– glenn jackman
Nov 19 at 18:33












@glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
– user10676353
Nov 19 at 18:35




@glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
– user10676353
Nov 19 at 18:35












@NicoHaase I have updated my question. Please have a look and help me to get out of it
– user10676353
Nov 19 at 18:46




@NicoHaase I have updated my question. Please have a look and help me to get out of it
– user10676353
Nov 19 at 18:46












1 Answer
1






active

oldest

votes

















up vote
3
down vote



accepted










As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.



awk -F, '++seen[$2] > 1' file


Given your sample data, this prints



name1,123456
name3,98765
name1,123456


This is lines 4,5,6 from the input data.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380626%2ffind-duplicate-record-in-csv-by-shell-scriptubuntu%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    3
    down vote



    accepted










    As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.



    awk -F, '++seen[$2] > 1' file


    Given your sample data, this prints



    name1,123456
    name3,98765
    name1,123456


    This is lines 4,5,6 from the input data.






    share|improve this answer



























      up vote
      3
      down vote



      accepted










      As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.



      awk -F, '++seen[$2] > 1' file


      Given your sample data, this prints



      name1,123456
      name3,98765
      name1,123456


      This is lines 4,5,6 from the input data.






      share|improve this answer

























        up vote
        3
        down vote



        accepted







        up vote
        3
        down vote



        accepted






        As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.



        awk -F, '++seen[$2] > 1' file


        Given your sample data, this prints



        name1,123456
        name3,98765
        name1,123456


        This is lines 4,5,6 from the input data.






        share|improve this answer














        As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.



        awk -F, '++seen[$2] > 1' file


        Given your sample data, this prints



        name1,123456
        name3,98765
        name1,123456


        This is lines 4,5,6 from the input data.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 19 at 18:48

























        answered Nov 19 at 18:42









        glenn jackman

        165k26142234




        165k26142234






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380626%2ffind-duplicate-record-in-csv-by-shell-scriptubuntu%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Paul Cézanne

            UIScrollView CustomStickyHeader Resize height generates problems when scroll is too fast

            Angular material date-picker (MatDatepicker) auto completes the date on focus out