Find duplicate record in csv by shell script(Ubuntu)

up vote
2
down vote

favorite

I have below csv

name,mobile

name1,123456

name2,98765

name1,123456

name3,98765

name1,123456

name4,344545443

If two record has mobile then that record will be considered as duplicate . But while printing the duplicate record first record has to ignore

So my output should be like this

name,mobile

name1,123456

name1,123456

name2,98765

So here 123456 is 3 times in my file but I only want to print it two time for me first occurrence is unique and all other occurrence is duplicate.

I have tried

awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1'  file1.csv file1.csv

It gives me

name1,123456

name2,98765

name1,123456

name3,98765

name1,123456

it's not ignoring the first occurrence

Please help me on this

edited Nov 19 at 18:49

glenn jackman

165k26142234

asked Nov 19 at 18:28

user10676353

133

@NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
– user10676353
Nov 19 at 18:33

What happened to "name3" and "name4" in your output?
– glenn jackman
Nov 19 at 18:33

@glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
– user10676353
Nov 19 at 18:35

@NicoHaase I have updated my question. Please have a look and help me to get out of it
– user10676353
Nov 19 at 18:46

add a comment |

up vote
2
down vote

favorite

I have below csv

name,mobile

name1,123456

name2,98765

name1,123456

name3,98765

name1,123456

name4,344545443

If two record has mobile then that record will be considered as duplicate . But while printing the duplicate record first record has to ignore

So my output should be like this

name,mobile

name1,123456

name1,123456

name2,98765

So here 123456 is 3 times in my file but I only want to print it two time for me first occurrence is unique and all other occurrence is duplicate.

I have tried

awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1'  file1.csv file1.csv

It gives me

name1,123456

name2,98765

name1,123456

name3,98765

name1,123456

it's not ignoring the first occurrence

Please help me on this

edited Nov 19 at 18:49

glenn jackman

165k26142234

asked Nov 19 at 18:28

user10676353

133

@NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
– user10676353
Nov 19 at 18:33

What happened to "name3" and "name4" in your output?
– glenn jackman
Nov 19 at 18:33

@glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
– user10676353
Nov 19 at 18:35

@NicoHaase I have updated my question. Please have a look and help me to get out of it
– user10676353
Nov 19 at 18:46

add a comment |

up vote
2
down vote

favorite

I have below csv

name,mobile

name1,123456

name2,98765

name1,123456

name3,98765

name1,123456

name4,344545443

If two record has mobile then that record will be considered as duplicate . But while printing the duplicate record first record has to ignore

So my output should be like this

name,mobile

name1,123456

name1,123456

name2,98765

So here 123456 is 3 times in my file but I only want to print it two time for me first occurrence is unique and all other occurrence is duplicate.

I have tried

awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1'  file1.csv file1.csv

It gives me

name1,123456

name2,98765

name1,123456

name3,98765

name1,123456

it's not ignoring the first occurrence

Please help me on this

edited Nov 19 at 18:49

glenn jackman

165k26142234

asked Nov 19 at 18:28

user10676353

133

I have below csv

name,mobile

name1,123456

name2,98765

name1,123456

name3,98765

name1,123456

name4,344545443

If two record has mobile then that record will be considered as duplicate . But while printing the duplicate record first record has to ignore

So my output should be like this

name,mobile

name1,123456

name1,123456

name2,98765

So here 123456 is 3 times in my file but I only want to print it two time for me first occurrence is unique and all other occurrence is duplicate.

I have tried

awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1'  file1.csv file1.csv

It gives me

name1,123456

name2,98765

name1,123456

name3,98765

name1,123456

it's not ignoring the first occurrence

Please help me on this

csv awk

edited Nov 19 at 18:49

glenn jackman

165k26142234

asked Nov 19 at 18:28

user10676353

133

edited Nov 19 at 18:49

glenn jackman

165k26142234

asked Nov 19 at 18:28

user10676353

133

edited Nov 19 at 18:49

glenn jackman

165k26142234

edited Nov 19 at 18:49

glenn jackman

165k26142234

edited Nov 19 at 18:49

glenn jackman

165k26142234

asked Nov 19 at 18:28

user10676353

133

asked Nov 19 at 18:28

user10676353

133

asked Nov 19 at 18:28

user10676353

133

@NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
– user10676353
Nov 19 at 18:33

What happened to "name3" and "name4" in your output?
– glenn jackman
Nov 19 at 18:33

@glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
– user10676353
Nov 19 at 18:35

@NicoHaase I have updated my question. Please have a look and help me to get out of it
– user10676353
Nov 19 at 18:46

add a comment |

@NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
– user10676353
Nov 19 at 18:33

What happened to "name3" and "name4" in your output?
– glenn jackman
Nov 19 at 18:33

@glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
– user10676353
Nov 19 at 18:35

@NicoHaase I have updated my question. Please have a look and help me to get out of it
– user10676353
Nov 19 at 18:46

@NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
– user10676353
Nov 19 at 18:33

What happened to "name3" and "name4" in your output?
– glenn jackman
Nov 19 at 18:33

@glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
– user10676353
Nov 19 at 18:35

@NicoHaase I have updated my question. Please have a look and help me to get out of it
– user10676353
Nov 19 at 18:46

add a comment |

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.

awk -F, '++seen[$2] > 1' file

Given your sample data, this prints

name1,123456

name3,98765

name1,123456

This is lines 4,5,6 from the input data.

edited Nov 19 at 18:48

answered Nov 19 at 18:42

glenn jackman

165k26142234

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380626%2ffind-duplicate-record-in-csv-by-shell-scriptubuntu%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.

awk -F, '++seen[$2] > 1' file

Given your sample data, this prints

name1,123456

name3,98765

name1,123456

This is lines 4,5,6 from the input data.

edited Nov 19 at 18:48

answered Nov 19 at 18:42

glenn jackman

165k26142234

add a comment |

up vote
3
down vote

accepted

As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.

awk -F, '++seen[$2] > 1' file

Given your sample data, this prints

name1,123456

name3,98765

name1,123456

This is lines 4,5,6 from the input data.

edited Nov 19 at 18:48

answered Nov 19 at 18:42

glenn jackman

165k26142234

add a comment |

up vote
3
down vote

accepted

As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.

awk -F, '++seen[$2] > 1' file

Given your sample data, this prints

name1,123456

name3,98765

name1,123456

This is lines 4,5,6 from the input data.

edited Nov 19 at 18:48

answered Nov 19 at 18:42

glenn jackman

165k26142234

As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.

awk -F, '++seen[$2] > 1' file

Given your sample data, this prints

name1,123456

name3,98765

name1,123456

This is lines 4,5,6 from the input data.

edited Nov 19 at 18:48

answered Nov 19 at 18:42

glenn jackman

165k26142234

edited Nov 19 at 18:48

answered Nov 19 at 18:42

glenn jackman

165k26142234

answered Nov 19 at 18:42

glenn jackman

165k26142234

answered Nov 19 at 18:42

glenn jackman

165k26142234

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr