Find duplicate record in csv by shell script(Ubuntu)
up vote
2
down vote
favorite
I have below csv
name,mobile
name1,123456
name2,98765
name1,123456
name3,98765
name1,123456
name4,344545443
If two record has mobile then that record will be considered as duplicate . But while printing the duplicate record first record has to ignore
So my output should be like this
name,mobile
name1,123456
name1,123456
name2,98765
So here 123456 is 3 times in my file but I only want to print it two time for me first occurrence is unique and all other occurrence is duplicate.
I have tried
awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv
It gives me
name1,123456
name2,98765
name1,123456
name3,98765
name1,123456
it's not ignoring the first occurrence
Please help me on this
csv awk
add a comment |
up vote
2
down vote
favorite
I have below csv
name,mobile
name1,123456
name2,98765
name1,123456
name3,98765
name1,123456
name4,344545443
If two record has mobile then that record will be considered as duplicate . But while printing the duplicate record first record has to ignore
So my output should be like this
name,mobile
name1,123456
name1,123456
name2,98765
So here 123456 is 3 times in my file but I only want to print it two time for me first occurrence is unique and all other occurrence is duplicate.
I have tried
awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv
It gives me
name1,123456
name2,98765
name1,123456
name3,98765
name1,123456
it's not ignoring the first occurrence
Please help me on this
csv awk
@NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
– user10676353
Nov 19 at 18:33
What happened to "name3" and "name4" in your output?
– glenn jackman
Nov 19 at 18:33
@glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
– user10676353
Nov 19 at 18:35
@NicoHaase I have updated my question. Please have a look and help me to get out of it
– user10676353
Nov 19 at 18:46
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have below csv
name,mobile
name1,123456
name2,98765
name1,123456
name3,98765
name1,123456
name4,344545443
If two record has mobile then that record will be considered as duplicate . But while printing the duplicate record first record has to ignore
So my output should be like this
name,mobile
name1,123456
name1,123456
name2,98765
So here 123456 is 3 times in my file but I only want to print it two time for me first occurrence is unique and all other occurrence is duplicate.
I have tried
awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv
It gives me
name1,123456
name2,98765
name1,123456
name3,98765
name1,123456
it's not ignoring the first occurrence
Please help me on this
csv awk
I have below csv
name,mobile
name1,123456
name2,98765
name1,123456
name3,98765
name1,123456
name4,344545443
If two record has mobile then that record will be considered as duplicate . But while printing the duplicate record first record has to ignore
So my output should be like this
name,mobile
name1,123456
name1,123456
name2,98765
So here 123456 is 3 times in my file but I only want to print it two time for me first occurrence is unique and all other occurrence is duplicate.
I have tried
awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv
It gives me
name1,123456
name2,98765
name1,123456
name3,98765
name1,123456
it's not ignoring the first occurrence
Please help me on this
csv awk
csv awk
edited Nov 19 at 18:49
glenn jackman
165k26142234
165k26142234
asked Nov 19 at 18:28
user10676353
133
133
@NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
– user10676353
Nov 19 at 18:33
What happened to "name3" and "name4" in your output?
– glenn jackman
Nov 19 at 18:33
@glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
– user10676353
Nov 19 at 18:35
@NicoHaase I have updated my question. Please have a look and help me to get out of it
– user10676353
Nov 19 at 18:46
add a comment |
@NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
– user10676353
Nov 19 at 18:33
What happened to "name3" and "name4" in your output?
– glenn jackman
Nov 19 at 18:33
@glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
– user10676353
Nov 19 at 18:35
@NicoHaase I have updated my question. Please have a look and help me to get out of it
– user10676353
Nov 19 at 18:46
@NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
– user10676353
Nov 19 at 18:33
@NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
– user10676353
Nov 19 at 18:33
What happened to "name3" and "name4" in your output?
– glenn jackman
Nov 19 at 18:33
What happened to "name3" and "name4" in your output?
– glenn jackman
Nov 19 at 18:33
@glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
– user10676353
Nov 19 at 18:35
@glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
– user10676353
Nov 19 at 18:35
@NicoHaase I have updated my question. Please have a look and help me to get out of it
– user10676353
Nov 19 at 18:46
@NicoHaase I have updated my question. Please have a look and help me to get out of it
– user10676353
Nov 19 at 18:46
add a comment |
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.
awk -F, '++seen[$2] > 1' file
Given your sample data, this prints
name1,123456
name3,98765
name1,123456
This is lines 4,5,6 from the input data.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380626%2ffind-duplicate-record-in-csv-by-shell-scriptubuntu%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.
awk -F, '++seen[$2] > 1' file
Given your sample data, this prints
name1,123456
name3,98765
name1,123456
This is lines 4,5,6 from the input data.
add a comment |
up vote
3
down vote
accepted
As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.
awk -F, '++seen[$2] > 1' file
Given your sample data, this prints
name1,123456
name3,98765
name1,123456
This is lines 4,5,6 from the input data.
add a comment |
up vote
3
down vote
accepted
up vote
3
down vote
accepted
As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.
awk -F, '++seen[$2] > 1' file
Given your sample data, this prints
name1,123456
name3,98765
name1,123456
This is lines 4,5,6 from the input data.
As I understand your question, you want to output records where the 2nd field occurs at least twice, but do not output the first instance.
awk -F, '++seen[$2] > 1' file
Given your sample data, this prints
name1,123456
name3,98765
name1,123456
This is lines 4,5,6 from the input data.
edited Nov 19 at 18:48
answered Nov 19 at 18:42
glenn jackman
165k26142234
165k26142234
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380626%2ffind-duplicate-record-in-csv-by-shell-scriptubuntu%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
@NicoHaase awk -F, 'NR==FNR {++A[$2]; next} A[$2]>1' file1.csv file1.csv This is not ignoring the first occurrence
– user10676353
Nov 19 at 18:33
What happened to "name3" and "name4" in your output?
– glenn jackman
Nov 19 at 18:33
@glennjackman By usiing above script I am getting below output name1,123456 name2,98765 name1,123456 name3,98765 name1,123456
– user10676353
Nov 19 at 18:35
@NicoHaase I have updated my question. Please have a look and help me to get out of it
– user10676353
Nov 19 at 18:46