comm -23 not deleting all common lines

I want to delete lines from file 1.txt that are in file 2.txt and save the output to 3.txt,
I am using this bash command:

comm -23 1.txt 2.txt > 3.txt

When I check the output in file 3.txt, I find that some common lines between 1.txt and 2.txt are still in 3.txt, take as an example the word "registry" , what is the problem?

You can download the two files below:

file 1.txt : https://ufile.io/n7vn6

file 2.txt : https://ufile.io/p4s58

edited Nov 22 '18 at 15:42

chepner

254k34242335

asked Nov 22 '18 at 15:26

Youcef

366

I haven't checked your files but I am guessing you didn't sort them before using comm.

– mickp
Nov 22 '18 at 15:28

Is there extra whitespace you're not taking into account?

– glenn jackman
Nov 22 '18 at 15:43

they are sorted, and there is no extract space

– Youcef
Nov 22 '18 at 16:53

add a comment |

I want to delete lines from file 1.txt that are in file 2.txt and save the output to 3.txt,
I am using this bash command:

comm -23 1.txt 2.txt > 3.txt

When I check the output in file 3.txt, I find that some common lines between 1.txt and 2.txt are still in 3.txt, take as an example the word "registry" , what is the problem?

You can download the two files below:

file 1.txt : https://ufile.io/n7vn6

file 2.txt : https://ufile.io/p4s58

edited Nov 22 '18 at 15:42

chepner

254k34242335

asked Nov 22 '18 at 15:26

Youcef

366

I haven't checked your files but I am guessing you didn't sort them before using comm.

– mickp
Nov 22 '18 at 15:28

Is there extra whitespace you're not taking into account?

– glenn jackman
Nov 22 '18 at 15:43

they are sorted, and there is no extract space

– Youcef
Nov 22 '18 at 16:53

add a comment |

I want to delete lines from file 1.txt that are in file 2.txt and save the output to 3.txt,
I am using this bash command:

comm -23 1.txt 2.txt > 3.txt

When I check the output in file 3.txt, I find that some common lines between 1.txt and 2.txt are still in 3.txt, take as an example the word "registry" , what is the problem?

You can download the two files below:

file 1.txt : https://ufile.io/n7vn6

file 2.txt : https://ufile.io/p4s58

edited Nov 22 '18 at 15:42

chepner

254k34242335

asked Nov 22 '18 at 15:26

Youcef

366

I want to delete lines from file 1.txt that are in file 2.txt and save the output to 3.txt,
I am using this bash command:

comm -23 1.txt 2.txt > 3.txt

When I check the output in file 3.txt, I find that some common lines between 1.txt and 2.txt are still in 3.txt, take as an example the word "registry" , what is the problem?

You can download the two files below:

file 1.txt : https://ufile.io/n7vn6

file 2.txt : https://ufile.io/p4s58

duplicates comm

edited Nov 22 '18 at 15:42

chepner

254k34242335

asked Nov 22 '18 at 15:26

Youcef

366

edited Nov 22 '18 at 15:42

chepner

254k34242335

asked Nov 22 '18 at 15:26

Youcef

366

edited Nov 22 '18 at 15:42

chepner

254k34242335

edited Nov 22 '18 at 15:42

chepner

254k34242335

edited Nov 22 '18 at 15:42

chepner

254k34242335

asked Nov 22 '18 at 15:26

Youcef

366

asked Nov 22 '18 at 15:26

Youcef

366

asked Nov 22 '18 at 15:26

Youcef

366

I haven't checked your files but I am guessing you didn't sort them before using comm.

– mickp
Nov 22 '18 at 15:28

Is there extra whitespace you're not taking into account?

– glenn jackman
Nov 22 '18 at 15:43

they are sorted, and there is no extract space

– Youcef
Nov 22 '18 at 16:53

add a comment |

I haven't checked your files but I am guessing you didn't sort them before using comm.

– mickp
Nov 22 '18 at 15:28

Is there extra whitespace you're not taking into account?

– glenn jackman
Nov 22 '18 at 15:43

they are sorted, and there is no extract space

– Youcef
Nov 22 '18 at 16:53

I haven't checked your files but I am guessing you didn't sort them before using comm.

– mickp
Nov 22 '18 at 15:28

Is there extra whitespace you're not taking into account?

– glenn jackman
Nov 22 '18 at 15:43

they are sorted, and there is no extract space

– Youcef
Nov 22 '18 at 16:53

add a comment |

2 Answers
2

active

oldest

votes

I'm not sure how you generated your text files, but the problem is that some of your 1.txt and 2.txt lines don't have consistent line terminations. Some have a CR character (ctrl-M) but not the sole line feed Linux expects for text files. For example, one of them has registry^M which doesn't match registry (Linux programs that examine text will see ^M as another character or white space but not as a line termination that gets ignored). When you look at the file with some text editors, the ^M isn't visible so it appears registry is the same in both places, but it isn't.

You could try:

dos2unix 1.txt 2.txt

comm -23 <(sort 1.txt) <(sort 2.txt) > 3.txt

dos2unix will make all of the line terminations correct (assuming they might be using the DOS CR). Note that this can affect the sort a little, so I'm also resorting them. You can try this without resorting, and if there's an issue comm will give an error that one of the files isn't sorted.

edited Nov 22 '18 at 16:31

answered Nov 22 '18 at 16:06

lurker

44.8k74574

1

dos2unix solved it ! Thanks man!

– Youcef
Nov 22 '18 at 16:53

well, it would change the input files.

– hek2mgl
Nov 22 '18 at 16:53

All I need is the alphabetic characters, what would I do with those hidden characters related to some bizzare encoding :D

– Youcef
Nov 22 '18 at 16:56

@hek2mgl that may be a good thing if the contents need fixing. Unclear what the overall use case is. The OP can choose other means of addressing the issue if needed now that they know the problem.

– lurker
Nov 22 '18 at 16:56

@Youcef it's not about alphabetic characters. This solution doesn't change the encoding, just the line endings.

– hek2mgl
Nov 22 '18 at 16:58

|
show 5 more comments

comm needs the input to be sorted. You can use process substitution for that:

comm -23 <(sort 1.txt) <(sort 2.txt) > 3.txt

Update, if you additionally have a problem with line endings, you can use sed to align that:

comm -23 <(sed 's/r//g' 1.txt | sort) <(sed 's/r//g' 2.txt| sort) > 3.txt

edited Nov 22 '18 at 16:53

answered Nov 22 '18 at 15:31

hek2mgl

108k13146170

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53434102%2fcomm-23-not-deleting-all-common-lines%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You could try:

dos2unix 1.txt 2.txt

comm -23 <(sort 1.txt) <(sort 2.txt) > 3.txt

edited Nov 22 '18 at 16:31

answered Nov 22 '18 at 16:06

lurker

44.8k74574

1

dos2unix solved it ! Thanks man!

– Youcef
Nov 22 '18 at 16:53

well, it would change the input files.

– hek2mgl
Nov 22 '18 at 16:53

All I need is the alphabetic characters, what would I do with those hidden characters related to some bizzare encoding :D

– Youcef
Nov 22 '18 at 16:56

@hek2mgl that may be a good thing if the contents need fixing. Unclear what the overall use case is. The OP can choose other means of addressing the issue if needed now that they know the problem.

– lurker
Nov 22 '18 at 16:56

@Youcef it's not about alphabetic characters. This solution doesn't change the encoding, just the line endings.

– hek2mgl
Nov 22 '18 at 16:58

|
show 5 more comments

You could try:

dos2unix 1.txt 2.txt

comm -23 <(sort 1.txt) <(sort 2.txt) > 3.txt

edited Nov 22 '18 at 16:31

answered Nov 22 '18 at 16:06

lurker

44.8k74574

1

dos2unix solved it ! Thanks man!

– Youcef
Nov 22 '18 at 16:53

well, it would change the input files.

– hek2mgl
Nov 22 '18 at 16:53

All I need is the alphabetic characters, what would I do with those hidden characters related to some bizzare encoding :D

– Youcef
Nov 22 '18 at 16:56

@hek2mgl that may be a good thing if the contents need fixing. Unclear what the overall use case is. The OP can choose other means of addressing the issue if needed now that they know the problem.

– lurker
Nov 22 '18 at 16:56

@Youcef it's not about alphabetic characters. This solution doesn't change the encoding, just the line endings.

– hek2mgl
Nov 22 '18 at 16:58

|
show 5 more comments

You could try:

dos2unix 1.txt 2.txt

comm -23 <(sort 1.txt) <(sort 2.txt) > 3.txt

edited Nov 22 '18 at 16:31

answered Nov 22 '18 at 16:06

lurker

44.8k74574

You could try:

dos2unix 1.txt 2.txt

comm -23 <(sort 1.txt) <(sort 2.txt) > 3.txt

edited Nov 22 '18 at 16:31

answered Nov 22 '18 at 16:06

lurker

44.8k74574

edited Nov 22 '18 at 16:31

answered Nov 22 '18 at 16:06

lurker

44.8k74574

answered Nov 22 '18 at 16:06

lurker

44.8k74574

answered Nov 22 '18 at 16:06

lurker

44.8k74574

1

dos2unix solved it ! Thanks man!

– Youcef
Nov 22 '18 at 16:53

well, it would change the input files.

– hek2mgl
Nov 22 '18 at 16:53

All I need is the alphabetic characters, what would I do with those hidden characters related to some bizzare encoding :D

– Youcef
Nov 22 '18 at 16:56

@hek2mgl that may be a good thing if the contents need fixing. Unclear what the overall use case is. The OP can choose other means of addressing the issue if needed now that they know the problem.

– lurker
Nov 22 '18 at 16:56

@Youcef it's not about alphabetic characters. This solution doesn't change the encoding, just the line endings.

– hek2mgl
Nov 22 '18 at 16:58

|
show 5 more comments

1

dos2unix solved it ! Thanks man!

– Youcef
Nov 22 '18 at 16:53

well, it would change the input files.

– hek2mgl
Nov 22 '18 at 16:53

All I need is the alphabetic characters, what would I do with those hidden characters related to some bizzare encoding :D

– Youcef
Nov 22 '18 at 16:56

@hek2mgl that may be a good thing if the contents need fixing. Unclear what the overall use case is. The OP can choose other means of addressing the issue if needed now that they know the problem.

– lurker
Nov 22 '18 at 16:56

@Youcef it's not about alphabetic characters. This solution doesn't change the encoding, just the line endings.

– hek2mgl
Nov 22 '18 at 16:58

dos2unix solved it ! Thanks man!

– Youcef
Nov 22 '18 at 16:53

well, it would change the input files.

– hek2mgl
Nov 22 '18 at 16:53

All I need is the alphabetic characters, what would I do with those hidden characters related to some bizzare encoding :D

– Youcef
Nov 22 '18 at 16:56

@hek2mgl that may be a good thing if the contents need fixing. Unclear what the overall use case is. The OP can choose other means of addressing the issue if needed now that they know the problem.

– lurker
Nov 22 '18 at 16:56

@Youcef it's not about alphabetic characters. This solution doesn't change the encoding, just the line endings.

– hek2mgl
Nov 22 '18 at 16:58

|
show 5 more comments

comm needs the input to be sorted. You can use process substitution for that:

comm -23 <(sort 1.txt) <(sort 2.txt) > 3.txt

Update, if you additionally have a problem with line endings, you can use sed to align that:

comm -23 <(sed 's/r//g' 1.txt | sort) <(sed 's/r//g' 2.txt| sort) > 3.txt

edited Nov 22 '18 at 16:53

answered Nov 22 '18 at 15:31

hek2mgl

108k13146170

add a comment |

comm needs the input to be sorted. You can use process substitution for that:

comm -23 <(sort 1.txt) <(sort 2.txt) > 3.txt

Update, if you additionally have a problem with line endings, you can use sed to align that:

comm -23 <(sed 's/r//g' 1.txt | sort) <(sed 's/r//g' 2.txt| sort) > 3.txt

edited Nov 22 '18 at 16:53

answered Nov 22 '18 at 15:31

hek2mgl

108k13146170

add a comment |

comm needs the input to be sorted. You can use process substitution for that:

comm -23 <(sort 1.txt) <(sort 2.txt) > 3.txt

Update, if you additionally have a problem with line endings, you can use sed to align that:

comm -23 <(sed 's/r//g' 1.txt | sort) <(sed 's/r//g' 2.txt| sort) > 3.txt

edited Nov 22 '18 at 16:53

answered Nov 22 '18 at 15:31

hek2mgl

108k13146170

comm needs the input to be sorted. You can use process substitution for that:

comm -23 <(sort 1.txt) <(sort 2.txt) > 3.txt

Update, if you additionally have a problem with line endings, you can use sed to align that:

comm -23 <(sed 's/r//g' 1.txt | sort) <(sed 's/r//g' 2.txt| sort) > 3.txt

edited Nov 22 '18 at 16:53

answered Nov 22 '18 at 15:31

hek2mgl

108k13146170

edited Nov 22 '18 at 16:53

answered Nov 22 '18 at 15:31

hek2mgl

108k13146170

answered Nov 22 '18 at 15:31

hek2mgl

108k13146170

answered Nov 22 '18 at 15:31

hek2mgl

108k13146170

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr