How do I grep for lines containing either of two words, but not both?
I'm trying to use grep
to show only lines containing either of the two words, if only one of them appears in the line, but not if they are in the same line.
So far I've tried grep pattern1 | grep pattern2 | ...
but didn't get the result I expected.
grep
New contributor
add a comment |
I'm trying to use grep
to show only lines containing either of the two words, if only one of them appears in the line, but not if they are in the same line.
So far I've tried grep pattern1 | grep pattern2 | ...
but didn't get the result I expected.
grep
New contributor
(1) You talk about “words” and “patterns”. Which is it? Ordinary words like “quick”, “brown” and “fox”, or regular expressions like[a-z][a-z0-9](,7}(.[a-z0-9]{,3})+
? (2) What if one of the words / patterns appears more than once in a line (and the other one doesn’t appear)? Is that equivalent to the word appearing once, or does it count as multiple occurrences?
– G-Man
yesterday
add a comment |
I'm trying to use grep
to show only lines containing either of the two words, if only one of them appears in the line, but not if they are in the same line.
So far I've tried grep pattern1 | grep pattern2 | ...
but didn't get the result I expected.
grep
New contributor
I'm trying to use grep
to show only lines containing either of the two words, if only one of them appears in the line, but not if they are in the same line.
So far I've tried grep pattern1 | grep pattern2 | ...
but didn't get the result I expected.
grep
grep
New contributor
New contributor
edited yesterday
Olorin
3,2741417
3,2741417
New contributor
asked yesterday
TrasmosTrasmos
763
763
New contributor
New contributor
(1) You talk about “words” and “patterns”. Which is it? Ordinary words like “quick”, “brown” and “fox”, or regular expressions like[a-z][a-z0-9](,7}(.[a-z0-9]{,3})+
? (2) What if one of the words / patterns appears more than once in a line (and the other one doesn’t appear)? Is that equivalent to the word appearing once, or does it count as multiple occurrences?
– G-Man
yesterday
add a comment |
(1) You talk about “words” and “patterns”. Which is it? Ordinary words like “quick”, “brown” and “fox”, or regular expressions like[a-z][a-z0-9](,7}(.[a-z0-9]{,3})+
? (2) What if one of the words / patterns appears more than once in a line (and the other one doesn’t appear)? Is that equivalent to the word appearing once, or does it count as multiple occurrences?
– G-Man
yesterday
(1) You talk about “words” and “patterns”. Which is it? Ordinary words like “quick”, “brown” and “fox”, or regular expressions like
[a-z][a-z0-9](,7}(.[a-z0-9]{,3})+
? (2) What if one of the words / patterns appears more than once in a line (and the other one doesn’t appear)? Is that equivalent to the word appearing once, or does it count as multiple occurrences?– G-Man
yesterday
(1) You talk about “words” and “patterns”. Which is it? Ordinary words like “quick”, “brown” and “fox”, or regular expressions like
[a-z][a-z0-9](,7}(.[a-z0-9]{,3})+
? (2) What if one of the words / patterns appears more than once in a line (and the other one doesn’t appear)? Is that equivalent to the word appearing once, or does it count as multiple occurrences?– G-Man
yesterday
add a comment |
7 Answers
7
active
oldest
votes
A tool other than grep
is the way to go.
Using perl, for instance, the command would be:
perl -ne 'print if /pattern1/ xor /pattern2/'
perl -ne
runs the command given over each line of stdin, which in this case prints the line if it matches /pattern1/ xor /pattern2/
, or in other words matches one pattern but not the other (exclusive or).
This works for the pattern in either order, and should have better performance than multiple invocations of grep
, and is less typing as well.
Or, even shorter, with awk:
awk 'xor(/pattern1/,/pattern2/)'
or for versions of awk that don't have xor
:
awk '/pattern1/+/pattern2/==1`
3
Nice - is the Awkxor
available in GNU Awk only?
– steeldriver
yesterday
5
@steeldriver I think it's GNU only, yes. Or at least it's missing on older versions. You can replace it with/pattern1/+/pattern2/==1
irxor
is missing.
– Chris
yesterday
Just curious, how could those methods be modified to be word-senstive? The OP uses the phrase "two words".
– Jim L.
yesterday
3
@JimL. You could put word boundaries (b
) in the patterns themselves, i.e.bwordb
.
– wjandrea
yesterday
@Chris Are there any implementations one would run into that does have zero width look behind assertions and does not have a word boundary function?
– Caleb
14 hours ago
add a comment |
With GNU grep
, you could pass both words to grep
and then remove the lines containing both the patterns.
$ cat testfile.txt
abc
def
abc def
abc 123 def
1234
5678
1234 def abc
def abc
$ grep -w -e 'abc' -e 'def' testfile.txt | grep -v -e 'abc.*def' -e 'def.*abc'
abc
def
@Chris Thanks for the feedback, I've edited my answer.
– Haxiel
yesterday
add a comment |
Try with egrep
egrep 'pattern1|pattern2' file | grep -v -e 'pattern1.*pattern2' -e 'pattern2.*pattern1'
2
can also be written asgrep -e foo -e bar | grep -v -e 'foo.*bar' -e 'bar.*foo'
– glenn jackman
yesterday
7
Also, note from the grep man page:Direct invocation as either egrep or fgrep is deprecated
-- prefergrep -E
– glenn jackman
yesterday
That isn't in my OS @glennjackman
– Grump
yesterday
I'm on linux, so GNU coreutils
– glenn jackman
yesterday
add a comment |
In Boolean terms, you're looking for A xor B, which can be written as
(A and not B)
or
(B and not A)
Given that your question doesn't mention that you are concerned with the order of the output so long as the matching lines are shown, the Boolean expansion of A xor B is pretty darn simple in grep:
$ cat << EOF > foo
> a b
> a
> b
> c a
> c b
> b a
> b c
> EOF
$ grep -w 'a' foo | grep -vw 'b'; grep -w 'b' foo | grep -vw 'a';
a
c a
b
c b
b c
1
This works, but it will scramble the order of the file.
– Sparhawk
yesterday
@Sparhawk True, although "scramble" is a harsh word. ;) it lists all the 'a' matches first, in order, then all the 'b' matches next, in order. The OP didn't express any interest in maintaining the order, just show the lines. FAWK, the next step could besort | uniq
.
– Jim L.
yesterday
Fair call; I agree my language was inaccurate. I meant to imply that the original order would be changed.
– Sparhawk
yesterday
1
@Sparhawk ... And I edited in your observation for full disclosure.
– Jim L.
yesterday
add a comment |
With grep
implementations that support perl-like regular expressions (like pcregrep
or GNU grep -P
), you can do it in one grep
invocation with:
grep -P '^(?=.*pat1)(?!.*pat2)|^(?=.*pat2)(?!.*pat1)'
That is find the lines that match pat1
but not pat2
, or pat2
but not pat1
.
(?=...)
and (?!...)
are respectively look ahead and negative look ahead operators. So technically, the above looks for the beginning of the subject (^
) provided it's followed by .*pat1
and not followed by .*pat2
, or the same with pat1
and pat2
reversed.
That's suboptimal for lines that contain both patterns as they would then be looked for twice. You could instead use more advanced perl operators like:
grep -P '^(?=.*pat1|())(?(1)(?=.*pat2)|(?!.*pat2))'
(?(1)yespattern|nopattern)
matches against yespattern
if the 1
st capture group (empty ()
above) matched, and nopattern
otherwise. If that ()
matches, that means pat1
didn't match, so we look for pat2
(positive look ahead), and we look for not pat2
otherwise (negative look ahead).
With sed
, you could write it:
sed -ne '/pat1/{/pat2/!p;d;}' -e '/pat2/p'
Your first solution fails withgrep: the -P option only supports a single pattern
, at least on every system I have access to. +1 for your second solution, though.
– Chris
6 hours ago
@Chris, you're right. That seems to be a limitation specific to GNUgrep
.pcregrep
and ast-open grep don't have that problem. I've replaced the multiple-e
with the alternation RE operator, so it should work with GNUgrep
as well now.
– Stéphane Chazelas
6 hours ago
Yes, it works fine now.
– Chris
6 hours ago
add a comment |
For the following example:
# Patterns:
# apple
# pear
# Example line
line="a_apple_apple_pear_a"
This can be done purely with grep -E
, uniq
, and wc
.
# Grep for regex pattern, sort as unique, and count the number of lines
result=$(grep -oE 'apple|pear' <<< $line | sort -u | wc -l)
If grep
is compiled with Perl regular expressions then you can match on the last occurrence instead of needing to pipe to uniq
:
# Grep for regex pattern and count the number of lines
result=$(grep -oP '(apple(?!.*apple)|pear(?!.*pear))' <<< $line | wc -l)
Output the result:
# Only one of the words exists if the result is < 2
((result > 0)) &&
if (($result < 2)); then
echo Only one word matched
else
echo Both words matched
fi
A one-liner:
(($(grep -oP '(apple(?!.*apple)|pear(?!.*pear))' <<< $line | wc -l) == 1)) && echo Only one word matched
If you don't want to hard-code the pattern, assembling it with a variable set of elements can be automated with a function.
This can also be done natively in Bash as a function without pipes or additional processes but would be more involved and is probably outside the scope of your question.
add a comment |
Without knowing Perl or Awk or other tools, grep can also do it:
grep -e pattern1 -e pattern2 | grep -v pattern1.*pattern2
This will first select lines with either (using -e
to specify each pattern), then pipe it to another grep command which filters out (-v
is inverse matching) any lines containing pattern2 after pattern1.
If you want, you can add another pipe with | grep -v pattern2.*pattern1
to also filter lines that contain pattern1 after pattern2.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Trasmos is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497674%2fhow-do-i-grep-for-lines-containing-either-of-two-words-but-not-both%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
7 Answers
7
active
oldest
votes
7 Answers
7
active
oldest
votes
active
oldest
votes
active
oldest
votes
A tool other than grep
is the way to go.
Using perl, for instance, the command would be:
perl -ne 'print if /pattern1/ xor /pattern2/'
perl -ne
runs the command given over each line of stdin, which in this case prints the line if it matches /pattern1/ xor /pattern2/
, or in other words matches one pattern but not the other (exclusive or).
This works for the pattern in either order, and should have better performance than multiple invocations of grep
, and is less typing as well.
Or, even shorter, with awk:
awk 'xor(/pattern1/,/pattern2/)'
or for versions of awk that don't have xor
:
awk '/pattern1/+/pattern2/==1`
3
Nice - is the Awkxor
available in GNU Awk only?
– steeldriver
yesterday
5
@steeldriver I think it's GNU only, yes. Or at least it's missing on older versions. You can replace it with/pattern1/+/pattern2/==1
irxor
is missing.
– Chris
yesterday
Just curious, how could those methods be modified to be word-senstive? The OP uses the phrase "two words".
– Jim L.
yesterday
3
@JimL. You could put word boundaries (b
) in the patterns themselves, i.e.bwordb
.
– wjandrea
yesterday
@Chris Are there any implementations one would run into that does have zero width look behind assertions and does not have a word boundary function?
– Caleb
14 hours ago
add a comment |
A tool other than grep
is the way to go.
Using perl, for instance, the command would be:
perl -ne 'print if /pattern1/ xor /pattern2/'
perl -ne
runs the command given over each line of stdin, which in this case prints the line if it matches /pattern1/ xor /pattern2/
, or in other words matches one pattern but not the other (exclusive or).
This works for the pattern in either order, and should have better performance than multiple invocations of grep
, and is less typing as well.
Or, even shorter, with awk:
awk 'xor(/pattern1/,/pattern2/)'
or for versions of awk that don't have xor
:
awk '/pattern1/+/pattern2/==1`
3
Nice - is the Awkxor
available in GNU Awk only?
– steeldriver
yesterday
5
@steeldriver I think it's GNU only, yes. Or at least it's missing on older versions. You can replace it with/pattern1/+/pattern2/==1
irxor
is missing.
– Chris
yesterday
Just curious, how could those methods be modified to be word-senstive? The OP uses the phrase "two words".
– Jim L.
yesterday
3
@JimL. You could put word boundaries (b
) in the patterns themselves, i.e.bwordb
.
– wjandrea
yesterday
@Chris Are there any implementations one would run into that does have zero width look behind assertions and does not have a word boundary function?
– Caleb
14 hours ago
add a comment |
A tool other than grep
is the way to go.
Using perl, for instance, the command would be:
perl -ne 'print if /pattern1/ xor /pattern2/'
perl -ne
runs the command given over each line of stdin, which in this case prints the line if it matches /pattern1/ xor /pattern2/
, or in other words matches one pattern but not the other (exclusive or).
This works for the pattern in either order, and should have better performance than multiple invocations of grep
, and is less typing as well.
Or, even shorter, with awk:
awk 'xor(/pattern1/,/pattern2/)'
or for versions of awk that don't have xor
:
awk '/pattern1/+/pattern2/==1`
A tool other than grep
is the way to go.
Using perl, for instance, the command would be:
perl -ne 'print if /pattern1/ xor /pattern2/'
perl -ne
runs the command given over each line of stdin, which in this case prints the line if it matches /pattern1/ xor /pattern2/
, or in other words matches one pattern but not the other (exclusive or).
This works for the pattern in either order, and should have better performance than multiple invocations of grep
, and is less typing as well.
Or, even shorter, with awk:
awk 'xor(/pattern1/,/pattern2/)'
or for versions of awk that don't have xor
:
awk '/pattern1/+/pattern2/==1`
edited yesterday
answered yesterday
ChrisChris
800514
800514
3
Nice - is the Awkxor
available in GNU Awk only?
– steeldriver
yesterday
5
@steeldriver I think it's GNU only, yes. Or at least it's missing on older versions. You can replace it with/pattern1/+/pattern2/==1
irxor
is missing.
– Chris
yesterday
Just curious, how could those methods be modified to be word-senstive? The OP uses the phrase "two words".
– Jim L.
yesterday
3
@JimL. You could put word boundaries (b
) in the patterns themselves, i.e.bwordb
.
– wjandrea
yesterday
@Chris Are there any implementations one would run into that does have zero width look behind assertions and does not have a word boundary function?
– Caleb
14 hours ago
add a comment |
3
Nice - is the Awkxor
available in GNU Awk only?
– steeldriver
yesterday
5
@steeldriver I think it's GNU only, yes. Or at least it's missing on older versions. You can replace it with/pattern1/+/pattern2/==1
irxor
is missing.
– Chris
yesterday
Just curious, how could those methods be modified to be word-senstive? The OP uses the phrase "two words".
– Jim L.
yesterday
3
@JimL. You could put word boundaries (b
) in the patterns themselves, i.e.bwordb
.
– wjandrea
yesterday
@Chris Are there any implementations one would run into that does have zero width look behind assertions and does not have a word boundary function?
– Caleb
14 hours ago
3
3
Nice - is the Awk
xor
available in GNU Awk only?– steeldriver
yesterday
Nice - is the Awk
xor
available in GNU Awk only?– steeldriver
yesterday
5
5
@steeldriver I think it's GNU only, yes. Or at least it's missing on older versions. You can replace it with
/pattern1/+/pattern2/==1
ir xor
is missing.– Chris
yesterday
@steeldriver I think it's GNU only, yes. Or at least it's missing on older versions. You can replace it with
/pattern1/+/pattern2/==1
ir xor
is missing.– Chris
yesterday
Just curious, how could those methods be modified to be word-senstive? The OP uses the phrase "two words".
– Jim L.
yesterday
Just curious, how could those methods be modified to be word-senstive? The OP uses the phrase "two words".
– Jim L.
yesterday
3
3
@JimL. You could put word boundaries (
b
) in the patterns themselves, i.e. bwordb
.– wjandrea
yesterday
@JimL. You could put word boundaries (
b
) in the patterns themselves, i.e. bwordb
.– wjandrea
yesterday
@Chris Are there any implementations one would run into that does have zero width look behind assertions and does not have a word boundary function?
– Caleb
14 hours ago
@Chris Are there any implementations one would run into that does have zero width look behind assertions and does not have a word boundary function?
– Caleb
14 hours ago
add a comment |
With GNU grep
, you could pass both words to grep
and then remove the lines containing both the patterns.
$ cat testfile.txt
abc
def
abc def
abc 123 def
1234
5678
1234 def abc
def abc
$ grep -w -e 'abc' -e 'def' testfile.txt | grep -v -e 'abc.*def' -e 'def.*abc'
abc
def
@Chris Thanks for the feedback, I've edited my answer.
– Haxiel
yesterday
add a comment |
With GNU grep
, you could pass both words to grep
and then remove the lines containing both the patterns.
$ cat testfile.txt
abc
def
abc def
abc 123 def
1234
5678
1234 def abc
def abc
$ grep -w -e 'abc' -e 'def' testfile.txt | grep -v -e 'abc.*def' -e 'def.*abc'
abc
def
@Chris Thanks for the feedback, I've edited my answer.
– Haxiel
yesterday
add a comment |
With GNU grep
, you could pass both words to grep
and then remove the lines containing both the patterns.
$ cat testfile.txt
abc
def
abc def
abc 123 def
1234
5678
1234 def abc
def abc
$ grep -w -e 'abc' -e 'def' testfile.txt | grep -v -e 'abc.*def' -e 'def.*abc'
abc
def
With GNU grep
, you could pass both words to grep
and then remove the lines containing both the patterns.
$ cat testfile.txt
abc
def
abc def
abc 123 def
1234
5678
1234 def abc
def abc
$ grep -w -e 'abc' -e 'def' testfile.txt | grep -v -e 'abc.*def' -e 'def.*abc'
abc
def
edited yesterday
answered yesterday
HaxielHaxiel
2,1451711
2,1451711
@Chris Thanks for the feedback, I've edited my answer.
– Haxiel
yesterday
add a comment |
@Chris Thanks for the feedback, I've edited my answer.
– Haxiel
yesterday
@Chris Thanks for the feedback, I've edited my answer.
– Haxiel
yesterday
@Chris Thanks for the feedback, I've edited my answer.
– Haxiel
yesterday
add a comment |
Try with egrep
egrep 'pattern1|pattern2' file | grep -v -e 'pattern1.*pattern2' -e 'pattern2.*pattern1'
2
can also be written asgrep -e foo -e bar | grep -v -e 'foo.*bar' -e 'bar.*foo'
– glenn jackman
yesterday
7
Also, note from the grep man page:Direct invocation as either egrep or fgrep is deprecated
-- prefergrep -E
– glenn jackman
yesterday
That isn't in my OS @glennjackman
– Grump
yesterday
I'm on linux, so GNU coreutils
– glenn jackman
yesterday
add a comment |
Try with egrep
egrep 'pattern1|pattern2' file | grep -v -e 'pattern1.*pattern2' -e 'pattern2.*pattern1'
2
can also be written asgrep -e foo -e bar | grep -v -e 'foo.*bar' -e 'bar.*foo'
– glenn jackman
yesterday
7
Also, note from the grep man page:Direct invocation as either egrep or fgrep is deprecated
-- prefergrep -E
– glenn jackman
yesterday
That isn't in my OS @glennjackman
– Grump
yesterday
I'm on linux, so GNU coreutils
– glenn jackman
yesterday
add a comment |
Try with egrep
egrep 'pattern1|pattern2' file | grep -v -e 'pattern1.*pattern2' -e 'pattern2.*pattern1'
Try with egrep
egrep 'pattern1|pattern2' file | grep -v -e 'pattern1.*pattern2' -e 'pattern2.*pattern1'
edited yesterday
answered yesterday
msp9011msp9011
4,23144065
4,23144065
2
can also be written asgrep -e foo -e bar | grep -v -e 'foo.*bar' -e 'bar.*foo'
– glenn jackman
yesterday
7
Also, note from the grep man page:Direct invocation as either egrep or fgrep is deprecated
-- prefergrep -E
– glenn jackman
yesterday
That isn't in my OS @glennjackman
– Grump
yesterday
I'm on linux, so GNU coreutils
– glenn jackman
yesterday
add a comment |
2
can also be written asgrep -e foo -e bar | grep -v -e 'foo.*bar' -e 'bar.*foo'
– glenn jackman
yesterday
7
Also, note from the grep man page:Direct invocation as either egrep or fgrep is deprecated
-- prefergrep -E
– glenn jackman
yesterday
That isn't in my OS @glennjackman
– Grump
yesterday
I'm on linux, so GNU coreutils
– glenn jackman
yesterday
2
2
can also be written as
grep -e foo -e bar | grep -v -e 'foo.*bar' -e 'bar.*foo'
– glenn jackman
yesterday
can also be written as
grep -e foo -e bar | grep -v -e 'foo.*bar' -e 'bar.*foo'
– glenn jackman
yesterday
7
7
Also, note from the grep man page:
Direct invocation as either egrep or fgrep is deprecated
-- prefer grep -E
– glenn jackman
yesterday
Also, note from the grep man page:
Direct invocation as either egrep or fgrep is deprecated
-- prefer grep -E
– glenn jackman
yesterday
That isn't in my OS @glennjackman
– Grump
yesterday
That isn't in my OS @glennjackman
– Grump
yesterday
I'm on linux, so GNU coreutils
– glenn jackman
yesterday
I'm on linux, so GNU coreutils
– glenn jackman
yesterday
add a comment |
In Boolean terms, you're looking for A xor B, which can be written as
(A and not B)
or
(B and not A)
Given that your question doesn't mention that you are concerned with the order of the output so long as the matching lines are shown, the Boolean expansion of A xor B is pretty darn simple in grep:
$ cat << EOF > foo
> a b
> a
> b
> c a
> c b
> b a
> b c
> EOF
$ grep -w 'a' foo | grep -vw 'b'; grep -w 'b' foo | grep -vw 'a';
a
c a
b
c b
b c
1
This works, but it will scramble the order of the file.
– Sparhawk
yesterday
@Sparhawk True, although "scramble" is a harsh word. ;) it lists all the 'a' matches first, in order, then all the 'b' matches next, in order. The OP didn't express any interest in maintaining the order, just show the lines. FAWK, the next step could besort | uniq
.
– Jim L.
yesterday
Fair call; I agree my language was inaccurate. I meant to imply that the original order would be changed.
– Sparhawk
yesterday
1
@Sparhawk ... And I edited in your observation for full disclosure.
– Jim L.
yesterday
add a comment |
In Boolean terms, you're looking for A xor B, which can be written as
(A and not B)
or
(B and not A)
Given that your question doesn't mention that you are concerned with the order of the output so long as the matching lines are shown, the Boolean expansion of A xor B is pretty darn simple in grep:
$ cat << EOF > foo
> a b
> a
> b
> c a
> c b
> b a
> b c
> EOF
$ grep -w 'a' foo | grep -vw 'b'; grep -w 'b' foo | grep -vw 'a';
a
c a
b
c b
b c
1
This works, but it will scramble the order of the file.
– Sparhawk
yesterday
@Sparhawk True, although "scramble" is a harsh word. ;) it lists all the 'a' matches first, in order, then all the 'b' matches next, in order. The OP didn't express any interest in maintaining the order, just show the lines. FAWK, the next step could besort | uniq
.
– Jim L.
yesterday
Fair call; I agree my language was inaccurate. I meant to imply that the original order would be changed.
– Sparhawk
yesterday
1
@Sparhawk ... And I edited in your observation for full disclosure.
– Jim L.
yesterday
add a comment |
In Boolean terms, you're looking for A xor B, which can be written as
(A and not B)
or
(B and not A)
Given that your question doesn't mention that you are concerned with the order of the output so long as the matching lines are shown, the Boolean expansion of A xor B is pretty darn simple in grep:
$ cat << EOF > foo
> a b
> a
> b
> c a
> c b
> b a
> b c
> EOF
$ grep -w 'a' foo | grep -vw 'b'; grep -w 'b' foo | grep -vw 'a';
a
c a
b
c b
b c
In Boolean terms, you're looking for A xor B, which can be written as
(A and not B)
or
(B and not A)
Given that your question doesn't mention that you are concerned with the order of the output so long as the matching lines are shown, the Boolean expansion of A xor B is pretty darn simple in grep:
$ cat << EOF > foo
> a b
> a
> b
> c a
> c b
> b a
> b c
> EOF
$ grep -w 'a' foo | grep -vw 'b'; grep -w 'b' foo | grep -vw 'a';
a
c a
b
c b
b c
edited yesterday
answered yesterday
Jim L.Jim L.
1213
1213
1
This works, but it will scramble the order of the file.
– Sparhawk
yesterday
@Sparhawk True, although "scramble" is a harsh word. ;) it lists all the 'a' matches first, in order, then all the 'b' matches next, in order. The OP didn't express any interest in maintaining the order, just show the lines. FAWK, the next step could besort | uniq
.
– Jim L.
yesterday
Fair call; I agree my language was inaccurate. I meant to imply that the original order would be changed.
– Sparhawk
yesterday
1
@Sparhawk ... And I edited in your observation for full disclosure.
– Jim L.
yesterday
add a comment |
1
This works, but it will scramble the order of the file.
– Sparhawk
yesterday
@Sparhawk True, although "scramble" is a harsh word. ;) it lists all the 'a' matches first, in order, then all the 'b' matches next, in order. The OP didn't express any interest in maintaining the order, just show the lines. FAWK, the next step could besort | uniq
.
– Jim L.
yesterday
Fair call; I agree my language was inaccurate. I meant to imply that the original order would be changed.
– Sparhawk
yesterday
1
@Sparhawk ... And I edited in your observation for full disclosure.
– Jim L.
yesterday
1
1
This works, but it will scramble the order of the file.
– Sparhawk
yesterday
This works, but it will scramble the order of the file.
– Sparhawk
yesterday
@Sparhawk True, although "scramble" is a harsh word. ;) it lists all the 'a' matches first, in order, then all the 'b' matches next, in order. The OP didn't express any interest in maintaining the order, just show the lines. FAWK, the next step could be
sort | uniq
.– Jim L.
yesterday
@Sparhawk True, although "scramble" is a harsh word. ;) it lists all the 'a' matches first, in order, then all the 'b' matches next, in order. The OP didn't express any interest in maintaining the order, just show the lines. FAWK, the next step could be
sort | uniq
.– Jim L.
yesterday
Fair call; I agree my language was inaccurate. I meant to imply that the original order would be changed.
– Sparhawk
yesterday
Fair call; I agree my language was inaccurate. I meant to imply that the original order would be changed.
– Sparhawk
yesterday
1
1
@Sparhawk ... And I edited in your observation for full disclosure.
– Jim L.
yesterday
@Sparhawk ... And I edited in your observation for full disclosure.
– Jim L.
yesterday
add a comment |
With grep
implementations that support perl-like regular expressions (like pcregrep
or GNU grep -P
), you can do it in one grep
invocation with:
grep -P '^(?=.*pat1)(?!.*pat2)|^(?=.*pat2)(?!.*pat1)'
That is find the lines that match pat1
but not pat2
, or pat2
but not pat1
.
(?=...)
and (?!...)
are respectively look ahead and negative look ahead operators. So technically, the above looks for the beginning of the subject (^
) provided it's followed by .*pat1
and not followed by .*pat2
, or the same with pat1
and pat2
reversed.
That's suboptimal for lines that contain both patterns as they would then be looked for twice. You could instead use more advanced perl operators like:
grep -P '^(?=.*pat1|())(?(1)(?=.*pat2)|(?!.*pat2))'
(?(1)yespattern|nopattern)
matches against yespattern
if the 1
st capture group (empty ()
above) matched, and nopattern
otherwise. If that ()
matches, that means pat1
didn't match, so we look for pat2
(positive look ahead), and we look for not pat2
otherwise (negative look ahead).
With sed
, you could write it:
sed -ne '/pat1/{/pat2/!p;d;}' -e '/pat2/p'
Your first solution fails withgrep: the -P option only supports a single pattern
, at least on every system I have access to. +1 for your second solution, though.
– Chris
6 hours ago
@Chris, you're right. That seems to be a limitation specific to GNUgrep
.pcregrep
and ast-open grep don't have that problem. I've replaced the multiple-e
with the alternation RE operator, so it should work with GNUgrep
as well now.
– Stéphane Chazelas
6 hours ago
Yes, it works fine now.
– Chris
6 hours ago
add a comment |
With grep
implementations that support perl-like regular expressions (like pcregrep
or GNU grep -P
), you can do it in one grep
invocation with:
grep -P '^(?=.*pat1)(?!.*pat2)|^(?=.*pat2)(?!.*pat1)'
That is find the lines that match pat1
but not pat2
, or pat2
but not pat1
.
(?=...)
and (?!...)
are respectively look ahead and negative look ahead operators. So technically, the above looks for the beginning of the subject (^
) provided it's followed by .*pat1
and not followed by .*pat2
, or the same with pat1
and pat2
reversed.
That's suboptimal for lines that contain both patterns as they would then be looked for twice. You could instead use more advanced perl operators like:
grep -P '^(?=.*pat1|())(?(1)(?=.*pat2)|(?!.*pat2))'
(?(1)yespattern|nopattern)
matches against yespattern
if the 1
st capture group (empty ()
above) matched, and nopattern
otherwise. If that ()
matches, that means pat1
didn't match, so we look for pat2
(positive look ahead), and we look for not pat2
otherwise (negative look ahead).
With sed
, you could write it:
sed -ne '/pat1/{/pat2/!p;d;}' -e '/pat2/p'
Your first solution fails withgrep: the -P option only supports a single pattern
, at least on every system I have access to. +1 for your second solution, though.
– Chris
6 hours ago
@Chris, you're right. That seems to be a limitation specific to GNUgrep
.pcregrep
and ast-open grep don't have that problem. I've replaced the multiple-e
with the alternation RE operator, so it should work with GNUgrep
as well now.
– Stéphane Chazelas
6 hours ago
Yes, it works fine now.
– Chris
6 hours ago
add a comment |
With grep
implementations that support perl-like regular expressions (like pcregrep
or GNU grep -P
), you can do it in one grep
invocation with:
grep -P '^(?=.*pat1)(?!.*pat2)|^(?=.*pat2)(?!.*pat1)'
That is find the lines that match pat1
but not pat2
, or pat2
but not pat1
.
(?=...)
and (?!...)
are respectively look ahead and negative look ahead operators. So technically, the above looks for the beginning of the subject (^
) provided it's followed by .*pat1
and not followed by .*pat2
, or the same with pat1
and pat2
reversed.
That's suboptimal for lines that contain both patterns as they would then be looked for twice. You could instead use more advanced perl operators like:
grep -P '^(?=.*pat1|())(?(1)(?=.*pat2)|(?!.*pat2))'
(?(1)yespattern|nopattern)
matches against yespattern
if the 1
st capture group (empty ()
above) matched, and nopattern
otherwise. If that ()
matches, that means pat1
didn't match, so we look for pat2
(positive look ahead), and we look for not pat2
otherwise (negative look ahead).
With sed
, you could write it:
sed -ne '/pat1/{/pat2/!p;d;}' -e '/pat2/p'
With grep
implementations that support perl-like regular expressions (like pcregrep
or GNU grep -P
), you can do it in one grep
invocation with:
grep -P '^(?=.*pat1)(?!.*pat2)|^(?=.*pat2)(?!.*pat1)'
That is find the lines that match pat1
but not pat2
, or pat2
but not pat1
.
(?=...)
and (?!...)
are respectively look ahead and negative look ahead operators. So technically, the above looks for the beginning of the subject (^
) provided it's followed by .*pat1
and not followed by .*pat2
, or the same with pat1
and pat2
reversed.
That's suboptimal for lines that contain both patterns as they would then be looked for twice. You could instead use more advanced perl operators like:
grep -P '^(?=.*pat1|())(?(1)(?=.*pat2)|(?!.*pat2))'
(?(1)yespattern|nopattern)
matches against yespattern
if the 1
st capture group (empty ()
above) matched, and nopattern
otherwise. If that ()
matches, that means pat1
didn't match, so we look for pat2
(positive look ahead), and we look for not pat2
otherwise (negative look ahead).
With sed
, you could write it:
sed -ne '/pat1/{/pat2/!p;d;}' -e '/pat2/p'
edited 6 hours ago
answered 14 hours ago
Stéphane ChazelasStéphane Chazelas
303k57570926
303k57570926
Your first solution fails withgrep: the -P option only supports a single pattern
, at least on every system I have access to. +1 for your second solution, though.
– Chris
6 hours ago
@Chris, you're right. That seems to be a limitation specific to GNUgrep
.pcregrep
and ast-open grep don't have that problem. I've replaced the multiple-e
with the alternation RE operator, so it should work with GNUgrep
as well now.
– Stéphane Chazelas
6 hours ago
Yes, it works fine now.
– Chris
6 hours ago
add a comment |
Your first solution fails withgrep: the -P option only supports a single pattern
, at least on every system I have access to. +1 for your second solution, though.
– Chris
6 hours ago
@Chris, you're right. That seems to be a limitation specific to GNUgrep
.pcregrep
and ast-open grep don't have that problem. I've replaced the multiple-e
with the alternation RE operator, so it should work with GNUgrep
as well now.
– Stéphane Chazelas
6 hours ago
Yes, it works fine now.
– Chris
6 hours ago
Your first solution fails with
grep: the -P option only supports a single pattern
, at least on every system I have access to. +1 for your second solution, though.– Chris
6 hours ago
Your first solution fails with
grep: the -P option only supports a single pattern
, at least on every system I have access to. +1 for your second solution, though.– Chris
6 hours ago
@Chris, you're right. That seems to be a limitation specific to GNU
grep
. pcregrep
and ast-open grep don't have that problem. I've replaced the multiple -e
with the alternation RE operator, so it should work with GNU grep
as well now.– Stéphane Chazelas
6 hours ago
@Chris, you're right. That seems to be a limitation specific to GNU
grep
. pcregrep
and ast-open grep don't have that problem. I've replaced the multiple -e
with the alternation RE operator, so it should work with GNU grep
as well now.– Stéphane Chazelas
6 hours ago
Yes, it works fine now.
– Chris
6 hours ago
Yes, it works fine now.
– Chris
6 hours ago
add a comment |
For the following example:
# Patterns:
# apple
# pear
# Example line
line="a_apple_apple_pear_a"
This can be done purely with grep -E
, uniq
, and wc
.
# Grep for regex pattern, sort as unique, and count the number of lines
result=$(grep -oE 'apple|pear' <<< $line | sort -u | wc -l)
If grep
is compiled with Perl regular expressions then you can match on the last occurrence instead of needing to pipe to uniq
:
# Grep for regex pattern and count the number of lines
result=$(grep -oP '(apple(?!.*apple)|pear(?!.*pear))' <<< $line | wc -l)
Output the result:
# Only one of the words exists if the result is < 2
((result > 0)) &&
if (($result < 2)); then
echo Only one word matched
else
echo Both words matched
fi
A one-liner:
(($(grep -oP '(apple(?!.*apple)|pear(?!.*pear))' <<< $line | wc -l) == 1)) && echo Only one word matched
If you don't want to hard-code the pattern, assembling it with a variable set of elements can be automated with a function.
This can also be done natively in Bash as a function without pipes or additional processes but would be more involved and is probably outside the scope of your question.
add a comment |
For the following example:
# Patterns:
# apple
# pear
# Example line
line="a_apple_apple_pear_a"
This can be done purely with grep -E
, uniq
, and wc
.
# Grep for regex pattern, sort as unique, and count the number of lines
result=$(grep -oE 'apple|pear' <<< $line | sort -u | wc -l)
If grep
is compiled with Perl regular expressions then you can match on the last occurrence instead of needing to pipe to uniq
:
# Grep for regex pattern and count the number of lines
result=$(grep -oP '(apple(?!.*apple)|pear(?!.*pear))' <<< $line | wc -l)
Output the result:
# Only one of the words exists if the result is < 2
((result > 0)) &&
if (($result < 2)); then
echo Only one word matched
else
echo Both words matched
fi
A one-liner:
(($(grep -oP '(apple(?!.*apple)|pear(?!.*pear))' <<< $line | wc -l) == 1)) && echo Only one word matched
If you don't want to hard-code the pattern, assembling it with a variable set of elements can be automated with a function.
This can also be done natively in Bash as a function without pipes or additional processes but would be more involved and is probably outside the scope of your question.
add a comment |
For the following example:
# Patterns:
# apple
# pear
# Example line
line="a_apple_apple_pear_a"
This can be done purely with grep -E
, uniq
, and wc
.
# Grep for regex pattern, sort as unique, and count the number of lines
result=$(grep -oE 'apple|pear' <<< $line | sort -u | wc -l)
If grep
is compiled with Perl regular expressions then you can match on the last occurrence instead of needing to pipe to uniq
:
# Grep for regex pattern and count the number of lines
result=$(grep -oP '(apple(?!.*apple)|pear(?!.*pear))' <<< $line | wc -l)
Output the result:
# Only one of the words exists if the result is < 2
((result > 0)) &&
if (($result < 2)); then
echo Only one word matched
else
echo Both words matched
fi
A one-liner:
(($(grep -oP '(apple(?!.*apple)|pear(?!.*pear))' <<< $line | wc -l) == 1)) && echo Only one word matched
If you don't want to hard-code the pattern, assembling it with a variable set of elements can be automated with a function.
This can also be done natively in Bash as a function without pipes or additional processes but would be more involved and is probably outside the scope of your question.
For the following example:
# Patterns:
# apple
# pear
# Example line
line="a_apple_apple_pear_a"
This can be done purely with grep -E
, uniq
, and wc
.
# Grep for regex pattern, sort as unique, and count the number of lines
result=$(grep -oE 'apple|pear' <<< $line | sort -u | wc -l)
If grep
is compiled with Perl regular expressions then you can match on the last occurrence instead of needing to pipe to uniq
:
# Grep for regex pattern and count the number of lines
result=$(grep -oP '(apple(?!.*apple)|pear(?!.*pear))' <<< $line | wc -l)
Output the result:
# Only one of the words exists if the result is < 2
((result > 0)) &&
if (($result < 2)); then
echo Only one word matched
else
echo Both words matched
fi
A one-liner:
(($(grep -oP '(apple(?!.*apple)|pear(?!.*pear))' <<< $line | wc -l) == 1)) && echo Only one word matched
If you don't want to hard-code the pattern, assembling it with a variable set of elements can be automated with a function.
This can also be done natively in Bash as a function without pipes or additional processes but would be more involved and is probably outside the scope of your question.
edited 21 hours ago
answered yesterday
ZhroZhro
342413
342413
add a comment |
add a comment |
Without knowing Perl or Awk or other tools, grep can also do it:
grep -e pattern1 -e pattern2 | grep -v pattern1.*pattern2
This will first select lines with either (using -e
to specify each pattern), then pipe it to another grep command which filters out (-v
is inverse matching) any lines containing pattern2 after pattern1.
If you want, you can add another pipe with | grep -v pattern2.*pattern1
to also filter lines that contain pattern1 after pattern2.
add a comment |
Without knowing Perl or Awk or other tools, grep can also do it:
grep -e pattern1 -e pattern2 | grep -v pattern1.*pattern2
This will first select lines with either (using -e
to specify each pattern), then pipe it to another grep command which filters out (-v
is inverse matching) any lines containing pattern2 after pattern1.
If you want, you can add another pipe with | grep -v pattern2.*pattern1
to also filter lines that contain pattern1 after pattern2.
add a comment |
Without knowing Perl or Awk or other tools, grep can also do it:
grep -e pattern1 -e pattern2 | grep -v pattern1.*pattern2
This will first select lines with either (using -e
to specify each pattern), then pipe it to another grep command which filters out (-v
is inverse matching) any lines containing pattern2 after pattern1.
If you want, you can add another pipe with | grep -v pattern2.*pattern1
to also filter lines that contain pattern1 after pattern2.
Without knowing Perl or Awk or other tools, grep can also do it:
grep -e pattern1 -e pattern2 | grep -v pattern1.*pattern2
This will first select lines with either (using -e
to specify each pattern), then pipe it to another grep command which filters out (-v
is inverse matching) any lines containing pattern2 after pattern1.
If you want, you can add another pipe with | grep -v pattern2.*pattern1
to also filter lines that contain pattern1 after pattern2.
answered 15 hours ago
LucLuc
9261817
9261817
add a comment |
add a comment |
Trasmos is a new contributor. Be nice, and check out our Code of Conduct.
Trasmos is a new contributor. Be nice, and check out our Code of Conduct.
Trasmos is a new contributor. Be nice, and check out our Code of Conduct.
Trasmos is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497674%2fhow-do-i-grep-for-lines-containing-either-of-two-words-but-not-both%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
(1) You talk about “words” and “patterns”. Which is it? Ordinary words like “quick”, “brown” and “fox”, or regular expressions like
[a-z][a-z0-9](,7}(.[a-z0-9]{,3})+
? (2) What if one of the words / patterns appears more than once in a line (and the other one doesn’t appear)? Is that equivalent to the word appearing once, or does it count as multiple occurrences?– G-Man
yesterday