How to find the category of a key words from log file using python script?
my input file
INCIDENT 677700 password reset
INCIDENT 677742 C: drive full
INCIDENT 500901 mouse not working
INCIDENT 500942 unable to connect oracle box
INCIDENT 500949 high cpu utilization
INCIDENT 600901 sql server clustering failed
INCIDENT 490203 Low disk space issue
INCIDENT 10I891 Lotus Notes client failed
INCIDENT 489011 Low disk space issue
INCIDENT 89G901 SSIS Load failed
words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]
I would like assign category in my output file should be: and how to add future words.
Password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
Oracle,500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space, INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed
python
add a comment |
my input file
INCIDENT 677700 password reset
INCIDENT 677742 C: drive full
INCIDENT 500901 mouse not working
INCIDENT 500942 unable to connect oracle box
INCIDENT 500949 high cpu utilization
INCIDENT 600901 sql server clustering failed
INCIDENT 490203 Low disk space issue
INCIDENT 10I891 Lotus Notes client failed
INCIDENT 489011 Low disk space issue
INCIDENT 89G901 SSIS Load failed
words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]
I would like assign category in my output file should be: and how to add future words.
Password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
Oracle,500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space, INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed
python
So, what have you tried?
– Andreas
Nov 21 '18 at 4:37
What if there are two keywords in one entry?
– NotAnAmbiTurner
Nov 21 '18 at 5:00
add a comment |
my input file
INCIDENT 677700 password reset
INCIDENT 677742 C: drive full
INCIDENT 500901 mouse not working
INCIDENT 500942 unable to connect oracle box
INCIDENT 500949 high cpu utilization
INCIDENT 600901 sql server clustering failed
INCIDENT 490203 Low disk space issue
INCIDENT 10I891 Lotus Notes client failed
INCIDENT 489011 Low disk space issue
INCIDENT 89G901 SSIS Load failed
words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]
I would like assign category in my output file should be: and how to add future words.
Password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
Oracle,500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space, INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed
python
my input file
INCIDENT 677700 password reset
INCIDENT 677742 C: drive full
INCIDENT 500901 mouse not working
INCIDENT 500942 unable to connect oracle box
INCIDENT 500949 high cpu utilization
INCIDENT 600901 sql server clustering failed
INCIDENT 490203 Low disk space issue
INCIDENT 10I891 Lotus Notes client failed
INCIDENT 489011 Low disk space issue
INCIDENT 89G901 SSIS Load failed
words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]
I would like assign category in my output file should be: and how to add future words.
Password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
Oracle,500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space, INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed
python
python
edited Nov 21 '18 at 5:07
DYZ
26.2k61948
26.2k61948
asked Nov 21 '18 at 4:34
GowerGower
405
405
So, what have you tried?
– Andreas
Nov 21 '18 at 4:37
What if there are two keywords in one entry?
– NotAnAmbiTurner
Nov 21 '18 at 5:00
add a comment |
So, what have you tried?
– Andreas
Nov 21 '18 at 4:37
What if there are two keywords in one entry?
– NotAnAmbiTurner
Nov 21 '18 at 5:00
So, what have you tried?
– Andreas
Nov 21 '18 at 4:37
So, what have you tried?
– Andreas
Nov 21 '18 at 4:37
What if there are two keywords in one entry?
– NotAnAmbiTurner
Nov 21 '18 at 5:00
What if there are two keywords in one entry?
– NotAnAmbiTurner
Nov 21 '18 at 5:00
add a comment |
1 Answer
1
active
oldest
votes
Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line
to the output file.
Some assumptions:
- You want to do caseless matching e.g.
Disk space
anddisk space
would match. - Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.
Demo:
words = [
"password",
"drive full",
"disk space",
"SSIS",
"sql server",
"cpu utilization",
"oracle",
"Lotus Notes",
"mouse",
]
with open("input.txt") as file, open("output.txt", "w") as out:
for line in file:
for word in words:
# Do lowercase matching
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))
output.txt:
password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
oracle,INCIDENT 500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space,INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed
You can also condense the two nested loops with itertools.product()
:
from itertools import product
with open("input.txt") as file, open("output.txt", "w") as out:
for line, word in product(file, words):
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))
without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?
– Gower
Nov 21 '18 at 6:09
How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?
– Gower
Nov 21 '18 at 10:01
@Gower Where are you getting these features from?
– RoadRunner
Nov 21 '18 at 10:05
Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.
– Gower
Nov 21 '18 at 10:26
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53405316%2fhow-to-find-the-category-of-a-key-words-from-log-file-using-python-script%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line
to the output file.
Some assumptions:
- You want to do caseless matching e.g.
Disk space
anddisk space
would match. - Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.
Demo:
words = [
"password",
"drive full",
"disk space",
"SSIS",
"sql server",
"cpu utilization",
"oracle",
"Lotus Notes",
"mouse",
]
with open("input.txt") as file, open("output.txt", "w") as out:
for line in file:
for word in words:
# Do lowercase matching
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))
output.txt:
password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
oracle,INCIDENT 500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space,INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed
You can also condense the two nested loops with itertools.product()
:
from itertools import product
with open("input.txt") as file, open("output.txt", "w") as out:
for line, word in product(file, words):
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))
without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?
– Gower
Nov 21 '18 at 6:09
How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?
– Gower
Nov 21 '18 at 10:01
@Gower Where are you getting these features from?
– RoadRunner
Nov 21 '18 at 10:05
Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.
– Gower
Nov 21 '18 at 10:26
add a comment |
Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line
to the output file.
Some assumptions:
- You want to do caseless matching e.g.
Disk space
anddisk space
would match. - Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.
Demo:
words = [
"password",
"drive full",
"disk space",
"SSIS",
"sql server",
"cpu utilization",
"oracle",
"Lotus Notes",
"mouse",
]
with open("input.txt") as file, open("output.txt", "w") as out:
for line in file:
for word in words:
# Do lowercase matching
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))
output.txt:
password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
oracle,INCIDENT 500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space,INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed
You can also condense the two nested loops with itertools.product()
:
from itertools import product
with open("input.txt") as file, open("output.txt", "w") as out:
for line, word in product(file, words):
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))
without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?
– Gower
Nov 21 '18 at 6:09
How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?
– Gower
Nov 21 '18 at 10:01
@Gower Where are you getting these features from?
– RoadRunner
Nov 21 '18 at 10:05
Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.
– Gower
Nov 21 '18 at 10:26
add a comment |
Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line
to the output file.
Some assumptions:
- You want to do caseless matching e.g.
Disk space
anddisk space
would match. - Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.
Demo:
words = [
"password",
"drive full",
"disk space",
"SSIS",
"sql server",
"cpu utilization",
"oracle",
"Lotus Notes",
"mouse",
]
with open("input.txt") as file, open("output.txt", "w") as out:
for line in file:
for word in words:
# Do lowercase matching
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))
output.txt:
password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
oracle,INCIDENT 500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space,INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed
You can also condense the two nested loops with itertools.product()
:
from itertools import product
with open("input.txt") as file, open("output.txt", "w") as out:
for line, word in product(file, words):
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))
Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line
to the output file.
Some assumptions:
- You want to do caseless matching e.g.
Disk space
anddisk space
would match. - Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.
Demo:
words = [
"password",
"drive full",
"disk space",
"SSIS",
"sql server",
"cpu utilization",
"oracle",
"Lotus Notes",
"mouse",
]
with open("input.txt") as file, open("output.txt", "w") as out:
for line in file:
for word in words:
# Do lowercase matching
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))
output.txt:
password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
oracle,INCIDENT 500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space,INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed
You can also condense the two nested loops with itertools.product()
:
from itertools import product
with open("input.txt") as file, open("output.txt", "w") as out:
for line, word in product(file, words):
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))
edited Nov 21 '18 at 5:06
answered Nov 21 '18 at 4:47
RoadRunnerRoadRunner
11.2k31340
11.2k31340
without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?
– Gower
Nov 21 '18 at 6:09
How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?
– Gower
Nov 21 '18 at 10:01
@Gower Where are you getting these features from?
– RoadRunner
Nov 21 '18 at 10:05
Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.
– Gower
Nov 21 '18 at 10:26
add a comment |
without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?
– Gower
Nov 21 '18 at 6:09
How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?
– Gower
Nov 21 '18 at 10:01
@Gower Where are you getting these features from?
– RoadRunner
Nov 21 '18 at 10:05
Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.
– Gower
Nov 21 '18 at 10:26
without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?
– Gower
Nov 21 '18 at 6:09
without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?
– Gower
Nov 21 '18 at 6:09
How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?
– Gower
Nov 21 '18 at 10:01
How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?
– Gower
Nov 21 '18 at 10:01
@Gower Where are you getting these features from?
– RoadRunner
Nov 21 '18 at 10:05
@Gower Where are you getting these features from?
– RoadRunner
Nov 21 '18 at 10:05
Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.
– Gower
Nov 21 '18 at 10:26
Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.
– Gower
Nov 21 '18 at 10:26
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53405316%2fhow-to-find-the-category-of-a-key-words-from-log-file-using-python-script%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
So, what have you tried?
– Andreas
Nov 21 '18 at 4:37
What if there are two keywords in one entry?
– NotAnAmbiTurner
Nov 21 '18 at 5:00