How to find the category of a key words from log file using python script?












-1















my input file



INCIDENT 677700 password reset
INCIDENT 677742 C: drive full
INCIDENT 500901 mouse not working
INCIDENT 500942 unable to connect oracle box
INCIDENT 500949 high cpu utilization
INCIDENT 600901 sql server clustering failed
INCIDENT 490203 Low disk space issue
INCIDENT 10I891 Lotus Notes client failed
INCIDENT 489011 Low disk space issue
INCIDENT 89G901 SSIS Load failed

words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]


I would like assign category in my output file should be: and how to add future words.



Password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
Oracle,500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space, INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed









share|improve this question

























  • So, what have you tried?

    – Andreas
    Nov 21 '18 at 4:37











  • What if there are two keywords in one entry?

    – NotAnAmbiTurner
    Nov 21 '18 at 5:00
















-1















my input file



INCIDENT 677700 password reset
INCIDENT 677742 C: drive full
INCIDENT 500901 mouse not working
INCIDENT 500942 unable to connect oracle box
INCIDENT 500949 high cpu utilization
INCIDENT 600901 sql server clustering failed
INCIDENT 490203 Low disk space issue
INCIDENT 10I891 Lotus Notes client failed
INCIDENT 489011 Low disk space issue
INCIDENT 89G901 SSIS Load failed

words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]


I would like assign category in my output file should be: and how to add future words.



Password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
Oracle,500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space, INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed









share|improve this question

























  • So, what have you tried?

    – Andreas
    Nov 21 '18 at 4:37











  • What if there are two keywords in one entry?

    – NotAnAmbiTurner
    Nov 21 '18 at 5:00














-1












-1








-1








my input file



INCIDENT 677700 password reset
INCIDENT 677742 C: drive full
INCIDENT 500901 mouse not working
INCIDENT 500942 unable to connect oracle box
INCIDENT 500949 high cpu utilization
INCIDENT 600901 sql server clustering failed
INCIDENT 490203 Low disk space issue
INCIDENT 10I891 Lotus Notes client failed
INCIDENT 489011 Low disk space issue
INCIDENT 89G901 SSIS Load failed

words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]


I would like assign category in my output file should be: and how to add future words.



Password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
Oracle,500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space, INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed









share|improve this question
















my input file



INCIDENT 677700 password reset
INCIDENT 677742 C: drive full
INCIDENT 500901 mouse not working
INCIDENT 500942 unable to connect oracle box
INCIDENT 500949 high cpu utilization
INCIDENT 600901 sql server clustering failed
INCIDENT 490203 Low disk space issue
INCIDENT 10I891 Lotus Notes client failed
INCIDENT 489011 Low disk space issue
INCIDENT 89G901 SSIS Load failed

words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]


I would like assign category in my output file should be: and how to add future words.



Password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
Oracle,500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space, INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed






python






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 5:07









DYZ

26.2k61948




26.2k61948










asked Nov 21 '18 at 4:34









GowerGower

405




405













  • So, what have you tried?

    – Andreas
    Nov 21 '18 at 4:37











  • What if there are two keywords in one entry?

    – NotAnAmbiTurner
    Nov 21 '18 at 5:00



















  • So, what have you tried?

    – Andreas
    Nov 21 '18 at 4:37











  • What if there are two keywords in one entry?

    – NotAnAmbiTurner
    Nov 21 '18 at 5:00

















So, what have you tried?

– Andreas
Nov 21 '18 at 4:37





So, what have you tried?

– Andreas
Nov 21 '18 at 4:37













What if there are two keywords in one entry?

– NotAnAmbiTurner
Nov 21 '18 at 5:00





What if there are two keywords in one entry?

– NotAnAmbiTurner
Nov 21 '18 at 5:00












1 Answer
1






active

oldest

votes


















0














Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line to the output file.



Some assumptions:




  • You want to do caseless matching e.g. Disk space and disk space would match.

  • Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.


Demo:



words = [
"password",
"drive full",
"disk space",
"SSIS",
"sql server",
"cpu utilization",
"oracle",
"Lotus Notes",
"mouse",
]

with open("input.txt") as file, open("output.txt", "w") as out:
for line in file:
for word in words:
# Do lowercase matching
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))


output.txt:



password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
oracle,INCIDENT 500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space,INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed


You can also condense the two nested loops with itertools.product():



from itertools import product

with open("input.txt") as file, open("output.txt", "w") as out:
for line, word in product(file, words):
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))





share|improve this answer


























  • without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?

    – Gower
    Nov 21 '18 at 6:09











  • How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?

    – Gower
    Nov 21 '18 at 10:01













  • @Gower Where are you getting these features from?

    – RoadRunner
    Nov 21 '18 at 10:05











  • Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.

    – Gower
    Nov 21 '18 at 10:26











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53405316%2fhow-to-find-the-category-of-a-key-words-from-log-file-using-python-script%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line to the output file.



Some assumptions:




  • You want to do caseless matching e.g. Disk space and disk space would match.

  • Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.


Demo:



words = [
"password",
"drive full",
"disk space",
"SSIS",
"sql server",
"cpu utilization",
"oracle",
"Lotus Notes",
"mouse",
]

with open("input.txt") as file, open("output.txt", "w") as out:
for line in file:
for word in words:
# Do lowercase matching
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))


output.txt:



password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
oracle,INCIDENT 500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space,INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed


You can also condense the two nested loops with itertools.product():



from itertools import product

with open("input.txt") as file, open("output.txt", "w") as out:
for line, word in product(file, words):
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))





share|improve this answer


























  • without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?

    – Gower
    Nov 21 '18 at 6:09











  • How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?

    – Gower
    Nov 21 '18 at 10:01













  • @Gower Where are you getting these features from?

    – RoadRunner
    Nov 21 '18 at 10:05











  • Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.

    – Gower
    Nov 21 '18 at 10:26
















0














Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line to the output file.



Some assumptions:




  • You want to do caseless matching e.g. Disk space and disk space would match.

  • Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.


Demo:



words = [
"password",
"drive full",
"disk space",
"SSIS",
"sql server",
"cpu utilization",
"oracle",
"Lotus Notes",
"mouse",
]

with open("input.txt") as file, open("output.txt", "w") as out:
for line in file:
for word in words:
# Do lowercase matching
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))


output.txt:



password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
oracle,INCIDENT 500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space,INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed


You can also condense the two nested loops with itertools.product():



from itertools import product

with open("input.txt") as file, open("output.txt", "w") as out:
for line, word in product(file, words):
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))





share|improve this answer


























  • without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?

    – Gower
    Nov 21 '18 at 6:09











  • How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?

    – Gower
    Nov 21 '18 at 10:01













  • @Gower Where are you getting these features from?

    – RoadRunner
    Nov 21 '18 at 10:05











  • Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.

    – Gower
    Nov 21 '18 at 10:26














0












0








0







Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line to the output file.



Some assumptions:




  • You want to do caseless matching e.g. Disk space and disk space would match.

  • Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.


Demo:



words = [
"password",
"drive full",
"disk space",
"SSIS",
"sql server",
"cpu utilization",
"oracle",
"Lotus Notes",
"mouse",
]

with open("input.txt") as file, open("output.txt", "w") as out:
for line in file:
for word in words:
# Do lowercase matching
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))


output.txt:



password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
oracle,INCIDENT 500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space,INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed


You can also condense the two nested loops with itertools.product():



from itertools import product

with open("input.txt") as file, open("output.txt", "w") as out:
for line, word in product(file, words):
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))





share|improve this answer















Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line to the output file.



Some assumptions:




  • You want to do caseless matching e.g. Disk space and disk space would match.

  • Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.


Demo:



words = [
"password",
"drive full",
"disk space",
"SSIS",
"sql server",
"cpu utilization",
"oracle",
"Lotus Notes",
"mouse",
]

with open("input.txt") as file, open("output.txt", "w") as out:
for line in file:
for word in words:
# Do lowercase matching
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))


output.txt:



password,INCIDENT 677700 password reset
drive full,INCIDENT 677742 C: drive full
mouse,INCIDENT 500901 mouse not working
oracle,INCIDENT 500942 unable to connect oracle box
cpu utilization,INCIDENT 500949 high cpu utilization
sql server,INCIDENT 600901 sql server clustering failed
disk space,INCIDENT 490203 Low disk space issue
Lotus Notes,INCIDENT 10I891 Lotus Notes client failed
disk space,INCIDENT 489011 Low disk space issue
SSIS,INCIDENT 89G901 SSIS Load failed


You can also condense the two nested loops with itertools.product():



from itertools import product

with open("input.txt") as file, open("output.txt", "w") as out:
for line, word in product(file, words):
if word.lower() in line.lower():
out.write("%s,%s" % (word, line))






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 21 '18 at 5:06

























answered Nov 21 '18 at 4:47









RoadRunnerRoadRunner

11.2k31340




11.2k31340













  • without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?

    – Gower
    Nov 21 '18 at 6:09











  • How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?

    – Gower
    Nov 21 '18 at 10:01













  • @Gower Where are you getting these features from?

    – RoadRunner
    Nov 21 '18 at 10:05











  • Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.

    – Gower
    Nov 21 '18 at 10:26



















  • without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?

    – Gower
    Nov 21 '18 at 6:09











  • How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?

    – Gower
    Nov 21 '18 at 10:01













  • @Gower Where are you getting these features from?

    – RoadRunner
    Nov 21 '18 at 10:05











  • Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.

    – Gower
    Nov 21 '18 at 10:26

















without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?

– Gower
Nov 21 '18 at 6:09





without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?

– Gower
Nov 21 '18 at 6:09













How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?

– Gower
Nov 21 '18 at 10:01







How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?

– Gower
Nov 21 '18 at 10:01















@Gower Where are you getting these features from?

– RoadRunner
Nov 21 '18 at 10:05





@Gower Where are you getting these features from?

– RoadRunner
Nov 21 '18 at 10:05













Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.

– Gower
Nov 21 '18 at 10:26





Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.

– Gower
Nov 21 '18 at 10:26


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53405316%2fhow-to-find-the-category-of-a-key-words-from-log-file-using-python-script%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

Origin of the phrase “under your belt”?