How to find the category of a key words from log file using python script?

-1

my input file

INCIDENT 677700 password reset

INCIDENT 677742 C: drive full

INCIDENT 500901 mouse not working

INCIDENT 500942 unable to connect oracle box

INCIDENT 500949 high cpu utilization

INCIDENT 600901 sql server clustering failed

INCIDENT 490203 Low disk space issue

INCIDENT 10I891 Lotus Notes client failed

INCIDENT 489011 Low disk space issue

INCIDENT 89G901 SSIS Load failed



words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]

I would like assign category in my output file should be: and how to add future words.

Password,INCIDENT 677700 password reset

drive full,INCIDENT 677742 C: drive full

mouse,INCIDENT 500901 mouse not working

Oracle,500942 unable to connect oracle box

cpu utilization,INCIDENT 500949 high cpu utilization

sql server,INCIDENT 600901 sql server clustering failed

disk space, INCIDENT 490203 Low disk space issue

Lotus Notes,INCIDENT 10I891 Lotus Notes client failed

disk space,INCIDENT 489011 Low disk space issue

SSIS,INCIDENT 89G901 SSIS Load failed

edited Nov 21 '18 at 5:07

DYZ

26.2k61948

asked Nov 21 '18 at 4:34

Gower

405

So, what have you tried?

– Andreas
Nov 21 '18 at 4:37

What if there are two keywords in one entry?

– NotAnAmbiTurner
Nov 21 '18 at 5:00

add a comment |

-1

my input file

INCIDENT 677700 password reset

INCIDENT 677742 C: drive full

INCIDENT 500901 mouse not working

INCIDENT 500942 unable to connect oracle box

INCIDENT 500949 high cpu utilization

INCIDENT 600901 sql server clustering failed

INCIDENT 490203 Low disk space issue

INCIDENT 10I891 Lotus Notes client failed

INCIDENT 489011 Low disk space issue

INCIDENT 89G901 SSIS Load failed



words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]

I would like assign category in my output file should be: and how to add future words.

Password,INCIDENT 677700 password reset

drive full,INCIDENT 677742 C: drive full

mouse,INCIDENT 500901 mouse not working

Oracle,500942 unable to connect oracle box

cpu utilization,INCIDENT 500949 high cpu utilization

sql server,INCIDENT 600901 sql server clustering failed

disk space, INCIDENT 490203 Low disk space issue

Lotus Notes,INCIDENT 10I891 Lotus Notes client failed

disk space,INCIDENT 489011 Low disk space issue

SSIS,INCIDENT 89G901 SSIS Load failed

edited Nov 21 '18 at 5:07

DYZ

26.2k61948

asked Nov 21 '18 at 4:34

Gower

405

So, what have you tried?

– Andreas
Nov 21 '18 at 4:37

What if there are two keywords in one entry?

– NotAnAmbiTurner
Nov 21 '18 at 5:00

add a comment |

-1

my input file

INCIDENT 677700 password reset

INCIDENT 677742 C: drive full

INCIDENT 500901 mouse not working

INCIDENT 500942 unable to connect oracle box

INCIDENT 500949 high cpu utilization

INCIDENT 600901 sql server clustering failed

INCIDENT 490203 Low disk space issue

INCIDENT 10I891 Lotus Notes client failed

INCIDENT 489011 Low disk space issue

INCIDENT 89G901 SSIS Load failed



words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]

I would like assign category in my output file should be: and how to add future words.

Password,INCIDENT 677700 password reset

drive full,INCIDENT 677742 C: drive full

mouse,INCIDENT 500901 mouse not working

Oracle,500942 unable to connect oracle box

cpu utilization,INCIDENT 500949 high cpu utilization

sql server,INCIDENT 600901 sql server clustering failed

disk space, INCIDENT 490203 Low disk space issue

Lotus Notes,INCIDENT 10I891 Lotus Notes client failed

disk space,INCIDENT 489011 Low disk space issue

SSIS,INCIDENT 89G901 SSIS Load failed

edited Nov 21 '18 at 5:07

DYZ

26.2k61948

asked Nov 21 '18 at 4:34

Gower

405

my input file

INCIDENT 677700 password reset

INCIDENT 677742 C: drive full

INCIDENT 500901 mouse not working

INCIDENT 500942 unable to connect oracle box

INCIDENT 500949 high cpu utilization

INCIDENT 600901 sql server clustering failed

INCIDENT 490203 Low disk space issue

INCIDENT 10I891 Lotus Notes client failed

INCIDENT 489011 Low disk space issue

INCIDENT 89G901 SSIS Load failed



words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]

I would like assign category in my output file should be: and how to add future words.

Password,INCIDENT 677700 password reset

drive full,INCIDENT 677742 C: drive full

mouse,INCIDENT 500901 mouse not working

Oracle,500942 unable to connect oracle box

cpu utilization,INCIDENT 500949 high cpu utilization

sql server,INCIDENT 600901 sql server clustering failed

disk space, INCIDENT 490203 Low disk space issue

Lotus Notes,INCIDENT 10I891 Lotus Notes client failed

disk space,INCIDENT 489011 Low disk space issue

SSIS,INCIDENT 89G901 SSIS Load failed

python

edited Nov 21 '18 at 5:07

DYZ

26.2k61948

asked Nov 21 '18 at 4:34

Gower

405

edited Nov 21 '18 at 5:07

DYZ

26.2k61948

asked Nov 21 '18 at 4:34

Gower

405

edited Nov 21 '18 at 5:07

DYZ

26.2k61948

edited Nov 21 '18 at 5:07

DYZ

26.2k61948

edited Nov 21 '18 at 5:07

DYZ

26.2k61948

asked Nov 21 '18 at 4:34

Gower

405

asked Nov 21 '18 at 4:34

Gower

405

asked Nov 21 '18 at 4:34

Gower

405

So, what have you tried?

– Andreas
Nov 21 '18 at 4:37

What if there are two keywords in one entry?

– NotAnAmbiTurner
Nov 21 '18 at 5:00

add a comment |

So, what have you tried?

– Andreas
Nov 21 '18 at 4:37

What if there are two keywords in one entry?

– NotAnAmbiTurner
Nov 21 '18 at 5:00

So, what have you tried?

– Andreas
Nov 21 '18 at 4:37

What if there are two keywords in one entry?

– NotAnAmbiTurner
Nov 21 '18 at 5:00

add a comment |

1 Answer
1

active

oldest

votes

Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line to the output file.

Some assumptions:

You want to do caseless matching e.g. Disk space and disk space would match.

Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.

Demo:

words = [

    "password",

    "drive full",

    "disk space",

    "SSIS",

    "sql server",

    "cpu utilization",

    "oracle",

    "Lotus Notes",

    "mouse",

]



with open("input.txt") as file, open("output.txt", "w") as out:

    for line in file:

        for word in words:

            # Do lowercase matching

            if word.lower() in line.lower():

                out.write("%s,%s" % (word, line))

output.txt:

password,INCIDENT 677700 password reset

drive full,INCIDENT 677742 C: drive full

mouse,INCIDENT 500901 mouse not working

oracle,INCIDENT 500942 unable to connect oracle box

cpu utilization,INCIDENT 500949 high cpu utilization

sql server,INCIDENT 600901 sql server clustering failed

disk space,INCIDENT 490203 Low disk space issue

Lotus Notes,INCIDENT 10I891 Lotus Notes client failed

disk space,INCIDENT 489011 Low disk space issue

SSIS,INCIDENT 89G901 SSIS Load failed

You can also condense the two nested loops with itertools.product():

from itertools import product



with open("input.txt") as file, open("output.txt", "w") as out:

    for line, word in product(file, words):

        if word.lower() in line.lower():

            out.write("%s,%s" % (word, line))

edited Nov 21 '18 at 5:06

answered Nov 21 '18 at 4:47

RoadRunner

11.2k31340

without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?

– Gower
Nov 21 '18 at 6:09

How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?

– Gower
Nov 21 '18 at 10:01

@Gower Where are you getting these features from?

– RoadRunner
Nov 21 '18 at 10:05

Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.

– Gower
Nov 21 '18 at 10:26

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53405316%2fhow-to-find-the-category-of-a-key-words-from-log-file-using-python-script%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line to the output file.

Some assumptions:

You want to do caseless matching e.g. Disk space and disk space would match.

Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.

Demo:

words = [

    "password",

    "drive full",

    "disk space",

    "SSIS",

    "sql server",

    "cpu utilization",

    "oracle",

    "Lotus Notes",

    "mouse",

]



with open("input.txt") as file, open("output.txt", "w") as out:

    for line in file:

        for word in words:

            # Do lowercase matching

            if word.lower() in line.lower():

                out.write("%s,%s" % (word, line))

output.txt:

password,INCIDENT 677700 password reset

drive full,INCIDENT 677742 C: drive full

mouse,INCIDENT 500901 mouse not working

oracle,INCIDENT 500942 unable to connect oracle box

cpu utilization,INCIDENT 500949 high cpu utilization

sql server,INCIDENT 600901 sql server clustering failed

disk space,INCIDENT 490203 Low disk space issue

Lotus Notes,INCIDENT 10I891 Lotus Notes client failed

disk space,INCIDENT 489011 Low disk space issue

SSIS,INCIDENT 89G901 SSIS Load failed

You can also condense the two nested loops with itertools.product():

from itertools import product



with open("input.txt") as file, open("output.txt", "w") as out:

    for line, word in product(file, words):

        if word.lower() in line.lower():

            out.write("%s,%s" % (word, line))

edited Nov 21 '18 at 5:06

answered Nov 21 '18 at 4:47

RoadRunner

11.2k31340

without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?

– Gower
Nov 21 '18 at 6:09

How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?

– Gower
Nov 21 '18 at 10:01

@Gower Where are you getting these features from?

– RoadRunner
Nov 21 '18 at 10:05

Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.

– Gower
Nov 21 '18 at 10:26

add a comment |

Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line to the output file.

Some assumptions:

You want to do caseless matching e.g. Disk space and disk space would match.

Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.

Demo:

words = [

    "password",

    "drive full",

    "disk space",

    "SSIS",

    "sql server",

    "cpu utilization",

    "oracle",

    "Lotus Notes",

    "mouse",

]



with open("input.txt") as file, open("output.txt", "w") as out:

    for line in file:

        for word in words:

            # Do lowercase matching

            if word.lower() in line.lower():

                out.write("%s,%s" % (word, line))

output.txt:

password,INCIDENT 677700 password reset

drive full,INCIDENT 677742 C: drive full

mouse,INCIDENT 500901 mouse not working

oracle,INCIDENT 500942 unable to connect oracle box

cpu utilization,INCIDENT 500949 high cpu utilization

sql server,INCIDENT 600901 sql server clustering failed

disk space,INCIDENT 490203 Low disk space issue

Lotus Notes,INCIDENT 10I891 Lotus Notes client failed

disk space,INCIDENT 489011 Low disk space issue

SSIS,INCIDENT 89G901 SSIS Load failed

You can also condense the two nested loops with itertools.product():

from itertools import product



with open("input.txt") as file, open("output.txt", "w") as out:

    for line, word in product(file, words):

        if word.lower() in line.lower():

            out.write("%s,%s" % (word, line))

edited Nov 21 '18 at 5:06

answered Nov 21 '18 at 4:47

RoadRunner

11.2k31340

without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?

– Gower
Nov 21 '18 at 6:09

How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?

– Gower
Nov 21 '18 at 10:01

@Gower Where are you getting these features from?

– RoadRunner
Nov 21 '18 at 10:05

Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.

– Gower
Nov 21 '18 at 10:26

add a comment |

Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line to the output file.

Some assumptions:

You want to do caseless matching e.g. Disk space and disk space would match.

Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.

Demo:

words = [

    "password",

    "drive full",

    "disk space",

    "SSIS",

    "sql server",

    "cpu utilization",

    "oracle",

    "Lotus Notes",

    "mouse",

]



with open("input.txt") as file, open("output.txt", "w") as out:

    for line in file:

        for word in words:

            # Do lowercase matching

            if word.lower() in line.lower():

                out.write("%s,%s" % (word, line))

output.txt:

password,INCIDENT 677700 password reset

drive full,INCIDENT 677742 C: drive full

mouse,INCIDENT 500901 mouse not working

oracle,INCIDENT 500942 unable to connect oracle box

cpu utilization,INCIDENT 500949 high cpu utilization

sql server,INCIDENT 600901 sql server clustering failed

disk space,INCIDENT 490203 Low disk space issue

Lotus Notes,INCIDENT 10I891 Lotus Notes client failed

disk space,INCIDENT 489011 Low disk space issue

SSIS,INCIDENT 89G901 SSIS Load failed

You can also condense the two nested loops with itertools.product():

from itertools import product



with open("input.txt") as file, open("output.txt", "w") as out:

    for line, word in product(file, words):

        if word.lower() in line.lower():

            out.write("%s,%s" % (word, line))

edited Nov 21 '18 at 5:06

answered Nov 21 '18 at 4:47

RoadRunner

11.2k31340

Just iterate over each line and each word, and if the word exists in the line, then write the new line word,line to the output file.

Some assumptions:

You want to do caseless matching e.g. Disk space and disk space would match.

Each entry in the log file only has one matching word. If there are more, then the line would get written twice with each separate entry.

Demo:

words = [

    "password",

    "drive full",

    "disk space",

    "SSIS",

    "sql server",

    "cpu utilization",

    "oracle",

    "Lotus Notes",

    "mouse",

]



with open("input.txt") as file, open("output.txt", "w") as out:

    for line in file:

        for word in words:

            # Do lowercase matching

            if word.lower() in line.lower():

                out.write("%s,%s" % (word, line))

output.txt:

password,INCIDENT 677700 password reset

drive full,INCIDENT 677742 C: drive full

mouse,INCIDENT 500901 mouse not working

oracle,INCIDENT 500942 unable to connect oracle box

cpu utilization,INCIDENT 500949 high cpu utilization

sql server,INCIDENT 600901 sql server clustering failed

disk space,INCIDENT 490203 Low disk space issue

Lotus Notes,INCIDENT 10I891 Lotus Notes client failed

disk space,INCIDENT 489011 Low disk space issue

SSIS,INCIDENT 89G901 SSIS Load failed

You can also condense the two nested loops with itertools.product():

from itertools import product



with open("input.txt") as file, open("output.txt", "w") as out:

    for line, word in product(file, words):

        if word.lower() in line.lower():

            out.write("%s,%s" % (word, line))

edited Nov 21 '18 at 5:06

answered Nov 21 '18 at 4:47

RoadRunner

11.2k31340

edited Nov 21 '18 at 5:06

answered Nov 21 '18 at 4:47

RoadRunner

11.2k31340

answered Nov 21 '18 at 4:47

RoadRunner

11.2k31340

answered Nov 21 '18 at 4:47

RoadRunner

11.2k31340

without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?

– Gower
Nov 21 '18 at 6:09

How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?

– Gower
Nov 21 '18 at 10:01

@Gower Where are you getting these features from?

– RoadRunner
Nov 21 '18 at 10:05

Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.

– Gower
Nov 21 '18 at 10:26

add a comment |

without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?

– Gower
Nov 21 '18 at 6:09

How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?

– Gower
Nov 21 '18 at 10:01

@Gower Where are you getting these features from?

– RoadRunner
Nov 21 '18 at 10:05

Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.

– Gower
Nov 21 '18 at 10:26

without using words =["password","drive full","disk space","SSIS","sql server","cpu utilization","oracle","Lotus Notes","mouse"]. Is it possible to identify categories?

– Gower
Nov 21 '18 at 6:09

How to identify new features? Tomorrow i may get like INCIDENT 500986 ERROR_BAD_ENVIRONMENT? How can i capture?

– Gower
Nov 21 '18 at 10:01

@Gower Where are you getting these features from?

– RoadRunner
Nov 21 '18 at 10:05

Incident / release / problem tickets data located @ SQL Server database. from there i have to pull and categorize.

– Gower
Nov 21 '18 at 10:26

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr