Parse HTML tags into dict
up vote
0
down vote
favorite
I have html data that is almost being parsed using BeautifulSoup, but running into issue how to get the start and end time as they are right next to each other.
Here is the data:
[u'Start', u'End', u'2018-11-14 05:00 GMT (Greenwich Mean Time)', u'2018-11-14 11:00 GMT (Greenwich Mean Time)', u'2018-11-14 00:00 EST (Eastern Standard Time)', u'2018-11-14 06:00 EST (Eastern Standard Time)', u'Customer Name', u'Circuit ID', u'Alt Circuit ID', u'Bandwidth', u'A Location', u'Z Location', u'Impact Type', u'Maximum Duration', u'Order Number', u'Status', u'COMPANY, LLC', u'BDKN1111', u'N/A', u'10GIG-E LAN', u'CT USA', u'KINGS MOUNTAIN', u'Outage', u'1 hour ', u'xa0', u'Alternate Night', u'COMPANY, LLC', u'BDKN1112', u'N/A', u'10GIG-E LAN', u'BRISTOL', u'KINGS MOUNTAIN', u'Outage', u'1 hour ', u'xa0', u'Alternate Night', u'COMPANY, LLC', u'BDKF1011', u'N/A', u'10GIG-E LAN', u'BRISTOL', u'OMAHA ', u'Outage', u'1 hour ', u'xa0', u'Alternate Night']
Here is the code: Data is above list.
for i in data:
pattern = re.compile(r'([1-9]{4}|[0-9]{4})-([0-9]{2})-([0-9]{2}) ([0-9]{2}:[0-9]{2} GMT)')
if re.search(pattern, i):
match = re.search(pattern, i)
match = match.group().split()
output["startdate"] = match[0]
if match[1] not in output["endtime"]:
output["endtime"] = match[1:-1]
Trying to capture start data and time, and end data and time. Some reason it is overwriting previous value.
python html list parsing
add a comment |
up vote
0
down vote
favorite
I have html data that is almost being parsed using BeautifulSoup, but running into issue how to get the start and end time as they are right next to each other.
Here is the data:
[u'Start', u'End', u'2018-11-14 05:00 GMT (Greenwich Mean Time)', u'2018-11-14 11:00 GMT (Greenwich Mean Time)', u'2018-11-14 00:00 EST (Eastern Standard Time)', u'2018-11-14 06:00 EST (Eastern Standard Time)', u'Customer Name', u'Circuit ID', u'Alt Circuit ID', u'Bandwidth', u'A Location', u'Z Location', u'Impact Type', u'Maximum Duration', u'Order Number', u'Status', u'COMPANY, LLC', u'BDKN1111', u'N/A', u'10GIG-E LAN', u'CT USA', u'KINGS MOUNTAIN', u'Outage', u'1 hour ', u'xa0', u'Alternate Night', u'COMPANY, LLC', u'BDKN1112', u'N/A', u'10GIG-E LAN', u'BRISTOL', u'KINGS MOUNTAIN', u'Outage', u'1 hour ', u'xa0', u'Alternate Night', u'COMPANY, LLC', u'BDKF1011', u'N/A', u'10GIG-E LAN', u'BRISTOL', u'OMAHA ', u'Outage', u'1 hour ', u'xa0', u'Alternate Night']
Here is the code: Data is above list.
for i in data:
pattern = re.compile(r'([1-9]{4}|[0-9]{4})-([0-9]{2})-([0-9]{2}) ([0-9]{2}:[0-9]{2} GMT)')
if re.search(pattern, i):
match = re.search(pattern, i)
match = match.group().split()
output["startdate"] = match[0]
if match[1] not in output["endtime"]:
output["endtime"] = match[1:-1]
Trying to capture start data and time, and end data and time. Some reason it is overwriting previous value.
python html list parsing
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have html data that is almost being parsed using BeautifulSoup, but running into issue how to get the start and end time as they are right next to each other.
Here is the data:
[u'Start', u'End', u'2018-11-14 05:00 GMT (Greenwich Mean Time)', u'2018-11-14 11:00 GMT (Greenwich Mean Time)', u'2018-11-14 00:00 EST (Eastern Standard Time)', u'2018-11-14 06:00 EST (Eastern Standard Time)', u'Customer Name', u'Circuit ID', u'Alt Circuit ID', u'Bandwidth', u'A Location', u'Z Location', u'Impact Type', u'Maximum Duration', u'Order Number', u'Status', u'COMPANY, LLC', u'BDKN1111', u'N/A', u'10GIG-E LAN', u'CT USA', u'KINGS MOUNTAIN', u'Outage', u'1 hour ', u'xa0', u'Alternate Night', u'COMPANY, LLC', u'BDKN1112', u'N/A', u'10GIG-E LAN', u'BRISTOL', u'KINGS MOUNTAIN', u'Outage', u'1 hour ', u'xa0', u'Alternate Night', u'COMPANY, LLC', u'BDKF1011', u'N/A', u'10GIG-E LAN', u'BRISTOL', u'OMAHA ', u'Outage', u'1 hour ', u'xa0', u'Alternate Night']
Here is the code: Data is above list.
for i in data:
pattern = re.compile(r'([1-9]{4}|[0-9]{4})-([0-9]{2})-([0-9]{2}) ([0-9]{2}:[0-9]{2} GMT)')
if re.search(pattern, i):
match = re.search(pattern, i)
match = match.group().split()
output["startdate"] = match[0]
if match[1] not in output["endtime"]:
output["endtime"] = match[1:-1]
Trying to capture start data and time, and end data and time. Some reason it is overwriting previous value.
python html list parsing
I have html data that is almost being parsed using BeautifulSoup, but running into issue how to get the start and end time as they are right next to each other.
Here is the data:
[u'Start', u'End', u'2018-11-14 05:00 GMT (Greenwich Mean Time)', u'2018-11-14 11:00 GMT (Greenwich Mean Time)', u'2018-11-14 00:00 EST (Eastern Standard Time)', u'2018-11-14 06:00 EST (Eastern Standard Time)', u'Customer Name', u'Circuit ID', u'Alt Circuit ID', u'Bandwidth', u'A Location', u'Z Location', u'Impact Type', u'Maximum Duration', u'Order Number', u'Status', u'COMPANY, LLC', u'BDKN1111', u'N/A', u'10GIG-E LAN', u'CT USA', u'KINGS MOUNTAIN', u'Outage', u'1 hour ', u'xa0', u'Alternate Night', u'COMPANY, LLC', u'BDKN1112', u'N/A', u'10GIG-E LAN', u'BRISTOL', u'KINGS MOUNTAIN', u'Outage', u'1 hour ', u'xa0', u'Alternate Night', u'COMPANY, LLC', u'BDKF1011', u'N/A', u'10GIG-E LAN', u'BRISTOL', u'OMAHA ', u'Outage', u'1 hour ', u'xa0', u'Alternate Night']
Here is the code: Data is above list.
for i in data:
pattern = re.compile(r'([1-9]{4}|[0-9]{4})-([0-9]{2})-([0-9]{2}) ([0-9]{2}:[0-9]{2} GMT)')
if re.search(pattern, i):
match = re.search(pattern, i)
match = match.group().split()
output["startdate"] = match[0]
if match[1] not in output["endtime"]:
output["endtime"] = match[1:-1]
Trying to capture start data and time, and end data and time. Some reason it is overwriting previous value.
python html list parsing
python html list parsing
asked 18 hours ago
miu
184
184
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
I was able to figured it out. Thanks
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
I was able to figured it out. Thanks
add a comment |
up vote
0
down vote
I was able to figured it out. Thanks
add a comment |
up vote
0
down vote
up vote
0
down vote
I was able to figured it out. Thanks
I was able to figured it out. Thanks
answered 17 hours ago
miu
184
184
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53343727%2fparse-html-tags-into-dict%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown