Scraping with Python and Selenium - how should I return a 'null' if element not present












1















Good Day,
I am a newbie to Python and Selenium and have searched for the solution for a while now. While some answers come close, I can't see to find one that solves my problem. The snippet of my code that is a slight problem is as follows:



for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info

num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:




IndexError: list index out of range




what I have tried thus far:



1) created a try/except (doesn't work)
2) tried if/else (if variable is not "")



I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.



any assistance and guidance would be greatly appreciated.



EDIT 1:



I have tried the following:



for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
pass
num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


and:



for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):
pass

num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


and:



for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
i = 'Null'
pass

num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


I tried the same try/except at the point of appending to Pandas.



EDIT 2
the error I get:




IndexError: list index out of range




is attributed to the line:




df = df.append({'Company': company[i].text, 'Date': date[i].text,
'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)











share|improve this question

























  • Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed

    – Moshe Slavin
    Nov 22 '18 at 6:57













  • I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried

    – qbbq
    Nov 22 '18 at 7:36













  • I'll take a look...

    – Moshe Slavin
    Nov 22 '18 at 10:10











  • I posted an answer let me know if you need any other assistance!

    – Moshe Slavin
    Nov 22 '18 at 10:27
















1















Good Day,
I am a newbie to Python and Selenium and have searched for the solution for a while now. While some answers come close, I can't see to find one that solves my problem. The snippet of my code that is a slight problem is as follows:



for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info

num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:




IndexError: list index out of range




what I have tried thus far:



1) created a try/except (doesn't work)
2) tried if/else (if variable is not "")



I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.



any assistance and guidance would be greatly appreciated.



EDIT 1:



I have tried the following:



for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
pass
num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


and:



for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):
pass

num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


and:



for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
i = 'Null'
pass

num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


I tried the same try/except at the point of appending to Pandas.



EDIT 2
the error I get:




IndexError: list index out of range




is attributed to the line:




df = df.append({'Company': company[i].text, 'Date': date[i].text,
'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)











share|improve this question

























  • Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed

    – Moshe Slavin
    Nov 22 '18 at 6:57













  • I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried

    – qbbq
    Nov 22 '18 at 7:36













  • I'll take a look...

    – Moshe Slavin
    Nov 22 '18 at 10:10











  • I posted an answer let me know if you need any other assistance!

    – Moshe Slavin
    Nov 22 '18 at 10:27














1












1








1








Good Day,
I am a newbie to Python and Selenium and have searched for the solution for a while now. While some answers come close, I can't see to find one that solves my problem. The snippet of my code that is a slight problem is as follows:



for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info

num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:




IndexError: list index out of range




what I have tried thus far:



1) created a try/except (doesn't work)
2) tried if/else (if variable is not "")



I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.



any assistance and guidance would be greatly appreciated.



EDIT 1:



I have tried the following:



for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
pass
num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


and:



for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):
pass

num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


and:



for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
i = 'Null'
pass

num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


I tried the same try/except at the point of appending to Pandas.



EDIT 2
the error I get:




IndexError: list index out of range




is attributed to the line:




df = df.append({'Company': company[i].text, 'Date': date[i].text,
'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)











share|improve this question
















Good Day,
I am a newbie to Python and Selenium and have searched for the solution for a while now. While some answers come close, I can't see to find one that solves my problem. The snippet of my code that is a slight problem is as follows:



for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info

num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


While this does work if all elements are present (and I can see the output to Pandas dataframe), if one of the elements doesn't exist (either 'date' or 'title') Python sends out the error:




IndexError: list index out of range




what I have tried thus far:



1) created a try/except (doesn't work)
2) tried if/else (if variable is not "")



I would like to insert "Null" if the element doesn't exist so that the Pandas dataframe populates with "Null" in the event an element doesn't exist.



any assistance and guidance would be greatly appreciated.



EDIT 1:



I have tried the following:



for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
pass
num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


and:



for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except (NoSuchElementException, ElementNotVisibleException, InvalidSelectorException):
pass

num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


and:



for url in links:
driver.get(url)
try:
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info
except:
i = 'Null'
pass

num_page_items = len(date)

for i in range(num_page_items):
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)


I tried the same try/except at the point of appending to Pandas.



EDIT 2
the error I get:




IndexError: list index out of range




is attributed to the line:




df = df.append({'Company': company[i].text, 'Date': date[i].text,
'Title': title[i].text, 'URL': urlinf[i]}, ignore_index=True)








python selenium selenium-chromedriver screen-scraping






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 7:48







qbbq

















asked Nov 22 '18 at 5:12









qbbqqbbq

227




227













  • Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed

    – Moshe Slavin
    Nov 22 '18 at 6:57













  • I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried

    – qbbq
    Nov 22 '18 at 7:36













  • I'll take a look...

    – Moshe Slavin
    Nov 22 '18 at 10:10











  • I posted an answer let me know if you need any other assistance!

    – Moshe Slavin
    Nov 22 '18 at 10:27



















  • Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed

    – Moshe Slavin
    Nov 22 '18 at 6:57













  • I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried

    – qbbq
    Nov 22 '18 at 7:36













  • I'll take a look...

    – Moshe Slavin
    Nov 22 '18 at 10:10











  • I posted an answer let me know if you need any other assistance!

    – Moshe Slavin
    Nov 22 '18 at 10:27

















Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed

– Moshe Slavin
Nov 22 '18 at 6:57







Can you show your attempts with the try except.... That is the best way to handle error messages and ignore them if needed

– Moshe Slavin
Nov 22 '18 at 6:57















I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried

– qbbq
Nov 22 '18 at 7:36







I've tried quite a few iterations, and overwrote when I found that it didn't work, but what I have added to my questions what I have tried

– qbbq
Nov 22 '18 at 7:36















I'll take a look...

– Moshe Slavin
Nov 22 '18 at 10:10





I'll take a look...

– Moshe Slavin
Nov 22 '18 at 10:10













I posted an answer let me know if you need any other assistance!

– Moshe Slavin
Nov 22 '18 at 10:27





I posted an answer let me know if you need any other assistance!

– Moshe Slavin
Nov 22 '18 at 10:27












1 Answer
1






active

oldest

votes


















1














As your error shows you have an index error!



To overcome that you should add a try except within the area that raises this error.



Also, you are using the driver.current_url which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...



In your case try this:



for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info

num_page_items = len(date)
for i in range(num_page_items):
try:
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)
except IndexError:
df.append(None) # or df.append('Null')


Hope you find this helpfull!






share|improve this answer
























  • this solution works! thank you very much - I really appreciate it.

    – qbbq
    Nov 22 '18 at 11:27











  • just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

    – qbbq
    Nov 22 '18 at 11:28











  • It's a pandas issue probably... Just use None...

    – Moshe Slavin
    Nov 22 '18 at 11:31











  • Glad to help!!!

    – Moshe Slavin
    Nov 22 '18 at 11:33











  • just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

    – qbbq
    Nov 26 '18 at 3:07











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424252%2fscraping-with-python-and-selenium-how-should-i-return-a-null-if-element-not%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














As your error shows you have an index error!



To overcome that you should add a try except within the area that raises this error.



Also, you are using the driver.current_url which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...



In your case try this:



for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info

num_page_items = len(date)
for i in range(num_page_items):
try:
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)
except IndexError:
df.append(None) # or df.append('Null')


Hope you find this helpfull!






share|improve this answer
























  • this solution works! thank you very much - I really appreciate it.

    – qbbq
    Nov 22 '18 at 11:27











  • just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

    – qbbq
    Nov 22 '18 at 11:28











  • It's a pandas issue probably... Just use None...

    – Moshe Slavin
    Nov 22 '18 at 11:31











  • Glad to help!!!

    – Moshe Slavin
    Nov 22 '18 at 11:33











  • just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

    – qbbq
    Nov 26 '18 at 3:07
















1














As your error shows you have an index error!



To overcome that you should add a try except within the area that raises this error.



Also, you are using the driver.current_url which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...



In your case try this:



for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info

num_page_items = len(date)
for i in range(num_page_items):
try:
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)
except IndexError:
df.append(None) # or df.append('Null')


Hope you find this helpfull!






share|improve this answer
























  • this solution works! thank you very much - I really appreciate it.

    – qbbq
    Nov 22 '18 at 11:27











  • just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

    – qbbq
    Nov 22 '18 at 11:28











  • It's a pandas issue probably... Just use None...

    – Moshe Slavin
    Nov 22 '18 at 11:31











  • Glad to help!!!

    – Moshe Slavin
    Nov 22 '18 at 11:33











  • just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

    – qbbq
    Nov 26 '18 at 3:07














1












1








1







As your error shows you have an index error!



To overcome that you should add a try except within the area that raises this error.



Also, you are using the driver.current_url which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...



In your case try this:



for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info

num_page_items = len(date)
for i in range(num_page_items):
try:
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)
except IndexError:
df.append(None) # or df.append('Null')


Hope you find this helpfull!






share|improve this answer













As your error shows you have an index error!



To overcome that you should add a try except within the area that raises this error.



Also, you are using the driver.current_url which returns the URL.
But in your inner for loop you are trying to refer to it as a list... this can be the origin of your error...



In your case try this:



for url in links:
driver.get(url)
company = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[2]/ul/li/div/div[1]/span""")
date = driver.find_elements_by_xpath("""//*[contains(@id, 'node')]/div[1]/div[1]/div[2]/div/span""")
title = driver.find_elements_by_xpath("""//*[@id="page-title"]/span""")
urlinf = driver.current_url #url info

num_page_items = len(date)
for i in range(num_page_items):
try:
df = df.append({'Company': company[i].text, 'Date': date[i].text, 'Title': title[i].text, 'URL': urlinf}, ignore_index=True)
except IndexError:
df.append(None) # or df.append('Null')


Hope you find this helpfull!







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 22 '18 at 10:23









Moshe SlavinMoshe Slavin

2,1083823




2,1083823













  • this solution works! thank you very much - I really appreciate it.

    – qbbq
    Nov 22 '18 at 11:27











  • just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

    – qbbq
    Nov 22 '18 at 11:28











  • It's a pandas issue probably... Just use None...

    – Moshe Slavin
    Nov 22 '18 at 11:31











  • Glad to help!!!

    – Moshe Slavin
    Nov 22 '18 at 11:33











  • just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

    – qbbq
    Nov 26 '18 at 3:07



















  • this solution works! thank you very much - I really appreciate it.

    – qbbq
    Nov 22 '18 at 11:27











  • just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

    – qbbq
    Nov 22 '18 at 11:28











  • It's a pandas issue probably... Just use None...

    – Moshe Slavin
    Nov 22 '18 at 11:31











  • Glad to help!!!

    – Moshe Slavin
    Nov 22 '18 at 11:33











  • just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

    – qbbq
    Nov 26 '18 at 3:07

















this solution works! thank you very much - I really appreciate it.

– qbbq
Nov 22 '18 at 11:27





this solution works! thank you very much - I really appreciate it.

– qbbq
Nov 22 '18 at 11:27













just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

– qbbq
Nov 22 '18 at 11:28





just as a matter of interest, I tried df.append('Null') and I got this error message: 'code' TypeError: cannot concatenate object of type "<type 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

– qbbq
Nov 22 '18 at 11:28













It's a pandas issue probably... Just use None...

– Moshe Slavin
Nov 22 '18 at 11:31





It's a pandas issue probably... Just use None...

– Moshe Slavin
Nov 22 '18 at 11:31













Glad to help!!!

– Moshe Slavin
Nov 22 '18 at 11:33





Glad to help!!!

– Moshe Slavin
Nov 22 '18 at 11:33













just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

– qbbq
Nov 26 '18 at 3:07





just an update to this, I decided to write directly to a csv, however on the original solution, the "None" / Null was creating a line break instead of making the variable = "Null". as a result I have added the following: blank = "blank" and except IndexError: with open('results.csv', 'a') as f: f.write(blank) however my data in the csv is getting offset by the missing value - would you suggest I create if statements in the loop to check if the variable = "" ?

– qbbq
Nov 26 '18 at 3:07




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424252%2fscraping-with-python-and-selenium-how-should-i-return-a-null-if-element-not%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

Origin of the phrase “under your belt”?