How to get the web page using requests.post?












1















I want to get the result of the web page http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx with the input of stock code being 5.



The problem is that I don't know the website after pressing search as it runs a javascript.



Furthermore, how to find the parameters needed to pass to requests.post, e.g. data? Is header needed?



enter image description here










share|improve this question

























  • Do you want to simulate POST request that sends after you enter 5 to "Stock code" field and press "Search"?

    – Andersson
    Nov 22 '18 at 10:48











  • Yes, you are right.

    – Chan
    Nov 22 '18 at 10:51











  • looks site has problems when clicking search

    – mirhossein
    Nov 22 '18 at 17:16











  • The website works. After inputting 5 in the field of stock code, then press search, you can view the page that shows the results.

    – Chan
    Nov 23 '18 at 1:09











  • Can anyone help?

    – Chan
    Nov 23 '18 at 3:46
















1















I want to get the result of the web page http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx with the input of stock code being 5.



The problem is that I don't know the website after pressing search as it runs a javascript.



Furthermore, how to find the parameters needed to pass to requests.post, e.g. data? Is header needed?



enter image description here










share|improve this question

























  • Do you want to simulate POST request that sends after you enter 5 to "Stock code" field and press "Search"?

    – Andersson
    Nov 22 '18 at 10:48











  • Yes, you are right.

    – Chan
    Nov 22 '18 at 10:51











  • looks site has problems when clicking search

    – mirhossein
    Nov 22 '18 at 17:16











  • The website works. After inputting 5 in the field of stock code, then press search, you can view the page that shows the results.

    – Chan
    Nov 23 '18 at 1:09











  • Can anyone help?

    – Chan
    Nov 23 '18 at 3:46














1












1








1








I want to get the result of the web page http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx with the input of stock code being 5.



The problem is that I don't know the website after pressing search as it runs a javascript.



Furthermore, how to find the parameters needed to pass to requests.post, e.g. data? Is header needed?



enter image description here










share|improve this question
















I want to get the result of the web page http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx with the input of stock code being 5.



The problem is that I don't know the website after pressing search as it runs a javascript.



Furthermore, how to find the parameters needed to pass to requests.post, e.g. data? Is header needed?



enter image description here







python web-scraping python-requests






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 11:27









petezurich

3,65081834




3,65081834










asked Nov 22 '18 at 10:45









ChanChan

437215




437215













  • Do you want to simulate POST request that sends after you enter 5 to "Stock code" field and press "Search"?

    – Andersson
    Nov 22 '18 at 10:48











  • Yes, you are right.

    – Chan
    Nov 22 '18 at 10:51











  • looks site has problems when clicking search

    – mirhossein
    Nov 22 '18 at 17:16











  • The website works. After inputting 5 in the field of stock code, then press search, you can view the page that shows the results.

    – Chan
    Nov 23 '18 at 1:09











  • Can anyone help?

    – Chan
    Nov 23 '18 at 3:46



















  • Do you want to simulate POST request that sends after you enter 5 to "Stock code" field and press "Search"?

    – Andersson
    Nov 22 '18 at 10:48











  • Yes, you are right.

    – Chan
    Nov 22 '18 at 10:51











  • looks site has problems when clicking search

    – mirhossein
    Nov 22 '18 at 17:16











  • The website works. After inputting 5 in the field of stock code, then press search, you can view the page that shows the results.

    – Chan
    Nov 23 '18 at 1:09











  • Can anyone help?

    – Chan
    Nov 23 '18 at 3:46

















Do you want to simulate POST request that sends after you enter 5 to "Stock code" field and press "Search"?

– Andersson
Nov 22 '18 at 10:48





Do you want to simulate POST request that sends after you enter 5 to "Stock code" field and press "Search"?

– Andersson
Nov 22 '18 at 10:48













Yes, you are right.

– Chan
Nov 22 '18 at 10:51





Yes, you are right.

– Chan
Nov 22 '18 at 10:51













looks site has problems when clicking search

– mirhossein
Nov 22 '18 at 17:16





looks site has problems when clicking search

– mirhossein
Nov 22 '18 at 17:16













The website works. After inputting 5 in the field of stock code, then press search, you can view the page that shows the results.

– Chan
Nov 23 '18 at 1:09





The website works. After inputting 5 in the field of stock code, then press search, you can view the page that shows the results.

– Chan
Nov 23 '18 at 1:09













Can anyone help?

– Chan
Nov 23 '18 at 3:46





Can anyone help?

– Chan
Nov 23 '18 at 3:46












1 Answer
1






active

oldest

votes


















1














You have multiple options:



1) You can use Selenium. First install Selenium.



sudo pip3 install selenium


Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)



from selenium import webdriver
from bs4 import BeautifulSoup
import time

browser = webdriver.Chrome()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
browser.get(url)
element = browser.find_element_by_id('ctl00_txt_stock_code') # find the text box
time.sleep(2)
element.send_keys('5') # populate the text box
time.sleep(2)
element.submit() # submit the form
soup = BeautifulSoup(browser.page_source, 'html.parser')
browser.quit()
for news in soup.find_all(class_='news'):
print(news.text)


2) Or use PyQt with QWebEngineView.



Install PyQt on Ubuntu:



    sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine


or on other OS (64 bit versions of Python)



    pip3 install PyQt5


Basically you load the first page with the form on. Fill in the form by running JavaScript then submit it. The loadFinished() signal is called twice, the second time because you submitted the form so you can use an if statement to differentiate between the calls.



import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup


class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()

def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()

def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")

def _second_finished(self):
self.page().toHtml(self.callable)

def callable(self, data):
self.html = data
self.app.quit()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
soup = BeautifulSoup(web.html, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)


Outputs:



Voting Rights and Capital
Next Day Disclosure Return
NOTICE OF REDEMPTION AND CANCELLATION OF LISTING
THIRD INTERIM DIVIDEND FOR 2018
Notification of Transactions by Persons Discharging Managerial Responsibilities
Next Day Disclosure Return
THIRD INTERIM DIVIDEND FOR 2018
Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018
Voting Rights and Capital
PUBLICATION OF BASE PROSPECTUS SUPPLEMENT
3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL
3Q EARNINGS RELEASE - HIGHLIGHTS
Scrip Dividend Circular
2018 Third Interim Dividend; Scrip Dividend
THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE
NOTIFICATION OF MAJOR HOLDINGS
EARNINGS RELEASE FOR THIRD QUARTER 2018
NOTIFICATION OF MAJOR HOLDINGS
Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018
THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES


Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash



Or Requests-HTML https://html.python-requests.org/ .



But I am not sure how you would fill the form in using these two last approaches.



Updated how to read the next pages:



import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup


class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.count = 0
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()

def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()

def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")

def _second_finished(self):
try:
self.page().toHtml(self.parse)
self.count += 1
if self.count > 5:
self.page().toHtml(self.callable)
else:
self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")
except:
self.page().toHtml(self.callable)

def parse(self, data):
soup = BeautifulSoup(data, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)

def callable(self, data):
self.app.quit()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)





share|improve this answer


























  • Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using requests. Any other faster alternatives?

    – Chan
    Nov 26 '18 at 1:30













  • This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.

    – Dan-Dev
    Nov 26 '18 at 11:50











  • How to read the next page?

    – Chan
    Feb 12 at 4:03











  • Updated the post with how to read the next pages

    – Dan-Dev
    Feb 12 at 20:41











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53429181%2fhow-to-get-the-web-page-using-requests-post%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














You have multiple options:



1) You can use Selenium. First install Selenium.



sudo pip3 install selenium


Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)



from selenium import webdriver
from bs4 import BeautifulSoup
import time

browser = webdriver.Chrome()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
browser.get(url)
element = browser.find_element_by_id('ctl00_txt_stock_code') # find the text box
time.sleep(2)
element.send_keys('5') # populate the text box
time.sleep(2)
element.submit() # submit the form
soup = BeautifulSoup(browser.page_source, 'html.parser')
browser.quit()
for news in soup.find_all(class_='news'):
print(news.text)


2) Or use PyQt with QWebEngineView.



Install PyQt on Ubuntu:



    sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine


or on other OS (64 bit versions of Python)



    pip3 install PyQt5


Basically you load the first page with the form on. Fill in the form by running JavaScript then submit it. The loadFinished() signal is called twice, the second time because you submitted the form so you can use an if statement to differentiate between the calls.



import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup


class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()

def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()

def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")

def _second_finished(self):
self.page().toHtml(self.callable)

def callable(self, data):
self.html = data
self.app.quit()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
soup = BeautifulSoup(web.html, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)


Outputs:



Voting Rights and Capital
Next Day Disclosure Return
NOTICE OF REDEMPTION AND CANCELLATION OF LISTING
THIRD INTERIM DIVIDEND FOR 2018
Notification of Transactions by Persons Discharging Managerial Responsibilities
Next Day Disclosure Return
THIRD INTERIM DIVIDEND FOR 2018
Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018
Voting Rights and Capital
PUBLICATION OF BASE PROSPECTUS SUPPLEMENT
3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL
3Q EARNINGS RELEASE - HIGHLIGHTS
Scrip Dividend Circular
2018 Third Interim Dividend; Scrip Dividend
THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE
NOTIFICATION OF MAJOR HOLDINGS
EARNINGS RELEASE FOR THIRD QUARTER 2018
NOTIFICATION OF MAJOR HOLDINGS
Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018
THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES


Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash



Or Requests-HTML https://html.python-requests.org/ .



But I am not sure how you would fill the form in using these two last approaches.



Updated how to read the next pages:



import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup


class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.count = 0
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()

def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()

def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")

def _second_finished(self):
try:
self.page().toHtml(self.parse)
self.count += 1
if self.count > 5:
self.page().toHtml(self.callable)
else:
self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")
except:
self.page().toHtml(self.callable)

def parse(self, data):
soup = BeautifulSoup(data, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)

def callable(self, data):
self.app.quit()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)





share|improve this answer


























  • Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using requests. Any other faster alternatives?

    – Chan
    Nov 26 '18 at 1:30













  • This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.

    – Dan-Dev
    Nov 26 '18 at 11:50











  • How to read the next page?

    – Chan
    Feb 12 at 4:03











  • Updated the post with how to read the next pages

    – Dan-Dev
    Feb 12 at 20:41
















1














You have multiple options:



1) You can use Selenium. First install Selenium.



sudo pip3 install selenium


Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)



from selenium import webdriver
from bs4 import BeautifulSoup
import time

browser = webdriver.Chrome()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
browser.get(url)
element = browser.find_element_by_id('ctl00_txt_stock_code') # find the text box
time.sleep(2)
element.send_keys('5') # populate the text box
time.sleep(2)
element.submit() # submit the form
soup = BeautifulSoup(browser.page_source, 'html.parser')
browser.quit()
for news in soup.find_all(class_='news'):
print(news.text)


2) Or use PyQt with QWebEngineView.



Install PyQt on Ubuntu:



    sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine


or on other OS (64 bit versions of Python)



    pip3 install PyQt5


Basically you load the first page with the form on. Fill in the form by running JavaScript then submit it. The loadFinished() signal is called twice, the second time because you submitted the form so you can use an if statement to differentiate between the calls.



import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup


class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()

def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()

def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")

def _second_finished(self):
self.page().toHtml(self.callable)

def callable(self, data):
self.html = data
self.app.quit()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
soup = BeautifulSoup(web.html, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)


Outputs:



Voting Rights and Capital
Next Day Disclosure Return
NOTICE OF REDEMPTION AND CANCELLATION OF LISTING
THIRD INTERIM DIVIDEND FOR 2018
Notification of Transactions by Persons Discharging Managerial Responsibilities
Next Day Disclosure Return
THIRD INTERIM DIVIDEND FOR 2018
Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018
Voting Rights and Capital
PUBLICATION OF BASE PROSPECTUS SUPPLEMENT
3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL
3Q EARNINGS RELEASE - HIGHLIGHTS
Scrip Dividend Circular
2018 Third Interim Dividend; Scrip Dividend
THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE
NOTIFICATION OF MAJOR HOLDINGS
EARNINGS RELEASE FOR THIRD QUARTER 2018
NOTIFICATION OF MAJOR HOLDINGS
Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018
THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES


Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash



Or Requests-HTML https://html.python-requests.org/ .



But I am not sure how you would fill the form in using these two last approaches.



Updated how to read the next pages:



import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup


class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.count = 0
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()

def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()

def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")

def _second_finished(self):
try:
self.page().toHtml(self.parse)
self.count += 1
if self.count > 5:
self.page().toHtml(self.callable)
else:
self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")
except:
self.page().toHtml(self.callable)

def parse(self, data):
soup = BeautifulSoup(data, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)

def callable(self, data):
self.app.quit()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)





share|improve this answer


























  • Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using requests. Any other faster alternatives?

    – Chan
    Nov 26 '18 at 1:30













  • This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.

    – Dan-Dev
    Nov 26 '18 at 11:50











  • How to read the next page?

    – Chan
    Feb 12 at 4:03











  • Updated the post with how to read the next pages

    – Dan-Dev
    Feb 12 at 20:41














1












1








1







You have multiple options:



1) You can use Selenium. First install Selenium.



sudo pip3 install selenium


Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)



from selenium import webdriver
from bs4 import BeautifulSoup
import time

browser = webdriver.Chrome()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
browser.get(url)
element = browser.find_element_by_id('ctl00_txt_stock_code') # find the text box
time.sleep(2)
element.send_keys('5') # populate the text box
time.sleep(2)
element.submit() # submit the form
soup = BeautifulSoup(browser.page_source, 'html.parser')
browser.quit()
for news in soup.find_all(class_='news'):
print(news.text)


2) Or use PyQt with QWebEngineView.



Install PyQt on Ubuntu:



    sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine


or on other OS (64 bit versions of Python)



    pip3 install PyQt5


Basically you load the first page with the form on. Fill in the form by running JavaScript then submit it. The loadFinished() signal is called twice, the second time because you submitted the form so you can use an if statement to differentiate between the calls.



import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup


class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()

def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()

def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")

def _second_finished(self):
self.page().toHtml(self.callable)

def callable(self, data):
self.html = data
self.app.quit()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
soup = BeautifulSoup(web.html, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)


Outputs:



Voting Rights and Capital
Next Day Disclosure Return
NOTICE OF REDEMPTION AND CANCELLATION OF LISTING
THIRD INTERIM DIVIDEND FOR 2018
Notification of Transactions by Persons Discharging Managerial Responsibilities
Next Day Disclosure Return
THIRD INTERIM DIVIDEND FOR 2018
Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018
Voting Rights and Capital
PUBLICATION OF BASE PROSPECTUS SUPPLEMENT
3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL
3Q EARNINGS RELEASE - HIGHLIGHTS
Scrip Dividend Circular
2018 Third Interim Dividend; Scrip Dividend
THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE
NOTIFICATION OF MAJOR HOLDINGS
EARNINGS RELEASE FOR THIRD QUARTER 2018
NOTIFICATION OF MAJOR HOLDINGS
Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018
THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES


Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash



Or Requests-HTML https://html.python-requests.org/ .



But I am not sure how you would fill the form in using these two last approaches.



Updated how to read the next pages:



import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup


class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.count = 0
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()

def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()

def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")

def _second_finished(self):
try:
self.page().toHtml(self.parse)
self.count += 1
if self.count > 5:
self.page().toHtml(self.callable)
else:
self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")
except:
self.page().toHtml(self.callable)

def parse(self, data):
soup = BeautifulSoup(data, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)

def callable(self, data):
self.app.quit()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)





share|improve this answer















You have multiple options:



1) You can use Selenium. First install Selenium.



sudo pip3 install selenium


Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)



from selenium import webdriver
from bs4 import BeautifulSoup
import time

browser = webdriver.Chrome()
url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
browser.get(url)
element = browser.find_element_by_id('ctl00_txt_stock_code') # find the text box
time.sleep(2)
element.send_keys('5') # populate the text box
time.sleep(2)
element.submit() # submit the form
soup = BeautifulSoup(browser.page_source, 'html.parser')
browser.quit()
for news in soup.find_all(class_='news'):
print(news.text)


2) Or use PyQt with QWebEngineView.



Install PyQt on Ubuntu:



    sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine


or on other OS (64 bit versions of Python)



    pip3 install PyQt5


Basically you load the first page with the form on. Fill in the form by running JavaScript then submit it. The loadFinished() signal is called twice, the second time because you submitted the form so you can use an if statement to differentiate between the calls.



import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup


class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()

def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()

def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")

def _second_finished(self):
self.page().toHtml(self.callable)

def callable(self, data):
self.html = data
self.app.quit()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)
soup = BeautifulSoup(web.html, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)


Outputs:



Voting Rights and Capital
Next Day Disclosure Return
NOTICE OF REDEMPTION AND CANCELLATION OF LISTING
THIRD INTERIM DIVIDEND FOR 2018
Notification of Transactions by Persons Discharging Managerial Responsibilities
Next Day Disclosure Return
THIRD INTERIM DIVIDEND FOR 2018
Monthly Return of Equity Issuer on Movements in Securities for the month ended 31 October 2018
Voting Rights and Capital
PUBLICATION OF BASE PROSPECTUS SUPPLEMENT
3Q 2018 EARNINGS RELEASE AUDIO WEBCAST AND CONFERENCE CALL
3Q EARNINGS RELEASE - HIGHLIGHTS
Scrip Dividend Circular
2018 Third Interim Dividend; Scrip Dividend
THIRD INTERIM DIVIDEND FOR 2018 SCRIP DIVIDEND ALTERNATIVE
NOTIFICATION OF MAJOR HOLDINGS
EARNINGS RELEASE FOR THIRD QUARTER 2018
NOTIFICATION OF MAJOR HOLDINGS
Monthly Return of Equity Issuer on Movements in Securities for the month ended 30 September 2018
THIRD INTERIM DIVIDEND FOR 2018; DIVIDEND ON PREFERENCE SHARES


Alternatively you can use Scrapy splash https://github.com/scrapy-plugins/scrapy-splash



Or Requests-HTML https://html.python-requests.org/ .



But I am not sure how you would fill the form in using these two last approaches.



Updated how to read the next pages:



import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
from bs4 import BeautifulSoup


class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.count = 0
self.first_pass = True
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._load_finished)
self.load(QUrl(url))
self.app.exec_()

def _load_finished(self, result):
if self.first_pass:
self._first_finished()
self.first_pass = False
else:
self._second_finished()

def _first_finished(self):
self.page().runJavaScript("document.getElementById('ctl00_txt_stock_code').value = '5';")
self.page().runJavaScript("document.getElementById('ctl00_sel_DateOfReleaseFrom_y').value='1999';")
self.page().runJavaScript("preprocessMainForm();")
self.page().runJavaScript("document.forms[0].submit();")

def _second_finished(self):
try:
self.page().toHtml(self.parse)
self.count += 1
if self.count > 5:
self.page().toHtml(self.callable)
else:
self.page().runJavaScript("document.getElementById('ctl00_btnNext2').click();")
except:
self.page().toHtml(self.callable)

def parse(self, data):
soup = BeautifulSoup(data, 'html.parser')
for news in soup.find_all(class_ = 'news'):
print(news.text)

def callable(self, data):
self.app.quit()

url = "http://www3.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx"
web = Render(url)






share|improve this answer














share|improve this answer



share|improve this answer








edited Feb 12 at 20:41

























answered Nov 24 '18 at 2:35









Dan-DevDan-Dev

4,87822033




4,87822033













  • Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using requests. Any other faster alternatives?

    – Chan
    Nov 26 '18 at 1:30













  • This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.

    – Dan-Dev
    Nov 26 '18 at 11:50











  • How to read the next page?

    – Chan
    Feb 12 at 4:03











  • Updated the post with how to read the next pages

    – Dan-Dev
    Feb 12 at 20:41



















  • Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using requests. Any other faster alternatives?

    – Chan
    Nov 26 '18 at 1:30













  • This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.

    – Dan-Dev
    Nov 26 '18 at 11:50











  • How to read the next page?

    – Chan
    Feb 12 at 4:03











  • Updated the post with how to read the next pages

    – Dan-Dev
    Feb 12 at 20:41

















Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using requests. Any other faster alternatives?

– Chan
Nov 26 '18 at 1:30







Thank you, Dan. Selenium can solve the problem. But I want to know how to deal with javascript webpage like this one using requests. Any other faster alternatives?

– Chan
Nov 26 '18 at 1:30















This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.

– Dan-Dev
Nov 26 '18 at 11:50





This particular web page has a VIEWSTATE token generated by JavaScript and an encrypted version of it also generated by JavaScript. Without actually running JavaScript is is virtually impossible to recreate these tokens. There is no way to do this with requests and I'm not sure how you would run the require JavaScript with Requests-HTML. If you don't like the Selenium option try the PyQt5 solution I gave in the answer.

– Dan-Dev
Nov 26 '18 at 11:50













How to read the next page?

– Chan
Feb 12 at 4:03





How to read the next page?

– Chan
Feb 12 at 4:03













Updated the post with how to read the next pages

– Dan-Dev
Feb 12 at 20:41





Updated the post with how to read the next pages

– Dan-Dev
Feb 12 at 20:41




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53429181%2fhow-to-get-the-web-page-using-requests-post%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

Origin of the phrase “under your belt”?