Python Scrapy & Ubuntu Connection Limits
I'm running a Python Scrapy scraper on an Ubuntu virtual machine. It's able to scrape ~200 pages from a website at a time even if I set concurrent connections to 10000. If I start another process of the same scraper, then the 2 processes will essentially split that 200 pages per minute between the 2 processes -- still not increasing the total number of concurrent connections. However, if I start a new virtual machine (sharing the same IP as the other virtual machine), the 2 virtual machines in combination are able to achieve 400 concurrent connections. Anyone know what might be causing this limitation or ways around it?
I've attempted the advice in this question re: increasing Ubuntu tcp/ip connections without any luck. Increasing the maximum number of tcp/ip connections in linux
python scrapy
add a comment |
I'm running a Python Scrapy scraper on an Ubuntu virtual machine. It's able to scrape ~200 pages from a website at a time even if I set concurrent connections to 10000. If I start another process of the same scraper, then the 2 processes will essentially split that 200 pages per minute between the 2 processes -- still not increasing the total number of concurrent connections. However, if I start a new virtual machine (sharing the same IP as the other virtual machine), the 2 virtual machines in combination are able to achieve 400 concurrent connections. Anyone know what might be causing this limitation or ways around it?
I've attempted the advice in this question re: increasing Ubuntu tcp/ip connections without any luck. Increasing the maximum number of tcp/ip connections in linux
python scrapy
add a comment |
I'm running a Python Scrapy scraper on an Ubuntu virtual machine. It's able to scrape ~200 pages from a website at a time even if I set concurrent connections to 10000. If I start another process of the same scraper, then the 2 processes will essentially split that 200 pages per minute between the 2 processes -- still not increasing the total number of concurrent connections. However, if I start a new virtual machine (sharing the same IP as the other virtual machine), the 2 virtual machines in combination are able to achieve 400 concurrent connections. Anyone know what might be causing this limitation or ways around it?
I've attempted the advice in this question re: increasing Ubuntu tcp/ip connections without any luck. Increasing the maximum number of tcp/ip connections in linux
python scrapy
I'm running a Python Scrapy scraper on an Ubuntu virtual machine. It's able to scrape ~200 pages from a website at a time even if I set concurrent connections to 10000. If I start another process of the same scraper, then the 2 processes will essentially split that 200 pages per minute between the 2 processes -- still not increasing the total number of concurrent connections. However, if I start a new virtual machine (sharing the same IP as the other virtual machine), the 2 virtual machines in combination are able to achieve 400 concurrent connections. Anyone know what might be causing this limitation or ways around it?
I've attempted the advice in this question re: increasing Ubuntu tcp/ip connections without any luck. Increasing the maximum number of tcp/ip connections in linux
python scrapy
python scrapy
asked Nov 23 '18 at 1:57
Michael3256Michael3256
12
12
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53439821%2fpython-scrapy-ubuntu-connection-limits%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53439821%2fpython-scrapy-ubuntu-connection-limits%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown