How to make wget download site with HTML5 video at other domains?
Hi, I am trying to download a site in such a way that it will most closely mimic FireFox's Save As Website Complete.
The site is a page containing various forms of media (some from other domains), as well as css, js. I logged in to the site using curl, and have been trying numerous options such as
wget --load-cookies /tmp/cookies/cookie1.txt -r -np -nd -l 1 -nc -H -nH -E -p -k -P foldertosave http://someurl
This is getting me to the authenticated page but is not resulting in any mp4's being touched, and also some images are not saving because filenames are too large. For instance, in some elements, you have urls like
https://dvgddacn5gxxx.cloudfront.net/49ji.jpg?response-content-disposition=inline%3Bfilename%3D%22L1-MMG_DVD-pg3.jpg%22&Expires=21474836547&Signature=VcQ2sde9X8-EAPLGqp9I28LOf67ueNciGnVXkuh19NJG8MUzNy-N8e~ElbFS87JZIiG3nLIIqhqIzD6YJ6WbqwbQVaOT0wxYuISEslxJhkHlEjh-~jkpvTCv2BKOtvxEwTjh-ipbJjs-FI~qBKrEjlDHWOL0H7IW0x5jYaxhQeE_&Key-Pair-Id=APKAIVZN7KJ762UIENTQ
and video similar
https://dvgddacn5xxxx.cloudfront.net/53hj.mp4?response-content-disposition=inline%3Bfilename%3D%22MMM_DVD-pg3.mp4%22&Expires=2147483675&Signature=hz6jTrh5j71D2x4QTSSB6myPAB5a69pDTNV5CgdB0DVu7~E1bluCenDMFoEnX2KX~tt0nHECurIalXsu8icE6rZQo5C9AoYihTVPD49pBJcBJA3yQffu-wo1AODWqgFu6uwzfS2FtBZhhwMmDrjJiHxLCKSTObkIYLZ7PZ7QN08_&Key-Pair-Id=APKAIVZN4AJ799UIENTQ
Maybe the content disposition is key here, but in any case I just can't get this working quite right. Almost, but not quite right.
So ideally, here is what I am trying to end up with:
- A single html file named using title of page (like how FireFox SaveAs works)
- A single folder that is same name as the html file but instead of "some page.html" it is some_page_files.
- All css, js, images, video saved.
- No other HTML files saved
- No crazy file names as mentioned above. No filenames should contain any thing after the file extension.
- URLs such that everything is locally consistent so that I can view the offline site as localhost/
- No robot.txt saved.
- Simply, just what FireFox SaveAs Website Complete is producing...
If I can get this working it will be great!
firefox download wget
add a comment |
Hi, I am trying to download a site in such a way that it will most closely mimic FireFox's Save As Website Complete.
The site is a page containing various forms of media (some from other domains), as well as css, js. I logged in to the site using curl, and have been trying numerous options such as
wget --load-cookies /tmp/cookies/cookie1.txt -r -np -nd -l 1 -nc -H -nH -E -p -k -P foldertosave http://someurl
This is getting me to the authenticated page but is not resulting in any mp4's being touched, and also some images are not saving because filenames are too large. For instance, in some elements, you have urls like
https://dvgddacn5gxxx.cloudfront.net/49ji.jpg?response-content-disposition=inline%3Bfilename%3D%22L1-MMG_DVD-pg3.jpg%22&Expires=21474836547&Signature=VcQ2sde9X8-EAPLGqp9I28LOf67ueNciGnVXkuh19NJG8MUzNy-N8e~ElbFS87JZIiG3nLIIqhqIzD6YJ6WbqwbQVaOT0wxYuISEslxJhkHlEjh-~jkpvTCv2BKOtvxEwTjh-ipbJjs-FI~qBKrEjlDHWOL0H7IW0x5jYaxhQeE_&Key-Pair-Id=APKAIVZN7KJ762UIENTQ
and video similar
https://dvgddacn5xxxx.cloudfront.net/53hj.mp4?response-content-disposition=inline%3Bfilename%3D%22MMM_DVD-pg3.mp4%22&Expires=2147483675&Signature=hz6jTrh5j71D2x4QTSSB6myPAB5a69pDTNV5CgdB0DVu7~E1bluCenDMFoEnX2KX~tt0nHECurIalXsu8icE6rZQo5C9AoYihTVPD49pBJcBJA3yQffu-wo1AODWqgFu6uwzfS2FtBZhhwMmDrjJiHxLCKSTObkIYLZ7PZ7QN08_&Key-Pair-Id=APKAIVZN4AJ799UIENTQ
Maybe the content disposition is key here, but in any case I just can't get this working quite right. Almost, but not quite right.
So ideally, here is what I am trying to end up with:
- A single html file named using title of page (like how FireFox SaveAs works)
- A single folder that is same name as the html file but instead of "some page.html" it is some_page_files.
- All css, js, images, video saved.
- No other HTML files saved
- No crazy file names as mentioned above. No filenames should contain any thing after the file extension.
- URLs such that everything is locally consistent so that I can view the offline site as localhost/
- No robot.txt saved.
- Simply, just what FireFox SaveAs Website Complete is producing...
If I can get this working it will be great!
firefox download wget
If Firefox's Save Website Complete gives you what you want, why not just use that?
– fixer1234
Dec 16 '18 at 7:11
Hi, I am trying to do it automatically, because I am studying online, and we are only given one year after we graduate to still have access to class material online, and I procrastinated saving lesson material offline for the past two years and so am just trying to get offline version of class material before I have no access (which is very soon). So will be very tedious to do through Firefox but I may just do that...
– Brian
Dec 16 '18 at 13:38
add a comment |
Hi, I am trying to download a site in such a way that it will most closely mimic FireFox's Save As Website Complete.
The site is a page containing various forms of media (some from other domains), as well as css, js. I logged in to the site using curl, and have been trying numerous options such as
wget --load-cookies /tmp/cookies/cookie1.txt -r -np -nd -l 1 -nc -H -nH -E -p -k -P foldertosave http://someurl
This is getting me to the authenticated page but is not resulting in any mp4's being touched, and also some images are not saving because filenames are too large. For instance, in some elements, you have urls like
https://dvgddacn5gxxx.cloudfront.net/49ji.jpg?response-content-disposition=inline%3Bfilename%3D%22L1-MMG_DVD-pg3.jpg%22&Expires=21474836547&Signature=VcQ2sde9X8-EAPLGqp9I28LOf67ueNciGnVXkuh19NJG8MUzNy-N8e~ElbFS87JZIiG3nLIIqhqIzD6YJ6WbqwbQVaOT0wxYuISEslxJhkHlEjh-~jkpvTCv2BKOtvxEwTjh-ipbJjs-FI~qBKrEjlDHWOL0H7IW0x5jYaxhQeE_&Key-Pair-Id=APKAIVZN7KJ762UIENTQ
and video similar
https://dvgddacn5xxxx.cloudfront.net/53hj.mp4?response-content-disposition=inline%3Bfilename%3D%22MMM_DVD-pg3.mp4%22&Expires=2147483675&Signature=hz6jTrh5j71D2x4QTSSB6myPAB5a69pDTNV5CgdB0DVu7~E1bluCenDMFoEnX2KX~tt0nHECurIalXsu8icE6rZQo5C9AoYihTVPD49pBJcBJA3yQffu-wo1AODWqgFu6uwzfS2FtBZhhwMmDrjJiHxLCKSTObkIYLZ7PZ7QN08_&Key-Pair-Id=APKAIVZN4AJ799UIENTQ
Maybe the content disposition is key here, but in any case I just can't get this working quite right. Almost, but not quite right.
So ideally, here is what I am trying to end up with:
- A single html file named using title of page (like how FireFox SaveAs works)
- A single folder that is same name as the html file but instead of "some page.html" it is some_page_files.
- All css, js, images, video saved.
- No other HTML files saved
- No crazy file names as mentioned above. No filenames should contain any thing after the file extension.
- URLs such that everything is locally consistent so that I can view the offline site as localhost/
- No robot.txt saved.
- Simply, just what FireFox SaveAs Website Complete is producing...
If I can get this working it will be great!
firefox download wget
Hi, I am trying to download a site in such a way that it will most closely mimic FireFox's Save As Website Complete.
The site is a page containing various forms of media (some from other domains), as well as css, js. I logged in to the site using curl, and have been trying numerous options such as
wget --load-cookies /tmp/cookies/cookie1.txt -r -np -nd -l 1 -nc -H -nH -E -p -k -P foldertosave http://someurl
This is getting me to the authenticated page but is not resulting in any mp4's being touched, and also some images are not saving because filenames are too large. For instance, in some elements, you have urls like
https://dvgddacn5gxxx.cloudfront.net/49ji.jpg?response-content-disposition=inline%3Bfilename%3D%22L1-MMG_DVD-pg3.jpg%22&Expires=21474836547&Signature=VcQ2sde9X8-EAPLGqp9I28LOf67ueNciGnVXkuh19NJG8MUzNy-N8e~ElbFS87JZIiG3nLIIqhqIzD6YJ6WbqwbQVaOT0wxYuISEslxJhkHlEjh-~jkpvTCv2BKOtvxEwTjh-ipbJjs-FI~qBKrEjlDHWOL0H7IW0x5jYaxhQeE_&Key-Pair-Id=APKAIVZN7KJ762UIENTQ
and video similar
https://dvgddacn5xxxx.cloudfront.net/53hj.mp4?response-content-disposition=inline%3Bfilename%3D%22MMM_DVD-pg3.mp4%22&Expires=2147483675&Signature=hz6jTrh5j71D2x4QTSSB6myPAB5a69pDTNV5CgdB0DVu7~E1bluCenDMFoEnX2KX~tt0nHECurIalXsu8icE6rZQo5C9AoYihTVPD49pBJcBJA3yQffu-wo1AODWqgFu6uwzfS2FtBZhhwMmDrjJiHxLCKSTObkIYLZ7PZ7QN08_&Key-Pair-Id=APKAIVZN4AJ799UIENTQ
Maybe the content disposition is key here, but in any case I just can't get this working quite right. Almost, but not quite right.
So ideally, here is what I am trying to end up with:
- A single html file named using title of page (like how FireFox SaveAs works)
- A single folder that is same name as the html file but instead of "some page.html" it is some_page_files.
- All css, js, images, video saved.
- No other HTML files saved
- No crazy file names as mentioned above. No filenames should contain any thing after the file extension.
- URLs such that everything is locally consistent so that I can view the offline site as localhost/
- No robot.txt saved.
- Simply, just what FireFox SaveAs Website Complete is producing...
If I can get this working it will be great!
firefox download wget
firefox download wget
edited Dec 16 '18 at 12:36
Twisty Impersonator
17.8k146495
17.8k146495
asked Dec 15 '18 at 11:32
Brian
1
1
If Firefox's Save Website Complete gives you what you want, why not just use that?
– fixer1234
Dec 16 '18 at 7:11
Hi, I am trying to do it automatically, because I am studying online, and we are only given one year after we graduate to still have access to class material online, and I procrastinated saving lesson material offline for the past two years and so am just trying to get offline version of class material before I have no access (which is very soon). So will be very tedious to do through Firefox but I may just do that...
– Brian
Dec 16 '18 at 13:38
add a comment |
If Firefox's Save Website Complete gives you what you want, why not just use that?
– fixer1234
Dec 16 '18 at 7:11
Hi, I am trying to do it automatically, because I am studying online, and we are only given one year after we graduate to still have access to class material online, and I procrastinated saving lesson material offline for the past two years and so am just trying to get offline version of class material before I have no access (which is very soon). So will be very tedious to do through Firefox but I may just do that...
– Brian
Dec 16 '18 at 13:38
If Firefox's Save Website Complete gives you what you want, why not just use that?
– fixer1234
Dec 16 '18 at 7:11
If Firefox's Save Website Complete gives you what you want, why not just use that?
– fixer1234
Dec 16 '18 at 7:11
Hi, I am trying to do it automatically, because I am studying online, and we are only given one year after we graduate to still have access to class material online, and I procrastinated saving lesson material offline for the past two years and so am just trying to get offline version of class material before I have no access (which is very soon). So will be very tedious to do through Firefox but I may just do that...
– Brian
Dec 16 '18 at 13:38
Hi, I am trying to do it automatically, because I am studying online, and we are only given one year after we graduate to still have access to class material online, and I procrastinated saving lesson material offline for the past two years and so am just trying to get offline version of class material before I have no access (which is very soon). So will be very tedious to do through Firefox but I may just do that...
– Brian
Dec 16 '18 at 13:38
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1383814%2fhow-to-make-wget-download-site-with-html5-video-at-other-domains%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1383814%2fhow-to-make-wget-download-site-with-html5-video-at-other-domains%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If Firefox's Save Website Complete gives you what you want, why not just use that?
– fixer1234
Dec 16 '18 at 7:11
Hi, I am trying to do it automatically, because I am studying online, and we are only given one year after we graduate to still have access to class material online, and I procrastinated saving lesson material offline for the past two years and so am just trying to get offline version of class material before I have no access (which is very soon). So will be very tedious to do through Firefox but I may just do that...
– Brian
Dec 16 '18 at 13:38