How to make wget download site with HTML5 video at other domains?












0














Hi, I am trying to download a site in such a way that it will most closely mimic FireFox's Save As Website Complete.



The site is a page containing various forms of media (some from other domains), as well as css, js. I logged in to the site using curl, and have been trying numerous options such as



wget --load-cookies /tmp/cookies/cookie1.txt -r -np -nd -l 1 -nc -H -nH -E -p -k -P foldertosave http://someurl


This is getting me to the authenticated page but is not resulting in any mp4's being touched, and also some images are not saving because filenames are too large. For instance, in some elements, you have urls like



https://dvgddacn5gxxx.cloudfront.net/49ji.jpg?response-content-disposition=inline%3Bfilename%3D%22L1-MMG_DVD-pg3.jpg%22&Expires=21474836547&Signature=VcQ2sde9X8-EAPLGqp9I28LOf67ueNciGnVXkuh19NJG8MUzNy-N8e~ElbFS87JZIiG3nLIIqhqIzD6YJ6WbqwbQVaOT0wxYuISEslxJhkHlEjh-~jkpvTCv2BKOtvxEwTjh-ipbJjs-FI~qBKrEjlDHWOL0H7IW0x5jYaxhQeE_&Key-Pair-Id=APKAIVZN7KJ762UIENTQ


and video similar



https://dvgddacn5xxxx.cloudfront.net/53hj.mp4?response-content-disposition=inline%3Bfilename%3D%22MMM_DVD-pg3.mp4%22&Expires=2147483675&Signature=hz6jTrh5j71D2x4QTSSB6myPAB5a69pDTNV5CgdB0DVu7~E1bluCenDMFoEnX2KX~tt0nHECurIalXsu8icE6rZQo5C9AoYihTVPD49pBJcBJA3yQffu-wo1AODWqgFu6uwzfS2FtBZhhwMmDrjJiHxLCKSTObkIYLZ7PZ7QN08_&Key-Pair-Id=APKAIVZN4AJ799UIENTQ


Maybe the content disposition is key here, but in any case I just can't get this working quite right. Almost, but not quite right.



So ideally, here is what I am trying to end up with:




  1. A single html file named using title of page (like how FireFox SaveAs works)

  2. A single folder that is same name as the html file but instead of "some page.html" it is some_page_files.

  3. All css, js, images, video saved.

  4. No other HTML files saved

  5. No crazy file names as mentioned above. No filenames should contain any thing after the file extension.

  6. URLs such that everything is locally consistent so that I can view the offline site as localhost/

  7. No robot.txt saved.

  8. Simply, just what FireFox SaveAs Website Complete is producing...


If I can get this working it will be great!










share|improve this question
























  • If Firefox's Save Website Complete gives you what you want, why not just use that?
    – fixer1234
    Dec 16 '18 at 7:11










  • Hi, I am trying to do it automatically, because I am studying online, and we are only given one year after we graduate to still have access to class material online, and I procrastinated saving lesson material offline for the past two years and so am just trying to get offline version of class material before I have no access (which is very soon). So will be very tedious to do through Firefox but I may just do that...
    – Brian
    Dec 16 '18 at 13:38
















0














Hi, I am trying to download a site in such a way that it will most closely mimic FireFox's Save As Website Complete.



The site is a page containing various forms of media (some from other domains), as well as css, js. I logged in to the site using curl, and have been trying numerous options such as



wget --load-cookies /tmp/cookies/cookie1.txt -r -np -nd -l 1 -nc -H -nH -E -p -k -P foldertosave http://someurl


This is getting me to the authenticated page but is not resulting in any mp4's being touched, and also some images are not saving because filenames are too large. For instance, in some elements, you have urls like



https://dvgddacn5gxxx.cloudfront.net/49ji.jpg?response-content-disposition=inline%3Bfilename%3D%22L1-MMG_DVD-pg3.jpg%22&Expires=21474836547&Signature=VcQ2sde9X8-EAPLGqp9I28LOf67ueNciGnVXkuh19NJG8MUzNy-N8e~ElbFS87JZIiG3nLIIqhqIzD6YJ6WbqwbQVaOT0wxYuISEslxJhkHlEjh-~jkpvTCv2BKOtvxEwTjh-ipbJjs-FI~qBKrEjlDHWOL0H7IW0x5jYaxhQeE_&Key-Pair-Id=APKAIVZN7KJ762UIENTQ


and video similar



https://dvgddacn5xxxx.cloudfront.net/53hj.mp4?response-content-disposition=inline%3Bfilename%3D%22MMM_DVD-pg3.mp4%22&Expires=2147483675&Signature=hz6jTrh5j71D2x4QTSSB6myPAB5a69pDTNV5CgdB0DVu7~E1bluCenDMFoEnX2KX~tt0nHECurIalXsu8icE6rZQo5C9AoYihTVPD49pBJcBJA3yQffu-wo1AODWqgFu6uwzfS2FtBZhhwMmDrjJiHxLCKSTObkIYLZ7PZ7QN08_&Key-Pair-Id=APKAIVZN4AJ799UIENTQ


Maybe the content disposition is key here, but in any case I just can't get this working quite right. Almost, but not quite right.



So ideally, here is what I am trying to end up with:




  1. A single html file named using title of page (like how FireFox SaveAs works)

  2. A single folder that is same name as the html file but instead of "some page.html" it is some_page_files.

  3. All css, js, images, video saved.

  4. No other HTML files saved

  5. No crazy file names as mentioned above. No filenames should contain any thing after the file extension.

  6. URLs such that everything is locally consistent so that I can view the offline site as localhost/

  7. No robot.txt saved.

  8. Simply, just what FireFox SaveAs Website Complete is producing...


If I can get this working it will be great!










share|improve this question
























  • If Firefox's Save Website Complete gives you what you want, why not just use that?
    – fixer1234
    Dec 16 '18 at 7:11










  • Hi, I am trying to do it automatically, because I am studying online, and we are only given one year after we graduate to still have access to class material online, and I procrastinated saving lesson material offline for the past two years and so am just trying to get offline version of class material before I have no access (which is very soon). So will be very tedious to do through Firefox but I may just do that...
    – Brian
    Dec 16 '18 at 13:38














0












0








0







Hi, I am trying to download a site in such a way that it will most closely mimic FireFox's Save As Website Complete.



The site is a page containing various forms of media (some from other domains), as well as css, js. I logged in to the site using curl, and have been trying numerous options such as



wget --load-cookies /tmp/cookies/cookie1.txt -r -np -nd -l 1 -nc -H -nH -E -p -k -P foldertosave http://someurl


This is getting me to the authenticated page but is not resulting in any mp4's being touched, and also some images are not saving because filenames are too large. For instance, in some elements, you have urls like



https://dvgddacn5gxxx.cloudfront.net/49ji.jpg?response-content-disposition=inline%3Bfilename%3D%22L1-MMG_DVD-pg3.jpg%22&Expires=21474836547&Signature=VcQ2sde9X8-EAPLGqp9I28LOf67ueNciGnVXkuh19NJG8MUzNy-N8e~ElbFS87JZIiG3nLIIqhqIzD6YJ6WbqwbQVaOT0wxYuISEslxJhkHlEjh-~jkpvTCv2BKOtvxEwTjh-ipbJjs-FI~qBKrEjlDHWOL0H7IW0x5jYaxhQeE_&Key-Pair-Id=APKAIVZN7KJ762UIENTQ


and video similar



https://dvgddacn5xxxx.cloudfront.net/53hj.mp4?response-content-disposition=inline%3Bfilename%3D%22MMM_DVD-pg3.mp4%22&Expires=2147483675&Signature=hz6jTrh5j71D2x4QTSSB6myPAB5a69pDTNV5CgdB0DVu7~E1bluCenDMFoEnX2KX~tt0nHECurIalXsu8icE6rZQo5C9AoYihTVPD49pBJcBJA3yQffu-wo1AODWqgFu6uwzfS2FtBZhhwMmDrjJiHxLCKSTObkIYLZ7PZ7QN08_&Key-Pair-Id=APKAIVZN4AJ799UIENTQ


Maybe the content disposition is key here, but in any case I just can't get this working quite right. Almost, but not quite right.



So ideally, here is what I am trying to end up with:




  1. A single html file named using title of page (like how FireFox SaveAs works)

  2. A single folder that is same name as the html file but instead of "some page.html" it is some_page_files.

  3. All css, js, images, video saved.

  4. No other HTML files saved

  5. No crazy file names as mentioned above. No filenames should contain any thing after the file extension.

  6. URLs such that everything is locally consistent so that I can view the offline site as localhost/

  7. No robot.txt saved.

  8. Simply, just what FireFox SaveAs Website Complete is producing...


If I can get this working it will be great!










share|improve this question















Hi, I am trying to download a site in such a way that it will most closely mimic FireFox's Save As Website Complete.



The site is a page containing various forms of media (some from other domains), as well as css, js. I logged in to the site using curl, and have been trying numerous options such as



wget --load-cookies /tmp/cookies/cookie1.txt -r -np -nd -l 1 -nc -H -nH -E -p -k -P foldertosave http://someurl


This is getting me to the authenticated page but is not resulting in any mp4's being touched, and also some images are not saving because filenames are too large. For instance, in some elements, you have urls like



https://dvgddacn5gxxx.cloudfront.net/49ji.jpg?response-content-disposition=inline%3Bfilename%3D%22L1-MMG_DVD-pg3.jpg%22&Expires=21474836547&Signature=VcQ2sde9X8-EAPLGqp9I28LOf67ueNciGnVXkuh19NJG8MUzNy-N8e~ElbFS87JZIiG3nLIIqhqIzD6YJ6WbqwbQVaOT0wxYuISEslxJhkHlEjh-~jkpvTCv2BKOtvxEwTjh-ipbJjs-FI~qBKrEjlDHWOL0H7IW0x5jYaxhQeE_&Key-Pair-Id=APKAIVZN7KJ762UIENTQ


and video similar



https://dvgddacn5xxxx.cloudfront.net/53hj.mp4?response-content-disposition=inline%3Bfilename%3D%22MMM_DVD-pg3.mp4%22&Expires=2147483675&Signature=hz6jTrh5j71D2x4QTSSB6myPAB5a69pDTNV5CgdB0DVu7~E1bluCenDMFoEnX2KX~tt0nHECurIalXsu8icE6rZQo5C9AoYihTVPD49pBJcBJA3yQffu-wo1AODWqgFu6uwzfS2FtBZhhwMmDrjJiHxLCKSTObkIYLZ7PZ7QN08_&Key-Pair-Id=APKAIVZN4AJ799UIENTQ


Maybe the content disposition is key here, but in any case I just can't get this working quite right. Almost, but not quite right.



So ideally, here is what I am trying to end up with:




  1. A single html file named using title of page (like how FireFox SaveAs works)

  2. A single folder that is same name as the html file but instead of "some page.html" it is some_page_files.

  3. All css, js, images, video saved.

  4. No other HTML files saved

  5. No crazy file names as mentioned above. No filenames should contain any thing after the file extension.

  6. URLs such that everything is locally consistent so that I can view the offline site as localhost/

  7. No robot.txt saved.

  8. Simply, just what FireFox SaveAs Website Complete is producing...


If I can get this working it will be great!







firefox download wget






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 16 '18 at 12:36









Twisty Impersonator

17.8k146495




17.8k146495










asked Dec 15 '18 at 11:32









Brian

1




1












  • If Firefox's Save Website Complete gives you what you want, why not just use that?
    – fixer1234
    Dec 16 '18 at 7:11










  • Hi, I am trying to do it automatically, because I am studying online, and we are only given one year after we graduate to still have access to class material online, and I procrastinated saving lesson material offline for the past two years and so am just trying to get offline version of class material before I have no access (which is very soon). So will be very tedious to do through Firefox but I may just do that...
    – Brian
    Dec 16 '18 at 13:38


















  • If Firefox's Save Website Complete gives you what you want, why not just use that?
    – fixer1234
    Dec 16 '18 at 7:11










  • Hi, I am trying to do it automatically, because I am studying online, and we are only given one year after we graduate to still have access to class material online, and I procrastinated saving lesson material offline for the past two years and so am just trying to get offline version of class material before I have no access (which is very soon). So will be very tedious to do through Firefox but I may just do that...
    – Brian
    Dec 16 '18 at 13:38
















If Firefox's Save Website Complete gives you what you want, why not just use that?
– fixer1234
Dec 16 '18 at 7:11




If Firefox's Save Website Complete gives you what you want, why not just use that?
– fixer1234
Dec 16 '18 at 7:11












Hi, I am trying to do it automatically, because I am studying online, and we are only given one year after we graduate to still have access to class material online, and I procrastinated saving lesson material offline for the past two years and so am just trying to get offline version of class material before I have no access (which is very soon). So will be very tedious to do through Firefox but I may just do that...
– Brian
Dec 16 '18 at 13:38




Hi, I am trying to do it automatically, because I am studying online, and we are only given one year after we graduate to still have access to class material online, and I procrastinated saving lesson material offline for the past two years and so am just trying to get offline version of class material before I have no access (which is very soon). So will be very tedious to do through Firefox but I may just do that...
– Brian
Dec 16 '18 at 13:38










0






active

oldest

votes











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1383814%2fhow-to-make-wget-download-site-with-html5-video-at-other-domains%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Super User!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1383814%2fhow-to-make-wget-download-site-with-html5-video-at-other-domains%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

RAC Tourist Trophy