Unable to copy/mirror website-page by using WinHTTrack












0














I am using Httrack for copying/mirroring a website and facing one problem.



I am talking about this website. Consider I want to cover this page with all inside links (you can see like: problem 6.11 , problem 6.10 from that page). So, I've tried following:




  1. Enter project name and URL:


screen-shot




  1. Set option can go up and down both


enter image description here



And I started to mirror, the process finished but when I browse index.html, the main page displays correctly but further links (sab page as mentioned earlier problem 6.11, 6.10 etc) does not display - only the file name feed is shown.(try yourself to see what is going wrong)



How do I fix this issue?










share|improve this question





























    0














    I am using Httrack for copying/mirroring a website and facing one problem.



    I am talking about this website. Consider I want to cover this page with all inside links (you can see like: problem 6.11 , problem 6.10 from that page). So, I've tried following:




    1. Enter project name and URL:


    screen-shot




    1. Set option can go up and down both


    enter image description here



    And I started to mirror, the process finished but when I browse index.html, the main page displays correctly but further links (sab page as mentioned earlier problem 6.11, 6.10 etc) does not display - only the file name feed is shown.(try yourself to see what is going wrong)



    How do I fix this issue?










    share|improve this question



























      0












      0








      0







      I am using Httrack for copying/mirroring a website and facing one problem.



      I am talking about this website. Consider I want to cover this page with all inside links (you can see like: problem 6.11 , problem 6.10 from that page). So, I've tried following:




      1. Enter project name and URL:


      screen-shot




      1. Set option can go up and down both


      enter image description here



      And I started to mirror, the process finished but when I browse index.html, the main page displays correctly but further links (sab page as mentioned earlier problem 6.11, 6.10 etc) does not display - only the file name feed is shown.(try yourself to see what is going wrong)



      How do I fix this issue?










      share|improve this question















      I am using Httrack for copying/mirroring a website and facing one problem.



      I am talking about this website. Consider I want to cover this page with all inside links (you can see like: problem 6.11 , problem 6.10 from that page). So, I've tried following:




      1. Enter project name and URL:


      screen-shot




      1. Set option can go up and down both


      enter image description here



      And I started to mirror, the process finished but when I browse index.html, the main page displays correctly but further links (sab page as mentioned earlier problem 6.11, 6.10 etc) does not display - only the file name feed is shown.(try yourself to see what is going wrong)



      How do I fix this issue?







      website mirroring httrack






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jul 15 '15 at 13:08

























      asked Jul 15 '15 at 11:24









      APLUS

      10113




      10113






















          1 Answer
          1






          active

          oldest

          votes


















          0














          I suggest you to read the FAQ



          Here is a quote from WinHTTrack website:




          Question: Some sites are captured very well, other aren't. Why?



          Answer: There are several reasons (and solutions) for a mirror to fail. Reading the log
          files (ans this FAQ!) is generally a VERY good idea to figure out what
          occured.



          Links within the site refers to external links, or links located in another (or upper) directories, not captured by default - the use
          of filters is generally THE solution, as this is one of the powerful
          option in HTTrack. See the above questions/answers.
          Website 'robots.txt' rules forbide access to several website parts - you can disable them, but only with great care!
          HTTrack is filtered (by its default User-agent IDentity) - you can change the Browser User-Agent identity to an anonymous one (MSIE,
          Netscape..) - here again, use this option with care, as this measure
          might have been put to avoid some bandwidth abuse (see also the abuse
          faq!)



          There are cases, however, that can not be (yet) handled:



          Flash sites - no full support



          Intensive Java/Javascript sites - might be bogus/incomplete



          Complex CGI with built-in redirect, and other tricks - very complicated to handle, and therefore might cause problems



          Parsing problem in the HTML code (cases where the engine is fooled, for example by a false comment (


          comment (-->) detected. Rare cases, but might occur. A bug report is
          then generally good!



          Note: For some sites, setting "Force old HTTP/1.0 requests" option can
          be useful, as this option uses more basic requests (no HEAD request
          for example). This will cause a performance loss, but will increase
          the compatibility with some cgi-based sites.




          PD. There are many reasons to website cant be captured 100% i think in SuperUser we are very enthusiast but we wont to make reverse enginerring to a website to discovery which system is running from behind(It's my oppinion).






          share|improve this answer





















            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "3"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f940875%2funable-to-copy-mirror-website-page-by-using-winhttrack%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            I suggest you to read the FAQ



            Here is a quote from WinHTTrack website:




            Question: Some sites are captured very well, other aren't. Why?



            Answer: There are several reasons (and solutions) for a mirror to fail. Reading the log
            files (ans this FAQ!) is generally a VERY good idea to figure out what
            occured.



            Links within the site refers to external links, or links located in another (or upper) directories, not captured by default - the use
            of filters is generally THE solution, as this is one of the powerful
            option in HTTrack. See the above questions/answers.
            Website 'robots.txt' rules forbide access to several website parts - you can disable them, but only with great care!
            HTTrack is filtered (by its default User-agent IDentity) - you can change the Browser User-Agent identity to an anonymous one (MSIE,
            Netscape..) - here again, use this option with care, as this measure
            might have been put to avoid some bandwidth abuse (see also the abuse
            faq!)



            There are cases, however, that can not be (yet) handled:



            Flash sites - no full support



            Intensive Java/Javascript sites - might be bogus/incomplete



            Complex CGI with built-in redirect, and other tricks - very complicated to handle, and therefore might cause problems



            Parsing problem in the HTML code (cases where the engine is fooled, for example by a false comment (


            comment (-->) detected. Rare cases, but might occur. A bug report is
            then generally good!



            Note: For some sites, setting "Force old HTTP/1.0 requests" option can
            be useful, as this option uses more basic requests (no HEAD request
            for example). This will cause a performance loss, but will increase
            the compatibility with some cgi-based sites.




            PD. There are many reasons to website cant be captured 100% i think in SuperUser we are very enthusiast but we wont to make reverse enginerring to a website to discovery which system is running from behind(It's my oppinion).






            share|improve this answer


























              0














              I suggest you to read the FAQ



              Here is a quote from WinHTTrack website:




              Question: Some sites are captured very well, other aren't. Why?



              Answer: There are several reasons (and solutions) for a mirror to fail. Reading the log
              files (ans this FAQ!) is generally a VERY good idea to figure out what
              occured.



              Links within the site refers to external links, or links located in another (or upper) directories, not captured by default - the use
              of filters is generally THE solution, as this is one of the powerful
              option in HTTrack. See the above questions/answers.
              Website 'robots.txt' rules forbide access to several website parts - you can disable them, but only with great care!
              HTTrack is filtered (by its default User-agent IDentity) - you can change the Browser User-Agent identity to an anonymous one (MSIE,
              Netscape..) - here again, use this option with care, as this measure
              might have been put to avoid some bandwidth abuse (see also the abuse
              faq!)



              There are cases, however, that can not be (yet) handled:



              Flash sites - no full support



              Intensive Java/Javascript sites - might be bogus/incomplete



              Complex CGI with built-in redirect, and other tricks - very complicated to handle, and therefore might cause problems



              Parsing problem in the HTML code (cases where the engine is fooled, for example by a false comment (


              comment (-->) detected. Rare cases, but might occur. A bug report is
              then generally good!



              Note: For some sites, setting "Force old HTTP/1.0 requests" option can
              be useful, as this option uses more basic requests (no HEAD request
              for example). This will cause a performance loss, but will increase
              the compatibility with some cgi-based sites.




              PD. There are many reasons to website cant be captured 100% i think in SuperUser we are very enthusiast but we wont to make reverse enginerring to a website to discovery which system is running from behind(It's my oppinion).






              share|improve this answer
























                0












                0








                0






                I suggest you to read the FAQ



                Here is a quote from WinHTTrack website:




                Question: Some sites are captured very well, other aren't. Why?



                Answer: There are several reasons (and solutions) for a mirror to fail. Reading the log
                files (ans this FAQ!) is generally a VERY good idea to figure out what
                occured.



                Links within the site refers to external links, or links located in another (or upper) directories, not captured by default - the use
                of filters is generally THE solution, as this is one of the powerful
                option in HTTrack. See the above questions/answers.
                Website 'robots.txt' rules forbide access to several website parts - you can disable them, but only with great care!
                HTTrack is filtered (by its default User-agent IDentity) - you can change the Browser User-Agent identity to an anonymous one (MSIE,
                Netscape..) - here again, use this option with care, as this measure
                might have been put to avoid some bandwidth abuse (see also the abuse
                faq!)



                There are cases, however, that can not be (yet) handled:



                Flash sites - no full support



                Intensive Java/Javascript sites - might be bogus/incomplete



                Complex CGI with built-in redirect, and other tricks - very complicated to handle, and therefore might cause problems



                Parsing problem in the HTML code (cases where the engine is fooled, for example by a false comment (


                comment (-->) detected. Rare cases, but might occur. A bug report is
                then generally good!



                Note: For some sites, setting "Force old HTTP/1.0 requests" option can
                be useful, as this option uses more basic requests (no HEAD request
                for example). This will cause a performance loss, but will increase
                the compatibility with some cgi-based sites.




                PD. There are many reasons to website cant be captured 100% i think in SuperUser we are very enthusiast but we wont to make reverse enginerring to a website to discovery which system is running from behind(It's my oppinion).






                share|improve this answer












                I suggest you to read the FAQ



                Here is a quote from WinHTTrack website:




                Question: Some sites are captured very well, other aren't. Why?



                Answer: There are several reasons (and solutions) for a mirror to fail. Reading the log
                files (ans this FAQ!) is generally a VERY good idea to figure out what
                occured.



                Links within the site refers to external links, or links located in another (or upper) directories, not captured by default - the use
                of filters is generally THE solution, as this is one of the powerful
                option in HTTrack. See the above questions/answers.
                Website 'robots.txt' rules forbide access to several website parts - you can disable them, but only with great care!
                HTTrack is filtered (by its default User-agent IDentity) - you can change the Browser User-Agent identity to an anonymous one (MSIE,
                Netscape..) - here again, use this option with care, as this measure
                might have been put to avoid some bandwidth abuse (see also the abuse
                faq!)



                There are cases, however, that can not be (yet) handled:



                Flash sites - no full support



                Intensive Java/Javascript sites - might be bogus/incomplete



                Complex CGI with built-in redirect, and other tricks - very complicated to handle, and therefore might cause problems



                Parsing problem in the HTML code (cases where the engine is fooled, for example by a false comment (


                comment (-->) detected. Rare cases, but might occur. A bug report is
                then generally good!



                Note: For some sites, setting "Force old HTTP/1.0 requests" option can
                be useful, as this option uses more basic requests (no HEAD request
                for example). This will cause a performance loss, but will increase
                the compatibility with some cgi-based sites.




                PD. There are many reasons to website cant be captured 100% i think in SuperUser we are very enthusiast but we wont to make reverse enginerring to a website to discovery which system is running from behind(It's my oppinion).







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jul 15 '15 at 12:14









                Francisco Tapia

                2,20321340




                2,20321340






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Super User!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f940875%2funable-to-copy-mirror-website-page-by-using-winhttrack%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                    Alcedinidae

                    Origin of the phrase “under your belt”?