rsync can't complete upload due to connection drops












0















I am trying to upload ~3 million jpegs files, ~90GB, to a remote server.
At first I thought rsync would be great for the job and at first it did saturate my upload link completely.



However my internet connection is somewhat unstable and drops every few hours. It appears the rsyncs startup phase takes substantially longer to complete when there already are some files in the target directory, apparently checking all the files for changes. This process now takes hours before it starts to upload any new files. My connection does not stay up that long, restarting the process.



So I effectively make no progress at all anymore, because the startup takes too long and gets cancelled before it can complete. I need rsync or something like rsync that is aware of connection drops and can reconnect without losing all the progress.



Is there such a tool or option for rsync?










share|improve this question























  • Make sure to include the options --partial --progress so rsync can pick up where it left off after failure. rsync is the correct tool.

    – David C. Rankin
    Jan 26 at 5:51











  • I assume that the 3 million JPEGs aren't all in the same directory? Could you divide the process per directory to have a manageable size (<100K each)?

    – xenoid
    Jan 26 at 10:13











  • They're all in the same directory. I know that is a lot of files for a single directory, but it works fine in all regards, except the restart issue.

    – Cola_Colin
    Jan 26 at 11:41






  • 2





    Then you can rsync a*, b*, etc... and use any strategy where you rsync only manageable parts of the files.

    – xenoid
    Jan 26 at 13:18











  • That's a great idea. I should've thought of that.

    – Cola_Colin
    Jan 26 at 13:38
















0















I am trying to upload ~3 million jpegs files, ~90GB, to a remote server.
At first I thought rsync would be great for the job and at first it did saturate my upload link completely.



However my internet connection is somewhat unstable and drops every few hours. It appears the rsyncs startup phase takes substantially longer to complete when there already are some files in the target directory, apparently checking all the files for changes. This process now takes hours before it starts to upload any new files. My connection does not stay up that long, restarting the process.



So I effectively make no progress at all anymore, because the startup takes too long and gets cancelled before it can complete. I need rsync or something like rsync that is aware of connection drops and can reconnect without losing all the progress.



Is there such a tool or option for rsync?










share|improve this question























  • Make sure to include the options --partial --progress so rsync can pick up where it left off after failure. rsync is the correct tool.

    – David C. Rankin
    Jan 26 at 5:51











  • I assume that the 3 million JPEGs aren't all in the same directory? Could you divide the process per directory to have a manageable size (<100K each)?

    – xenoid
    Jan 26 at 10:13











  • They're all in the same directory. I know that is a lot of files for a single directory, but it works fine in all regards, except the restart issue.

    – Cola_Colin
    Jan 26 at 11:41






  • 2





    Then you can rsync a*, b*, etc... and use any strategy where you rsync only manageable parts of the files.

    – xenoid
    Jan 26 at 13:18











  • That's a great idea. I should've thought of that.

    – Cola_Colin
    Jan 26 at 13:38














0












0








0








I am trying to upload ~3 million jpegs files, ~90GB, to a remote server.
At first I thought rsync would be great for the job and at first it did saturate my upload link completely.



However my internet connection is somewhat unstable and drops every few hours. It appears the rsyncs startup phase takes substantially longer to complete when there already are some files in the target directory, apparently checking all the files for changes. This process now takes hours before it starts to upload any new files. My connection does not stay up that long, restarting the process.



So I effectively make no progress at all anymore, because the startup takes too long and gets cancelled before it can complete. I need rsync or something like rsync that is aware of connection drops and can reconnect without losing all the progress.



Is there such a tool or option for rsync?










share|improve this question














I am trying to upload ~3 million jpegs files, ~90GB, to a remote server.
At first I thought rsync would be great for the job and at first it did saturate my upload link completely.



However my internet connection is somewhat unstable and drops every few hours. It appears the rsyncs startup phase takes substantially longer to complete when there already are some files in the target directory, apparently checking all the files for changes. This process now takes hours before it starts to upload any new files. My connection does not stay up that long, restarting the process.



So I effectively make no progress at all anymore, because the startup takes too long and gets cancelled before it can complete. I need rsync or something like rsync that is aware of connection drops and can reconnect without losing all the progress.



Is there such a tool or option for rsync?







linux rsync






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 26 at 3:38









Cola_ColinCola_Colin

213




213













  • Make sure to include the options --partial --progress so rsync can pick up where it left off after failure. rsync is the correct tool.

    – David C. Rankin
    Jan 26 at 5:51











  • I assume that the 3 million JPEGs aren't all in the same directory? Could you divide the process per directory to have a manageable size (<100K each)?

    – xenoid
    Jan 26 at 10:13











  • They're all in the same directory. I know that is a lot of files for a single directory, but it works fine in all regards, except the restart issue.

    – Cola_Colin
    Jan 26 at 11:41






  • 2





    Then you can rsync a*, b*, etc... and use any strategy where you rsync only manageable parts of the files.

    – xenoid
    Jan 26 at 13:18











  • That's a great idea. I should've thought of that.

    – Cola_Colin
    Jan 26 at 13:38



















  • Make sure to include the options --partial --progress so rsync can pick up where it left off after failure. rsync is the correct tool.

    – David C. Rankin
    Jan 26 at 5:51











  • I assume that the 3 million JPEGs aren't all in the same directory? Could you divide the process per directory to have a manageable size (<100K each)?

    – xenoid
    Jan 26 at 10:13











  • They're all in the same directory. I know that is a lot of files for a single directory, but it works fine in all regards, except the restart issue.

    – Cola_Colin
    Jan 26 at 11:41






  • 2





    Then you can rsync a*, b*, etc... and use any strategy where you rsync only manageable parts of the files.

    – xenoid
    Jan 26 at 13:18











  • That's a great idea. I should've thought of that.

    – Cola_Colin
    Jan 26 at 13:38

















Make sure to include the options --partial --progress so rsync can pick up where it left off after failure. rsync is the correct tool.

– David C. Rankin
Jan 26 at 5:51





Make sure to include the options --partial --progress so rsync can pick up where it left off after failure. rsync is the correct tool.

– David C. Rankin
Jan 26 at 5:51













I assume that the 3 million JPEGs aren't all in the same directory? Could you divide the process per directory to have a manageable size (<100K each)?

– xenoid
Jan 26 at 10:13





I assume that the 3 million JPEGs aren't all in the same directory? Could you divide the process per directory to have a manageable size (<100K each)?

– xenoid
Jan 26 at 10:13













They're all in the same directory. I know that is a lot of files for a single directory, but it works fine in all regards, except the restart issue.

– Cola_Colin
Jan 26 at 11:41





They're all in the same directory. I know that is a lot of files for a single directory, but it works fine in all regards, except the restart issue.

– Cola_Colin
Jan 26 at 11:41




2




2





Then you can rsync a*, b*, etc... and use any strategy where you rsync only manageable parts of the files.

– xenoid
Jan 26 at 13:18





Then you can rsync a*, b*, etc... and use any strategy where you rsync only manageable parts of the files.

– xenoid
Jan 26 at 13:18













That's a great idea. I should've thought of that.

– Cola_Colin
Jan 26 at 13:38





That's a great idea. I should've thought of that.

– Cola_Colin
Jan 26 at 13:38










3 Answers
3






active

oldest

votes


















1














Couple of thoughts -



Is rsync checksumming the files? If so, change the behaviour to date/time and filesize and make sure these attributes are being preserved.



Set up OpenVPN between the client and server and rsync across that. Because the IP addresses of the endpoints dont change and because there is no NAT to break the connection, when OpenVPN resumes rsync will continue where it left off.






share|improve this answer
























  • open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.

    – Cola_Colin
    Jan 26 at 11:43



















0














If your files is unchanged from the initial rsync, you can try the rsync option of --ignore-existing to ignore the existing files on the receiving remote server and just progress with what is not on it.






share|improve this answer
























  • Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.

    – Cola_Colin
    Jan 26 at 3:54



















0














To summarize my experience for future googlers:




  • Trying to split up files in multiple batches by copying a*, b*, etc is a good idea and helped to complete the upload


  • The actual problem was that I had made the mistake of selecting a HDD volume on the cloud server I was uploading to. A HDD cannot handle a directory with 3 million files at all, even tools like cp were not able to further move the data elsewhere from the HDD, just spending forever preparing at 100% disk waits without actually doing any file copying. After using an SSD instead the startup process of rsync is much faster and poses no more problem.







share|improve this answer
























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "3"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1398636%2frsync-cant-complete-upload-due-to-connection-drops%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Couple of thoughts -



    Is rsync checksumming the files? If so, change the behaviour to date/time and filesize and make sure these attributes are being preserved.



    Set up OpenVPN between the client and server and rsync across that. Because the IP addresses of the endpoints dont change and because there is no NAT to break the connection, when OpenVPN resumes rsync will continue where it left off.






    share|improve this answer
























    • open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.

      – Cola_Colin
      Jan 26 at 11:43
















    1














    Couple of thoughts -



    Is rsync checksumming the files? If so, change the behaviour to date/time and filesize and make sure these attributes are being preserved.



    Set up OpenVPN between the client and server and rsync across that. Because the IP addresses of the endpoints dont change and because there is no NAT to break the connection, when OpenVPN resumes rsync will continue where it left off.






    share|improve this answer
























    • open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.

      – Cola_Colin
      Jan 26 at 11:43














    1












    1








    1







    Couple of thoughts -



    Is rsync checksumming the files? If so, change the behaviour to date/time and filesize and make sure these attributes are being preserved.



    Set up OpenVPN between the client and server and rsync across that. Because the IP addresses of the endpoints dont change and because there is no NAT to break the connection, when OpenVPN resumes rsync will continue where it left off.






    share|improve this answer













    Couple of thoughts -



    Is rsync checksumming the files? If so, change the behaviour to date/time and filesize and make sure these attributes are being preserved.



    Set up OpenVPN between the client and server and rsync across that. Because the IP addresses of the endpoints dont change and because there is no NAT to break the connection, when OpenVPN resumes rsync will continue where it left off.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 26 at 4:52









    davidgodavidgo

    44.6k75292




    44.6k75292













    • open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.

      – Cola_Colin
      Jan 26 at 11:43



















    • open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.

      – Cola_Colin
      Jan 26 at 11:43

















    open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.

    – Cola_Colin
    Jan 26 at 11:43





    open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.

    – Cola_Colin
    Jan 26 at 11:43













    0














    If your files is unchanged from the initial rsync, you can try the rsync option of --ignore-existing to ignore the existing files on the receiving remote server and just progress with what is not on it.






    share|improve this answer
























    • Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.

      – Cola_Colin
      Jan 26 at 3:54
















    0














    If your files is unchanged from the initial rsync, you can try the rsync option of --ignore-existing to ignore the existing files on the receiving remote server and just progress with what is not on it.






    share|improve this answer
























    • Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.

      – Cola_Colin
      Jan 26 at 3:54














    0












    0








    0







    If your files is unchanged from the initial rsync, you can try the rsync option of --ignore-existing to ignore the existing files on the receiving remote server and just progress with what is not on it.






    share|improve this answer













    If your files is unchanged from the initial rsync, you can try the rsync option of --ignore-existing to ignore the existing files on the receiving remote server and just progress with what is not on it.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 26 at 3:46









    Eric FeldhusenEric Feldhusen

    1324




    1324













    • Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.

      – Cola_Colin
      Jan 26 at 3:54



















    • Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.

      – Cola_Colin
      Jan 26 at 3:54

















    Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.

    – Cola_Colin
    Jan 26 at 3:54





    Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.

    – Cola_Colin
    Jan 26 at 3:54











    0














    To summarize my experience for future googlers:




    • Trying to split up files in multiple batches by copying a*, b*, etc is a good idea and helped to complete the upload


    • The actual problem was that I had made the mistake of selecting a HDD volume on the cloud server I was uploading to. A HDD cannot handle a directory with 3 million files at all, even tools like cp were not able to further move the data elsewhere from the HDD, just spending forever preparing at 100% disk waits without actually doing any file copying. After using an SSD instead the startup process of rsync is much faster and poses no more problem.







    share|improve this answer




























      0














      To summarize my experience for future googlers:




      • Trying to split up files in multiple batches by copying a*, b*, etc is a good idea and helped to complete the upload


      • The actual problem was that I had made the mistake of selecting a HDD volume on the cloud server I was uploading to. A HDD cannot handle a directory with 3 million files at all, even tools like cp were not able to further move the data elsewhere from the HDD, just spending forever preparing at 100% disk waits without actually doing any file copying. After using an SSD instead the startup process of rsync is much faster and poses no more problem.







      share|improve this answer


























        0












        0








        0







        To summarize my experience for future googlers:




        • Trying to split up files in multiple batches by copying a*, b*, etc is a good idea and helped to complete the upload


        • The actual problem was that I had made the mistake of selecting a HDD volume on the cloud server I was uploading to. A HDD cannot handle a directory with 3 million files at all, even tools like cp were not able to further move the data elsewhere from the HDD, just spending forever preparing at 100% disk waits without actually doing any file copying. After using an SSD instead the startup process of rsync is much faster and poses no more problem.







        share|improve this answer













        To summarize my experience for future googlers:




        • Trying to split up files in multiple batches by copying a*, b*, etc is a good idea and helped to complete the upload


        • The actual problem was that I had made the mistake of selecting a HDD volume on the cloud server I was uploading to. A HDD cannot handle a directory with 3 million files at all, even tools like cp were not able to further move the data elsewhere from the HDD, just spending forever preparing at 100% disk waits without actually doing any file copying. After using an SSD instead the startup process of rsync is much faster and poses no more problem.








        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 29 at 12:12









        Cola_ColinCola_Colin

        213




        213






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Super User!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1398636%2frsync-cant-complete-upload-due-to-connection-drops%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown