rsync can't complete upload due to connection drops
I am trying to upload ~3 million jpegs files, ~90GB, to a remote server.
At first I thought rsync would be great for the job and at first it did saturate my upload link completely.
However my internet connection is somewhat unstable and drops every few hours. It appears the rsyncs startup phase takes substantially longer to complete when there already are some files in the target directory, apparently checking all the files for changes. This process now takes hours before it starts to upload any new files. My connection does not stay up that long, restarting the process.
So I effectively make no progress at all anymore, because the startup takes too long and gets cancelled before it can complete. I need rsync or something like rsync that is aware of connection drops and can reconnect without losing all the progress.
Is there such a tool or option for rsync?
linux rsync
add a comment |
I am trying to upload ~3 million jpegs files, ~90GB, to a remote server.
At first I thought rsync would be great for the job and at first it did saturate my upload link completely.
However my internet connection is somewhat unstable and drops every few hours. It appears the rsyncs startup phase takes substantially longer to complete when there already are some files in the target directory, apparently checking all the files for changes. This process now takes hours before it starts to upload any new files. My connection does not stay up that long, restarting the process.
So I effectively make no progress at all anymore, because the startup takes too long and gets cancelled before it can complete. I need rsync or something like rsync that is aware of connection drops and can reconnect without losing all the progress.
Is there such a tool or option for rsync?
linux rsync
Make sure to include the options--partial --progresssorsynccan pick up where it left off after failure.rsyncis the correct tool.
– David C. Rankin
Jan 26 at 5:51
I assume that the 3 million JPEGs aren't all in the same directory? Could you divide the process per directory to have a manageable size (<100K each)?
– xenoid
Jan 26 at 10:13
They're all in the same directory. I know that is a lot of files for a single directory, but it works fine in all regards, except the restart issue.
– Cola_Colin
Jan 26 at 11:41
2
Then you can rsynca*,b*, etc... and use any strategy where you rsync only manageable parts of the files.
– xenoid
Jan 26 at 13:18
That's a great idea. I should've thought of that.
– Cola_Colin
Jan 26 at 13:38
add a comment |
I am trying to upload ~3 million jpegs files, ~90GB, to a remote server.
At first I thought rsync would be great for the job and at first it did saturate my upload link completely.
However my internet connection is somewhat unstable and drops every few hours. It appears the rsyncs startup phase takes substantially longer to complete when there already are some files in the target directory, apparently checking all the files for changes. This process now takes hours before it starts to upload any new files. My connection does not stay up that long, restarting the process.
So I effectively make no progress at all anymore, because the startup takes too long and gets cancelled before it can complete. I need rsync or something like rsync that is aware of connection drops and can reconnect without losing all the progress.
Is there such a tool or option for rsync?
linux rsync
I am trying to upload ~3 million jpegs files, ~90GB, to a remote server.
At first I thought rsync would be great for the job and at first it did saturate my upload link completely.
However my internet connection is somewhat unstable and drops every few hours. It appears the rsyncs startup phase takes substantially longer to complete when there already are some files in the target directory, apparently checking all the files for changes. This process now takes hours before it starts to upload any new files. My connection does not stay up that long, restarting the process.
So I effectively make no progress at all anymore, because the startup takes too long and gets cancelled before it can complete. I need rsync or something like rsync that is aware of connection drops and can reconnect without losing all the progress.
Is there such a tool or option for rsync?
linux rsync
linux rsync
asked Jan 26 at 3:38
Cola_ColinCola_Colin
213
213
Make sure to include the options--partial --progresssorsynccan pick up where it left off after failure.rsyncis the correct tool.
– David C. Rankin
Jan 26 at 5:51
I assume that the 3 million JPEGs aren't all in the same directory? Could you divide the process per directory to have a manageable size (<100K each)?
– xenoid
Jan 26 at 10:13
They're all in the same directory. I know that is a lot of files for a single directory, but it works fine in all regards, except the restart issue.
– Cola_Colin
Jan 26 at 11:41
2
Then you can rsynca*,b*, etc... and use any strategy where you rsync only manageable parts of the files.
– xenoid
Jan 26 at 13:18
That's a great idea. I should've thought of that.
– Cola_Colin
Jan 26 at 13:38
add a comment |
Make sure to include the options--partial --progresssorsynccan pick up where it left off after failure.rsyncis the correct tool.
– David C. Rankin
Jan 26 at 5:51
I assume that the 3 million JPEGs aren't all in the same directory? Could you divide the process per directory to have a manageable size (<100K each)?
– xenoid
Jan 26 at 10:13
They're all in the same directory. I know that is a lot of files for a single directory, but it works fine in all regards, except the restart issue.
– Cola_Colin
Jan 26 at 11:41
2
Then you can rsynca*,b*, etc... and use any strategy where you rsync only manageable parts of the files.
– xenoid
Jan 26 at 13:18
That's a great idea. I should've thought of that.
– Cola_Colin
Jan 26 at 13:38
Make sure to include the options
--partial --progress so rsync can pick up where it left off after failure. rsync is the correct tool.– David C. Rankin
Jan 26 at 5:51
Make sure to include the options
--partial --progress so rsync can pick up where it left off after failure. rsync is the correct tool.– David C. Rankin
Jan 26 at 5:51
I assume that the 3 million JPEGs aren't all in the same directory? Could you divide the process per directory to have a manageable size (<100K each)?
– xenoid
Jan 26 at 10:13
I assume that the 3 million JPEGs aren't all in the same directory? Could you divide the process per directory to have a manageable size (<100K each)?
– xenoid
Jan 26 at 10:13
They're all in the same directory. I know that is a lot of files for a single directory, but it works fine in all regards, except the restart issue.
– Cola_Colin
Jan 26 at 11:41
They're all in the same directory. I know that is a lot of files for a single directory, but it works fine in all regards, except the restart issue.
– Cola_Colin
Jan 26 at 11:41
2
2
Then you can rsync
a*, b*, etc... and use any strategy where you rsync only manageable parts of the files.– xenoid
Jan 26 at 13:18
Then you can rsync
a*, b*, etc... and use any strategy where you rsync only manageable parts of the files.– xenoid
Jan 26 at 13:18
That's a great idea. I should've thought of that.
– Cola_Colin
Jan 26 at 13:38
That's a great idea. I should've thought of that.
– Cola_Colin
Jan 26 at 13:38
add a comment |
3 Answers
3
active
oldest
votes
Couple of thoughts -
Is rsync checksumming the files? If so, change the behaviour to date/time and filesize and make sure these attributes are being preserved.
Set up OpenVPN between the client and server and rsync across that. Because the IP addresses of the endpoints dont change and because there is no NAT to break the connection, when OpenVPN resumes rsync will continue where it left off.
open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.
– Cola_Colin
Jan 26 at 11:43
add a comment |
If your files is unchanged from the initial rsync, you can try the rsync option of --ignore-existing to ignore the existing files on the receiving remote server and just progress with what is not on it.
Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.
– Cola_Colin
Jan 26 at 3:54
add a comment |
To summarize my experience for future googlers:
Trying to split up files in multiple batches by copying a*, b*, etc is a good idea and helped to complete the upload
The actual problem was that I had made the mistake of selecting a HDD volume on the cloud server I was uploading to. A HDD cannot handle a directory with 3 million files at all, even tools like cp were not able to further move the data elsewhere from the HDD, just spending forever preparing at 100% disk waits without actually doing any file copying. After using an SSD instead the startup process of rsync is much faster and poses no more problem.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1398636%2frsync-cant-complete-upload-due-to-connection-drops%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Couple of thoughts -
Is rsync checksumming the files? If so, change the behaviour to date/time and filesize and make sure these attributes are being preserved.
Set up OpenVPN between the client and server and rsync across that. Because the IP addresses of the endpoints dont change and because there is no NAT to break the connection, when OpenVPN resumes rsync will continue where it left off.
open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.
– Cola_Colin
Jan 26 at 11:43
add a comment |
Couple of thoughts -
Is rsync checksumming the files? If so, change the behaviour to date/time and filesize and make sure these attributes are being preserved.
Set up OpenVPN between the client and server and rsync across that. Because the IP addresses of the endpoints dont change and because there is no NAT to break the connection, when OpenVPN resumes rsync will continue where it left off.
open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.
– Cola_Colin
Jan 26 at 11:43
add a comment |
Couple of thoughts -
Is rsync checksumming the files? If so, change the behaviour to date/time and filesize and make sure these attributes are being preserved.
Set up OpenVPN between the client and server and rsync across that. Because the IP addresses of the endpoints dont change and because there is no NAT to break the connection, when OpenVPN resumes rsync will continue where it left off.
Couple of thoughts -
Is rsync checksumming the files? If so, change the behaviour to date/time and filesize and make sure these attributes are being preserved.
Set up OpenVPN between the client and server and rsync across that. Because the IP addresses of the endpoints dont change and because there is no NAT to break the connection, when OpenVPN resumes rsync will continue where it left off.
answered Jan 26 at 4:52
davidgodavidgo
44.6k75292
44.6k75292
open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.
– Cola_Colin
Jan 26 at 11:43
add a comment |
open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.
– Cola_Colin
Jan 26 at 11:43
open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.
– Cola_Colin
Jan 26 at 11:43
open vpn is a cool idea. I'll try that on the next connection drop. However over night my connection was nice to me and right now it is transfering files. Maybe I will get lucky. If not, open vpn it is.
– Cola_Colin
Jan 26 at 11:43
add a comment |
If your files is unchanged from the initial rsync, you can try the rsync option of --ignore-existing to ignore the existing files on the receiving remote server and just progress with what is not on it.
Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.
– Cola_Colin
Jan 26 at 3:54
add a comment |
If your files is unchanged from the initial rsync, you can try the rsync option of --ignore-existing to ignore the existing files on the receiving remote server and just progress with what is not on it.
Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.
– Cola_Colin
Jan 26 at 3:54
add a comment |
If your files is unchanged from the initial rsync, you can try the rsync option of --ignore-existing to ignore the existing files on the receiving remote server and just progress with what is not on it.
If your files is unchanged from the initial rsync, you can try the rsync option of --ignore-existing to ignore the existing files on the receiving remote server and just progress with what is not on it.
answered Jan 26 at 3:46
Eric FeldhusenEric Feldhusen
1324
1324
Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.
– Cola_Colin
Jan 26 at 3:54
add a comment |
Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.
– Cola_Colin
Jan 26 at 3:54
Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.
– Cola_Colin
Jan 26 at 3:54
Tried that just now. It does not appear to help. The verbose logging output during startup changes from "file xyz uptodate" to "file xyz exists", but it still goes through all of the files and checks that they exist, which appears to be just as slow as making sure they are uptodate.
– Cola_Colin
Jan 26 at 3:54
add a comment |
To summarize my experience for future googlers:
Trying to split up files in multiple batches by copying a*, b*, etc is a good idea and helped to complete the upload
The actual problem was that I had made the mistake of selecting a HDD volume on the cloud server I was uploading to. A HDD cannot handle a directory with 3 million files at all, even tools like cp were not able to further move the data elsewhere from the HDD, just spending forever preparing at 100% disk waits without actually doing any file copying. After using an SSD instead the startup process of rsync is much faster and poses no more problem.
add a comment |
To summarize my experience for future googlers:
Trying to split up files in multiple batches by copying a*, b*, etc is a good idea and helped to complete the upload
The actual problem was that I had made the mistake of selecting a HDD volume on the cloud server I was uploading to. A HDD cannot handle a directory with 3 million files at all, even tools like cp were not able to further move the data elsewhere from the HDD, just spending forever preparing at 100% disk waits without actually doing any file copying. After using an SSD instead the startup process of rsync is much faster and poses no more problem.
add a comment |
To summarize my experience for future googlers:
Trying to split up files in multiple batches by copying a*, b*, etc is a good idea and helped to complete the upload
The actual problem was that I had made the mistake of selecting a HDD volume on the cloud server I was uploading to. A HDD cannot handle a directory with 3 million files at all, even tools like cp were not able to further move the data elsewhere from the HDD, just spending forever preparing at 100% disk waits without actually doing any file copying. After using an SSD instead the startup process of rsync is much faster and poses no more problem.
To summarize my experience for future googlers:
Trying to split up files in multiple batches by copying a*, b*, etc is a good idea and helped to complete the upload
The actual problem was that I had made the mistake of selecting a HDD volume on the cloud server I was uploading to. A HDD cannot handle a directory with 3 million files at all, even tools like cp were not able to further move the data elsewhere from the HDD, just spending forever preparing at 100% disk waits without actually doing any file copying. After using an SSD instead the startup process of rsync is much faster and poses no more problem.
answered Jan 29 at 12:12
Cola_ColinCola_Colin
213
213
add a comment |
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1398636%2frsync-cant-complete-upload-due-to-connection-drops%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Make sure to include the options
--partial --progresssorsynccan pick up where it left off after failure.rsyncis the correct tool.– David C. Rankin
Jan 26 at 5:51
I assume that the 3 million JPEGs aren't all in the same directory? Could you divide the process per directory to have a manageable size (<100K each)?
– xenoid
Jan 26 at 10:13
They're all in the same directory. I know that is a lot of files for a single directory, but it works fine in all regards, except the restart issue.
– Cola_Colin
Jan 26 at 11:41
2
Then you can rsync
a*,b*, etc... and use any strategy where you rsync only manageable parts of the files.– xenoid
Jan 26 at 13:18
That's a great idea. I should've thought of that.
– Cola_Colin
Jan 26 at 13:38