Understanding Cloud Backup

Have questions about Backup4all, or having problems using it? Ask here for help.
Post Reply
GaryML
Posts: 10
Joined: Thu Apr 30, 2020 8:34 pm

Understanding Cloud Backup

Post by GaryML »

Hi,

I'm new here so please be gentle.

I'm looking for a backup system to use with Microsoft OneDrive Personal. I currently have about 800GB of slow moving data, with about 40GB per year to add.

As far as I can tell this is how the Backup4All works for this:
1) Create a backup in a temporary folder.
2) it then tests the content of zip
3) The process bar reaches 100%
4) Backup4All then uploads the zip file to the OneDrive (at close to my internet connections upload speed).
5) Once this is complete I press test this then downloads the backup catalogue from the OneDrive version of the zip, analysis it somehow and says all is good.

As it stands, probably due to ignorance, I have a couple of concerns about this.
1) For amount of data I want to upload and my internet connection the upload is about 72 hours which is fine, however if the backup is built completely before the upload starts, this will slow this process right down and also take up about 800GB on disk. I believe OneDrive Personal can now take 100GB files up from 15GB. Given I auto split my backup into files that are no more than 50GB, will the first 50GB start to upload and be removed from temporary storage once it is uploaded while the second (and possible third to tenth) is being created?

2) Leading on from the first question is there anyway to either get the backup to be created directly on the OneDrive so that it doesn't take up so much temporary storage, or is the only way to do this is to split the 800GB into smaller backups?

3) I take it is only the full backup that will require the full size of the backup to be stored in temporary storage, e.g. incremental and differential backups will only use as much space in temporary storage as they do in the final copy?

4) When the copy is in local storage it tests the content of the zip, what is it doing? Is it a complete bit by bit comparison of the zipped files to the original?

5) The part I'm most concerned about (and ignorant of) is the transfer from my PC to sitting safely on Microsoft's servers as I believe this is were problems might occur. The Test button runs very quickly it barely does any reading of the cloud based file. I take it this is using some form of checksum? Would this checksum be created on the cloud side based on the actual data there at that time? I.e. if it leaves my PC with the correct checksum and data, but part of the file gets slightly corrupted on the way or at some point before I hit the test button while sat on the cloud server, by hitting the test button will this request the cloud provider to recalculate the checksum so correctly representing the now ever so slightly corrupted zip and therefore not match to the one I'm expecting, which I assume is stored in the catalogue somewhere?

6) is there anyway to see progress on the upload stage to see how much is uploaded, as for me this is the slowest part of the journey?

Sorry, that this is really basic stuff.

Thanks,
Best Regards, Gary

Adrian (Softland)
Posts: 1914
Joined: Wed Dec 16, 2009 12:46 pm

Re: Understanding Cloud Backup

Post by Adrian (Softland) »

Hi,

The backup steps you indicate are correct.

1. In Backup4all, you can split the zip files in smaller parts. Each split is created in temp and then uploaded, the temp file is deleted and then the second zip split is created. You just need to set the split limit in "Backup Properties->Compression" and also select the "Create independent splits" option on the same page.

2. The zip files cannot be created directly on OneDrive, but you have the split solution described above.

3. You are correct: Incremental / Differential backups are smaller and doesn't use so much space.

4. When testing the temp in zip, Backup4all will test for each file to exist in zip and have the correct CRC.

5. After uploading the zip to cloud, we check the CRC for that file on the cloud.

6. I'm afraid the upload progress to cloud is not available.

If you have any other questions, please contact us.

GaryML
Posts: 10
Joined: Thu Apr 30, 2020 8:34 pm

Re: Understanding Cloud Backup

Post by GaryML »

Hi Adrian,

Thank you for your quick response and clarifications.

That sounds really good!

I can use the file splitting part to also give a rough indication of progress (which is all anyone probably needs) as well as managing temporary storage requirements. Is there any particular split size you would recommend? Most of my files are 20-100MB and 50-100KB, with some around 4GB and a few bigger. I was thinking something in the 20-60GB region.

Pardon my ignorance, but how do you calculate the CRC (I assume standard CRC32 algorithm?) on the cloud without downloading the file, or is this a standard API call to the Cloud provider to calculate it locally their end and pass back?

Do all interments and differentials create new files, so if something goes wrong with these files the damage is limited to that particular version (for incremental obviously the file changes made then wouldn't be caught until the next full backup, another change to the same file or a repair).

Thanks,
Best Regards, Gary

Adrian (Softland)
Posts: 1914
Joined: Wed Dec 16, 2009 12:46 pm

Re: Understanding Cloud Backup

Post by Adrian (Softland) »

Hi,

We suggest you to split the files at 4 GB.

For CRC, it's not the standard CRC32. When uploading the file to OneDrive we receive a checksum for the file content. Afterwards we check to see if the checksum calculated for the file is the same.

Yes, incremental and differential backup executions create new files.

GaryML
Posts: 10
Joined: Thu Apr 30, 2020 8:34 pm

Re: Understanding Cloud Backup

Post by GaryML »

Hi Adrian,

Thank you for confirmation on the recommended size and the way increments and differentials work.

I might well be overly simplifying a complex process here. As far as I'm aware (please correct me if I'm wrong) but the way CRC works for file validation is it reads all bits in a file and uses every single one of them to create a short code. Any particular instance of short code could obviously be produce my many different files, however the short code is produced so that similar files that only vary in a small way won't end up with, or are incredibly unlikely to end up with the same short code. The short code is long enough that any two random files that are different are incredibly unlikely to have the same short code. Therefore if the same short code is produced for two files in a realistic environment that are meant to be the same file, we can have high confidence the files are in fact the same with no differences, corruption or otherwise.

So I'm happy with the idea that the backup version stored in temp is correct with no corruption. However at this point it is sent over the internet, bounced around a fair few servers and then ends up (in my case) in a cloud system that Microsoft are using.

For me the check of the upload is important, a CRC check should give me a good degree of confidence that nothing has gone wrong in the transmission process from local temporary copy to permanent cloud copy.

To do the CRC check either the whole file needs to be downloaded/read locally on my machine to analysis every bit to create the CRC short code and compare against the expected value; or the analysis is done my Microsoft locally on their servers by them checking every bit and the result sent back to my local PC; or the file is sent from Microsoft to some third party server for the check to be done and the results sent to my local PC.

How is the CRC code on the remote server calculated (e.g. is there a script that can be triggered to run remotely on the cloud servers, or an API that can be called) or is it not calculated on the server at all and instead copied there from my local PC for later comparison?

Are the checksums from OneDrive, calculated from the temp file about to be uploaded and then once uploaded, then recalculated by OneDrive on request on the cloud server? Where the methodology for creating the checksum various from different cloud providers and services and is different to to the CRC method used for Backup4all's initial checks?


Apologies if this level of scrutiny sounds paranoid. I've used other dedicated backup software where the only way to get confidence that a backup is valid is to download it and compare to the original static data. The number of invalid backups across my local network was zero, while across the internet it was about 20% failure rate.

Thanks,
Best Regards, Gary

Adrian (Softland)
Posts: 1914
Joined: Wed Dec 16, 2009 12:46 pm

Re: Understanding Cloud Backup

Post by Adrian (Softland) »

Hi,

We are using SSL when uploading the files and this ensures the data integrity during upload. When the file has finished uploading we receive a checksum from OneDrive for the uploaded file. We store that checksum and during Test we compare it to the checksum of the file in OneDrive. This ensures that the file has not changed during storage.

GaryML
Posts: 10
Joined: Thu Apr 30, 2020 8:34 pm

Re: Understanding Cloud Backup

Post by GaryML »

Thank you Adrian!

This is exactly what I'm looking for in a cloud backup!

Best Regards, Gary

Post Reply