
Azure Data Lake Storage is designed to enable operational and exploratory analytics through a hyper-scale repository There are two different types of Data Lake Store in Azure (Gen1 and Gen2) available at the current date. If a new instance is deployed it is recommended to use Data Lake Store Gen2. The data is replicated so that the backup concept considers the “human fault” component and also the technical backup aspect.
Data Lake Gen1
Azure Data Lake Storage Gen1 is an enterprise-wide hyper-scale repository for big data analytic workloads. Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics as stated at: https://docs.microsoft.com/de-de/azure/data-lake-store/data-lake-store-overview
Data Lake Gen2
Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob storage. Data Lake Storage Gen2 is the result of converging the capabilities of our two existing storage services, Azure Blob storage and Azure Data Lake Storage Gen1. Features from Azure Data Lake Storage Gen1, such as file system semantics, directory, and file level security and scale are combined with low-cost, tiered storage, high availability/disaster recovery capabilities from Azure Blob storage.
ADLS Gen2 is based off Azure storage. Therefore, storage capacity is virtually limitless. Also, all the High availability features (GRS, RA-GRS etc) supported by Azure Storage is readily available for ADLS Gen2. This also means ADLS gen2 takes advantage of all the security lockdown features offered by Azure storage. Azure storage supports RBAC based resource access control and so does ADLS. Add to that, Access Control Lists (ACL) offer fine grained access control to files and directories.
ADLS gen2 is based on the hierarchical namespace feature of Azure Storage. Object storages like Azure Blob storages, historically have had virtual file path but not physically implemented filesystem. This makes is harder to query or iterate or move files within a path as this means iterating over all the blobs. And at analytical workload scales, the latency of doing such operations becomes noticeable. Hierarchical namespaces in ADLS Gen2 introduces Directories and filesystem which helps the data to be organized within directories. This also helps to provide/restrict access at directory or file level. See https://docs.microsoft.com/en-au/azure/storage/blobs/data-lake-storage-introduction for more details.
Microsoft Recommendations
Microsoft recommends copying he data into another Data Lake Store in another region with a dedicated frequency. This can be done via ADLCopy, Azure PowerShell or Data Factory.
In matter of data corruption or accidental deletion it is recommended to use Resource Locks, available that ADL security features and restrict access via RBAC roles.
Details listed at https://docs.microsoft.com/en-au/azure/data-lake-store/data-lake-store-disaster-recovery-guidance and https://docs.microsoft.com/en-au/azure/data-lake-store/data-lake-store-security-overview
Possible Implementations
The following section states different backup concept implementation methods. In both options the Data Factory is used. ADLCopy is a command line tool for copying files. PowerShell needs also extra scripting. Recommended way is to use the Data Factory with dedicated triggers. Normally it is easier to backup Data Lake Store Gen2 as it is based on Azure Storage.
Backup via Data Factory and second Data Lake store
This method uses the Data Factory to get the data to another Azure Data Lake Store for Backup. In the Data Factory two Pipelines are created one which performs the Backup Action and one which is used to restore the Backup data. But keep in mind a second Data Lake may increase costs dramatically.

Prerequisites:
- Data Factory with Copy pipeline
- Data Lake Store for Backup
- Data Lake (Backup Source)
Backup via Data Factory and File Storage
Here the Data Factory is used to get the data to a Storage Account (Azure Files) for Backup. In the Data Factory two Pipelines are created one which performs the Backup Action and one which is used to restore the Backup data.

Prerequisites:
- Data Factory with Move and Transform pipeline
- Storage Account (Backup Target) with https enabled
- Access Key to the Storage Account
- Data Lake (Backup Source)
- Recovery Service Vault to Backup Storage for versioning
Backup via Azure Function and File Storage
It is possible to use an Azure Function to automatically save data from Data Lake to a Storage Account based on a specific trigger.

Prerequisites:
- App Plan for Azure Functions and Code
- Storage Account (Backup Target)
- Access Key to the Storage Account
- Data Lake (Backup Source)
- Recovery Service Vault to Backup Storage for versioning
Note
In addition it is possible to use an Azure Recovery Service Vault to back the data of the Storage Account.