INTRODUCTION
S3 storage is the most resilient and available service in AWS: is a trusted service for storing backups over the long term.
From a Windows Server machine, in AWS EC2 or elsewhere, the easiest and most reliable way to make a backup from SQL Server:
- Via URL, natively supported since SQL Server 2022
- Via a Storage Gateway with File Gateway, which exposes an SMB share. The Storage Gateway will be responsible for replication to S3
1- Concept
For data that must be available over a long period of time (finance, insurance, medical), less costly archiving can be envisaged as follows:
- By optimizing the type of storage class
- Reduce the volume of backups with a retention strategy
a- Backup Retention
For a customer project, we chose to implement the following strategy:
- All backups with 30 days retention (with standard storage class)
- A Full backup a month with 12 months retention (with glacier storage class)
- A Full backup a year with 15 years retention (with deep-archive storage class)
To do this, we decided to create, in a Bucket, the following 3 Prefixes:
- Daily: contains all backups (.bak, .trn…) in last 30 days
- Monthly: contains the first FULL backup of the month in the last 12 months
- Yearly: contains the first FULL backup of the year in the last 15 years
b- S3 Lifecycle Policy
To implement the retention, we did set up a lifecycle rule on each 3 prefixes.
These rules include the following parameters:
- A Transition Action: the object is converted into the default storage class for the prefix, based on the backup retention decided above. Of course, it’s more efficient to create them in the right storage class first
- A Retention Action: the object is deleted after a defined period
c- Archiving backups
While lifecycle exists natively in S3 with Lifecycle Policies, there is no native mechanism which moves S3 objects between prefixes.
Generating a backup once a month and/or a year from SQL Server to S3 is not efficient since backups are already present in Daily prefixes.
To do this, you need to set up a custom script which will archive backups from Daily to Monthly and Yearly.
This script can be scheduled, for example, by AWS Lambda or another scheduler tool.
In our context, we did choose SQL Server Agent to schedule the archiving after the backup step in a job for the following reasons:
- Execute archiving as soon as a new backup (if eligible) was generated
- Get access easily and modify it, if necessary, from the database server. Of course, the script is server and instance agnostic… but you may want to temporarily modify it on a server as a workaround
2- Implementation
a- Accessing S3
First, a IAM role must be attached to the EC2 instance to authorize the access to the S3 Bucket where backups are stored. Keep in mind this access is available at the machine level, that’s why I recommend to restrict access to read/write only.
This role is accessible, on the OS, through a metadata token as follows (IMDSv2 requirement):
[string]$token = Invoke-RestMethod -Headers @{"X-aws-ec2-metadata-token-ttl-seconds" = "21600"} -Method PUT -Uri http://169.254.169.254/latest/api/token
# This sens a PUT request to get a token valid for 6 hours
Once the token retrieved, you can securely accessed the role assigned the the EC2 instance:
$roleName = Invoke-RestMethod -Uri "http://169.254.169.254/latest/meta-data/iam/security-credentials/" -Headers @{"X-aws-ec2-metadata-token"=$token}
From there, you can get temporary security AWS credentials for this role:
$credentials = Invoke-RestMethod -Uri "http://169.254.169.254/latest/meta-data/iam/security-credentials/$roleName" -Headers @{"X-aws-ec2-metadata-token"=$token}
Then, you can set the credentials using AWS Tools PowerShell module:
Set-AWSCredential -AccessKey $credentials.AccessKeyId -SecretKey $credentials.SecretAccessKey -SessionToken $credentials.Token
# Credentials set for the current PowerShell session
b- Retrieving backups to archive
Using dbatools PowerShell module, we retrieve the first backup based on a reference date.
Here is an example when a Storage Gateway with File Gateway is used as a destination:
# $RefDate = New-Object DateTime( $Today.Year, $Today.Month, 1 ); # First backup of the month
# $RefDate = New-Object DateTime( $Today.Year, 1, 1 ); # First backup of the year
$BackupHistory = Get-DbaDBBackupHistory -SqlInstance $ConnectionString -Since $RefDate -Type Full -DeviceType Disk | Group-Object Database;
foreach ($Database in $BackupHistory){
$Backup = ($Database.Group | Sort-Object End)[0];
$Files = $Backup.Path;
foreach ($File in $Files){
# Process your copy
}
}
c- Archiving backups
To process the copy, we must construct the key location and the key destination in S3 based on the information retrieved in msdb.
So, it mainly depends on your prefix structure in your bucket ; which must remains the same over the time.
Here is an example when the structure is RetentionPeriod\InstanceFolder\DatabaseName\FileName:
# Process copy for each file
$Leaf = Split-Path -Path $File -Leaf;
# Construct Source Key in S3
$SourceKey = "$($RefSource)/$($InstanceFolder)/$($Backup.Database)/$($Leaf)";
# Construct Destination Key in S3
$DestinationKey = "$($RefDestination)/$($InstanceFolder)/$($Backup.Database)/$($Leaf)";
# Copy the backup to the new Prefix
Copy-S3Object -BucketName $bucketName -Key $SourceKey -DestinationKey $DestinationKey -StorageClass $StorageClass -Region $Region;
CONCLUSION
In this blog, we demystified all the concepts and steps required to implement a backup strategy in AWS EC2 for SQL Server.
For other RDBMS running in EC2, the approach and code used for archiving would be very similar.
If you restrict access to Storage Gateway with File Gateway with READ/WRITE permissions and you enable COMPRESSION in SQL Server, then I recommend you read this blog I wrote:
Random permission denied when backing up SQL Server databases to AWS Storage Gateway