Overview
Backup and Archiving serves a variety of purposes and is commonly used throughout companies. Traditional backup and archiving operations target tape libraries, like the Sun StorageTek SL500 Modular Library System. Increasingly, companies are using local disk to facilitate a perceived decrease in access time and latency. With the decreased access time and latency, archives can remain highly accessible and restore operations can meet increasingly aggressive service level agreements.
Smaller companies that want the advantages of local disk solutions (performance and online browsing features) combined with a level of off-site backup and archiving for disaster recovery purposes are finding that storage utilities provide highly competitive offerings. Companies such as SmugMug and elephantdrive are using storage utilities to reduce their hardware costs and offload the system administration costs to the storage utility. The storage utility, in turn, can create better economies of scale through hierarchical storage management, being able to build large systems with lower cost per gigabyte, and being able to purchase and build management tools for managing large amounts of data while being able to amortize these costs to many customers. These savings are passed back to the smaller companies. Further, a company well versed in data management can provide a safer and more reliable solution than a small company where the core competency is in application software or in managing store inventory.
Requirements
- Up to 12 Terrabytes of accessible storage with offsite backup (assume 1TB per month utilized)
- Provide a lower monthly cost for backups than a locally hosted solution amortized over 1 year including
- Hardware expense
- Power expense
- Remote hosting facility expense
- Bandwidth expense
- System administration costs
Assumptions
- 1TB backed up per month
- Two 500MB restores occur in separate months throughout the year
Non-Requirements
- Access time: Assume "normal" Network Latency is acceptable
- Access rate: Assume 100 Mbps Network Connection is acceptable
- Reliability: Assume colocated solution has same availability as storage utility
Out of Scope
- Complex local storage and network solutions (assumption is a single X4500 being backed up over the network)
- Tape backup (cohosting solution does not go down in price for hosting tape)
Architecture Overview
We will assume a "simple" point to point file-based backup infrastructure is in place, as shown in the illustration below.

Implementation Guide - ZFS to Amazon S3
Overview
ZFS was chosen simply as a practical starting point, other file systems should be usable but could provide different performance characteristics and certainly different snapshot and restore abilities. If you find a substantial difference in a particular file system, I would suggest branching this entire implementation section into an alternative implementation.
Amazon S3 is an interesting choice for a storage utility, I chose it for the following reasons
- High usage in the industry
- Stable and predictable pricing model, lowest in the industry so this represents a good "best case"
- Trusted source of storage
The costs are as follows for hosting at Amazon S3:
- Consume 1TB per month aggregating to 12TB in the last month
- Hardware expense: $0
- Power expense: $0 (included in remote hosting cost)
- Storage Utility expense: $11,700 (1TB + 2TB + ... + 12TB, $0.15 / GB-Month, $150 + $300 + ... + $1,800)
- Bandwidth expense: $1,380 (Inbound: 12 * $0.10/GB * 1000 = $1200, Outbound: 2 * 500GB * 0.18/GB = $180)
- System administration costs: $0
- TCO for first year: $13,080
One of the original requirements of this particular pattern was to lower the cost for our storage backups. SmugMug did a simple analysis of their savings through using the Amazon S3 storage utility vs. purchasing their own disk, they netted about $340,000 in 7 months in savings.
It is _super_ important that you spend time to do a full cost analysis based on your own internal information. With very little imagination, you can see that the Amazon S3 service is a flat rate and that over the course of several years, the trajectory of the Amazon S3 cost for storing 12TB of data will cross owning a server with 12TB of storage.
Assumptions
- You have a ZFS installation already built, if you don't, a good place to start is the Solaris ZFS Administration Guide in the OpenSolaris ZFS Community
- You have an Amazon S3 account with proper keys ready to be used. Go to the Amazon S3 home page for more information about this.
Bill of Materials
Implementation Steps
This version of the implementation is going to take the "simple is elegant" route. We will use the built in capabilites of ZFS to take snapshots, save them, and restore them and couple this with simple pipes to move data between our file system and Amazon S3.
We can also restore our file system from Amazon S3 back to ZFS using the reverse flow.
All of this with built-in utilities (cheap, trusted, simple primitives for you to build on), just like Solaris Admins like it.
Basic Setup
The following was set up on our system:
- One storage pool: zfs create media c0t1d0
- A file system within the media storage pool: zfs create media/mypictures
- Mountpoint change to /export: zfs set mountpoint=/export/media/mypictures media/mypictures
- Shared over NFS: zfs set sharenfs=on media/mypictures
- Compression turned on: zfs set compression=on media/pictures
A set of pictures were then copied into the media/pictures directory (enough to create an acceptable snapshot).
Snapshot and Store
Our first process will consist of creating a snapshot and sending it to Amazon's S3 for backup. There are some assumptions that must be made at this point:
- You have an existing file system or volume that you would like to backup (as created above)
- The snapshot size will fit within the constraints of Amazon's S3 license or your own service contract with Amazon S3
The steps we will take are:
- Create a snapshot of the file system
- Compress the snapshot
- Send it to Amazon S3 with appropriate metadata
To create a snapshot, use the "zfs snapshot" command. I will snapshot the entire /export/media/mypictures directory and name the snapshot "20070607" with the command:
zfs snapshot media/mypictures@20070607
The snapshot should initially take up no additional space in my filesystem. As files change, the snapshot space will grow as well, since the changes in the data must be duplicated. Still, saving the snapshot will require the full amount of space since I am creating a "file" full of the snapshot of the data (which happens to be all of the original data). It is relatively easy to turn the snapshot itself into a stream of data, simply "send" the snapshot to a file:
zfs send media/mypictures@20070607 > ~/backups/20070607
We can also insert compression into the pipe, so the actual command I used is:
zfs send media/mypictures@20070607 | gzip > ~/backups/20070607.gz
Mileage varies with compression on snapshots.
And, finally, we need to uuencode the file to prepare it to be sent over the Internet. The uuencode process ''expands'' the file size by about 35% so its highly likely that any gains we made through compression are taken back out by uuencoding. Here is the uuencoding process from the command line:
uuencode 20070607.gz 20070607.gz > 20070607.gz.uu
Finally, we can send it to Amazon S3.
I will assume that a "bucket" is already created and that we are merely sending the final, uuencoded snapshot to the Amazon S3 bucket. To be honest, I tried using Curl, Perl, and a variety of other things and I couldn't quickly get the right libraries to create the signatures and I just hate scrounging around the Internet for the right this or that and changing compilation flags and recompiling and... So, I went with the Java - REST approach.
Use the Amazon S3 Library for REST in Java library. This has classes for doing all of your favorite Amazon S3 operations and was quite easy to use. I created the following "simple" program that passes in a key and the location of my uuencoded snapshot for upload (it is based on the samples from Amazon S3):
public static void main(String args[]) throws Exception {
if (awsAccessKeyId.startsWith("<INSERT")) {
System.err.println("Please examine S3Driver.java and update it with your credentials");
System.exit(-1);
}
if (args.length < 2) {
System.err.println("Send snapshot key and location with program: SendSnapshot key path");
System.exit(-1);
}
AWSAuthConnection conn =
new AWSAuthConnection(awsAccessKeyId, awsSecretAccessKey);
System.out.println("----- putting object -----");
S3Object object = new S3Object("this is a test".getBytes(), null);
try {
File file = new File(args[1]);
InputStream is = new FileInputStream(file);
long length = file.length();
if (length > Integer.MAX_VALUE) {
System.err.println("File too large: "+args[1]);
System.exit(-1);
}
byte[] bytes = new byte[(int)length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
// Ensure all the bytes have been read in
if (offset < bytes.length) {
throw new IOException("Could not completely read file "+args[1]);
}
// Close the input stream and return bytes
is.close();
object = new S3Object(bytes, null);
} catch (IOException ioe) {
System.err.println("Error reading file: "+args[1]);
System.exit(-1);
}
Map headers = new TreeMap();
headers.put("Content-Type", Arrays.asList(new String[] { "text/plain" }));
System.out.println(
conn.put(bucketName, args[0], object, headers).connection.getResponseMessage()
);
System.out.println("----- listing bucket -----");
System.out.println(conn.listBucket(bucketName, null, null, null, null).entries);
}
}
Send up your snapshot and you are GOOD TO GO!
Retrieve and Restore
The process of retrieving and restoring the snapshot when you lose your data or want to return to a previous time in your history is relatively simple as well, simply reverse the process above. Here is the Java code (using the Amazon S3 Java REST libraries again):
public static void main(String args[]) throws Exception {
if (awsAccessKeyId.startsWith("<INSERT")) {
System.err.println("Please examine S3Driver.java and update it with your credentials");
System.exit(-1);
}
if (args.length < 2) {
System.err.println("Get snapshot and write data needs 2 parameters: GetSnapshot key path");
System.exit(-1);
}
AWSAuthConnection conn =
new AWSAuthConnection(awsAccessKeyId, awsSecretAccessKey);
System.out.println("----- getting object -----");
byte[] bytes = conn.get(bucketName, args[0], null).object.data;
try {
FileOutputStream fos = new FileOutputStream(args[1]);
fos.write(bytes);
fos.flush();
fos.close();
} catch (Exception e) {
e.printStackTrace();
}
}
Once you have your uuencoded, gzipped snapshot, decode it and decompress the snapshot.
# uudecode 20070607.gz.uu # gunzip 20070607.gz
Now you have to decide what to do with your snapshot. I moved the existing mypictures pool and restored my old snapshot into its place to give me a complete time travel back to my snapshot. Here are the commands:
# zfs rename media/mypictures media/mypictures.old # zfs receive media/mypictures < 20070607
That's it! Going to /export/media/mypictures will bring me to the pictures I snapshotted on June 7, 2007!
Issues with implementation
Feel free to tackle any of these issues and feed them back into the implementation or add to the list if issues if you perceive there to be more. Occasionally, an issue may spawn a separate BluePrint rather than being tackled within the implementation section of this BluePrint.
*Amazon S3 size limitations - In the "Terms of Use", Amazon S3 specifies the following: "You may not, however, store "objects" (as described in the user documentation) that contain more than 5Gb of data, or own more than 100 "buckets" (as described in the user documentation) at any one time. As a result, one would want to slice the snapshot up appropriately so as to conform to the Amazon S3 limitations, or possibly work with Amazon S3. The limitation IS completely reasonable though due to lengths and limitations in the HTTPS protocol itself.
- Encryption - Data should be encrypted appropriately before being sent to a third-party storage site
- Cron Job - Timely snapshots
- Non-Java - It would be nice to do the whole process from scripts, but I got hung up on the key generation so I hopped to my native tongue (Java).
RSS Feed from ZFS Discussion Forum on OpenSolaris
|
(Most recent forum messages) |
|---|
|
Re: [zfs-discuss] dedupe question
> On Sat, 2009-11-07 at 17:41 -0500, Dennis Clarke wrote: >> Does the dedupe functionality happen at the file level or a lower block >> level? > > it occurs at the block allocation level. > |
|
Re: [zfs-discuss] dedupe question
On Sat, 2009-11-07 at 17:41 -0500, Dennis Clarke wrote: > Does the dedupe functionality happen at the file level or a lower block > level? it occurs at the block allocation level. > I am writing a large number of... |
|
Re: [zfs-discuss] dedupe question
Dennis Clarke wrote: > Does the dedupe functionality happen at the file level or a lower block > level? > block level, but remember that block size may vary from file to file. > I am writing a large number of... |
|
[zfs-discuss] dedupe question
Does the dedupe functionality happen at the file level or a lower block level? I am writing a large number of files that have the fol structure : ------ file begins 1024 lines of random ASCII chars 64 chars long |
|
Re: [zfs-discuss] Quick dedup question
Rich Teer wrote: > Congrats for integrating dedup! Quick question: in what build > of Nevada will dedep first be found? b126 is the current one > presently. It's in 128, which we're still in (closes on Monday 9th... |
|
Re: [zfs-discuss] Quick dedup question
Rich Teer wrote: > Congrats for integrating dedup! Quick question: in what build > of Nevada will dedep first be found? b126 is the current one > presently. > > Cheers, > 128 |
|
Re: [zfs-discuss] Quick dedup question
> Congrats for integrating dedup! Quick question: in what build > of Nevada will dedep first be found? b126 is the current one > presently. I had snv_127 and it was not there. So look for it in snv_128. At... |
|
[zfs-discuss] Quick dedup question
Congrats for integrating dedup! Quick question: in what build of Nevada will dedep first be found? b126 is the current one presently. Cheers, -- Rich Teer, SCSA, SCNA, SCSECA URLs: |
|
Re: [zfs-discuss] marvell88sx2 driver build126
I saw the same checksum error problem when I booted into b126. I havent dared try b126 again, I use b125 now, without problems. Here is my hardware Intel Q9450 + P45 Gigabyte EP45-DS3P motherboard + Ati 4850 I have the same AOC SATA controller... |
|
Re: [zfs-discuss] Accidentally mixed-up disks in RAIDZ
I can confirm that Tim is right, I have done it myself. |
Comments (2)
Aug 13, 2007
enomaly says:
Paul, this a great article. Another way you may want to consider mounting S3 is ...Paul, this a great article.
Another way you may want to consider mounting S3 is by using the ElasticDrive application, ElasticDrive is a distributed network storage application based upon the Amazon S3 (Simple Storage Service). ElasticDrive provides a permanent and infinitely large network hard drive which pushes storage blocks to and from Amazon's S3 service as if they were being written to a local block device *(hard drive or tape). ElasticDrive is intended to provide seamless backup, RAID target devices, or backing file stores for higher level distributed filesystems.
Please see > http://www.elasticdrive.com
Aug 15, 2007
paul.monday says:
Thanks, that's a good pointer, I will definitely look into it. One thing to kee...Thanks, that's a good pointer, I will definitely look into it. One thing to keep in mind, since this is a Wiki you can always add a section or pattern for this. Since it is similar to this pattern, I would suggest adding a second "Implementation Guide" by using an h1 header. Perhaps "Implementation Guide - xxx to ElasticDrive". You could also add a "References" section to the bottom if you have documentation or stuff at your site.
I don't want this to be an "advertising" venue, but my goal is to create a forum for solutions that people can use.