We get a lot of questions about BAR. Yes, we here at eBay went our own way a number of years ago, and it was one of the more successful things we’ve done – I have not even thought about them in quite some time. I’ll explore a bit about what we’ve done then wade through some of the architectural definitions and issues.
eBay implemented a GRID system of deep local storage platforms, and directly connected them to our Teradata nodes with GigE – we refer to these as “NearLine” nodes. We also wrote two substantial pieces of software, one to mange the metadata about what to backup when, and second to coordinate multiple ARC jobs.
Tape was never an option. As a company, the culture was “always online”, so the solution to database backups – was to use replication. There simply was not a culture which could support table, as we found out when we discovered that no one had added backups for the additional clusters after not one – but two platform upgrades. We essentially went without backups for six months.
The systems geometry we use is two Teradata nodes for each NearLine node. Each NearLine node has nearly 30TB of usable storage, connected by a single 1GigE link. In our current implementation, we can sustain data transfer rates of 100MB/sec per link, so If we need to go 2x faster, we can add an additional link per node and get to 200MB/sec. Essentially this makes for peak transfer rates in our system of ~7.8 GB/sec, or about 25TB an hour.
Setting up the hardware environment is actually the easy part. The hard part is managing what needs to be backed up on what schedule, allocating the work units to parallel and balanced work streams, managing and documenting what the state of that infrastructure is at any given time.
Our Backup system starts with metadata that tracks every table on the database, with knowledge about it’s synchronization state on both of our primary systems, how to select incremental rows on each specific table, the last time the backup was made of that table, and where the table is to be found in the file system image. This has some broad implications about what information is tracked, which is out of scope for this discussion.
These tables are used to generate a “backup instance”, which is comprised of the set of work to be done for that backup. The software we wrote takes into account the size of the tables, the number of nodes on the system, the number of concurrent ARC jobs which can be executed in parallel, and then generates a balanced set of ARC Scripts to be executed in phases on all the nodes.
The system finds the optimal point to break the work into small tables and large tables, the latter to be taken with Cluster Archives, the former set going through a knapsack algorithm based on tablesize to evenly distribute work of the small tables across all nodes.
It then compiles the worksets into Arcmain scripts, creates a new environment across all the nodes for the sets to execute, distributes the compiled scripts and executes the body of scripting in parallel. If there is an error, each instance is designed to be restartable from the last known state - very friendly to our operations group.
We originally intended to use the ability of our Dual Active sync infrastructure to generate incremental data sets, but it turns out it is usually cheaper and faster to simply dump the entire contents of a large table…
That much more robust Dual Active sync infrastructure was the starting point for the concepts of isolating work groups into “instances”, since several sets of work may overlap concurrently. It also contained the environment setup, compiling infrastructure, restartability, compression, validation, catalog metadata and data/system state – all supersets of our BAR requirements.
The benefits to eBay to build this infrastructure were substantial. Our operations groups no longer had to fix tape problems off of the 8/5/5 shift schedule. Our backup success rate went from 50% per backup, to 100% per quarter. Our restore failure rate went from 20% failures to 0% because we were no longer using Tape.
But most importantly, our backup schedule went from taking all day on Saturday without users to several hours weekday evenings – with our users being blissfully unaware we’re even doing backups. We even dispensed with our offsite backup contracts because our sync and backup systems were integrated – and those images are 1000 miles apart.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.