Lagged Copies and Planning for it.
If you use multiple database copies and Single Item Recovery, only the extremely rare
catastrophic store logical corruption case remains unaddressed. In the following scenarios
lagged database copies can be used to recover data:
►Recovering a deleted item from within 14 days outside the retention period
►Recovering to a point in time because of virus outbreak
►You should deploy lagged copies to mitigate a specific risk and lagged copies are usually not
needed if you are also deploying a third-party backup solution.
When planning for lagged database copies, you should carefully consider the implications
this brings to your storage planning. Every lagged database needs sufficient disk space for
holding the database as well as the log files for the configured time.
For example, at an large Org 14 days of logs for one database result in about 60,000
log files or 60 GB of data. The log storage design for the lagged database copy needs to
accommodate this. In addition to the space requirements, consider the following criteria
when deciding the replay lag time:
►How long does it take you to identify a logical database corruption? This should
include non-working days such as weekends. So if you configure a replay lag time
of two days, you might not be able to identify the problem when it happens on
a weekend and you’re back on Monday.
►Consider the maximum time where a replay lag time makes sense. Fourteen days is the
maximum time possible, but do you really need the full 14 days? In most cases, 7 days
should be sufficient to identify a corruption and be able to recover using the lagged
database copy.
►Don’t underestimate the space requirements needed the longer the replay lag time is
defined. In the previous Microsoft example you needed to reserve 60 GB for 14 days;
thus 7 days would save you 30 GB per database of storage that you need to have
available.
►The duration of replaying the log files is also worth considering. You should plan a test
to replay all log files; this might take a considerable amount of time. Replaying 14 days
of logs might require several hours before the database is up to date.
Besides the replay lag time considerations and the storage design, you should plan the
following considerations carefully:
►How many lagged database copies do you need? Normally one lagged copy should
be sufficient, but maybe you want more copies because of your disaster-recovery
requirements.
If lagged database copies are a critical piece of your disaster-recovery
strategy, you will probably want to put them on a RAID system or have multiple
copies of them.
►Where should you store the lagged database copies—at a server at the same site or
offsite? This decision has a direct impact on the time you need to recover the lagged
database copy because you need to consider available bandwidth when storing them
offsite.
►On what Exchange server should you place the lagged database copies? You have the
option to place them on the same server where your active database copies are stored,
or you can use a single server just for all lagged database copies, such as a dedicated
public folder server.
►Lagged database copies always should be activation-disabled and have the highest
activation preference number available. This is required to prevent automatic
activation by mistake or resulting from a system failure.
Deploying Lagged Database Copies
You configure a lagged database copy using the EMS by following these steps:
1. Create a database copy to the target server where you want to store the lagged
database copy.
2. Configure the ReplayLagTime of the database. The following cmdlet configures
a lag time of 7 days to the database DAG01-mumbai-01 located on Mumbai-MB01:
Set-MailboxDatabaseCopy –id DAG01-Mumbai-01\Mumbai-MB01 –ReplayLagTime 7.0:0:0.
3. Block auto activation of this database to make sure it is not activated by mistake. You
use the following cmdlet to perform this task: Suspend-MailboxDatabaseCopy
<database\server> -ActivationOnly -Confirm:$false.
4. If you use a dedicated Exchange server that hosts all lagged database copies, you can
block automatic activation of databases also on the server level by using the following
cmdlet: Set-MailboxServer <mailbox server> –DatabaseCopyAutoActivationPolicy
Blocked.
When the lagged database copy is configured, you will see that the replay queue length of
the lagged database will increase.
To verify that all logged database copies are not automatically activated, use the
Get-MailboxDatabaseCopyStatus –Server <name> | ft Name, Act* cmdlet and make sure
that the ActivationSuspended property is set to true.
How to use lagged Dbs to recover Data:
Using a lagged database copy to get to a specific point in time is rather difficult because you
have to know the exact time frame in which something occurred. In addition, no tools are
available to tell you which log file contains exactly what database change. Thus you have to
estimate which log files need to be replayed so that you get the database to the point in time
that you require. You must simply guess when you grab the database and logs files and then
replay the logs manually before you can recover data from a recovery database.
Recovering a lagged database to a specific point in time is a manual process, so follow
these steps to receive the data you’re looking for:
1. Suspend replication to the lagged database copy by using the Suspend-
MailboxDatabaseCopy <database>\<server> cmdlet.
You should now decide whether you want to back up or copy the database and
log files to a different location so that you have them available if you don’t get to the
right point in time. You alternatively can create a VSS snapshot using the VSSAdmin
CREATE SHADOW /For=<Volume that includes database> command.
2. Use Explorer to delete or move all log files that are newer from the log file’s time
stamp than the time you decided to go back. For example, if you have 14 days of log
files available, and you want to replay the log files to get back 10 days, you only need
to commit those log files to the 14-days-old database, that are 10 days and older. In
order to achieve this, you need to delete or move all log files that have a time stamp
newer than 10 days, like day 9 or newer.
3. Delete the .chk file for the database and note its filename. It should normally be
something like E00.chk.
4. Run the Eseutil.exe /r E00 /a command but replace E00 with the filename of the .chk
file. Depending on the number of log files that need to replayed, this might take several
hours. A rule of thumb is that on normal 7.2K JBOD 3.5-inch disks, you can assume that
you’ll replay approximately 7.2 GB of transactional log files per hour. The exact value, of
course, depends on your local factors such as storage performance or CPU.
If you want to measure how long replaying the log files to the database takes,
you can use the tool JetStress 2010, which includes a Recovery Performance measure
option for this exact situation.
5. When Eseutil is finished, the database is in clean shutdown state. You can now decide
how to continue:
a. You can create a recovery database using this database, mount it, and recover thedata
b. You can replace the corrupt database files with the lagged database files and
mount the database.
As you can see, several steps are involved here and the process is time-consuming because
of the large number of logs that must be replayed. The process is not difficult, but is not
something you want to be doing on a daily/weekly basis because of the operational time
required. Lagged copies were not designed for the deleted item recovery case—they were
designed for the once-in-a-great-while scenario where multiple database copies within a DAG
combined with retention hold is not enough protection in a backup-less environment.