SQL Server Distributed Replay Walkthrough
Distributed Replay is a new feature, available as of SQL Server 2012, which allows the means to capture a production workload and replay it against a target database to provide a repeatable means to test a workload’s performance during system changes.
Setup
Distributed Replay requires two services that are available in the SQL Server installation procedure: the Distributed Replay Controller and the Distributed Replay Client. Although a Controller can manage up to sixteen Clients (Enterprise edition), only one of each may be installed on a single server.
For this proof-of-concept, we will use a single replay machine with one Controller and one Client for simplicity. This configuration is not generally recommended in a live environment for performance reasons.
Prior to starting the installation of the services, it is advisable to pre-create the directories required for the installation. The Client will require a working directory and a result directory. By default, these are placed in C:\Program Files (x86)\Microsoft SQL Server\DReplayClient. Neither are usually automatically created by the installer.
If you prefer to use domain users as the service accounts for the Controller and Client, those accounts should also be pre-created in Active Directory. For security reasons, these should be separate accounts, but no special permissions are required. Per best practices, neither account should be an administrator (either on the domain or on an individual server). Virtual service accounts (NT SERVICE\servicename) may also be used successfully, especially for a simple installation like this proof-of-concept, but keep in mind that the local system account (domainname\servername$) would then be used when configuring inter-server permissions.
Both services are installed from the standard SQL Server installer media. Run Setup, then add features to an existing installation of SQL Server. On the Feature Selection page, select the Distributed Replay Controller and Distributed Replay Client features. If SQL Server Management Studio is not already installed on this machine, also select SQL Server Management Studio (Basic) – the dreplay.exe utility is only installed with SSMS.
On the Server Configuration page, enter the credentials for the service. Domain accounts or virtual service accounts are recommended. Local accounts may be used if the Controller, Client, and Target are all on the same server. The service startup mode should be kept at Manual.
On the Distributed Replay Controller page, select the accounts that will be allowed to access the Controller. This must include the Client service account.
On the Distributed Replay Client page, enter the name of the server hosting the Controller and the paths to the working directory and result directory. If the directories were not pre-created, they will need to be created now in order to continue installation. Both the Controller and Client service accounts should have access to these directories.
Click through the rest of the screens to complete the installation.
If the Controller and Client are on separate machines, verify that the Windows Firewall has been configured to allow communication between the services. This procedure is not covered here since it is not necessary for the proof-of-concept, but can be found at this URL: http://blogs.msdn.com/b/mspfe/archive/2012/11/08/using-distributed-replay-to-load-test-your-sql-server-part-1.aspx.
Start the services, either using the Services Control Panel (services.msc) or using NET START commands at a command prompt. The Controller service must be started first.
Note: During the configuration and usage of the Distributed Replay features, it is advised to create a simple batch or PowerShell file to stop and start the two core services, as the configuration settings are only fetched on service startup. The core services can be restarted quickly with a batch / PoSH file containing the following entries. Administrative privileges are required to execute this script.
NET STOP “SQL Server Distributed Replay Controller” NET START “SQL Server Distributed Replay Controller” NET STOP “SQL Server Distributed Replay Client” NET START “SQL Server Distributed Replay Client” |
After each service has been started, verify its status in its log. The Controller log can be found at C:\Program Files (x86)\Microsoft SQL Server\120\Tools\DReplayController\log. The last line should read “Dump support is ready” if the service started successfully.
The Client log can be found at C:\Program Files (x86)\Microsoft SQL Server\120\Tools\DReplayClient\log. If it starts successfully, the last line should read “Registered with controller ‘controllerName’”.
However, security errors are common when configuring the Client, in which case the last line will read “Failed to connect controller with error 0x80070005”.
The most common reason for failure is that the Client service account was not added properly to the Distributed Replay Controller page in the installer. If an account needs to be granted access to the Controller after installation has completed, it must be done manually through the DCOM permissions in the Component Services MMC snap-in (dcomcnfg.msc). The account must also be added to the Distributed COM Users security group. The procedure and permissions can be found at this URL: https://www.sqlshack.com/how-to-install-configure-and-use-sql-server-distributed-replay/.
Occasionally, the client might have trouble connecting to the controller. The DReplayClient config file can be altered to point directly to the server with the controller running. The following example uses the ‘.’ to point the client to the localhost for the controller registration.
Creating the Workload
Any trace file can be used to run a distributed replay, as long as it contains the events and columns defined by the TSQL Replay template from SQL Server Profiler. (Additional events and columns will discarded by the preprocessing step.) Rollover files are also acceptable.
The Heraflux ServerSideTrace scripts have been modified to capture all events required for a distributed replay, and are included with this document. Load them into a DBA utilities database. Use ServerSideTrace_Create to start a trace, and specify the location of the trace files and the duration of the trace. The trace will automatically end at the specified time.
Once the trace has been started, the workload can be run on the instance being traced. The script replay poc.config.sql contains a basic workload that creates and modifies data in a table to demonstrate that the target is writable. Jonathan Kehayias from SQLskills also created a PowerShell script that runs random SELECT statements against AdventureWorks2008R2. Jonathan’s scripts are included in the proof-of-concept package. The article describing how to use them is at the following URL: https://www.sqlskills.com/blogs/jonathan/the-adventureworks2008r2-books-online-random-workload-generator/. If multiple instances of Jonathan’s generator are run, the workload will be split among multiple Clients (if available).
Keep in mind that the Target database should be restored to a state similar to the starting point of the workload if accurate results are to be derived from the replay.
The freely available HammerDB benchmark utility can also be used to drive a synthetic workload against this database for testing.
Once a sufficient amount of workload has been generated for the test, disable then stop the trace using the ServerSideTrace scripts. Copy (or move) the trace file to a location accessible to the Controller server.
Preprocessing
Preprocessing reformats the commands from the trace file into a format usable by Distributed Replay, and is required to use the Distributed Replay feature. If multiple Clients are specified, the workload will be split among the Clients by SPID or connection (depending on the configuration).
Navigate to the Binn directory (by default, C:\Program Files (x86)\Microsoft SQL Server\120\Tools\Binn). Review the configuration in DReplay.Exe.Preprocess.config and update options as necessary.
(If dreplay.exe and associated files are not installed on your machine, download and install SSMS, as these files were removed as part of the SQL Server Management Studio split from the SQL Server installer in 2016.)
If any changes are made, restart both the Controller and Client services.
Open a command prompt and navigate to the Binn directory. Run the dreplay command as follows:
dreplay preprocess -i traceFilename -d replayDirectory
If the trace used rollover files, only specify the first filename; all subsequent rollover files will be included as long as they are contained in the same directory. replayDirectory is a directory created to hold the resulting working files. It is not the same directory created for the Client during installation. Files in this directory will be overwritten without prompting.
As before, a batch or PowerShell script can be created to streamline this process.
The results should show the trace files being preprocessed and stored in the target directory.
Executing the Replay
During the replay itself, the Controller will command all Clients to execute the commands in the files created during preprocessing. There are two modes for the replay: Synchronization and Stress. In Stress mode, timing between commands is controlled by options in the configuration file and order among separate SPIDs/connections is not guaranteed. In Synchronization mode, Distributed Replay takes more care to ensure that the commands are executed in the same order as they originally executed. Stress mode is the default.
IMPORTANT: In Stress mode, no guarantee is made as to the order of the commands executed between SPIDs/connections. Data modifications may have unpredictable results.
As in Preprocessing, start by reviewing the configuration file, DReplay.Exe.Replay.config, to ensure the options are appropriate for this test. If any changes are made to the settings, restart each Client.
To increase the intensity of the tests, change the ConnectTimeScale and ThinkTimeScale to a lower number. Valid numbers are between 1 and 100, with 1 executing the tests without delay between tests.
Open a command prompt, navigate to the Binn directory, and issue the following command to start the replay:
dreplay replay -s targetInstance -d replayDir -w listOfDReplayClients
For the replay, dreplay has a number of additional options which may be useful:
- Add the -o switch to save traces from each Client’s replay for later comparison to a directory specified in the config file.
- Add -f to change the interval at which dreplay reports its status to the console.
- Add -m to run the replay using a Controller on a remote machine.
- Add -s for the target instance name
Note: Do not use FQDN names for the servers involved. Use the server name only. Failure to do so will throw a message stating that the Distributed Replay library DLLs have not been successfully registered, which is erroneous and misleading.
As before, this process can be scripted.
The syntax of each switch can be found on Books Online.
Ensure that the Client service account has the necessary permissions to execute the replayed commands on the Target before beginning the replay.
The output should result in the test workload being replayed and an end runtime for the test being displayed.
The resultant runtime also is logged in the client log file.
The test runtime is the difference in the start and end times recorded in the log file.
Notes & Best Practices
- The Target database must have the same name as the original.
- Distributed Replay can execute commands across an entire instance, but all database names must match the original.
- Distributed Replay can only preprocess one trace file at this time. If multiple traces are required (for example, to test consolidation), either Profiler or RML Utilities must be used.
- Enterprise Edition is required to use more than a single Client per Controller.
- If multiple Clients are used, install the Controller on a separate server for performance.
- For best performance, do not install either the Controller or a Client on the server containing the Target instance.
- To maintain security, follow the following guidelines:
- Use the minimum necessary permissions for service accounts.
- The client, controller, and target instance should be accessible via FQDN, not just by IP address.
- Leveraging UNC paths for the locations of the trace files is not supported.
- The Controller service account should not be a Windows domain administrator, a Windows local administrator, or a member of the SQL Server sysadmin role.
- The Client service account should not be a Windows domain administrator or a Windows local administrator. If the trace file will affect the entire instance, it may be necessary to make the Client service account a member of sysadmin, but that should be avoided if possible. (NOTE: This is contrary to Microsoft’s recommendation. Books Online recommends that the Client service account be made a sysadmin on the Target.)
- Lock down the directories containing the trace files, the intermediate replay files, and the Client working directories to only those accounts necessary to the operation of the replay. Personally-identifiable and/or confidential information may be contained in the trace and/or replay files in clear text. Delete the trace and replay files as soon as practical after the replay is finished.
- Use SSL and/or VPN connections between the Controller, Clients, and Target to secure the network layer.