Best Practice Recommendations for Windows Hosting SQL Server on VMware
Did you know it's very important to ensure that your VMware environment is properly set when hosting SQL Server instances? Imagine you have a host server with specific specs. Now you need to create virtual machines on it and ensure each machine gets the proper amount of resources. At the same time, to avoid performance decreases, it's vital to make sure that different VMs are not competing with each other on resources in a way that may starve one of them. So what to do? VMware has created a very detailed best-practice document for us, specifically for SQL Server. You may find the latest one here. In case the link doesn't work for you, or you have a different version of VMware, you can search for the proper SQL Server best practices on the VMware site. Here are the main best practices VMware recommends, and the most important based on Pythian's experience (SQL Server on Windows):
BIOS (basic input / output system):
- Ensure latest versions of BIOS and Firmware are in place.
- Enable Turbo Boost and Hyper-Threading (HT) on the BIOS.
- Verify that all ESXi hosts have NUMA (Non-Uniform Memory Access) enabled in the system BIOS.
- Enable advanced CPU features.
- Set power management (or its vendor-specific equivalent label) to “OS controlled.”
- Disable all processor C-states (including the C1E halt state).
- Select the appropriate snoop mode to allow the processor to check the cache contents of both local and remote processors for data.
- Disable any unused devices (for example, serial ports).
- Set Power Option to "High Performance" on host and VM Operating Systems.
- NUMA and vNUMA architecture is strongly recommended for SQL Server environments, as SQL Server is NUMA-aware. Note that SQL Server Enterprise Edition is required to really benefit from NUMA topology. Study this topic, specifically for your vSphere version and SQL Server Editions and configure everything accordingly.
- The SQL Server Soft-NUMA feature was introduced to react on the growing number of cores per pCPU. Soft-NUMA aims to partition available CPU resources inside one NUMA node into so called “Soft-NUMA” nodes. The “Soft-NUMA” feature has no contradiction with vNUMA topology exposed to a VM, but might further optimize scalability and performance of the database engine for most of the workload. Starting with SQL Server 2014 SP2, the “Soft-NUMA” is enabled by default and does not require any further configurations.
- If performance is high priority, ensure the total number of vCPUs assigned to all the virtual machines is no more than the total number of physical cores of the host for the initial sizing.
- After enabling Hyper-Threading in the BIOS, allocate cores to machines in the following way (this example assumes a VM with four virtual CPUs): each virtual CPU will be mapped to a different physical core and not to two logical threads that are part of the same physical core. In the Hyper-Threading Sharing section on the Properties tab of a virtual machine, use the "Any" setting to allow the CPU scheduler the maximum scheduling opportunities.
- To reduce memory access latency, consider the following:
- For small SQL Server virtual machines, allocate virtual machine CPUs equal to or less than the number of cores in each physical NUMA node. When you do this, the guest operating system or SQL server does not need to take into account NUMA because ESXi makes sure memory accesses are as local as possible.
- For wide SQL Server virtual machines, where the number of allocated CPUs is larger than the number of cores in the NUMA node, ESXi will divide the CPU and memory of the VM into two virtual NUMA nodes and will place each vNUMA on a different physical NUMA node. The vNUMA topology will be exposed to the guest OS and SQL to take advantage of memory locality. By default, vSphere NUMA scheduling and related optimizations are enabled only on systems with a total of at least four CPU cores and with at least two CPU cores per NUMA node, setting the number of cores per socket to one when allocating virtual CPUs to VMs on the vSphere platform. That means that a VM that requires eight CPUs will have eight vCPUs with one core each.
- Disable CPU hot plug for virtual machines that require vNUMA and ensure this is the default configuration in VM templates for SQL Server.
- DO NOT use CPU affinity in production because it limits the hypervisor’s ability to efficiently schedule vCPUs on the physical server.
- Do not over-commit memory ( (Host Memory >= Sum of (VMs memory + overhead). VMs require a certain amount of overhead memory to power on, depending on their size. Please refer to the table in the referenced document for this.
- Identify the maximum amount of memory you can assign to a VM without crossing the pNUMA boundaries.
- Set the "Reserve all guest memory (All locked)" configuration with caution. Each option has advantages and disadvantages. Study them before making a decision.
- Ballooning: When designing SQL Server for performance, the goal is to eliminate any chance of paging from happening. Disable the ability for the hypervisor to reclaim memory from the guest OS by setting the memory reservation of the VM to the size of the provisioned memory. The recommendation is to leave the balloon driver installed for corner cases where it might be needed to prevent loss of service.
- Use memory hot plug only with vSphere 6 and later in cases where you cannot easily and accurately predict memory consumption patterns. It's always preferable to right size memory over hot plug.
- Consider using Persistent Memory (PMem) when working with vSphere version 6.7 and higher, Windows Server 2016 guest OS and SQL Server 2016 SP1 and above. This feature is capable of maintaining data in memory DIMM even after a power outage.
- Review all the physical and logical layout and configuration recommendations for the storage setup (Virtual Machine File System — VMFS — on a shared storage, Raw Device Mapping — RDM — volumes, SAN virtualization using vVols, etc.). The following Best Practices apply to all type of storage configurations when performance should be optimized to its best:
- Partition alignment on both physical machines and VMFS partitions prevents performance I/O degradation caused by unaligned I/O. vSphere 5.0 and later automatically aligns VMFS5 partitions along a one MB boundary and most modern OSs do the same. You will only require additional manual efforts in rare cases. Consult with the storage vendor for alignment recommendations on their hardware.
- Place SQL Server data (system and user), transaction log and backup files into separate VMDKs (if not using RDMs) and possibly on separate datastores.
- The SQL Server binaries should be installed on the OS VMDK. Even if you select another drive for binary installation, SQL Server will still install things on the OS drive so there is no real point in installing elsewhere. Separating SQL Server installation files from data and transaction logs also provides better flexibility for backup, management and troubleshooting.
- For the most critical databases where performance requirements supersede all other requirements, maintain 1:1 mapping between VMDKs and LUNs. This will provide better workload isolation and will prevent any chance for storage contention on the datastore level. Of course, the underlying physical disk configuration must accommodate the I/O and latency requirements as well. When manageability is a concern, group VMDKs and SQL Server files with similar I/O characteristics on common LUNs while making sure that the underling physical device can accommodate the aggregated I/O requirements of all the VMDKs.
- For underlying storage, where applicable, RAID 10 can provide the best performance and availability for user data, transaction log files and TempDB.
- Follow the recommended physical and software side setup by VMware and hardware providers.
- Consider separating network traffic on the virtual networks if you host different types of workloads that require separate network bandwidth.
- If using iSCSI, you should dedicate the network adapters to either network communication or iSCSI, but not both.
- VMware highly recommends considering enabling jumbo frames on the virtual switches where you have enabled vSphere vMotion traffic or iSCSI traffic. Make sure the configuration is end-to-end on the physical infrastructure. However, do NOT enable jumbo frames within a vSphere infrastructure unless you configure the underlying physical network devices to support this setting.
- Enable RSS (Receive Side Scaling): This network driver configuration within Windows Server enables distribution of the kernel-mode network processing load across multiple CPUs. For more information about RSS, see https://technet.microsoft.com/en-us/library/hh997036.aspx.
- Follow the guidelines on guest OS networking considerations and hardware networking considerations.
SQL Server-specific best practices:
- Follow the general Best Practices of SQL Server on the Operating System, SQL Server Instance and database levels.
- The most important ones are:
- Do not install Antivirus software on SQL Server environments but if this is a must, follow Microsoft's strict guidelines around what should be excluded from Online Scans.
- Ensure that the guest server is dedicated to SQL Server only (if possible) to deliver highest performance and stability.
- Set min and max SQL Server memory properly to ensure you leave enough RAM for the OS and other processes.
- Setting "Lock Pages in Memory" configuration on the OS level means that SQL Server will not release memory it already acquired no matter what. If you set the SQL Server "Lock Pages in Memory" user correctly, also set the VM’s reservations to match the amount of memory you set in the VM configuration. Used incorrectly and during times of memory over-commit pressure, SQL Server instability could occur as a result. You need to use this setting in conjunction with the max server memory setting to avoid SQL Server taking over all memory on the VM.
- Use Large Pages configuration with caution (SQL Server Enterprise Edition) and following the guidelines in the document. Monitor performance implications of this setup and reconsider if required.
- Set up the proper "Max Degree of Parallelism" and "Cost Threshold for Parallelism" values on the SQL Server instance based on your queries and workload. Monitor performance impact of those settings.
- By default, SQL Server engine will first clean erase the space to be added to newly created files or expanded parts of files by writing zeroes in it (zeroing). This operation requires additional time and disk IO. Enabling "Instant File Initialization" configuration for the SQL Server service account on the OS level will ensure that file creation and growth will not include the zeroing operation, hence improve performance of those operations.
- Additional settings and recommendations can be found in our blog: https://blog.pythian.com/sql-server-default-configurations-change/.
Versions and upgrades:
- Ensure you upgrade your VM software to the latest possible based on your hardware version, compatibility and capabilities.
Implement proper monitoring:
- Monitor the OS and SQL Server (rather than the VMware tools) to get real performance data of the SQL Server environment and base your decision around VM specs accordingly.
- Monitor the ballooned memory both on the host and VMs levels and reconfigure memory allocations if needed.
- Ensure your network is properly set and configured by monitoring network traffic with Windows PerfMon or the VMware tools for network monitoring.
- Adjust configurations based on the monitoring results if needed!