In part 1, I explained the virtual machine settings that are available to us regarding to memory, in part 2, I explained how the ESX kernel assigns memory to a VM and in this part I will dive into ESX memory reclaiming techniques.
The ESX kernel uses transparent page sharing, ballooning and swapping to reclaim memory. Ballooning and swapping are used only when the host is running out of machine memory or a VM limit is hit (see also Limits I discussed in part 1).
Transparent Page Sharing (TPS)
One great feature of ESX is that it supports memory overcommitment. This means that the aggregated size of the guests physical memory can be greater than the actual size of the physical machine memory. This is accomplished because assigned memory that is never accessed by the VM isn’t mapped to machine memory and by a feature called Transparent Page Sharing or simply TPS. TPS is a technique that makes it possible for VMs to share memory of identical physical guest pages, so only one copy is saved in the host’s machine memory.
The ESX kernel scans VM memory pages regularly and generates a hash value for every scanned page. This hash value is then compared to a global hash table which contains entries for all scanned pages. If a match is found, a full comparison of both pages is made to verify that the pages are identical. If the pages are identical, both physical pages (guest) are mapped to the same machine page (host) and the physical pages are marked “Read-Only”. Whenever a VM wants to write to this physical page, a private copy of the machine page is made and the PPN-to-MPN mapping is changed accordingly.
Remember: TPS is always on. You can however disable memory sharing of a particular VM by setting the advanced parameter “sched.mem.pshare.enable” to “False”.
To view information about TPS you can use the esxtop command from the COS (see Figure1). From the COS, issue the command “esxtop” en then press “m”to display the memory statistics page. On the top-right you see the overcommit level averages in 1-min, 5-min and 15-min. The value of 0.77 in Figure1 is a percentage and means 77% memory overcommit. There is 77% more memory allocated to virtual machines than machine memory available.
The counter to look for is PSHARE. The “shared” value is the total amount of guest physical memory that is being shared and the “common” value is the amount of machine memory actually used to share that amount of guest physical memory. The “saving” value is the amount of machine memory saved due to page sharing.
When the ESX host’s machine memory is scarce or when a VM hits a Limit, The kernel needs to reclaim memory and prefers ballooning over swapping. The balloon driver is installed inside the guest OS as part of the VMware Tools installation and is also known as the vmmemctl driver.
When the ESX kernel wants to reclaim memory, it instructs the balloon driver to inflate. The balloon driver then requests memory from the guest OS. When there is enough memory available, the guest OS will return memory from its “free” list. When there isn’t enough memory, the guest OS will have to use its own memory management techniques to decide which particular pages to reclaim and if necessary page them out to its swap- or page-file.
In the background, the ESX kernel frees up the machine memory page that corresponds to the physical machine memory page allocated to the balloon driver. When there is enough memory reclaimed, the balloon driver will deflate after some time returning physical memory pages to the guest OS again.
This process will also decrease the Host Memory Usage parameter (discussed inpart 2)
Ballooning is only effective it the guest has available space in its swap- or page-file, because used memory pages need to be swapped out in order to allocated the page to the balloon driver. Ballooning can lead to high guest memory swapping. This is guest OS swapping inside the VM and is not to be confused with ESX host swapping, which I will discuss later on.
To view balloon activity we use the esxtop uitility again from the COS (see Figure2). From the COS, issue the command “esxtop” en then press “m” to display the memory statistics page. Now press “f” and then “i” to show the vmmemctl (ballooning) columns.
On the top (see Figure2) we see the “MEMCTL” counter which shows us the overall ballooning activity. The “curr” and “target” values are the accumulated values of the “MCTLSZ” and “MCTLTGT” as described below. We have to look for the “MCTL” columns to view ballooning activity on a per VM basis:
- “MCTL?”: indicates if the balloon driver is active “Y” or not “N”
- “MCTLSZ”: the amount (in MB) of guest physical memory that is actually reclaimed by the balloon driver
- “MCTLTGT”: the amount (in MB) of guest physical memory that is going to be reclaimed (targetted memory). If this counter is greater than “MCTLSZ”, the balloon driver inflates causing more memory to be reclaimed. If “MCTLTGT” is less than “MCTLSZ”, then the balloon will deflate. This deflating process runs slowly unless the guest requests memory.
- “MCTLMAX”: the maximum amount of guest physical memory that the balloon driver can reclaim. Default is 65% of assigned memory.
You can limit the maximum balloon size by specifying the “sched.mem.maxmemctl” parameter in the .vmx file of the VM. This value must be in MB.
When ballooning isn’t possible (if the balloon driver isn’t installed for example) or insufficient, the ESX kernel falls back to swapping. Swapping is used by the ESX kernel as a last resort if other techniques fail to satisfy the memory demands. This swapping mechanism pages out machine memory pages which are in use by the VM to the VM’s swap file (.vswp file) on disk. As I explained in part 1, this swap file has the size of the VM’s limit minus its reservation. ESX kernel swapping is done without any guest involvement which could result in paging out active guest memory. Do not confuse ESX kernel swapping with VM guest OS swapping.
To view swap activity we use the esxtop uitility again from the COS (see Figure3). From the COS, issue the command “esxtop” en then press “m” to display the memory statistics page. Now press “f” and then “j” to show the swap columns.
On the top (see Figure3) we see the “SWAP” counter which shows us the overall swap activity. The “curr” and “target” values are the accumulated values of the “SWCUR” and “SWTGT” as described below. We have to look for the “SW” columns to view swap activity on a per VM basis:
- “SWCUR”: the current amount (in MB) of guest physical memory that is swapped out to the ESX kernel VM swap file.
- “SWTGT”: the current amount (in MB) of guest physical memory that is going to be swapped (targetted memory). If this counter is greater than “SWCUR”, the ESX kernel will start swapping and on the other hand if this counter is less than “SWCUR”, the ESX kernel will stop swapping.
- “SWR/s”: the rate at which memory is being swapped in from disk. A physical memory page only gets swapped in if accessed by the guest OS.
- “SWW/s”: the rate at which memory is being swapped out to disk.
ESX memory state:
There is one other parameter you should know about, and that is the ESX memory state. The ESX memory state affects what mechanisms are used to reclaim memory if necessary. In the “high” and “soft” states, ballooning is favored over swapping. In the “hard” and “low” states, swapping is favored over ballooning. To view this counter we use the esxtop uitility again from the COS (see Figure4). From the COS, issue the command “esxtop” en then press “m”to display the memory statistics page.
The “state” counter displays the ESX free memory state. Possible values are “high”, “soft”, “hard” and “low”. If the state is “high”, then there’s enough free machine memory available and there’s nothing to worry about. When the state is “soft”, the ESX kernel actively reclaims memory through ballooning and falls back to swapping only when ballooning is not possible. In the “hard” state, the ESX kernel relies on swapping to reclaim memory and in the “low” state the ESX kernel continues to use swapping to forcibly reclaim memory and additionally blocks the execution of VMs that are above there target allocations.
Idle memory tax
In part 1, I explained how the proportional share systems works and that the more memory you assign to a VM, the more shares it receives. This could mean that a VM which has a large amount of memory assigned can hoard a lot of idle memory while a VM that has less memory assigned is screaming for memory, but according to the share value is not entitled to use it. This is rather unfair and therefore ESX throws in a referee called idle memory tax.
The idle memory tax rate specifies the maximum amount of idle memory that can be reclaimed from a VM and defaults to 75%. Let me first quote a phrase from the Resource Management Guide: “The default tax rate is 75 percent, that is, an idle page costs as much as four active pages.” Well, that makes perfectly sense, right?……NOT. I wonder how many people have read this phrase without understanding it?
First let me state that the phrase is completely correct, but they forgot to mention the formula which explains why a 75% tax rate equals to a 4 times higher cost for an idle memory page. The idle page cost is defined as:
where τ = the idle memory tax rate.
If we fill in the default of 75%, we get the following result:
This explains why the phrase from the Resource Management Guide states that an idle memory tax of 75% results in a 4 times higher cost for an idle page.
To determine how to divide memory between VMs, a share-per-page ratio is calculated and idle pages are charged extra with this idle page cost. This results in a lower share-per-page ratio for VMs with lots of idle memory and the available memory is more fairly distributed amongst all VMs.
Best practice is to avoid VMs hoarding idle memory by allocating only the memory that the VM really needs, before thinking about changing the default idle memory tax value.