Tag Archives: Distributed Cache

Changing the Distributed Cache Service Account

So you want to follow the security by least privileges best practice for your SharePoint 2013 farm and decide to create a dedicated service account for distributed cache. You head on over to TechNet and check out Manage the Distributed Cache service in SharePoint Server 2013: Change the service account where you find the following script:

$farm = Get-SPFarm
$cacheService = $farm.Services | where {$_.Name -eq "AppFabricCachingService"}
$accnt = Get-SPManagedAccount -Identity domain_name\user_name
$cacheService.ProcessIdentity.CurrentIdentityType = "SpecificUser"
$cacheService.ProcessIdentity.ManagedAccount = $accnt

Provided you’ve already added your dedicated service account as a Managed Account, the script works. The trouble is the documentation is missing one important piece of information: the service account needs to be a local machine administrator on all the cache hosts before running the Deploy() method (the last line).

If the account is not a local machine administrator, you’ll get this exception after waiting a number of minutes:

Exception calling "Deploy" with "0" argument(s): "Error occurred while performing the operation on host
CACHEHOST:22233 : ErrorCode<ERRCAdmin003>:SubStatus<ES0001>:Time-out occurred on
At line:1 char:1
+ $cacheservice.ProcessIdentity.deploy()
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : CmdletInvocationException

What happens is the AppFabricCachingService Windows service gets stuck on starting because the service account doesn’t have the necessary rights on the server to set up the service for the first time. Grant it local admin and Deploy() goes off smoothly.

Remember to remove the local admin rights for the service account and restart the server after distributed cache is running. After all you’re following least privileges and the last thing you want is a service account running around as a local administrator.

Note as well when you first set up the farm distributed cache uses the farm service account which too needs to be a local admin for the same reason (the AppFabricCachingService won’t start otherwise).

One last reminder: if you spin up a new server or want to turn on distributed cache on another server in the farm you’ll need to first grant the current distributed cache service account local admin rights on the new server otherwise you’ll encounter the same issue.

Share Button

Distributed Cache Needs Ping

After setting up a number of SharePoint 2013 farms in different environments I discovered that to correctly set up the Distributed Cache service you require allowing ICMPv4 (ping) traffic between the cache hosts. This requirement is partially documented at the bottom of a TechNet page.

Check out the full story in the Habanero Insight, Distributed Cache Needs Ping

Share Button

Distributed Cache bug in SharePoint Server 2013

Distributed Cache is a new component of SharePoint 2013 that is used to cache data for activity feeds, news feeds, search queries, authentication tokens, security trimming, Apps-related data and views. Even though it’s making it’s debut, it’s a pretty critical component to the functionality of a SharePoint farm.

The Distributed Cache service uses Windows AppFabric caching technology behind the scenes.

The cache can consume a lot of memory and needs to be constantly accessing the stored data so for best performance, Microsoft recommends including dedicated Distributed Cache servers in your farm. In large server farms this makes a lot of sense, though for smaller farms you can usually make due without the dedicated servers.

On a recent project, I ran into an issue with Distributed Cache — requests for items in the cache kept timing out which caused delays to other components that were relying on the data from the cache. It wasn’t occasional requests either, there were hundreds of timeouts every second. Something was up with the service.

Tracing through the logs, we saw that when a user accesses a page, SharePoint attempts to authorize the user to ensure they have access. SharePoint stores the user’s token in the user’s browser session and in the DistributedCacheLogonTokenCache container. When SharePoint tried to retrieve the token from Distributed Cache, the connection would time out or a connection would be unavailable and the comparison would fail. Since it couldn’t validate the presented token SharePoint had no choice but to log the user out and redirect them to the sign in page.

One of the interesting things about this issue was when I consulted the MSDN about the timeout values, the documentation didn’t provide the units for the values. I had no idea if the timeouts were in milliseconds or seconds.

What are the units for the ChannelOpenTimeOut and RequestTimeout? The ChannelInitializationTimeout is much larger at 60000, so maybe it’s milliseconds. Are RequestTimeout and ChannelOpenTimeout then 20 milliseconds? That seems really small. Maybe it’s 20 seconds? The MSDN page for RequestTimeout doesn’t provide an answer so we initially had to guess. In our development environments we were able to reproduce the issue when we reduced the time outs to a value of “5”. So we tried increasing them to 40 in the test environments. Then 60. Then 120. The issue persisted.

With the help of Microsoft Support I sorted out these initial questions but the issue continued even after increasing the timeouts to larger values. Microsoft called in help from their development support team and with some additional logging determined the issue was actually caused by the way AppFabric handles garbage collection. AppFabric 1.1 Cumulative Update 1 is a prerequisite for SharePoint 2013 and in this version garbage collection “takes too long.”

In AppFabric 1.1 CU1, imagine that the garbage collection happens with a little man who walks around the memory of the computer with one of those sticks with a nail on the end. When the man finds things lying around that AppFabric no longer needs he stabs the garbage with the nail-stick and takes it away. He continues looking for other pieces to clean up and for a room that is 14 GB in size this can take quite some time. He tells AppFabric once he’s done, and then AppFabric un-pauses and continues where it left off. Since everything is waiting for our garbage collector to finish checking everything lying around, other dependent services will get tired of waiting and move on. Sometimes this results in having to perform the original operation again (like a search query), and sometimes it means there is no data available to the requesting service. Sometimes it will result in an exception, and sometimes, as in our case the user gets logged out of the site.

So Microsoft wrote a hotfix that changes the way garbage collection happens in AppFabric. Instead of telling everything to wait for our garbage collector and asking him to go find all of the trash, the hotfix now tells the garbage collector to walk around looking for trash to pick up forever. With our man on the ground always tidying up, AppFabric can now just request things without waiting.

As of this writing, the most recent AppFabric CU is AppFabric Cumulative Update 4. I recommend applying this update to your SharePoint 2013 farms if you’re experiencing lots of timeouts with calling Distributed Cache. Once applied you need to modify the Distributed Cache configuration file, which is typically found in C:\Program Files\AppFabric 1.1 for Windows Server\DistributedCacheService.exe.config. Add the following section within the Configuration element between the configurationSections and dataCacheConfig elements:

<appSettings><add key="backgroundGC" value="true"/></appSettings>

So you end up with something like this:

<?xml version="1.0" encoding="utf-8"?>
      ... other configurations ...
   <appSettings><add key="backgroundGC" value="true"/></appSettings>
   ... other configurations ...

(Update January 30, 2014: HT to Aben Samuel and Gavin Barron for discovering that the appSettings element neets to go between configSections and dataCacheConfig)

Share Button