How to deal with costly building operations using MemoryCache?

C#.NetMemorycache

C# Problem Overview


On an ASP.NET MVC project we have several instances of data that requires good amount of resources and time to build. We want to cache them.

MemoryCache provides certain level of thread-safety but not enough to avoid running multiple instances of building code in parallel. Here is an example:

var data = cache["key"];
if(data == null)
{
  data = buildDataUsingGoodAmountOfResources();
  cache["key"] = data;
}

As you can see on a busy website hundreds of threads could go inside the if statement simultaneously until the data is built and make the building operation even slower, unnecessarily consuming the server resources.

There is an atomic AddOrGetExisting implementation in MemoryCache but it incorrectly requires "value to set" instead of "code to retrieve the value to set" which I think renders the given method almost completely useless.

We have been using our own ad-hoc scaffolding around MemoryCache to get it right however it requires explicit locks. It's cumbersome to use per-entry lock objects and we usually get away by sharing lock objects which is far from ideal. That made me think that reasons to avoid such convention could be intentional.

So I have two questions:

  • Is it a better practice not to lock building code? (That could have been proven more responsive for one, I wonder)

  • What's the right way to achieve per-entry locking for MemoryCache for such a lock? The strong urge to use key string as the lock object is dismissed at ".NET locking 101".

C# Solutions


Solution 1 - C#

We solved this issue by combining Lazy<T> with AddOrGetExisting to avoid a need for a lock object completely. Here is a sample code (which uses infinite expiration):

public T GetFromCache<T>(string key, Func<T> valueFactory) 
{
    var newValue = new Lazy<T>(valueFactory);
    // the line belows returns existing item or adds the new value if it doesn't exist
    var value = (Lazy<T>)cache.AddOrGetExisting(key, newValue, MemoryCache.InfiniteExpiration);
    return (value ?? newValue).Value; // Lazy<T> handles the locking itself
}

That's not complete. There are gotchas like "exception caching" so you have to decide about what you want to do in case your valueFactory throws exception. One of the advantages, though, is the ability to cache null values too.

Solution 2 - C#

For the conditional add requirement, I always use ConcurrentDictionary, which has an overloaded GetOrAdd method which accepts a delegate to fire if the object needs to be built.

ConcurrentDictionary<string, object> _cache = new
  ConcurrenctDictionary<string, object>();

public void GetOrAdd(string key)
{
  return _cache.GetOrAdd(key, (k) => {
    //here 'k' is actually the same as 'key'
    return buildDataUsingGoodAmountOfResources();
  });
}

In reality I almost always use static concurrent dictionaries. I used to have 'normal' dictionaries protected by a ReaderWriterLockSlim instance, but as soon as I switched to .Net 4 (it's only available from that onwards) I started converting any of those that I came across.

ConcurrentDictionary's performance is admirable to say the least :)

Update Naive implementation with expiration semantics based on age only. Also should ensure that individual items are only created once - as per @usr's suggestion. Update again - as @usr has suggested - simply using a Lazy<T> would be a lot simpler - you can just forward the creation delegate to that when adding it to the concurrent dictionary. I'be changed the code, as actually my dictionary of locks wouldn't have worked anyway. But I really should have thought of that myself (past midnight here in the UK though and I'm beat. Any sympathy? No of course not. Being a developer, I have enough caffeine coursing through my veins to wake the dead).

I do recommend implementing the IRegisteredObject interface with this, though, and then registering it with the HostingEnvironment.RegisterObject method - doing that would provide a cleaner way to shut down the poller thread when the application pool shuts-down/recycles.

public class ConcurrentCache : IDisposable
{
  private readonly ConcurrentDictionary<string, Tuple<DateTime?, Lazy<object>>> _cache = 
    new ConcurrentDictionary<string, Tuple<DateTime?, Lazy<object>>>();

  private readonly Thread ExpireThread = new Thread(ExpireMonitor);

  public ConcurrentCache(){
    ExpireThread.Start();
  }

  public void Dispose()
  {
    //yeah, nasty, but this is a 'naive' implementation :)
    ExpireThread.Abort();
  }

  public void ExpireMonitor()
  {
    while(true)
    {
      Thread.Sleep(1000);
      DateTime expireTime = DateTime.Now;
      var toExpire = _cache.Where(kvp => kvp.First != null &&
        kvp.Item1.Value < expireTime).Select(kvp => kvp.Key).ToArray();
      Tuple<string, Lazy<object>> removed;
      object removedLock;
      foreach(var key in toExpire)
      {
        _cache.TryRemove(key, out removed);
      }
    }
  }

  public object CacheOrAdd(string key, Func<string, object> factory, 
    TimeSpan? expiry)
  {
    return _cache.GetOrAdd(key, (k) => { 
      //get or create a new object instance to use 
      //as the lock for the user code
        //here 'k' is actually the same as 'key' 
        return Tuple.Create(
          expiry.HasValue ? DateTime.Now + expiry.Value : (DateTime?)null,
          new Lazy<object>(() => factory(k)));
    }).Item2.Value; 
  }
}

Solution 3 - C#

Taking the top answer into C# 7, here's my implementation that allows storage from any source type T to any return type TResult.

/// <summary>
/// Creates a GetOrRefreshCache function with encapsulated MemoryCache.
/// </summary>
/// <typeparam name="T">The type of inbound objects to cache.</typeparam>
/// <typeparam name="TResult">How the objects will be serialized to cache and returned.</typeparam>
/// <param name="cacheName">The name of the cache.</param>
/// <param name="valueFactory">The factory for storing values.</param>
/// <param name="keyFactory">An optional factory to choose cache keys.</param>
/// <returns>A function to get or refresh from cache.</returns>
public static Func<T, TResult> GetOrRefreshCacheFactory<T, TResult>(string cacheName, Func<T, TResult> valueFactory, Func<T, string> keyFactory = null) {
    var getKey = keyFactory ?? (obj => obj.GetHashCode().ToString());
    var cache = new MemoryCache(cacheName);
    // Thread-safe lazy cache
    TResult getOrRefreshCache(T obj) {
        var key = getKey(obj);
        var newValue = new Lazy<TResult>(() => valueFactory(obj));
        var value = (Lazy<TResult>) cache.AddOrGetExisting(key, newValue, ObjectCache.InfiniteAbsoluteExpiration);
        return (value ?? newValue).Value;
    }
    return getOrRefreshCache;
}
Usage
/// <summary>
/// Get a JSON object from cache or serialize it if it doesn't exist yet.
/// </summary>
private static readonly Func<object, string> GetJson =
    GetOrRefreshCacheFactory<object, string>("json-cache", JsonConvert.SerializeObject);


var json = GetJson(new { foo = "bar", yes = true });

Solution 4 - C#

Sedat's solution of combining Lazy with AddOrGetExisting is inspiring. I must point out that this solution has a performance issue, which seems very important for a solution for caching.

If you look at the code of AddOrGetExisting(), you will find that AddOrGetExisting() is not a lock-free method. Comparing to the lock-free Get() method, it wastes the one of the advantage of MemoryCache.

I would like to recommend to follow solution, using Get() first and then use AddOrGetExisting() to avoid creating object multiple times.

public T GetFromCache<T>(string key, Func<T> valueFactory) 
{
    T value = (T)cache.Get(key);
    if (value != null)
    {
        return value;
    }

    var newValue = new Lazy<T>(valueFactory);
    // the line belows returns existing item or adds the new value if it doesn't exist
    var oldValue = (Lazy<T>)cache.AddOrGetExisting(key, newValue, MemoryCache.InfiniteExpiration);
    return (oldValue ?? newValue).Value; // Lazy<T> handles the locking itself
}

Solution 5 - C#

Here is a design that follows what you seem to have in mind. The first lock only happens for a short time. The final call to data.Value also locks (underneath), but clients will only block if two of them are requesting the same item at the same time.

public DataType GetData()
{      
  lock(_privateLockingField)
  {
    Lazy<DataType> data = cache["key"] as Lazy<DataType>;
    if(data == null)
    {
      data = new Lazy<DataType>(() => buildDataUsingGoodAmountOfResources();
      cache["key"] = data;
    }
  }

  return data.Value;
}

Solution 6 - C#

Here is simple solution as MemoryCache extension method.

 public static class MemoryCacheExtensions
 {
     public static T LazyAddOrGetExitingItem<T>(this MemoryCache memoryCache, string key, Func<T> getItemFunc, DateTimeOffset absoluteExpiration)
     {
         var item = new Lazy<T>(
             () => getItemFunc(),
             LazyThreadSafetyMode.PublicationOnly // Do not cache lazy exceptions
         );
 
         var cachedValue = memoryCache.AddOrGetExisting(key, item, absoluteExpiration) as Lazy<T>;
 
         return (cachedValue != null) ? cachedValue.Value : item.Value;
     }
 }

And test for it as usage description.

[TestMethod]
[TestCategory("MemoryCacheExtensionsTests"), TestCategory("UnitTests")]
public void MemoryCacheExtensions_LazyAddOrGetExitingItem_Test()
{
    const int expectedValue = 42;
    const int cacheRecordLifetimeInSeconds = 42;

    var key = "lazyMemoryCacheKey";
    var absoluteExpiration = DateTimeOffset.Now.AddSeconds(cacheRecordLifetimeInSeconds);

    var lazyMemoryCache = MemoryCache.Default;

    #region Cache warm up

    var actualValue = lazyMemoryCache.LazyAddOrGetExitingItem(key, () => expectedValue, absoluteExpiration);
    Assert.AreEqual(expectedValue, actualValue);

    #endregion

    #region Get value from cache

    actualValue = lazyMemoryCache.LazyAddOrGetExitingItem(key, () => expectedValue, absoluteExpiration);
    Assert.AreEqual(expectedValue, actualValue);

    #endregion
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSedat KapanogluView Question on Stackoverflow
Solution 1 - C#Sedat KapanogluView Answer on Stackoverflow
Solution 2 - C#Andras ZoltanView Answer on Stackoverflow
Solution 3 - C#cchamberlainView Answer on Stackoverflow
Solution 4 - C#Albert MaView Answer on Stackoverflow
Solution 5 - C#NeilView Answer on Stackoverflow
Solution 6 - C#OlegView Answer on Stackoverflow