alternative to memcached that can persist to disk

JavaCachingPersistenceMemcachedDistributed

Java Problem Overview


I am currently using memcached with my java app, and overall it's working great.

The features of memcached that are most important to me are:

  • it's fast, since reads and writes are in-memory and don't touch the disk
  • it's just a key/value store (since that's all my app needs)
  • it's distributed
  • it uses memory efficiently by having each object live on exactly one server
  • it doesn't assume that the objects are from a database (since my objects are not database objects)

However, there is one thing that I'd like to do that memcached can't do. I want to periodically (perhaps once per day) save the cache contents to disk. And I want to be able to restore the cache from the saved disk image.

The disk save does not need to be very complex. If a new key/value is added while the save is taking place, I don't care if it's included in the save or not. And if an existing key/value is modified while the save is taking place, the saved value should be either the old value or the new value, but I don't care which one.

Can anyone recommend another caching solution (either free or commercial) that has all (or a significant percentage) of the memcached features that are important to me, and also allows the ability to save and restore the entire cache from disk?

Java Solutions


Solution 1 - Java

I have never tried it, but what about redis ?
Its homepage says (quoting) :

> Redis is a key-value database. It is > similar to memcached but the dataset > is not volatile, and values can be > strings, exactly like in memcached, > but also lists and sets with atomic > operations to push/pop elements. > > In order to be very fast but at the > same time persistent the whole dataset > is taken in memory and from time to > time and/or when a number of changes > to the dataset are performed it is > written asynchronously on disk. You > may lost the last few queries that is > acceptable in many applications but it > is as fast as an in memory DB (Redis > supports non-blocking master-slave > replication in order to solve this > problem by redundancy).

It seems to answer some points you talked about, so maybe it might be helpful, in your case?

If you try it, I'm pretty interested in what you find out, btw ;-)


As a side note : if you need to write all this to disk, maybe a cache system is not really what you need... after all, if you are using memcached as a cache, you should be able to re-populate it on-demand, whenever it is necessary -- still, I admit, there might be some performance problems if you whole memcached cluster falls at once...

So, maybe some "more" key/value store oriented software could help? Something like CouchDB, for instance?
It will probably not be as fast as memcached, as data is not store in RAM, but on disk, though...

Solution 2 - Java

Maybe your problem is like mine: I have only a few machines for memcached, but with lots of memory. Even if one of them fails or needs to be rebooted, it seriously affects the performance of the system. According to the original memcached philosophy I should add a lot more machines with less memory for each, but that's not cost-efficient and not exactly "green IT" ;)

For our solution, we built an interface layer for the Cache system so that the providers to the underlying cache systems can be nested, like you can do with streams, and wrote a cache provider for memcached as well as our own very simple Key-Value-2-disk storage provider. Then we define a weight for cache items that represent how costly it is to rebuild an item if it cannot be retrieved from cache. The nested Disk cache is only used for items with a weight above a certain threshold, maybe around 10% of all items.

When storing an object in the cache, we won't lose time as saving to one or both caches is queued for asynchronous execution anyway. So writing to the disk cache doesn't need to be fast. Same for reads: First we go for memcached, and only if it's not there and it is a "costly" object, then we check the disk cache (which is by magnitudes slower than memcached, but still so much better then recalculating 30 GB of data after a single machine went down).

This way we get the best from both worlds, without replacing memcached by anything new.

Solution 3 - Java

EhCache has a "disk persistent" mode which dumps the cache contents to disk on shutdown, and will reinstate the data when started back up again. As for your other requirements, when running in distributed mode it replicates the data across all nodes, rather than storing them on just one. other than that, it should fit your needs nicely. It's also still under active development, which many other java caching frameworks are not.

Solution 4 - Java

Try go-memcached - memcache server written in Go. It persists cached data to disk out of the box. Go-memcached is compatible with memcache clients. It has the following features missing in the original memcached:

  • Cached data survive server crashes and/or restarts.
  • Cache size may exceed available RAM size by multiple orders of magnitude.
  • There is no 250 byte limit on key size.
  • There is no 1Mb limit on value size. Value size is actually limited by 2Gb.
  • It is faster than the original memcached. It also uses less CPU when serving incoming requests.

Here are performance numbers obtained via go-memcached-bench:

-----------------------------------------------------
|            |  go-memcached   | original memcached |
|            |      v1         |      v1.4.13       |
| workerMode ----------------------------------------
|            | Kqps | cpu time |  Kqps  | cpu time  |
|----------------------------------------------------
| GetMiss    | 648  |    17    |  468   |   33      |
| GetHit     | 195  |    16    |  180   |   17      |
| Set        | 204  |    14    |  182   |   25      |
| GetSetRand | 164  |    16    |  157   |   20      |
-----------------------------------------------------

Statically linked binaries for go-memcached and go-memcached-bench are available at downloads page.

Solution 5 - Java

Take a look at the Apache Java Caching System (JCS)

> JCS is a distributed caching system > written in java. It is intended to > speed up applications by providing a > means to manage cached data of various > dynamic natures. Like any caching > system, JCS is most useful for high > read, low put applications. Latency > times drop sharply and bottlenecks > move away from the database in an > effectively cached system. Learn how > to start using JCS. > > The JCS goes beyond simply caching > objects in memory. It provides > numerous additional features: > > * Memory management > * Disk overflow (and defragmentation) > * Thread pool controls > * Element grouping > * Minimal dependencies > * Quick nested categorical removal > * Data expiration (idle time and max life) > * Extensible framework > * Fully configurable runtime parameters > * Region data separation and configuration > * Fine grained element configuration options > * Remote synchronization > * Remote store recovery > * Non-blocking "zombie" (balking facade) pattern > * Lateral distribution of elements via HTTP, TCP, or UDP > * UDP Discovery of other caches > * Element event handling > * Remote server chaining (or clustering) and failover > * Custom event logging hooks > * Custom event queue injection > * Custom object serializer injection > * Key pattern matching retrieval > * Network efficient multi-key retrieval

Solution 6 - Java

I think membase is what you want.

Solution 7 - Java

In my experience, it is best to write an intermediate layer between the application and the backend storage. This way you can pair up memcached instances and for example sharedanced (basically same key-value store, but disk based). Most basic way to do this is, always read from memcached and fail-back to sharedanced and always write to sharedanced and memcached.

You can scale writes by sharding between multiple sharedance instances. You can scale reads N-fold by using a solution like repcached (replicated memcached).

If this is not trivial for you, you can still use sharedanced as a basic replacement for memcached. It is fast, most of the filesystem calls are eventually cached - using memcached in combination with sharedance only avoids reading from sharedanced until some data expires in memcache. A restart of the memcached servers would cause all clients to read from the sharedance instance atleast once - not really a problem, unless you have extremely high concurrency for the same keys and clients contend for the same key.

There are certain issues if you are dealing with a severely high traffic environment, one is the choice of filesystem (reiserfs performs 5-10x better than ext3 because of some internal caching of the fs tree), it does not have udp support (TCP keepalive is quite an overhead if you use sharedance only, memcached has udp thanks to the facebook team) and scaling is usually done on your aplication (by sharding data across multiple instances of sharedance servers).

If you can leverage these factors, then this might be a good solution for you. In our current setup, a single sharedanced/memcache server can scale up to about 10 million pageviews a day, but this is aplication dependant. We don't use caching for everything (like facebook), so results may vary when it comes to your aplication.

And now, a good 2 years later, Membase is a great product for this. Or Redis, if you need additional functionality like Hashes, Lists, etc.

Solution 8 - Java

Have you looked at BerkeleyDB?

  • Fast, embedded, in-process data management.
  • Key/value store, non-relational.
  • Persistent storage.
  • Free, open-source.

However, it fails to meet one of your criteria:

  • BDB supports distributed replication, but the data is not partitioned. Each node stores the full data set.

Solution 9 - Java

What about Terracotta?

Solution 10 - Java

Oracle NoSQL is based on BerkeleyDB (the solution that Bill Karwin pointed to), but adds sharding (partitioning of the data set) and elastic scale-out. See: http://www.oracle.com/technetwork/products/nosqldb/overview/index.html

I think it meets all of the requirements of the original question.

For the sake of full disclosure, I work at Oracle (but not on the Oracle NoSQL product). The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.

Solution 11 - Java

memcached can be substituted by Couchbase - this is an open source and commercial continuation of this product line. It has data to disk persistence (very efficient and configurable). Also original authors of memcached have been working on Couchbase and its compatible with memcached protocol - so you don't need to change your client application code! Its very performing product and comes with 24/7 clustering and Cross Datacenter Replication (XDCR) built in. See technical paper.

Solution 12 - Java

You could use Tarantool (http://tarantool.org). It is an in-memory database with persistence, master-master replication and scriptable key expiration rules - https://github.com/tarantool/expirationd

Solution 13 - Java

We are using OSCache. I think it meets almost all your needs except periodically saving cache to the disk, but you should be able to create 2 cache managers (one memory based and one hdd based) and periodically run java cronjob that goes through all in-memory cache key/value pairs and puts them into hdd cache. What's nice about OSCache is that it is very easy to use.

Solution 14 - Java

You can use GigaSpaces XAP which is a mature commercial product which answers your requirements and more. It is the fastest distributed in-memory data grid (cache++), it is fully distributed, and supports multiple styles of persistence methods.

Guy Nirpaz, GigaSpaces

Solution 15 - Java

Just to complete this list - I just found couchbase. However I haven't tested it yet.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMike WView Question on Stackoverflow
Solution 1 - JavaPascal MARTINView Answer on Stackoverflow
Solution 2 - JavarealMarkusSchmidtView Answer on Stackoverflow
Solution 3 - JavaskaffmanView Answer on Stackoverflow
Solution 4 - JavavalyalaView Answer on Stackoverflow
Solution 5 - JavaMads HansenView Answer on Stackoverflow
Solution 6 - JavaBenjamin NitlehooView Answer on Stackoverflow
Solution 7 - JavaTit PetricView Answer on Stackoverflow
Solution 8 - JavaBill KarwinView Answer on Stackoverflow
Solution 9 - JavaArtyom SokolovView Answer on Stackoverflow
Solution 10 - JavacpurdyView Answer on Stackoverflow
Solution 11 - Javauser1697575View Answer on Stackoverflow
Solution 12 - Javauser3666759View Answer on Stackoverflow
Solution 13 - JavasergView Answer on Stackoverflow
Solution 14 - JavagnirpazView Answer on Stackoverflow
Solution 15 - JavarudiView Answer on Stackoverflow