Space Efficiency of memcached and Kyoto Tycoon

ID: 17
creation date: 2010/11/14 00:26
modification date: 2010/11/14 00:26
owner: mikio

The "StashDB" was added by the latest versions of Kyoto Cabinet and Kyoto Tycoon. It improves space efficiency and now surpasses memcached in space efficiency.

image:1:1289662060-kc-mc-graph.png

What's StashDB

Kyoto Cabinet supports a local MapReduce utility, which uses the "TinyHashMap" class. Although it is a naive implementation of static hash table, all data on memory is serialized in a space-saving format so that memory usage is curbed a lot.

StashDB implements the same structure as TinyHashMap. It is the ninth DBM succeeding to ProtoHashDB, ProtoTreeDB, CacheDB, GrassDB, HashDB, TreeDB, DirDB, and ForestDB. Of cource, it implements Visitor interface, Cursor interface, transaction interface, and other all interfaces compatible with other DBM implementations. In addition to such functionality and the best time efficiency by hash table, the space efficiency is the second best of all DBMs. Although GrassDB of B+ tree has the best space efficiency of all, StashDB is the best in hash table implementations.

As the default database of Kyoto Tycoon was CacheDB formerly, it was changed to StashDB by the latest version.

Spece Efficiency

Because StashDB uses static hashing, it is desirable to set the decent number of hash buckets as many as the number of records. One million buckets are desired if you expect to store one million records. The default bucket number is one million. As one bucket takes 8 bytes space in 64-bit environment, one million buckets take about 8MB memory and ten million buckets take about 80MB memory. They don't come to an issue if using recent machines. However, note that performance comes to decline in linear order after the record number exceeds the bucket number.

Space overhead per record is 18 bytes. On the assumption that typical size of a key and a value is less than 128, space required by the size information takes only one byte for each data. The size of the pointer to express the hash chain is 8 bytes. The total of them are 10 bytes. Moreover, if the bucket number is the same as the record number, each record takes 8 bytes in addition. So, the total size becomes 18 bytes.

Other space overhead is caused by the underlying memory manager (malloc). As StashDB allocates a region for each record from the memory manager, total overhead becomes fairly large. Although I don't know detail implementation of the memory manager, from a commonsense standpoint, it would take overhead by several pointers and alignment.

Comparison with memcached

memcached is a popular cache server, which implements its own memory manager called "slab allocator" so as to avoid overhead of time and space. It was designed to solve problems of the underlying malloc package. However, I think newer implementations of malloc are based on slab allocation algorithm and applications' own implementation is not needed any longer. GNU libc malloc is based on dlmalloc.

I did a benchmarking test to compare space efficiency of memcached and Kyoto Tycoon. I stored ten million records whose key and value is 8-byte strings such as "00000001", "00000002", and so on. Memory usage was checked by "cat /proc/(pid)/status".

I ran the server of memcached by the following command. That is, all setting was the default except that the limit memory usage was 1GB.

memcached -m 1073741824

The client to send record data was the following Perl script.

use Cache::Memcached;

my $memd = Cache::Memcached->new();
$memd->set_servers(["localhost:11211"]);

for (my $i = 1; $i <= 10000000; $i++) {
    my $key = sprintf("%08d", $i);
    $memd->set($key, $key);
}

I ran the server of Kyoto Tycoon by the following command. That is, all setting was the default except that the log level was raised to SYSTEM.

ktserver -ls

The client to send record data was the following command.

ktremotetest order -set 10000000

Now, I've got the result.

memory usage
memcached 981 MB
Kyoto Tycoon 553 MB

Space efficiency of Kyoto Tycoon seems better than that of memcached. I don't say that Kyoto Tycoon is better than memcached, but say glibc's malloc seems better than memcachd's slab allocator in space efficiency. BTW, as you might say that the option "-m 1073741824" caused that inefficiency, it is not true. Memcached doesn't take maximum memory to the limitation on start-up but takes required memory gradually.

Conclusion

StashDB is a memory-saving on-memory DBM. Its space efficiency is a kind of the best for associative array of strings. You can use StashDB in scripting languages by opening database objects with the database name of ":". I hope it becomes your favorite.

Space efficiency of Kyoto Tycoon using StashDB is better than that of memcached. Although StashDB allocates respective regions for each record, efficiency of glibc's malloc seems to compensate for it.

2 comments
dyu : No time-efficiency (throughput) benchmark? :-) (2010/11/14 08:49)
mikio : To show reasonable indices about time efficiency (throughput), I should prepare dozens of client machines and use the best optimized client library. However, it is too hard task for this article. I hope to do it in the near future. (2010/11/14 23:50)
riddle for guest comment authorization:
Where is the capital city of Japan? ...