Added Directory Database

ID: 9
creation date: 2010/07/23 11:12
modification date: 2010/07/23 11:12
owner: mikio

I've released Kyoto Cabinet 1.1.0, which features the directory database. It is powered by the directory mechanism of the file system and stores records as respective files in a directory.

What is the Directory Database?

Performance of the directory database is strongly based on the file system implementation and its tuning. Some file systems such as EXT2 are not good at storing a lot of files in a directory. But, other file systems are relatively efficient in that situation. In general, file systems featuring B tree or its variants are more suitable than those featuring linear search algorithms. If you have doubt still, please install the latest version of Kyoto Cabinet and try the following command on a modern file system.

kcdirtest order -set casket 100000

It stores records as 100,000 files in one directory and takes only 2.8 seconds in my environment (Linux 2.6.32, EXT3 file system). It indicates that the storing performance is 35,323 QPS until 100,000 records.

How about one million records? It causes heavy disk access and takes 122 seconds (8196 QPS). I should admit that the performance of the directory database is inferior to the file hash database. However, 8196 QPS seems good enough for a lot of casual usage.

Why you use the Directory Database?

Most DBM implementations including Kyoto Cabinet and Tokyo Cabinet are optimized to store small records. As it is, if you want to handle very large records, using the file system directly is better solution than using DBM. If you store and retrieve large records, the processing time by the `write' and `read' system calls is dominant rather than the `open' and `lseek' system calls. Although typical DBMs reduce the workload to locate the position of each record, they increase the workload to read/write the data of each record.

That is, if you want to handle large records but don't want to use the file system directly, use the directory database. Although the directory database is a mere wrapper of the directory mechanism of the file system, it provides a good abstraction to improve portability and maintainability. I also dislike to write "HANDLE fh = CreateFile(path, GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, OPEN_EXISTING, NULL); ..." in application code.

Performance Comparison

I did some tests to store 100,000 records into various databases and then retrieve/remove the records. Each key is an 8 byte numeric string and the value is the same string. My machine has Core i7 CPU 2.0GHz and 4GB RAM.

set get remove
HashDB (EXT2) 0.113 0.094 0.088
TreeDB (EXT2) 0.071 0.085 0.072
DirDB(EXT2) 261.185 0.417 0.953
DirDB(EXT3) 2.917 0.438 8.356
DirDB(EXT4) 4.821 1.646 3.047
DirDB(XFS) 653.015 2.012 507.505
DirDB(ReiserFS) 5.561 0.795 3.817

As the above, the file hash database and the file tree database are much faster than the directory database. And, the performance of the directory database is blazingly swayed among underlying file systems. I think that using EXT3 and ReiserFS are reasonable solutions.

Especially, ReiserFS seems promising. Like DBM, it is optimized to handle relatively small files. Even when storing one million records, it does not cause heavy disk access and takes only 65 seconds.

Additional Features

As with the other database types of Kyoto Cabinet, the directory database supports visitor interface and trancaction. You can convert each record (each file) atomically by a call back function of visitor interface. You can commit a series of operations atomically or rollback it by transaction.

As with the other database types of Kyoto Cabinet, the mechanism for durability is implemented. While transaction (both of explicit transaction and auto transaction), the original data of every updated records is escaped as another file and replace them atomically by the `rename' system call.

The directory database is available from the polymorphic database and all the script bindings. Open the database with a name whose suffix is ".kcd" or specify the type option as "#type=kcd".

Conclusion

Although the directory database is not very fast, it is very easy to use and has reasonable performance. When I was asked how to handle very large data by users, I answered them to use file systems directly. However, from now, I answer them to use the directory database.

8 comments
Xavier : Did you take a look at btrfs ? Btree file system, with transactions built in. (2010/07/23 18:20)
mikio : I know btrfs. But, I've not tried it yet. I'm waiting for its stable release. (2010/07/23 19:51)
Yegor : Hi. Will you develop Tokyo Cabinet or Kyoto Cabinet is its full successor and people should migrate to it in the future? (2010/07/23 20:00)
mikio : Though I maintain TC, I don't add new feature to TC. I don't think that KC is the full successor of TC. In fact, TC is strongly optimized and much faster than KC in most cases. (2010/07/23 20:22)
mikio : When parallel processors with more than 8 cores are popular, the market value of KC will exceed that of TC. (2010/07/23 20:25)
Yegor : Thanks for clarification. (2010/07/23 21:08)
Pranjul : I need to store billions of records. Presently my algorithm stores 1 million per database file. What would you recommend TC or KC taking into account that I need fast retrieval of records ? (2010/09/01 13:19)
mikio : Now, KC is recommended for almost all use cases. (2010/09/01 19:43)
riddle for guest comment authorization:
Where is the capital city of Japan? ...