In the last few years non-relational data stores have drawn the attention of researchers and pratictioners. Their lack of fixed table schemas and the avoidance of join operators allow these tools to scale horizontally, storing huge amounts of data and providing high-performance access to it.

This class of databases (also known as NoSQL databases) often employs a distributed architecture with the data partitioned and redundantly replicated on several nodes, providing both  and fault tolerance.

Notable examples are HBase, CouchDB, Cassandra and Project Voldemort. In the context of the SeCo project we are currently investigating Project to store the content of the server registries since it provides irremissible features such as:

  • a distributed architecture
  • partitioning
  • replication
  • low latency
  • scalability
  • pluggable persistence
  • a Java client API.

The remainder of this gives an overview of Project Voldemort and its features; we deal with the configuration of a simple cluster of Voldemort nodes and finally we discuss an example of Java client.

Project Voldemort

Project Voldemort is a distributed key-value storage system used at Linkedin “for certain high-scalability storage problems where simple functional partitioning is not sufficient.” It is a distributed, optionally persistent and fault tolerant hash table providing horizontal scalability and high-availability.

Partitioning of data is transparent to the final user and the persistence layer could be plugged-in according to the application needs (currently Berkeley DB and MySQL are supported, but additional persistence back-ends can be plugged-in writing few lines of code).

Serialization in Voldemort is pluggable too. JSON, raw strings, Java Serialization are supported out of the box; additional ad-hoc serializers can be implemented.
At the time of writing this tutorial the latest available release is 0.80.2 and the source code is available under the Apache 2.0 license.

Configuration

The configuration of a Voldemort cluster requires three files to be edited:

cluster.xml

It contains information about the cluster and its nodes (the name of the cluster for identification purposes and for each node its identifier, the hostname, the ports – listening and socket port – and the of the partitions – a comma separated list of identifiers).
The cluster.xml file is exactly the same for all Voldemort nodes.
An example is shown as follows:

<cluster>
   <name>secoCluster</name>
   <server>
      <id>0</id>
      <host>voldemort1.seco.com</host>
      <http-port>8081</http-port>
      <socket-port>6666</socket-port>
      <partitions>0,1,2,3</partitions>
   </server>
   <server>
      <id>1</id>
      <host>voldemort2.seco.com</host>
      <http-port>8081</http-port>
      <socket-port>6666</socket-port>
      <partitions>4,5,6,7</partitions>
   </server>
</cluster>

The example shows the configuration of a cluster with two nodes and eight data partitions.

stores.xml

It contains information about the required number of successful reads to maintain consistency, the
required number of writes, persistence and serialization.
The stores.xml file is exactly the same for all Voldemort nodes.
An example is shown as follows:

<stores>
   <store>
      <name>seco</name>
      <replication-factor>2</replication-factor>
      <preferred-reads>2</preferred-reads>
      <required-reads>1</required-reads>
      <preferred-writes>2</preferred-writes>
      <required-writes>1</required-writes>
      <persistence>bdb</persistence>
      <routing>client</routing>
      <key-serializer>
         <type>string</type>
         <schema-info>utf8</schema-info>
      </key-serializer>
      <value-serializer>
         <type>string</type>
         <schema-info>utf8</schema-info>
      </value-serializer>
   </store>
</stores>

The example shows a store (i.e. table) whose name is seco with a replication factor of 2 (a node failure will be tolerated without data loss). Required read and write values is the least number of reads and writes that must succeed without throwing an exception; preferred values, instead, represent
the number of attempts.
The persistence back-end for the cluster is Berkeley DB and both keys and values will be serialized as strings.

server.properties

The server.properties file is the only server specific file. It contains local configuration details (such as directories and persistence configuration). The minimal property file requires only the node.id property to be set.
An example is shown as follows:

node.id=0
max.threads=100
http.enable=true
socket.enable=true
# BDB
bdb.write.transactions=false
bdb.flush.transactions=false
bdb.cache.size=1G

A maximum amount of 100 threads will be allocated to the server, both the socket and HTTP server will be enabled and persistence (based on Berkeley DB) will not require the immediate writing of transactions to the disk and the flush of the OS cache. 1 GB will be dedicated to the Berkeley DB cache.

Shell

Project Voldemort provides a set of command-line clients to interact with a Voldemort cluster. To run a Voldemort shell, please run the following command from the /bin directory of your installation:

sh voldemort-shell.sh <store> <bootstrapURL>

where

  • <store> is the name of a store (as defined in stores.xml)
  • <bootstrapURL> is the contact URL of the server (e.g. tcp://voldemort1.seco.com:6666).

From the Voldemort interactive shell, the user can:

  • add data to the store – put "key" "value"
  • read data from the store – get "key"
  • delete data from the store – delete "key"

Java client example

Project Voldemort is developed in Java and a Java Client API is available. An example of Java client is described as follows.

package seco.voldemort.example;
import voldemort.client.ClientConfig;
import voldemort.client.SocketStoreClientFactory;
import voldemort.client.StoreClient;
import voldemort.client.StoreClientFactory;
import voldemort.versioning.Versioned;
public class VoldemortExample {
   public static void main(String[] args) {
      final String bootstrapUrl = "tcp://voldemort1.seco.com:6666";
      final StoreClientFactory factory =
         new SocketStoreClientFactory(new ClientConfig().setBootstrapUrls(bootstrapUrl));
      final StoreClient client = factory.getStoreClient("seco");
      final Versioned version = client.get("secoKey");
      version.setObject("secoValue");
      client.put("secoKey", version);
   }
}

A StoreClient object is created from the StoreClientFactory, specifying the store name.
Then an object is retrieved from the store (with key secoKey), its value is modified and finally the data is stored.

Additional resources

NoSQL databases
Anti-RDBMS: A list of distributed key-value stores
Project Voldemort