TCB: ScyllaDB
Scylla DB - NoSQL - wide column Database
ScyllaDB is a NoSQL database, written in C++ (Cassandra is in Java), for the store model it’s using wide column, in comparison MongoDb using document store, Redis is key/value store.
-
Benefits
Some of the benefits of ScyllaDB are High Availability, High Scalability, Low maintenance, drop in replacement for Cassandra/DynamoDB. Design decision is shard per core instead of thread per core like in Cassandra.
-
Architecture
ScyllaDB is running on nodes, (usually) there are 3 nodes in ScyllaDB cluster. Cluster is node ring structure, each node has his token range which helps ScyllaDB to know which node contains some data. Each item contains Partition key, hash value which contains token ID. Communication between nodes is done via peer-to-peer (p2p) protocol, they communicate using “Gossip” app which is decentralized and it does not have single-point-of-failure.
-
Podaci
Amount of same data (replicated data) on different nodes depends on Replication Factor (RF) attribute, i.e. if it is set RF=3 that means that same data needs to be replicated on 3 different nodes. When ScyllaDB writes some data to Node, Node returns acknowledgement (ACK) about result of operation, and final result depends on how Consistency level (CL) is set, i.e. if CL=ALL that means that every Node needs to return success result, if it is set CL=One that means that only one Node needs to return success to consider writing data as success. CL options are: ANY, QUORUM, ONE, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, ALL
-
Table
When you are creating a table you will pass attributes such as RF and CL and they are applied on whole table. CL can be changed for single query “on-fly” with Tunable consistency option.
-
Query Language
As we mention that ScyllaDB is “drop in replacement” for Cassandra, that also means that you can write queries in CQL and use the same application adapters like Cassandra.
-
CQL - Cassandra Query Language
SQL vs CQL, syntax is similar or exact but under the hood it’s different.
Examples
- INSERT is same.
INSERT INTO users (id, name) VALUES (1, 'John');
INSERT INTO users (id, name) VALUES (1, 'John');
- CQL does not have JOIN
SELECT * FROM orders
JOIN users ON orders.user_id = users.id;
Not exists.
- GROUP_BY needs to use
PARTITION_KEY
SELECT date, SUM(amount)
FROM payments
GROUP BY date;
Not exists.
- WHERE needs to use
PARTITION_KEY
SELECT * FROM orders WHERE price > 100;
SELECT * FROM orders WHERE price > 100; -- ERROR
SELECT * FROM logs
WHERE device_id = 'A'
AND timestamp > 1000; --- Correct if device_id is partition key.
- ORDER_BY needs to use
CLUSTER_KEY
SELECT * FROM logs ORDER BY date DESC;
SELECT * FROM logs
WHERE device_id = 1
ORDER BY timestamp DESC; --- timestamp needs to be cluster_key.
- UPDATE syntax is same but it does not update like SQL, it doing upsert/put
UPDATE users SET age = 30 WHERE id = 1;
UPDATE users SET age = 30 WHERE id = 1;
-
DELETE and Tombstone
ScyllaDB is distributive database and because of that DELETE is different than in RDBS, it does not remove data immediately, what it does is that adds flag for deletion to data which creates tombstone and tombstone after some period (per default it’s 10 days, which is max) is permanently removed.
Tombstone can be very dangerous if there is too much of them, you need to avoid them following situations:
- Doing DELETE WHERE partition_key on big tables
- Having too many TTL in big tables
- Doing UPDATE column=NULL
- Not using bucketing
-
TTL - Time to live
TTL is Time-to-live option that you can use on query which gives auto deletion option to data which you want to insert.
-
Index
Indexes exist in ScyllaDB but they are bad and slow, so they are very rare.
-
Tablets - Version 6+
Since version 6 ScyllaDB uses TABLETS, tablets are replacement for vNode and tokens, differences are that tablets are small parts and can be moved between nodes to adjust load of node, so each table is separate on multiple small parts and they are moved between nodes. This gives real elastic scaling like in cloud environment.