Рейтинг@Mail.ru
Tarantool 2.2.1 documentation
Tarantool 2.2.1 documentation

Tarantool 2.2.1 documentation

Tarantool - Documentation

Overview

An application server together with a database manager

Tarantool is a Lua application server integrated with a database management system. It has a “fiber” model which means that many Tarantool applications can run simultaneously on a single thread, while each instance of the Tarantool server itself can run multiple threads for input-output and background maintenance. It incorporates the LuaJIT – “Just In Time” – Lua compiler, Lua libraries for most common applications, and the Tarantool Database Server which is an established NoSQL DBMS. Thus Tarantool serves all the purposes that have made node.js and Twisted popular, plus it supports data persistence.

The code is free. The open-source license is BSD license. The supported platforms are GNU/Linux, Mac OS and FreeBSD.

Tarantool’s creator and biggest user is Mail.Ru, the largest internet company in Russia, with 30 million users, 25 million emails per day, and a web site whose Alexa global rank is in the top 40 worldwide. Tarantool services Mail.Ru’s hottest data, such as the session data of online users, the properties of online applications, the caches of the underlying data, the distribution and sharding algorithms, and much more. Outside Mail.Ru the software is used by a growing number of projects in online gaming, digital marketing, and social media industries. Although Mail.Ru is the sponsor for product development, the roadmap and the bugs database and the development process are fully open. The software incorporates patches from dozens of community contributors. The Tarantool community writes and maintains most of the drivers for programming languages. The greater Lua community has hundreds of useful packages most of which can become Tarantool extensions.

Users can create, modify and drop Lua functions at runtime. Or they can define Lua programs that are loaded during startup for triggers, background tasks, and interacting with networked peers. Unlike popular application development frameworks based on a “reactor” pattern, networking in server-side Lua is sequential, yet very efficient, as it is built on top of the cooperative multitasking environment that Tarantool itself uses.

One of the built-in Lua packages provides an API for the Database Management System. Thus some developers see Tarantool as a DBMS with a popular stored procedure language, while others see it as a Lua interpreter, while still others see it as a replacement for many components of multi-tier Web applications. Performance can be a few hundred thousand transactions per second on a laptop, scalable upwards or outwards to server farms.

Database features

Tarantool can run without it, but “The Box” – the DBMS server – is a strong distinguishing feature.

The database API allows for permanently storing Lua objects, managing object collections, creating or dropping secondary keys, making changes atomically, configuring and monitoring replication, performing controlled fail-over, and executing Lua code triggered by database events. Remote database instances are accessible transparently via a remote-procedure-invocation API.

Tarantool’s DBMS server uses the storage engine concept, where different sets of algorithms and data structures can be used for different situations. Two storage engines are built-in: an in-memory engine which has all the data and indexes in RAM, and a two-level B-tree engine for data sets whose size is 10 to 1000 times the amount of available RAM. All storage engines in Tarantool support transactions and replication by using a common write ahead log (WAL). This ensures consistency and crash safety of the persistent state. Changes are not considered complete until the WAL is written. The logging subsystem supports group commit.

Tarantool’s in-memory storage engine (memtx) keeps all the data in random-access memory, and therefore has very low read latency. It also keeps persistent copies of the data in non-volatile storage, such as disk, when users request “snapshots”. If an instance of the server stops and the random-access memory is lost, then restarts, it reads the latest snapshot and then replays the transactions that are in the log – therefore no data is lost.

Tarantool’s in-memory engine is lock-free in typical situations. Instead of the operating system’s concurrency primitives, such as mutexes, Tarantool uses cooperative multitasking to handle thousands of connections simultaneously. There is a fixed number of independent execution threads. The threads do not share state. Instead they exchange data using low-overhead message queues. While this approach limits the number of cores that the instance will use, it removes competition for the memory bus and ensures peak scalability of memory access and network throughput. CPU utilization of a typical highly-loaded Tarantool instance is under 10%. Searches are possible via secondary index keys as well as primary keys.

Tarantool’s disk-based storage engine is a fusion of ideas from modern filesystems, log-structured merge trees and classical B-trees. All data is organized into ranges. Each range is represented by a file on disk. Range size is a configuration option and normally is around 64MB. Each range is a collection of pages, serving different purposes. Pages in a fully merged range contain non-overlapping ranges of keys. A range can be partially merged if there were a lot of changes in its key range recently. In that case some pages represent new keys and values in the range. The disk-based storage engine is append only: new data never overwrites old data. The disk-based storage engine is named vinyl.

Tarantool supports multi-part index keys. The possible index types are HASH, TREE, BITSET, and RTREE.

Tarantool supports asynchronous replication, locally or to remote hosts. The replication architecture can be master-master, that is, many nodes may both handle the loads and receive what others have handled, for the same data sets.

Tarantool supports basic SQL structures and persistence for SQL operations (with acceptable limitations). All tables and triggers created in SQL are available after server restart.

User’s Guide

Preface

Welcome to Tarantool! This is the User’s Guide. We recommend reading it first, and consulting Reference materials for more detail afterwards, if needed.

How to read the documentation

To get started, you can install and launch Tarantool using a Docker container, a binary package, or the online Tarantool server at http://try.tarantool.org. Either way, as the first tryout, you can follow the introductory exercises from Chapter 2 “Getting started”. If you want more hands-on experience, proceed to Tutorials after you are through with Chapter 2.

Chapter 3 “Database” is about using Tarantool as a NoSQL DBMS, whereas Chapter 4 “Application server” is about using Tarantool as an application server.

Chapter 5 “Server administration” and Chapter 6 “Replication” are primarily for administrators.

Chapter 7 “Connectors” is strictly for users who are connecting from a different language such as C or Perl or Python — other users will find no immediate need for this chapter.

Chapter 8 “FAQ” gives answers to some frequently asked questions about Tarantool.

For experienced users, there are also Reference materials, a Contributor’s Guide and an extensive set of comments in the source code.

Getting in touch with the Tarantool community

Please report bugs or make feature requests at http://github.com/tarantool/tarantool/issues.

You can contact developers directly in telegram or in a Tarantool discussion group (English or Russian).

Conventions used in this manual

Square brackets [ and ] enclose optional syntax.

Two dots in a row .. mean the preceding tokens may be repeated.

A vertical bar | means the preceding and following tokens are mutually exclusive alternatives.

Getting started

In this chapter, we explain how to install Tarantool, how to start it, and how to create a simple database.

This chapter contains the following sections:

Using a Docker image

For trial and test purposes, we recommend using official Tarantool images for Docker. An official image contains a particular Tarantool version and all popular external modules for Tarantool. Everything is already installed and configured in Linux. These images are the easiest way to install and use Tarantool.

Note

If you’re new to Docker, we recommend going over this tutorial before proceeding with this chapter.

Launching a container

If you don’t have Docker installed, please follow the official installation guide for your OS.

To start a fully functional Tarantool instance, run a container with minimal options:

$ docker run \
  --name mytarantool \
  -d -p 3301:3301 \
  -v /data/dir/on/host:/var/lib/tarantool \
  tarantool/tarantool:2

This command runs a new container named mytarantool. Docker starts it from an official image named tarantool/tarantool:2, with Tarantool version 2.2 and all external modules already installed.

Tarantool will be accepting incoming connections on localhost:3301. You may start using it as a key-value storage right away.

Tarantool persists data inside the container. To make your test data available after you stop the container, this command also mounts the host’s directory /data/dir/on/host (you need to specify here an absolute path to an existing local directory) in the container’s directory /var/lib/tarantool (by convention, Tarantool in a container uses this directory to persist data). So, all changes made in the mounted directory on the container’s side are applied to the host’s disk.

Tarantool’s database module in the container is already configured and started. You needn’t do it manually, unless you use Tarantool as an application server and run it with an application.

Attaching to Tarantool

To attach to Tarantool that runs inside the container, say:

$ docker exec -i -t mytarantool console

This command:

  • Instructs Tarantool to open an interactive console port for incoming connections.
  • Attaches to the Tarantool server inside the container under admin user via a standard Unix socket.

Tarantool displays a prompt:

tarantool.sock>

Now you can enter requests on the command line.

Note

On production machines, Tarantool’s interactive mode is for system administration only. But we use it for most examples in this manual, because the interactive mode is convenient for learning.

Creating a database

While you’re attached to the console, let’s create a simple test database.

First, create the first space (named tester):

tarantool.sock> s = box.schema.space.create('tester')

Format the created space by specifying field names and types:

tarantool.sock> s:format({
              > {name = 'id', type = 'unsigned'},
              > {name = 'band_name', type = 'string'},
              > {name = 'year', type = 'unsigned'}
              > })

Create the first index (named primary):

tarantool.sock> s:create_index('primary', {
              > type = 'hash',
              > parts = {'id'}
              > })

This is a primary index based on the id field of each tuple.

Insert three tuples (our name for records) into the space:

tarantool.sock> s:insert{1, 'Roxette', 1986}
tarantool.sock> s:insert{2, 'Scorpions', 2015}
tarantool.sock> s:insert{3, 'Ace of Base', 1993}

To select a tuple using the primary index, say:

tarantool.sock> s:select{3}

The terminal screen now looks like this:

tarantool.sock> s = box.schema.space.create('tester')
---
...
tarantool.sock> s:format({
              > {name = 'id', type = 'unsigned'},
              > {name = 'band_name', type = 'string'},
              > {name = 'year', type = 'unsigned'}
              > })
---
...
tarantool.sock> s:create_index('primary', {
              > type = 'hash',
              > parts = {'id'}
              > })
---
- unique: true
  parts:
  - type: unsigned
    is_nullable: false
    fieldno: 1
  id: 0
  space_id: 512
  name: primary
  type: HASH
...
tarantool.sock> s:insert{1, 'Roxette', 1986}
---
- [1, 'Roxette', 1986]
...
tarantool.sock> s:insert{2, 'Scorpions', 2015}
---
- [2, 'Scorpions', 2015]
...
tarantool.sock> s:insert{3, 'Ace of Base', 1993}
---
- [3, 'Ace of Base', 1993]
...
tarantool.sock> s:select{3}
---
- - [3, 'Ace of Base', 1993]
...

To add a secondary index based on the band_name field, say:

tarantool.sock> s:create_index('secondary', {
              > type = 'hash',
              > parts = {'band_name'}
              > })

To select tuples using the secondary index, say:

tarantool.sock> s.index.secondary:select{'Scorpions'}
---
- - [2, 'Scorpions', 2015]
...

Stopping a container

When the testing is over, stop the container politely:

$ docker stop mytarantool

This was a temporary container, and its disk/memory data were flushed when you stopped it. But since you mounted a data directory from the host in the container, Tarantool’s data files were persisted to the host’s disk. Now if you start a new container and mount that data directory in it, Tarantool will recover all data from disk and continue working with the persisted data.

Using a binary package

For production purposes, we recommend official binary packages. You can choose from two Tarantool versions: 1.10 (stable) or 2.2 (beta). An automatic build system creates, tests and publishes packages for every push into a corresponding branch (1.10 or 2.2) at Tarantool’s GitHub repository.

To download and install the package that’s appropriate for your OS, start a shell (terminal) and enter the command-line instructions provided for your OS at Tarantool’s download page.

Starting Tarantool

To start a Tarantool instance, say this:

$ # if you downloaded a binary with apt-get or yum, say this:
$ /usr/bin/tarantool
$ # if you downloaded and untarred a binary tarball to ~/tarantool, say this:
$ ~/tarantool/bin/tarantool

Tarantool starts in the interactive mode and displays a prompt:

tarantool>

Now you can enter requests on the command line.

Note

On production machines, Tarantool’s interactive mode is for system administration only. But we use it for most examples in this manual, because the interactive mode is convenient for learning.

Creating a database

Here is how to create a simple test database after installation.

  1. To let Tarantool store data in a separate place, create a new directory dedicated for tests:

    $ mkdir ~/tarantool_sandbox
    $ cd ~/tarantool_sandbox
    

    You can delete the directory when the tests are over.

  2. Check if the default port the database instance will listen to is vacant.

    Depending on the release, during installation Tarantool may start a demonstrative global example.lua instance that listens to the 3301 port by default. The example.lua file showcases basic configuration and can be found in the /etc/tarantool/instances.enabled or /etc/tarantool/instances.available directories.

    However, we encourage you to perform the instance startup manually, so you can learn.

    Make sure the default port is vacant:

    1. To check if the demonstrative instance is running, say:

      $ lsof -i :3301
      COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
      tarantool 6851 root   12u  IPv4  40827      0t0  TCP *:3301 (LISTEN)
      
    2. If it does, kill the corresponding process. In this example:

      $ kill 6851
      
  3. To start Tarantool’s database module and make the instance accept TCP requests on port 3301, say:

    tarantool> box.cfg{listen = 3301}
    
  4. Create the first space (named tester):

    tarantool> s = box.schema.space.create('tester')
    
  5. Format the created space by specifying field names and types:

    tarantool> s:format({
             > {name = 'id', type = 'unsigned'},
             > {name = 'band_name', type = 'string'},
             > {name = 'year', type = 'unsigned'}
             > })
    
  6. Create the first index (named primary):

    tarantool> s:create_index('primary', {
             > type = 'hash',
             > parts = {'id'}
             > })
    

    This is a primary index based on the id field of each tuple.

  7. Insert three tuples (our name for records) into the space:

    tarantool> s:insert{1, 'Roxette', 1986}
    tarantool> s:insert{2, 'Scorpions', 2015}
    tarantool> s:insert{3, 'Ace of Base', 1993}
    
  8. To select a tuple using the primary index, say:

    tarantool> s:select{3}
    

    The terminal screen now looks like this:

    tarantool> s = box.schema.space.create('tester')
    ---
    ...
    tarantool> s:format({
             > {name = 'id', type = 'unsigned'},
             > {name = 'band_name', type = 'string'},
             > {name = 'year', type = 'unsigned'}
             > })
    ---
    ...
    tarantool> s:create_index('primary', {
             > type = 'hash',
             > parts = {'id'}
             > })
    ---
    - unique: true
      parts:
      - type: unsigned
        is_nullable: false
        fieldno: 1
      id: 0
      space_id: 512
      name: primary
      type: HASH
    ...
    tarantool> s:insert{1, 'Roxette', 1986}
    ---
    - [1, 'Roxette', 1986]
    ...
    tarantool> s:insert{2, 'Scorpions', 2015}
    ---
    - [2, 'Scorpions', 2015]
    ...
    tarantool> s:insert{3, 'Ace of Base', 1993}
    ---
    - [3, 'Ace of Base', 1993]
    ...
    tarantool> s:select{3}
    ---
    - - [3, 'Ace of Base', 1993]
    ...
    
  9. To add a secondary index based on the band_name field, say:

    tarantool> s:create_index('secondary', {
             > type = 'hash',
             > parts = {'band_name'}
             > })
    
  10. To select tuples using the secondary index, say:

    tarantool> s.index.secondary:select{'Scorpions'}
    ---
    - - [2, 'Scorpions', 2015]
    ...
    
  11. Now, to prepare for the example in the next section, try this:

    tarantool> box.schema.user.grant('guest', 'read,write,execute', 'universe')
    

Connecting remotely

In the request box.cfg{listen = 3301} that we made earlier, the listen value can be any form of a URI (uniform resource identifier). In this case, it’s just a local port: port 3301. You can send requests to the listen URI via:

  1. telnet,
  2. a connector,
  3. another instance of Tarantool (using the console module), or
  4. tarantoolctl utility.

Let’s try (4).

Switch to another terminal. On Linux, for example, this means starting another instance of a Bash shell. You can switch to any working directory in the new terminal, not necessarily to ~/tarantool_sandbox.

Start the tarantoolctl utility:

$ tarantoolctl connect '3301'

This means “use tarantoolctl connect to connect to the Tarantool instance that’s listening on localhost:3301”.

Try this request:

localhost:3301> box.space.tester:select{2}

This means “send a request to that Tarantool instance, and display the result”. The result in this case is one of the tuples that was inserted earlier. Your terminal screen should now look like this:

$ tarantoolctl connect 3301
/usr/local/bin/tarantoolctl: connected to localhost:3301
localhost:3301> box.space.tester:select{2}
---
- - [2, 'Scorpions', 2015]
...

You can repeat box.space...:insert{} and box.space...:select{} indefinitely, on either Tarantool instance.

When the testing is over:

  • To drop the space: s:drop()
  • To stop tarantoolctl: Ctrl+C or Ctrl+D
  • To stop Tarantool (an alternative): the standard Lua function os.exit()
  • To stop Tarantool (from another terminal): sudo pkill -f tarantool
  • To destroy the test: rm -r ~/tarantool_sandbox

Database

In this chapter, we introduce the basic concepts of working with Tarantool as a database manager.

This chapter contains the following sections:

Data model

This section describes how Tarantool stores values and what operations with data it supports.

If you tried to create a database as suggested in our “Getting started” exercises, then your test database now looks like this:

_images/data_model.png

Space

A space – ‘tester’ in our example – is a container.

When Tarantool is being used to store data, there is always at least one space. Each space has a unique name specified by the user. Besides, each space has a unique numeric identifier which can be specified by the user, but usually is assigned automatically by Tarantool. Finally, a space always has an engine: memtx (default) – in-memory engine, fast but limited in size, or vinyl – on-disk engine for huge data sets.

A space is a container for tuples. To be functional, it needs to have a primary index. It can also have secondary indexes.

Tuple

A tuple plays the same role as a “row” or a “record”, and the components of a tuple (which we call “fields”) play the same role as a “row column” or “record field”, except that:

  • fields can be composite structures, such as arrays or maps, and
  • fields don’t need to have names.

Any given tuple may have any number of fields, and the fields may be of different types. The identifier of a field is the field’s number, base 1 (in Lua and other 1-based languages) or base 0 (in PHP or C/C++). For example, 1 or 0 can be used in some contexts to refer to the first field of a tuple.

The number of tuples in a space is unlimited.

Tuples in Tarantool are stored as MsgPack arrays.

When Tarantool returns a tuple value in the console, by default it uses YAML format, for example: [3, 'Ace of Base', 1993].

Index

An index is a group of key values and pointers.

As with spaces, you should specify the index name, and let Tarantool come up with a unique numeric identifier (“index id”).

An index always has a type. The default index type is ‘TREE’. TREE indexes are provided by all Tarantool engines, can index unique and non-unique values, support partial key searches, comparisons and ordered results. Additionally, memtx engine supports HASH, RTREE and BITSET indexes.

An index may be multi-part, that is, you can declare that an index key value is composed of two or more fields in the tuple, in any order. For example, for an ordinary TREE index, the maximum number of parts is 255.

An index may be unique, that is, you can declare that it would be illegal to have the same key value twice.

The first index defined on a space is called the primary key index, and it must be unique. All other indexes are called secondary indexes, and they may be non-unique.

An index definition may include identifiers of tuple fields and their expected types (see allowed indexed field types below).

In our example, we first defined the primary index (named ‘primary’) based on field #1 of each tuple:

tarantool> i = s:create_index('primary', {type = 'hash', parts = {{field = 1, type = 'unsigned'}}}

The effect is that, for all tuples in space ‘tester’, field #1 must exist and must contain an unsigned integer. The index type is ‘hash’, so values in field #1 must be unique, because keys in HASH indexes are unique.

After that, we defined a secondary index (named ‘secondary’) based on field #2 of each tuple:

tarantool> i = s:create_index('secondary', {type = 'tree', parts = {field = 2, type = 'string'}})

The effect is that, for all tuples in space ‘tester’, field #2 must exist and must contain a string. The index type is ‘tree’, so values in field #2 must not be unique, because keys in TREE indexes may be non-unique.

Note

Space definitions and index definitions are stored permanently in Tarantool’s system spaces _space and _index (for details, see reference on box.space submodule).

You can add, drop, or alter the definitions at runtime, with some restrictions. See syntax details in reference on box module.

Data types

Tarantool is both a database and an application server. Hence a developer often deals with two type sets: the programming language types (e.g. Lua) and the types of the Tarantool storage format (MsgPack).

Lua vs MsgPack
Scalar / compound MsgPack   type Lua type Example value
scalar nil nil msgpack.NULL
scalar boolean boolean true
scalar string string ‘A B C’
scalar integer number 12345
scalar double number 1.2345
scalar bin cdata [!!binary 3t7e]
compound map table” (with string keys) {‘a’: 5, ‘b’: 6}
compound array table” (with integer keys) [1, 2, 3, 4, 5]
compound array tuple (“cdata”) [12345, ‘A B C’]

In Lua, a nil type has only one possible value, also called nil (displayed as null on Tarantool’s command line, since the output is in the YAML format). Nils may be compared to values of any types with == (is-equal) or ~= (is-not-equal), but other operations will not work. Nils may not be used in Lua tables; the workaround is to use msgpack.NULL

A boolean is either true or false.

A string is a variable-length sequence of bytes, usually represented with alphanumeric characters inside single quotes. In both Lua and MsgPack, strings are treated as binary data, with no attempts to determine a string’s character set or to perform any string conversion – unless there is an optional collation. So, usually, string sorting and comparison are done byte-by-byte, without any special collation rules applied. (Example: numbers are ordered by their point on the number line, so 2345 is greater than 500; meanwhile, strings are ordered by the encoding of the first byte, then the encoding of the second byte, and so on, so ‘2345’ is less than ‘500’.)

In Lua, a number is double-precision floating-point, but Tarantool allows both integer and floating-point values. Tarantool will try to store a Lua number as floating-point if the value contains a decimal point or is very large (greater than 100 trillion = 1e14), otherwise Tarantool will store it as an integer. To ensure that even very large numbers are stored as integers, use the tonumber64 function, or the LL (Long Long) suffix, or the ULL (Unsigned Long Long) suffix. Here are examples of numbers using regular notation, exponential notation, the ULL suffix and the tonumber64 function: -55, -2.7e+20, 100000000000000ULL, tonumber64('18446744073709551615').

A bin (binary) value is not directly supported by Lua but there is a Tarantool type VARBINARY which is encoded as MessagePack binary. For an (advanced) example showing how to insert VARBINARY into a database, see the Cookbook Recipe for ffi_varbinary_insert.

Lua tables with string keys are stored as MsgPack maps; Lua tables with integer keys starting with 1 – as MsgPack arrays. Nils may not be used in Lua tables; the workaround is to use msgpack.NULL

A tuple is a light reference to a MsgPack array stored in the database. It is a special type (cdata) to avoid conversion to a Lua table on retrieval. A few functions may return tables with multiple tuples. For more tuple examples, see box.tuple.

Note

Tarantool uses the MsgPack format for database storage, which is variable-length. So, for example, the smallest number requires only one byte, but the largest number requires nine bytes.

Examples of insert requests with different data types:

tarantool> box.space.K:insert{1,nil,true,'A B C',12345,1.2345}
---
- [1, null, true, 'A B C', 12345, 1.2345]
...
tarantool> box.space.K:insert{2,{['a']=5,['b']=6}}
---
- [2, {'a': 5, 'b': 6}]
...
tarantool> box.space.K:insert{3,{1,2,3,4,5}}
---
- [3, [1, 2, 3, 4, 5]]
...
Indexed field types

Indexes restrict values which Tarantool’s MsgPack may contain. This is why, for example, ‘unsigned’ is a separate indexed field type, compared to ‘integer’ data type in MsgPack: they both store ‘integer’ values, but an ‘unsigned’ index contains only non-negative integer values and an ‘integer’ index contains all integer values.

Here’s how Tarantool indexed field types correspond to MsgPack data types.

Indexed field type MsgPack data type
(and possible values)
Index type Examples
unsigned (may also be called ‘uint’ or ‘num’, but ‘num’ is deprecated) integer (integer between 0 and 18446744073709551615, i.e. about 18 quintillion) TREE, BITSET or HASH 123456
integer (may also be called ‘int’) integer (integer between -9223372036854775808 and 18446744073709551615) TREE or HASH -2^63
number

integer (integer between -9223372036854775808 and 18446744073709551615)

double (single-precision floating point number or double-precision floating point number)

TREE or HASH

1.234

-44

1.447e+44

string (may also be called ‘str’) string (any set of octets, up to the maximum length) TREE, BITSET or HASH

‘A B C’

‘\65 \66 \67’

varbinary bin (any set of octets, up to the maximum length) TREE or HASH ‘\65 \66 \67’
boolean bool (true or false) TREE or HASH true
array array (list of numbers representing points in a geometric figure) RTREE

{10, 11}

{3, 5, 9, 10}

scalar

null

bool (true or false)

integer (integer between -9223372036854775808 and 18446744073709551615)

double (single-precision floating point number or double-precision floating point number)

string (any set of octets)

varbinary (any set of octets)

Note: When there is a mix of types, the key order is: null, then booleans, then numbers, then strings, then varbinary.

TREE or HASH

msgpack.NULL

true

-1

1.234

‘’

‘ру’

Collations

By default, when Tarantool compares strings, it uses what we call a “binary” collation. The only consideration here is the numeric value of each byte in the string. Therefore, if the string is encoded with ASCII or UTF-8, then 'A' < 'B' < 'a', because the encoding of ‘A’ (what used to be called the “ASCII value”) is 65, the encoding of ‘B’ is 66, and the encoding of ‘a’ is 98. Binary collation is best if you prefer fast deterministic simple maintenance and searching with Tarantool indexes.

But if you want the ordering that you see in phone books and dictionaries, then you need Tarantool’s optional collations, such as unicode and unicode_ci, which allow for 'a' < 'A' < 'B' and 'a' = 'A' < 'B' respectively.

The unicode and unicode_ci optional collations use the ordering according to the Default Unicode Collation Element Table (DUCET) and the rules described in Unicode® Technical Standard #10 Unicode Collation Algorithm (UTS #10 UCA). The only difference between the two collations is about weights:

  • unicode collation observes L1 and L2 and L3 weights (strength = ‘tertiary’),
  • unicode_ci collation observes only L1 weights (strength = ‘primary’), so for example ‘a’ = ‘A’ = ‘á’ = ‘Á’.

As an example, take some Russian words:

'ЕЛЕ'
'елейный'
'ёлка'
'еловый'
'елозить'
'Ёлочка'
'ёлочный'
'ЕЛь'
'ель'

…and show the difference in ordering and selecting by index:

  • with unicode collation:

    tarantool> box.space.T:create_index('I', {parts = {{field = 1, type = 'str', collation='unicode'}}})
    ...
    tarantool> box.space.T.index.I:select()
    ---
    - - ['ЕЛЕ']
      - ['елейный']
      - ['ёлка']
      - ['еловый']
      - ['елозить']
      - ['Ёлочка']
      - ['ёлочный']
      - ['ель']
      - ['ЕЛь']
    ...
    tarantool> box.space.T.index.I:select{'ЁлКа'}
    ---
    - []
    ...
    
  • with unicode_ci collation:

    tarantool> box.space.T:create_index('I', {parts = {{field = 1, type ='str', collation='unicode_ci'}}})
    ...
    tarantool> box.space.S.index.I:select()
    ---
    - - ['ЕЛЕ']
      - ['елейный']
      - ['ёлка']
      - ['еловый']
      - ['елозить']
      - ['Ёлочка']
      - ['ёлочный']
      - ['ЕЛь']
    ...
    tarantool> box.space.S.index.I:select{'ЁлКа'}
    ---
    - - ['ёлка']
    ...
    

In all, collation involves much more than these simple examples of upper case / lower case and accented / unaccented equivalence in alphabets. We also consider variations of the same character, non-alphabetic writing systems, and special rules that apply for combinations of characters.

For English: use “unicode” and “unicode_ci”. For Russian: use “unicode” and “unicode_ci” (although a few Russians might prefer the Kyrgyz collation which says Cyrillic letters ‘Е’ and ‘Ё’ are the same with level-1 weights). For Dutch, German (dictionary), French, Indonesian, Irish, Italian, Lingala, Malay, Portuguese, Southern Soho, Xhosa, or Zulu: “unicode” and “unicode_ci” will do.

The tailored optional collations: For other languages, Tarantool supplies tailored collations for every modern language that has more than a million native speakers, and for specialized situations such as the difference between dictionary order and telephone book order. To see a complete list say box.space._collation:select(). The tailored collation names have the form unicode_[language code]_[strength] where language code is a standard 2-character or 3-character language abbreviation, and strength is s1 for “primary strength” (level-1 weights), s2 for “secondary”, s3 for “tertiary”. Tarantool uses the same language codes as the ones in the “list of tailorable locales” on man pages of Ubuntu and Fedora. Charts explaining the precise differences from DUCET order are in the Common Language Data Repository.

Sequences

A sequence is a generator of ordered integer values.

As with spaces and indexes, you should specify the sequence name, and let Tarantool come up with a unique numeric identifier (“sequence id”).

As well, you can specify several options when creating a new sequence. The options determine what value will be generated whenever the sequence is used.

Options for box.schema.sequence.create()
Option name Type and meaning Default Examples
start Integer. The value to generate the first time a sequence is used 1 start=0
min Integer. Values smaller than this cannot be generated 1 min=-1000
max Integer. Values larger than this cannot be generated 9223372036854775807 max=0
cycle Boolean. Whether to start again when values cannot be generated false cycle=true
cache Integer. The number of values to store in a cache 0 cache=0
step Integer. What to add to the previous generated value, when generating a new value 1 step=-1
if_not_exists Boolean. If this is true and a sequence with this name exists already, ignore other options and use the existing values false if_not_exists=true

Once a sequence exists, it can be altered, dropped, reset, forced to generate the next value, or associated with an index.

For an initial example, we generate a sequence named ‘S’.

tarantool> box.schema.sequence.create('S',{min=5, start=5})
---
- step: 1
  id: 5
  min: 5
  cache: 0
  uid: 1
  max: 9223372036854775807
  cycle: false
  name: S
  start: 5
...

The result shows that the new sequence has all default values, except for the two that were specified, min and start.

Then we get the next value, with the next() function.

tarantool> box.sequence.S:next()
---
- 5
...

The result is the same as the start value. If we called next() again, we would get 6 (because the previous value plus the step value is 6), and so on.

Then we create a new table, and say that its primary key may be generated from the sequence.

tarantool> s=box.schema.space.create('T');s:create_index('I',{sequence='S'})
---
...

Then we insert a tuple, without specifying a value for the primary key.

tarantool> box.space.T:insert{nil,'other stuff'}
---
- [6, 'other stuff']
...

The result is a new tuple where the first field has a value of 6. This arrangement, where the system automatically generates the values for a primary key, is sometimes called “auto-incrementing” or “identity”.

For syntax and implementation details, see the reference for box.schema.sequence.

Persistence

In Tarantool, updates to the database are recorded in the so-called write ahead log (WAL) files. This ensures data persistence. When a power outage occurs or the Tarantool instance is killed incidentally, the in-memory database is lost. In this situation, WAL files are used to restore the data. Namely, Tarantool reads the WAL files and redoes the requests (this is called the “recovery process”). You can change the timing of the WAL writer, or turn it off, by setting wal_mode.

Tarantool also maintains a set of snapshot files. These files contain an on-disk copy of the entire data set for a given moment. Instead of reading every WAL file since the databases were created, the recovery process can load the latest snapshot file and then read only those WAL files that were produced after the snapshot file was made. After checkpointing, old WAL files can be removed to free up space.

To force immediate creation of a snapshot file, you can use Tarantool’s box.snapshot() request. To enable automatic creation of snapshot files, you can use Tarantool’s checkpoint daemon. The checkpoint daemon sets intervals for forced checkpoints. It makes sure that the states of both memtx and vinyl storage engines are synchronized and saved to disk, and automatically removes old WAL files.

Snapshot files can be created even if there is no WAL file.

Note

The memtx engine makes only regular checkpoints with the interval set in checkpoint daemon configuration.

The vinyl engine runs checkpointing in the background at all times.

See the Internals section for more details about the WAL writer and the recovery process.

Operations

Data operations

The basic data operations supported in Tarantool are:

  • five data-manipulation operations (INSERT, UPDATE, UPSERT, DELETE, REPLACE), and
  • one data-retrieval operation (SELECT).

All of them are implemented as functions in box.space submodule.

Examples:

  • INSERT: Add a new tuple to space ‘tester’.

    The first field, field[1], will be 999 (MsgPack type is integer).

    The second field, field[2], will be ‘Taranto’ (MsgPack type is string).

    tarantool> box.space.tester:insert{999, 'Taranto'}
    
  • UPDATE: Update the tuple, changing field field[2].

    The clause “{999}”, which has the value to look up in the index of the tuple’s primary-key field, is mandatory, because update() requests must always have a clause that specifies a unique key, which in this case is field[1].

    The clause “{{‘=’, 2, ‘Tarantino’}}” specifies that assignment will happen to field[2] with the new value.

    tarantool> box.space.tester:update({999}, {{'=', 2, 'Tarantino'}})
    
  • UPSERT: Upsert the tuple, changing field field[2] again.

    The syntax of upsert() is similar to the syntax of update(). However, the execution logic of these two requests is different. UPSERT is either UPDATE or INSERT, depending on the database’s state. Also, UPSERT execution is postponed until after transaction commit, so, unlike update(), upsert() doesn’t return data back.

    tarantool> box.space.tester:upsert({999, 'Taranted'}, {{'=', 2, 'Tarantism'}})
    
  • REPLACE: Replace the tuple, adding a new field.

    This is also possible with the update() request, but the update() request is usually more complicated.

    tarantool> box.space.tester:replace{999, 'Tarantella', 'Tarantula'}
    
  • SELECT: Retrieve the tuple.

    The clause “{999}” is still mandatory, although it does not have to mention the primary key.

    tarantool> box.space.tester:select{999}
    
  • DELETE: Delete the tuple.

    In this example, we identify the primary-key field.

    tarantool> box.space.tester:delete{999}
    

Summarizing the examples:

  • Functions insert and replace accept a tuple (where a primary key comes as part of the tuple).
  • Function upsert accepts a tuple (where a primary key comes as part of the tuple), and also the update operations to execute.
  • Function delete accepts a full key of any unique index (primary or secondary).
  • Function update accepts a full key of any unique index (primary or secondary), and also the operations to execute.
  • Function select accepts any key: primary/secondary, unique/non-unique, full/partial.

See reference on box.space for more details on using data operations.

Note

Besides Lua, you can use Perl, PHP, Python or other programming language connectors. The client server protocol is open and documented. See this annotated BNF.

Index operations

Index operations are automatic: if a data-manipulation request changes a tuple, then it also changes the index keys defined for the tuple.

The simple index-creation operation that we’ve illustrated before is:

box.space.space-name:create_index('index-name')

This creates a unique TREE index on the first field of all tuples (often called “Field#1”), which is assumed to be numeric.

The simple SELECT request that we’ve illustrated before is:

box.space.space-name:select(value)

This looks for a single tuple via the first index. Since the first index is always unique, the maximum number of returned tuples will be: one.

The following SELECT variations exist:

  1. The search can use comparisons other than equality.

    box.space.space-name:select(value, {iterator = 'GT'})
    

    The comparison operators are LT, LE, EQ, REQ, GE, GT (for “less than”, “less than or equal”, “equal”, “reversed equal”, “greater than or equal”, “greater than” respectively). Comparisons make sense if and only if the index type is ‘TREE’.

    This type of search may return more than one tuple; if so, the tuples will be in descending order by key when the comparison operator is LT or LE or REQ, otherwise in ascending order.

  2. The search can use a secondary index.

    box.space.space-name.index.index-name:select(value)
    

    For a primary-key search, it is optional to specify an index name. For a secondary-key search, it is mandatory.

  3. The search may be for some or all key parts.

    -- Suppose an index has two parts
    tarantool> box.space.space-name.index.index-name.parts
    ---
    - - type: unsigned
        fieldno: 1
      - type: string
        fieldno: 2
    ...
    -- Suppose the space has three tuples
    box.space.space-name:select()
    ---
    - - [1, 'A']
      - [1, 'B']
      - [2, '']
    ...
    
  4. The search may be for all fields, using a table for the value:

    box.space.space-name:select({1, 'A'})
    

    or the search can be for one field, using a table or a scalar:

    box.space.space-name:select(1)
    

    In the second case, the result will be two tuples: {1, 'A'} and {1, 'B'}.

    You can specify even zero fields, causing all three tuples to be returned. (Notice that partial key searches are available only in TREE indexes.)

Examples

  • BITSET example:

    tarantool> box.schema.space.create('bitset_example')
    tarantool> box.space.bitset_example:create_index('primary')
    tarantool> box.space.bitset_example:create_index('bitset',{unique=false,type='BITSET', parts={field = 2, type = 'unsigned'}})
    tarantool> box.space.bitset_example:insert{1,1}
    tarantool> box.space.bitset_example:insert{2,4}
    tarantool> box.space.bitset_example:insert{3,7}
    tarantool> box.space.bitset_example:insert{4,3}
    tarantool> box.space.bitset_example.index.bitset:select(2, {iterator='BITS_ANY_SET'})
    

    The result will be:

    ---
    - - [3, 7]
      - [4, 3]
    ...
    

    because (7 AND 2) is not equal to 0, and (3 AND 2) is not equal to 0.

  • RTREE example:

    tarantool> box.schema.space.create('rtree_example')
    tarantool> box.space.rtree_example:create_index('primary')
    tarantool> box.space.rtree_example:create_index('rtree',{unique=false,type='RTREE', parts={field = 2, type = 'ARRAY'}})
    tarantool> box.space.rtree_example:insert{1, {3, 5, 9, 10}}
    tarantool> box.space.rtree_example:insert{2, {10, 11}}
    tarantool> box.space.rtree_example.index.rtree:select({4, 7, 5, 9}, {iterator = 'GT'})
    

    The result will be:

    ---
    - - [1, [3, 5, 9, 10]]
    ...
    

    because a rectangle whose corners are at coordinates 4,7,5,9 is entirely within a rectangle whose corners are at coordinates 3,5,9,10.

Additionally, there exist index iterator operations. They can only be used with code in Lua and C/C++. Index iterators are for traversing indexes one key at a time, taking advantage of features that are specific to an index type, for example evaluating Boolean expressions when traversing BITSET indexes, or going in descending order when traversing TREE indexes.

See also other index operations like alter() and drop() in reference for box.index submodule.

Complexity factors

In reference for box.space and box.index submodules, there are notes about which complexity factors might affect the resource usage of each function.

Complexity factor Effect
Index size The number of index keys is the same as the number of tuples in the data set. For a TREE index, if there are more keys, then the lookup time will be greater, although of course the effect is not linear. For a HASH index, if there are more keys, then there is more RAM used, but the number of low-level steps tends to remain constant.
Index type Typically, a HASH index is faster than a TREE index if the number of tuples in the space is greater than one.
Number of indexes accessed

Ordinarily, only one index is accessed to retrieve one tuple. But to update the tuple, there must be N accesses if the space has N different indexes.

Note re storage engine: Vinyl optimizes away such accesses if secondary index fields are unchanged by the update. So, this complexity factor applies only to memtx, since it always makes a full-tuple copy on every update.

Number of tuples accessed A few requests, for example SELECT, can retrieve multiple tuples. This factor is usually less important than the others.
WAL settings The important setting for the write-ahead log is wal_mode. If the setting causes no writing or delayed writing, this factor is unimportant. If the setting causes every data-change request to wait for writing to finish on a slow device, this factor is more important than all the others.

Transaction control

Transactions in Tarantool occur in fibers on a single thread. That is why Tarantool has a guarantee of execution atomicity. That requires emphasis.

Threads, fibers and yields

How does Tarantool process a basic operation? As an example, let’s take this query:

tarantool> box.space.tester:update({3}, {{'=', 2, 'size'}, {'=', 3, 0}})

This is equivalent to the following SQL statement for a table that stores primary keys in field[1]:

UPDATE tester SET "field[2]" = 'size', "field[3]" = 0 WHERE "field[1]" = 3

This query will be processed with three operating system threads:

  1. If we issue the query on a remote client, then the network thread on the server side receives the query, parses the statement and changes it to a server executable message which has already been checked, and which the server instance can understand without parsing everything again.

  2. The network thread ships this message to the instance’s transaction processor thread using a lock-free message bus. Lua programs execute directly in the transaction processor thread, and do not require parsing and preparation.

    The instance’s transaction processor thread uses the primary-key index on field[1] to find the location of the tuple. It determines that the tuple can be updated (not much can go wrong when you’re merely changing an unindexed field value to something shorter).

  3. The transaction processor thread sends a message to the write-ahead logging (WAL) thread to commit the transaction. When done, the WAL thread replies with a COMMIT or ROLLBACK result, which is returned to the client.

Notice that there is only one transaction processor thread in Tarantool. Some people are used to the idea that there can be multiple threads operating on the database, with (say) thread #1 reading row #x, while thread #2 writes row #y. With Tarantool, no such thing ever happens. Only the transaction processor thread can access the database, and there is only one transaction processor thread for each Tarantool instance.

Like any other Tarantool thread, the transaction processor thread can handle many fibers. A fiber is a set of computer instructions that may contain “yield” signals. The transaction processor thread will execute all computer instructions until a yield, then switch to execute the instructions of a different fiber. Thus (say) the thread reads row #x for the sake of fiber #1, then writes row #y for the sake of fiber #2.

Yields must happen, otherwise the transaction processor thread would stick permanently on the same fiber. There are two types of yields:

  • implicit yields: every data-change operation or network-access causes an implicit yield, and every statement that goes through the Tarantool client causes an implicit yield.
  • explicit yields: in a Lua function, you can (and should) add “yield” statements to prevent hogging. This is called cooperative multitasking.

Cooperative multitasking

Cooperative multitasking means: unless a running fiber deliberately yields control, it is not preempted by some other fiber. But a running fiber will deliberately yield when it encounters a “yield point”: a transaction commit, an operating system call, or an explicit “yield” request. Any system call which can block will be performed asynchronously, and any running fiber which must wait for a system call will be preempted, so that another ready-to-run fiber takes its place and becomes the new running fiber.

This model makes all programmatic locks unnecessary: cooperative multitasking ensures that there will be no concurrency around a resource, no race conditions, and no memory consistency issues.

When requests are small, for example simple UPDATE or INSERT or DELETE or SELECT, fiber scheduling is fair: it takes only a little time to process the request, schedule a disk write, and yield to a fiber serving the next client.

However, a function might perform complex computations or might be written in such a way that yields do not occur for a long time. This can lead to unfair scheduling, when a single client throttles the rest of the system, or to apparent stalls in request processing. Avoiding this situation is the responsibility of the function’s author.

Transactions

In the absence of transactions, any function that contains yield points may see changes in the database state caused by fibers that preempt. Multi-statement transactions exist to provide isolation: each transaction sees a consistent database state and commits all its changes atomically. At commit time, a yield happens and all transaction changes are written to the write ahead log in a single batch. Or, if needed, transaction changes can be rolled back – completely or to a specific savepoint.

To implement isolation, Tarantool uses a simple optimistic scheduler: the first transaction to commit wins. If a concurrent active transaction has read a value modified by a committed transaction, it is aborted.

The cooperative scheduler ensures that, in absence of yields, a multi-statement transaction is not preempted and hence is never aborted. Therefore, understanding yields is essential to writing abort-free code.

Note

You can’t mix storage engines in a transaction today.

Implicit yields

The only explicit yield requests in Tarantool are fiber.sleep() and fiber.yield(), but many other requests “imply” yields because Tarantool is designed to avoid blocking.

Database requests imply yields if and only if there is disk I/O. For memtx, since all data is in memory, there is no disk I/O during the request. For vinyl, since some data may not be in memory, there may be disk I/O for a read (to fetch data from disk) or for a write (because a stall may occur while waiting for memory to be free). For both memtx and vinyl, since data-change requests must be recorded in the WAL, there is normally a commit. A commit happens automatically after every request in default “autocommit” mode, or a commit happens at the end of a transaction in “transaction” mode, when a user deliberately commits by calling box.commit(). Therefore for both memtx and vinyl, because there can be disk I/O, some database operations may imply yields.

Many functions in modules fio, net_box, console and socket (the “os” and “network” requests) yield.

Example #1

  • Engine = memtx
    select() insert() has one yield, at the end of insertion, caused by implicit commit; select() has nothing to write to the WAL and so does not yield.
  • Engine = vinyl
    select() insert() has between one and three yields, since select() may yield if the data is not in cache, insert() may yield waiting for available memory, and there is an implicit yield at commit.
  • The sequence begin() insert() insert() commit() yields only at commit if the engine is memtx, and can yield up to 3 times if the engine is vinyl.

Example #2

Assume that in space ‘tester’ there are tuples in which the third field represents a positive dollar amount. Let’s start a transaction, withdraw from tuple#1, deposit in tuple#2, and end the transaction, making its effects permanent.

tarantool> function txn_example(from, to, amount_of_money)
         >   box.begin()
         >   box.space.tester:update(from, {{'-', 3, amount_of_money}})
         >   box.space.tester:update(to,   {{'+', 3, amount_of_money}})
         >   box.commit()
         >   return "ok"
         > end
---
...
tarantool> txn_example({999}, {1000}, 1.00)
---
- "ok"
...

If wal_mode = ‘none’, then implicit yielding at commit time does not take place, because there are no writes to the WAL.

If a task is interactive – sending requests to the server and receiving responses – then it involves network IO, and therefore there is an implicit yield, even if the request that is sent to the server is not itself an implicit yield request. Therefore, the sequence:

select
select
select

causes blocking (in memtx), if it is inside a function or Lua program being executed on the server instance, but causes yielding (in both memtx and vinyl) if it is done as a series of transmissions from a client, including a client which operates via telnet, via one of the connectors, or via the MySQL and PostgreSQL rocks, or via the interactive mode when using Tarantool as a client.

After a fiber has yielded and then has regained control, it immediately issues testcancel.

Access control

Understanding security details is primarily an issue for administrators. However, ordinary users should at least skim this section to get an idea of how Tarantool makes it possible for administrators to prevent unauthorized access to the database and to certain functions.

Briefly:

  • There is a method to guarantee with password checks that users really are who they say they are (“authentication”).
  • There is a _user system space, where usernames and password-hashes are stored.
  • There are functions for saying that certain users are allowed to do certain things (“privileges”).
  • There is a _priv system space, where privileges are stored. Whenever a user tries to do an operation, there is a check whether the user has the privilege to do the operation (“access control”).

Details follow.

Users

There is a current user for any program working with Tarantool, local or remote. If a remote connection is using a binary port, the current user, by default, is ‘guest’. If the connection is using an admin-console port, the current user is ‘admin’. When executing a Lua initialization script, the current user is also ‘admin’.

The current user name can be found with box.session.user().

The current user can be changed:

  • For a binary port connection – with the AUTH protocol command, supported by most clients;
  • For an admin-console connection and in a Lua initialization script – with box.session.su;
  • For a binary-port connection invoking a stored function with the CALL command – if the SETUID property is enabled for the function, Tarantool temporarily replaces the current user with the function’s creator, with all the creator’s privileges, during function execution.

Passwords

Each user (except ‘guest’) may have a password. The password is any alphanumeric string.

Tarantool passwords are stored in the _user system space with a cryptographic hash function so that, if the password is ‘x’, the stored hash-password is a long string like ‘lL3OvhkIPOKh+Vn9Avlkx69M/Ck=‘. When a client connects to a Tarantool instance, the instance sends a random salt value which the client must mix with the hashed-password before sending to the instance. Thus the original value ‘x’ is never stored anywhere except in the user’s head, and the hashed value is never passed down a network wire except when mixed with a random salt.

Note

For more details of the password hashing algorithm (e.g. for the purpose of writing a new client application), read the scramble.h header file.

This system prevents malicious onlookers from finding passwords by snooping in the log files or snooping on the wire. It is the same system that MySQL introduced several years ago, which has proved adequate for medium-security installations. Nevertheless, administrators should warn users that no system is foolproof against determined long-term attacks, so passwords should be guarded and changed occasionally. Administrators should also advise users to choose long unobvious passwords, but it is ultimately up to the users to choose or change their own passwords.

There are two functions for managing passwords in Tarantool: box.schema.user.passwd() for changing a user’s password and box.schema.user.password() for getting a hash of a user’s password.

Owners and privileges

Tarantool has one database. It may be called “box.schema” or “universe”. The database contains database objects, including spaces, indexes, users, roles, sequences, and functions.

The owner of a database object is the user who created it. The owner of the database itself, and the owner of objects that are created initially (the system spaces and the default users) is ‘admin’.

Owners automatically have privileges for what they create. They can share these privileges with other users or with roles, using box.schema.user.grant requests. The following privileges can be granted:

  • ‘read’, e.g. allow select from a space
  • ‘write’, e.g. allow update on a space
  • ‘execute’, e.g. allow call of a function, or (less commonly) allow use of a role
  • ‘create’, e.g. allow box.schema.space.create (access to certain system spaces is also necessary)
  • ‘alter’, e.g. allow box.space.x.index.y:alter (access to certain system spaces is also necessary)
  • ‘drop’, e.g. allow box.sequence.x:drop (currently this can be granted but has no effect)
  • ‘usage’, e.g. whether any action is allowable regardless of other privileges (sometimes revoking ‘usage’ is a convenient way to block a user temporarily without dropping the user)
  • ‘session’, e.g. whether the user can ‘connect’.

To create objects, users need the ‘create’ privilege and at least ‘read’ and ‘write’ privileges on the system space with a similar name (for example, on the _space if the user needs to create spaces).

To access objects, users need an appropriate privilege on the object (for example, the ‘execute’ privilege on function F if the users need to execute function F). See below some examples for granting specific privileges that a grantor – that is, ‘admin’ or the object creator – can make.

To drop an object, users must be the object’s creator or be ‘admin’. As the owner of the entire database, ‘admin’ can drop any object including other users.

To grant privileges to a user, the object owner says grant(). To revoke privileges from a user, the object owner says revoke(). In either case, there are up to five parameters:

(user-name, privilege, object-type [, object-name [, options]])
  • user-name is the user (or role) that will receive or lose the privilege;

  • privilege is any of ‘read’, ‘write’, ‘execute’, ‘create’, ‘alter’, ‘drop’, ‘usage’, or ‘session’ (or a comma-separated list);

  • object-type is any of ‘space’, ‘index’, ‘sequence’, ‘function’, role-name, or ‘universe’;

  • object-name is what the privilege is for (omitted if object-type is ‘universe’);

  • options is a list inside braces for example {if_not_exists=true|false} (usually omitted because the default is acceptable).

    Every update of user privileges is reflected immediately in the existing sessions and objects, e.g. functions.

Example for granting many privileges at once

In this example user ‘admin’ grants many privileges on many objects to user ‘U’, with a single request.

box.schema.user.grant('U','read,write,execute,create,drop','universe')

Examples for granting privileges for specific operations

In these examples the object’s creator grants precisely the minimal privileges necessary for particular operations, to user ‘U’.

-- So that 'U' can create spaces:
  box.schema.user.grant('U','create','universe')
  box.schema.user.grant('U','write', 'space', '_schema')
  box.schema.user.grant('U','write', 'space', '_space')
-- So that 'U' can  create indexes (assuming 'U' created the space)
  box.schema.user.grant('U','read', 'space', '_space')
  box.schema.user.grant('U','read,write', 'space', '_index')
-- So that 'U' can  create indexes on space T (assuming 'U' did not create space T)
  box.schema.user.grant('U','create','space','T')
  box.schema.user.grant('U','read', 'space', '_space')
  box.schema.user.grant('U','write', 'space', '_index')
-- So that 'U' can  alter indexes on space T (assuming 'U' did not create the index)
  box.schema.user.grant('U','alter','space','T')
  box.schema.user.grant('U','read','space','_space')
  box.schema.user.grant('U','read','space','_index')
  box.schema.user.grant('U','read','space','_space_sequence')
  box.schema.user.grant('U','write','space','_index')
-- So that 'U' can create users or roles:
  box.schema.user.grant('U','create','universe')
  box.schema.user.grant('U','read,write', 'space', '_user')
  box.schema.user.grant('U','write','space', '_priv')
-- So that 'U' can create sequences:
  box.schema.user.grant('U','create','universe')
  box.schema.user.grant('U','read,write','space','_sequence')
-- So that 'U' can create functions:
  box.schema.user.grant('U','create','universe')
  box.schema.user.grant('U','read,write','space','_func')
-- So that 'U' can grant access on objects that 'U' created
  box.schema.user.grant('U','read','space','_user')
-- So that 'U' can select or get from a space named 'T'
  box.schema.user.grant('U','read','space','T')
-- So that 'U' can update or insert or delete or truncate a space named 'T'
  box.schema.user.grant('U','write','space','T')
-- So that 'U' can execute a function named 'F'
  box.schema.user.grant('U','execute','function','F')
-- So that 'U' can use the "S:next()" function with a sequence named S
  box.schema.user.grant('U','read,write','sequence','S')
-- So that 'U' can use the "S:set()" or "S:reset() function with a sequence named S
  box.schema.user.grant('U','write','sequence','S')

Example for creating users and objects then granting privileges

Here we create a Lua function that will be executed under the user id of its creator, even if called by another user.

First, we create two spaces (‘u’ and ‘i’) and grant a no-password user (‘internal’) full access to them. Then we define a function (‘read_and_modify’) and the no-password user becomes this function’s creator. Finally, we grant another user (‘public_user’) access to execute Lua functions created by the no-password user.

box.schema.space.create('u')
box.schema.space.create('i')
box.space.u:create_index('pk')
box.space.i:create_index('pk')

box.schema.user.create('internal')

box.schema.user.grant('internal', 'read,write', 'space', 'u')
box.schema.user.grant('internal', 'read,write', 'space', 'i')
box.schema.user.grant('internal', 'create', 'universe')
box.schema.user.grant('internal', 'read,write', 'space', '_func')

function read_and_modify(key)
  local u = box.space.u
  local i = box.space.i
  local fiber = require('fiber')
  local t = u:get{key}
  if t ~= nil then
    u:put{key, box.session.uid()}
    i:put{key, fiber.time()}
  end
end

box.session.su('internal')
box.schema.func.create('read_and_modify', {setuid= true})
box.session.su('admin')
box.schema.user.create('public_user', {password = 'secret'})
box.schema.user.grant('public_user', 'execute', 'function', 'read_and_modify')

Roles

A role is a container for privileges which can be granted to regular users. Instead of granting or revoking individual privileges, you can put all the privileges in a role and then grant or revoke the role.

Role information is stored in the _user space, but the third field in the tuple – the type field – is ‘role’ rather than ‘user’.

An important feature in role management is that roles can be nested. For example, role R1 can be granted a privilege “role R2”, so users with the role R1 will subsequently get all privileges from both roles R1 and R2. In other words, a user gets all the privileges that are granted to a user’s roles, directly or indirectly.

There are actually two ways to grant or revoke a role: box.schema.user.grant-or-revoke(user-name-or-role-name,'execute', 'role',role-name...) or box.schema.user.grant-or-revoke(user-name-or-role-name,role-name...). The second way is preferable.

The ‘usage’ and ‘session’ privileges cannot be granted to roles.

Example

-- This example will work for a user with many privileges, such as 'admin'
-- or a user with the pre-defined 'super' role
-- Create space T with a primary index
box.schema.space.create('T')
box.space.T:create_index('primary', {})
-- Create user U1 so that later we can change the current user to U1
box.schema.user.create('U1')
-- Create two roles, R1 and R2
box.schema.role.create('R1')
box.schema.role.create('R2')
-- Grant role R2 to role R1 and role R1 to user U1 (order doesn't matter)
-- There are two ways to grant a role; here we use the shorter way
box.schema.role.grant('R1', 'R2')
box.schema.user.grant('U1', 'R1')
-- Grant read/write privileges for space T to role R2
-- (but not to role R1 and not to user U1)
box.schema.role.grant('R2', 'read,write', 'space', 'T')
-- Change the current user to user U1
box.session.su('U1')
-- An insertion to space T will now succeed because, due to nested roles,
-- user U1 has write privilege on space T
box.space.T:insert{1}

For more detail see box.schema.user.grant() and box.schema.role.grant() in the built-in modules reference.

Sessions and security

A session is the state of a connection to Tarantool. It contains:

  • an integer id identifying the connection,
  • the current user associated with the connection,
  • text description of the connected peer, and
  • session local state, such as Lua variables and functions.

In Tarantool, a single session can execute multiple concurrent transactions. Each transaction is identified by a unique integer id, which can be queried at start of the transaction using box.session.sync().

Note

To track all connects and disconnects, you can use connection and authentication triggers.

Triggers

Triggers, also known as callbacks, are functions which the server executes when certain events happen.

There are six types of triggers in Tarantool:

All triggers have the following characteristics:

  • Triggers associate a function with an event. The request to “define a trigger” implies passing the trigger’s function to one of the “on_event()” functions:
  • Triggers are defined only by the ‘admin’ user.
  • Triggers are stored in the Tarantool instance’s memory, not in the database. Therefore triggers disappear when the instance is shut down. To make them permanent, put function definitions and trigger settings into Tarantool’s initialization script.
  • Triggers have low overhead. If a trigger is not defined, then the overhead is minimal: merely a pointer dereference and check. If a trigger is defined, then its overhead is equivalent to the overhead of calling a function.
  • There can be multiple triggers for one event. In this case, triggers are executed in the reverse order that they were defined in. (Exception: member triggers are executed in the order that they appear in the member list.)
  • Triggers must work within the event context. However, effects are undefined if a function contains requests which normally could not occur immediately after the event, but only before the return from the event. For example, putting os.exit() or box.rollback() in a trigger function would be bringing in requests outside the event context.
  • Triggers are replaceable. The request to “redefine a trigger” implies passing a new trigger function and an old trigger function to one of the “on_event()” functions.
  • The “on_event()” functions all have parameters which are function pointers, and they all return function pointers. Remember that a Lua function definition such as “function f() x = x + 1 end” is the same as “f = function () x = x + 1 end” – in both cases f gets a function pointer. And “trigger = box.session.on_connect(f)” is the same as “trigger = box.session.on_connect(function () x = x + 1 end)” – in both cases trigger gets the function pointer which was passed.
  • You can call any “on_event()” function with no arguments to get a list of its triggers. For example, use box.session.on_connect() to return a table of all connect-trigger functions.

Example

Here we log connect and disconnect events into Tarantool server log.

log = require('log')

function on_connect_impl()
  log.info("connected "..box.session.peer()..", sid "..box.session.id())
end

function on_disconnect_impl()
  log.info("disconnected, sid "..box.session.id())
end

function on_auth_impl(user)
  log.info("authenticated sid "..box.session.id().." as "..user)
end"

function on_connect() pcall(on_connect_impl) end
function on_disconnect() pcall(on_disconnect_impl) end
function on_auth(user) pcall(on_auth_impl, user) end

box.session.on_connect(on_connect)
box.session.on_disconnect(on_disconnect)
box.session.on_auth(on_auth)

Limitations

Number of parts in an index

For TREE or HASH indexes, the maximum is 255 (box.schema.INDEX_PART_MAX). For RTREE indexes, the maximum is 1 but the field is an ARRAY of up to 20 dimensions. For BITSET indexes, the maximum is 1.

Number of indexes in a space

128 (box.schema.INDEX_MAX).

Number of fields in a tuple

The theoretical maximum is 2,147,483,647 (box.schema.FIELD_MAX). The practical maximum is whatever is specified by the space’s field_count member, or the maximal tuple length.

Number of bytes in a tuple

The maximal number of bytes in a tuple is roughly equal to memtx_max_tuple_size or vinyl_max_tuple_size (with a metadata overhead of about 20 bytes per tuple, which is added on top of useful bytes). By default, the value of either memtx_max_tuple_size or vinyl_max_tuple_size is 1,048,576. To increase it, specify a larger value when starting the Tarantool instance. For example, box.cfg{memtx_max_tuple_size=2*1048576}.

Number of bytes in an index key

If a field in a tuple can contain a million bytes, then the index key can contain a million bytes, so the maximum is determined by factors such as Number of bytes in a tuple, not by the index support.

Number of spaces

The theoretical maximum is 2147483647 (box.schema.SPACE_MAX) but the practical maximum is around 65,000.

Number of connections

The practical limit is the number of file descriptors that one can set with the operating system.

Space size

The total maximum size for all spaces is in effect set by memtx_memory, which in turn is limited by the total available memory.

Update operations count

The maximum number of operations per tuple that can be in a single update is 4000 (BOX_UPDATE_OP_CNT_MAX).

Number of users and roles

32 (BOX_USER_MAX).

Length of an index name or space name or user name

65000 (box.schema.NAME_MAX).

Number of replicas in a replica set

32 (vclock.VCLOCK_MAX).

Storage engines

A storage engine is a set of very-low-level routines which actually store and retrieve tuple values. Tarantool offers a choice of two storage engines:

  • memtx (the in-memory storage engine) is the default and was the first to arrive.

  • vinyl (the on-disk storage engine) is a working key-value engine and will especially appeal to users who like to see data go directly to disk, so that recovery time might be shorter and database size might be larger.

    On the other hand, vinyl lacks some functions and options that are available with memtx. Where that is the case, the relevant description in this manual contains a note beginning with the words “Note re storage engine”.

Further in this section we discuss the details of storing data using the vinyl storage engine.

To specify that the engine should be vinyl, add the clause engine = 'vinyl' when creating a space, for example:

space = box.schema.space.create('name', {engine='vinyl'})

Differences between memtx and vinyl storage engines

The primary difference between memtx and vinyl is that memtx is an “in-memory” engine while vinyl is an “on-disk” engine. An in-memory storage engine is generally faster (each query is usually run under 1 ms), and the memtx engine is justifiably the default for Tarantool, but on-disk engine such as vinyl is preferable when the database is larger than the available memory and adding more memory is not a realistic option.

Option memtx vinyl
Supported index type TREE, HASH, RTREE or BITSET TREE
Temporary spaces Supported Not supported
random() function Supported Not supported
alter() function Supported Supported starting from the 1.10.2 release (the primary index cannot be modified)
len() function Returns the number of tuples in the space Returns the maximum approximate number of tuples in the space
count() function Takes a constant amount of time Takes a variable amount of time depending on a state of a DB
delete() function Returns the deleted tuple, if any Always returns nil
yield Does not yield on the select requests unless the transaction is commited to WAL Yields on the select requests or on its equivalents: get() or pairs()

Storing data with vinyl

Tarantool is a transactional and persistent DBMS that maintains 100% of its data in RAM. The greatest advantages of in-memory databases are their speed and ease of use: they demonstrate consistently high performance, but you never need to tune them.

A few years ago we decided to extend the product by implementing a classical storage engine similar to those used by regular DBMSes: it uses RAM for caching, while the bulk of its data is stored on disk. We decided to make it possible to set a storage engine independently for each table in the database, which is the same way that MySQL approaches it, but we also wanted to support transactions from the very beginning.

The first question we needed to answer was whether to create our own storage engine or use an existing library. The open-source community offered a few viable solutions. The RocksDB library was the fastest growing open-source library and is currently one of the most prominent out there. There were also several lesser-known libraries to consider, such as WiredTiger, ForestDB, NestDB, and LMDB.

Nevertheless, after studying the source code of existing libraries and considering the pros and cons, we opted for our own storage engine. One reason is that the existing third-party libraries expected requests to come from multiple operating system threads and thus contained complex synchronization primitives for controlling parallel data access. If we had decided to embed one of these in Tarantool, we would have made our users bear the overhead of a multithreaded application without getting anything in return. The thing is, Tarantool has an actor-based architecture. The way it processes transactions in a dedicated thread allows it to do away with the unnecessary locks, interprocess communication, and other overhead that accounts for up to 80% of processor time in multithreaded DBMSes.

_images/actor_threads.png

The Tarantool process consists of a fixed number of “actor” threads

If you design a database engine with cooperative multitasking in mind right from the start, it not only significantly speeds up the development process, but also allows the implementation of certain optimization tricks that would be too complex for multithreaded engines. In short, using a third-party solution wouldn’t have yielded the best result.

Algorithm

Once the idea of using an existing library was off the table, we needed to pick an architecture to build upon. There are two competing approaches to on-disk data storage: the older one relies on B-trees and their variations; the newer one advocates the use of log-structured merge-trees, or “LSM” trees. MySQL, PostgreSQL, and Oracle use B-trees, while Cassandra, MongoDB, and CockroachDB have adopted LSM trees.

B-trees are considered better suited for reads and LSM trees—for writes. However, with SSDs becoming more widespread and the fact that SSDs have read throughput that’s several times greater than write throughput, the advantages of LSM trees in most scenarios was more obvious to us.

Before dissecting LSM trees in Tarantool, let’s take a look at how they work. To do that, we’ll begin by analyzing a regular B-tree and the issues it faces. A B-tree is a balanced tree made up of blocks, which contain sorted lists of key- value pairs. (Topics such as filling and balancing a B-tree or splitting and merging blocks are outside of the scope of this article and can easily be found on Wikipedia). As a result, we get a container sorted by key, where the smallest element is stored in the leftmost node and the largest one in the rightmost node. Let’s have a look at how insertions and searches in a B-tree happen.

_images/classical_b_tree.png

Classical B-tree

If you need to find an element or check its membership, the search starts at the root, as usual. If the key is found in the root block, the search stops; otherwise, the search visits the rightmost block holding the largest element that’s not larger than the key being searched (recall that elements at each level are sorted). If the first level yields no results, the search proceeds to the next level. Finally, the search ends up in one of the leaves and probably locates the needed key. Blocks are stored and read into RAM one by one, meaning the algorithm reads logB(N) blocks in a single search, where N is the number of elements in the B-tree. In the simplest case, writes are done similarly: the algorithm finds the block that holds the necessary element and updates (inserts) its value.

To better understand the data structure, let’s consider a practical example: say we have a B-tree with 100,000,000 nodes, a block size of 4096 bytes, and an element size of 100 bytes. Thus each block will hold up to 40 elements (all overhead considered), and the B-tree will consist of around 2,570,000 blocks and 5 levels: the first four will have a size of 256 Mb, while the last one will grow up to 10 Gb. Obviously, any modern computer will be able to store all of the levels except the last one in filesystem cache, so read requests will require just a single I/O operation.

But if we change our perspective —B-trees don’t look so good anymore. Suppose we need to update a single element. Since working with B-trees involves reading and writing whole blocks, we would have to read in one whole block, change our 100 bytes out of 4096, and then write the whole updated block to disk. In other words,we were forced to write 40 times more data than we actually modified!

If you take into account the fact that an SSD block has a size of 64 Kb+ and not every modification changes a whole element, the extra disk workload can be greater still.

Authors of specialized literature and blogs dedicated to on-disk data storage have coined two terms for these phenomena: extra reads are referred to as “read amplification” and writes as “write amplification”.

The amplification factor (multiplication coefficient) is calculated as the ratio of the size of actual read (or written) data to the size of data needed (or actually changed). In our B-tree example, the amplification factor would be around 40 for both reads and writes.

The huge number of extra I/O operations associated with updating data is one of the main issues addressed by LSM trees. Let’s see how they work.

The key difference between LSM trees and regular B-trees is that LSM trees don’t just store data (keys and values), but also data operations: insertions and deletions.

_images/lsm.png


LSM tree:

  • Stores statements, not values:
    • REPLACE
    • DELETE
    • UPSERT
  • Every statement is marked by LSN Append-only files, garbage is collected after a checkpoint
  • Transactional log of all filesystem changes: vylog

For example, an element corresponding to an insertion operation has, apart from a key and a value, an extra byte with an operation code (“REPLACE” in the image above). An element representing the deletion operation contains a key (since storing a value is unnecessary) and the corresponding operation code—“DELETE”. Also, each LSM tree element has a log sequence number (LSN), which is the value of a monotonically increasing sequence that uniquely identifies each operation. The whole tree is first ordered by key in ascending order, and then, within a single key scope, by LSN in descending order.

_images/lsm_single.png

A single level of an LSM tree

Filling an LSM tree

Unlike a B-tree, which is stored completely on disk and can be partly cached in RAM, when using an LSM tree, memory is explicitly separated from disk right from the start. The issue of volatile memory and data persistence is beyond the scope of the storage algorithm and can be solved in various ways—for example, by logging changes.

The part of an LSM tree that’s stored in RAM is called L0 (level zero). The size of RAM is limited, so L0 is allocated a fixed amount of memory. For example, in Tarantool, the L0 size is controlled by the vinyl_memory parameter. Initially, when an LSM tree is empty, operations are written to L0. Recall that all elements are ordered by key in ascending order, and then within a single key scope, by LSN in descending order, so when a new value associated with a given key gets inserted, it’s easy to locate the older value and delete it. L0 can be structured as any container capable of storing a sorted sequence of elements. For example, in Tarantool, L0 is implemented as a B+*-tree. Lookups and insertions are standard operations for the data structure underlying L0, so I won’t dwell on those.

Sooner or later the number of elements in an LSM tree exceeds the L0 size and that’s when L0 gets written to a file on disk (called a “run”) and then cleared for storing new elements. This operation is called a “dump”.

_images/dumps.png


Dumps on disk form a sequence ordered by LSN: LSN ranges in different runs don’t overlap, and the leftmost runs (at the head of the sequence) hold newer operations. Think of these runs as a pyramid, with the newest ones closer to the top. As runs keep getting dumped, the pyramid grows higher. Note that newer runs may contain deletions or replacements for existing keys. To remove older data, it’s necessary to perform garbage collection (this process is sometimes called “merge” or “compaction”) by combining several older runs into a new one. If two versions of the same key are encountered during a compaction, only the newer one is retained; however, if a key insertion is followed by a deletion, then both operations can be discarded.

_images/purge.png


The key choices determining an LSM tree’s efficiency are which runs to compact and when to compact them. Suppose an LSM tree stores a monotonically increasing sequence of keys (1, 2, 3, …,) with no deletions. In this case, compacting runs would be useless: all of the elements are sorted, the tree doesn’t have any garbage, and the location of any key can unequivocally be determined. On the other hand, if an LSM tree contains many deletions, doing a compaction would free up some disk space. However, even if there are no deletions, but key ranges in different runs overlap a lot, compacting such runs could speed up lookups as there would be fewer runs to scan. In this case, it might make sense to compact runs after each dump. But keep in mind that a compaction causes all data stored on disk to be overwritten, so with few reads it’s recommended to perform it less often.

To ensure it’s optimally configurable for any of the scenarios above, an LSM tree organizes all runs into a pyramid: the newer the data operations, the higher up the pyramid they are located. During a compaction, the algorithm picks two or more neighboring runs of approximately equal size, if possible.

_images/compaction.png


  • Multi-level compaction can span any number of levels
  • A level can contain multiple runs

All of the neighboring runs of approximately equal size constitute an LSM tree level on disk. The ratio of run sizes at different levels determines the pyramid’s proportions, which allows optimizing the tree for write-intensive or read-intensive scenarios.

Suppose the L0 size is 100 Mb, the ratio of run sizes at each level (the vinyl_run_size_ratio parameter) is 5, and there can be no more than 2 runs per level (the vinyl_run_count_per_level parameter). After the first 3 dumps, the disk will contain 3 runs of 100 Mb each—which constitute L1 (level one). Since 3 > 2, the runs will be compacted into a single 300 Mb run, with the older ones being deleted. After 2 more dumps, there will be another compaction, this time of 2 runs of 100 Mb each and the 300 Mb run, which will produce one 500 Mb run. It will be moved to L2 (recall that the run size ratio is 5), leaving L1 empty. The next 10 dumps will result in L2 having 3 runs of 500 Mb each, which will be compacted into a single 1500 Mb run. Over the course of 10 more dumps, the following will happen: 3 runs of 100 Mb each will be compacted twice, as will two 100 Mb runs and one 300 Mb run, which will yield 2 new 500 Mb runs in L2. Since L2 now has 3 runs, they will also be compacted: two 500 Mb runs and one 1500 Mb run will produce a 2500 Mb run that will be moved to L3, given its size.

This can go on infinitely, but if an LSM tree contains lots of deletions, the resulting compacted run can be moved not only down, but also up the pyramid due to its size being smaller than the sizes of the original runs that were compacted. In other words, it’s enough to logically track which level a certain run belongs to, based on the run size and the smallest and greatest LSN among all of its operations.

Controlling the form of an LSM tree

If it’s necessary to reduce the number of runs for lookups, then the run size ratio can be increased, thus bringing the number of levels down. If, on the other hand, you need to minimize the compaction-related overhead, then the run size ratio can be decreased: the pyramid will grow higher, and even though runs will be compacted more often, they will be smaller, which will reduce the total amount of work done. In general, write amplification in an LSM tree is described by this formula: log_{x}(\frac {N} {L0}) × x or, alternatively, x × \frac {ln (\frac {N} {C0})} {ln(x)}, where N is the total size of all tree elements, L0 is the level zero size, and x is the level size ratio (the level_size_ratio parameter). At \frac {N} {C0} = 40 (the disk-to- memory ratio), the plot would look something like this:

_images/curve.png


As for read amplification, it’s proportional to the number of levels. The lookup cost at each level is no greater than that for a B-tree. Getting back to the example of a tree with 100,000,000 elements: given 256 Mb of RAM and the default values of vinyl_level_size_ratio and run_count_per_level, write amplification would come out to about 13, while read amplification could be as high as 150. Let’s try to figure out why this happens.

Range searching

In the case of a single-key search, the algorithm stops after encountering the first match. However, when searching within a certain key range (for example, looking for all the users with the last name “Ivanov”), it’s necessary to scan all tree levels.

_images/range_search.png

Searching within a range of [24,30)

The required range is formed the same way as when compacting several runs: the algorithm picks the key with the largest LSN out of all the sources, ignoring the other associated operations, then moves on to the next key and repeats the procedure.

Deletion

Why would one store deletions? And why doesn’t it lead to a tree overflow in the case of for i=1,10000000 put(i) delete(i) end?

With regards to lookups, deletions signal the absence of a value being searched; with compactions, they clear the tree of “garbage” records with older LSNs.

While the data is in RAM only, there’s no need to store deletions. Similarly, you don’t need to keep them following a compaction if they affect, among other things, the lowest tree level, which contains the oldest dump. Indeed, if a value can’t be found at the lowest level, then it doesn’t exist in the tree.

  • We can’t delete from append-only files
  • Tombstones (delete markers) are inserted into L0 instead
_images/deletion_1.png

Deletion, step 1: a tombstone is inserted into L0

_images/deletion_2.png

Deletion, step 2: the tombstone passes through intermediate levels

_images/deletion_3.png

Deletion, step 3: in the case of a major compaction, the tombstone is removed from the tree

If a deletion is known to come right after the insertion of a unique value, which is often the case when modifying a value in a secondary index, then the deletion can safely be filtered out while compacting intermediate tree levels. This optimization is implemented in vinyl.

Advantages of an LSM tree

Apart from decreasing write amplification, the approach that involves periodically dumping level L0 and compacting levels L1-Lk has a few advantages over the approach to writes adopted by B-trees:

  • Dumps and compactions write relatively large files: typically, the L0 size is 50-100 Mb, which is thousands of times larger than the size of a B-tree block.
  • This large size allows efficiently compressing data before writing it. Tarantool compresses data automatically, which further decreases write amplification.
  • There is no fragmentation overhead, since there’s no padding/empty space between the elements inside a run.
  • All operations create new runs instead of modifying older data in place. This allows avoiding those nasty locks that everyone hates so much. Several operations can run in parallel without causing any conflicts. This also simplifies making backups and moving data to replicas.
  • Storing older versions of data allows for the efficient implementation of transaction support by using multiversion concurrency control.
Disadvantages of an LSM tree and how to deal with them

One of the key advantages of the B-tree as a search data structure is its predictability: all operations take no longer than log_{B}(N) to run. Conversely, in a classical LSM tree, both read and write speeds can differ by a factor of hundreds (best case scenario) or even thousands (worst case scenario). For example, adding just one element to L0 can cause it to overflow, which can trigger a chain reaction in levels L1, L2, and so on. Lookups may find the needed element in L0 or may need to scan all of the tree levels. It’s also necessary to optimize reads within a single level to achieve speeds comparable to those of a B-tree. Fortunately, most disadvantages can be mitigated or even eliminated with additional algorithms and data structures. Let’s take a closer look at these disadvantages and how they’re dealt with in Tarantool.

Unpredictable write speed

In an LSM tree, insertions almost always affect L0 only. How do you avoid idle time when the memory area allocated for L0 is full?

Clearing L0 involves two lengthy operations: writing to disk and memory deallocation. To avoid idle time while L0 is being dumped, Tarantool uses writeaheads. Suppose the L0 size is 256 Mb. The disk write speed is 10 Mbps. Then it would take 26 seconds to dump L0. The insertion speed is 10,000 RPS, with each key having a size of 100 bytes. While L0 is being dumped, it’s necessary to reserve 26 Mb of RAM, effectively slicing the L0 size down to 230 Mb.

Tarantool does all of these calculations automatically, constantly updating the rolling average of the DBMS workload and the histogram of the disk speed. This allows using L0 as efficiently as possible and it prevents write requests from timing out. But in the case of workload surges, some wait time is still possible. That’s why we also introduced an insertion timeout (the vinyl_timeout parameter), which is set to 60 seconds by default. The write operation itself is executed in dedicated threads. The number of these threads (2 by default) is controlled by the vinyl_write_threads parameter. The default value of 2 allows doing dumps and compactions in parallel, which is also necessary for ensuring system predictability.

In Tarantool, compactions are always performed independently of dumps, in a separate execution thread. This is made possible by the append-only nature of an LSM tree: after dumps runs are never changed, and compactions simply create new runs.

Delays can also be caused by L0 rotation and the deallocation of memory dumped to disk: during a dump, L0 memory is owned by two operating system threads, a transaction processing thread and a write thread. Even though no elements are being added to the rotated L0, it can still be used for lookups. To avoid read locks when doing lookups, the write thread doesn’t deallocate the dumped memory, instead delegating this task to the transaction processor thread. Following a dump, memory deallocation itself happens instantaneously: to achieve this, L0 uses a special allocator that deallocates all of the memory with a single operation.

_images/dump_from_shadow.png
  • anticipatory dump
  • throttling

The dump is performed from the so-called “shadow” L0 without blocking new insertions and lookups

Unpredictable read speed

Optimizing reads is the most difficult optimization task with regards to LSM trees. The main complexity factor here is the number of levels: any optimization causes not only much slower lookups, but also tends to require significantly larger RAM resources. Fortunately, the append-only nature of LSM trees allows us to address these problems in ways that would be nontrivial for traditional data structures.

_images/read_speed.png
  • page index
  • bloom filters
  • tuple range cache
  • multi-level compaction
Compression and page index

In B-trees, data compression is either the hardest problem to crack or a great marketing tool—rather than something really useful. In LSM trees, compression works as follows:

During a dump or compaction all of the data within a single run is split into pages. The page size (in bytes) is controlled by the vinyl_page_size parameter and can be set separately for each index. A page doesn’t have to be exactly of vinyl_page_size size—depending on the data it holds, it can be a little bit smaller or larger. Because of this, pages never have any empty space inside.

Data is compressed by Facebook’s streaming algorithm called “zstd”. The first key of each page, along with the page offset, is added to a “page index”, which is a separate file that allows the quick retrieval of any page. After a dump or compaction, the page index of the created run is also written to disk.

All .index files are cached in RAM, which allows finding the necessary page with a single lookup in a .run file (in vinyl, this is the extension of files resulting from a dump or compaction). Since data within a page is sorted, after it’s read and decompressed, the needed key can be found using a regular binary search. Decompression and reads are handled by separate threads, and are controlled by the vinyl_read_threads parameter.

Tarantool uses a universal file format: for example, the format of a .run file is no different from that of an .xlog file (log file). This simplifies backup and recovery as well as the usage of external tools.

Bloom filters

Even though using a page index enables scanning fewer pages per run when doing a lookup, it’s still necessary to traverse all of the tree levels. There’s a special case, which involves checking if particular data is absent when scanning all of the tree levels and it’s unavoidable: I’m talking about insertions into a unique index. If the data being inserted already exists, then inserting the same data into a unique index should lead to an error. The only way to throw an error in an LSM tree before a transaction is committed is to do a search before inserting the data. Such reads form a class of their own in the DBMS world and are called “hidden” or “parasitic” reads.

Another operation leading to hidden reads is updating a value in a field on which a secondary index is defined. Secondary keys are regular LSM trees that store differently ordered data. In most cases, in order not to have to store all of the data in all of the indexes, a value associated with a given key is kept in whole only in the primary index (any index that stores both a key and a value is called “covering” or “clustered”), whereas the secondary index only stores the fields on which a secondary index is defined, and the values of the fields that are part of the primary index. Thus, each time a change is made to a value in a field on which a secondary index is defined, it’s necessary to first remove the old key from the secondary index—and only then can the new key be inserted. At update time, the old value is unknown, and it is this value that needs to be read in from the primary key “under the hood”.

For example:

update t1 set city=’Moscow’ where id=1

To minimize the number of disk reads, especially for nonexistent data, nearly all LSM trees use probabilistic data structures, and Tarantool is no exception. A classical Bloom filter is made up of several (usually 3-to-5) bit arrays. When data is written, several hash functions are calculated for each key in order to get corresponding array positions. The bits at these positions are then set to 1. Due to possible hash collisions, some bits might be set to 1 twice. We’re most interested in the bits that remain 0 after all keys have been added. When looking for an element within a run, the same hash functions are applied to produce bit positions in the arrays. If any of the bits at these positions is 0, then the element is definitely not in the run. The probability of a false positive in a Bloom filter is calculated using Bayes’ theorem: each hash function is an independent random variable, so the probability of a collision simultaneously occurring in all of the bit arrays is infinitesimal.

The key advantage of Bloom filters in Tarantool is that they’re easily configurable. The only parameter that can be specified separately for each index is called bloom_fpr (FPR stands for “false positive ratio”) and it has the default value of 0.05, which translates to a 5% FPR. Based on this parameter, Tarantool automatically creates Bloom filters of the optimal size for partial- key and full-key searches. The Bloom filters are stored in the .index file, along with the page index, and are cached in RAM.

Caching

A lot of people think that caching is a silver bullet that can help with any performance issue. “When in doubt, add more cache”. In vinyl, caching is viewed rather as a means of reducing the overall workload and consequently, of getting a more stable response time for those requests that don’t hit the cache. vinyl boasts a unique type of cache among transactional systems called a “range tuple cache”. Unlike, say, RocksDB or MySQL, this cache doesn’t store pages, but rather ranges of index values obtained from disk, after having performed a compaction spanning all tree levels. This allows the use of caching for both single-key and key-range searches. Since this method of caching stores only hot data and not, say, pages (you may need only some data from a page), RAM is used in the most efficient way possible. The cache size is controlled by the vinyl_cache parameter.

Garbage collection control

Chances are that by now you’ve started losing focus and need a well-deserved dopamine reward. Feel free to take a break, since working through the rest of the article is going to take some serious mental effort.

An LSM tree in vinyl is just a small piece of the puzzle. Even with a single table (or so-called “space”), vinyl creates and maintains several LSM trees, one for each index. But even a single index can be comprised of dozens of LSM trees. Let’s try to understand why this might be necessary.

Recall our example with a tree containing 100,000,000 records, 100 bytes each. As time passes, the lowest LSM level may end up holding a 10 Gb run. During compaction, a temporary run of approximately the same size will be created. Data at intermediate levels takes up some space as well, since the tree may store several operations associated with a single key. In total, storing 10 Gb of actual data may require up to 30 Gb of free space: 10 Gb for the last tree level, 10 Gb for a temporary run, and 10 Gb for the remaining data. But what if the data size is not 10 Gb, but 1 Tb? Requiring that the available disk space always be several times greater than the actual data size is financially unpractical, not to mention that it may take dozens of hours to create a 1 Tb run. And in the case of an emergency shutdown or system restart, the process would have to be started from scratch.

Here’s another scenario. Suppose the primary key is a monotonically increasing sequence—for example, a time series. In this case, most insertions will fall into the right part of the key range, so it wouldn’t make much sense to do a compaction just to append a few million more records to an already huge run.

But what if writes predominantly occur in a particular region of the key range, whereas most reads take place in a different region? How do you optimize the form of the LSM tree in this case? If it’s too high, read performance is impacted; if it’s too low—write speed is reduced.

Tarantool “factorizes” this problem by creating multiple LSM trees for each index. The approximate size of each subtree may be controlled by the vinyl_range_size configuration parameter. We call such subtrees “ranges”.

_images/factor_lsm.png


Factorizing large LSM trees via ranging

  • Ranges reflect a static layout of sorted runs
  • Slices connect a sorted run into a range

Initially, when the index has few elements, it consists of a single range. As more elements are added, its total size may exceed the maximum range size. In that case a special operation called “split” divides the tree into two equal parts. The tree is split at the middle element in the range of keys stored in the tree. For example, if the tree initially stores the full range of -inf…+inf, then after splitting it at the middle key X, we get two subtrees: one that stores the range of -inf…X, and the other storing the range of X…+inf. With this approach, we always know which subtree to use for writes and which one for reads. If the tree contained deletions and each of the neighboring ranges grew smaller as a result, the opposite operation called “coalesce” combines two neighboring trees into one.

Split and coalesce don’t entail a compaction, the creation of new runs, or other resource-intensive operations. An LSM tree is just a collection of runs. vinyl has a special metadata log that helps keep track of which run belongs to which subtree(s). This has the .vylog extension and its format is compatible with an .xlog file. Similarly to an .xlog file, the metadata log gets rotated at each checkpoint. To avoid the creation of extra runs with split and coalesce, we have also introduced an auxiliary entity called “slice”. It’s a reference to a run containing a key range and it’s stored only in the metadata log. Once the reference counter drops to zero, the corresponding file gets removed. When it’s necessary to perform a split or to coalesce, Tarantool creates slice objects for each new tree, removes older slices, and writes these operations to the metadata log, which literally stores records that look like this: <tree id, slice id> or <slice id, run id, min, max>.

This way all of the heavy lifting associated with splitting a tree into two subtrees is postponed until a compaction and then is performed automatically. A huge advantage of dividing all of the keys into ranges is the ability to independently control the L0 size as well as the dump and compaction processes for each subtree, which makes these processes manageable and predictable. Having a separate metadata log also simplifies the implementation of both “truncate” and “drop”. In vinyl, they’re processed instantly, since they only work with the metadata log, while garbage collection is done in the background.

Advanced features of vinyl
Upsert

In the previous sections, we mentioned only two operations stored by an LSM tree: deletion and replacement. Let’s take a look at how all of the other operations can be represented. An insertion can be represented via a replacement—you just need to make sure there are no other elements with the specified key. To perform an update, it’s necessary to read the older value from the tree, so it’s easier to represent this operation as a replacement as well—this speeds up future read requests by the key. Besides, an update must return the new value, so there’s no avoiding hidden reads.

In B-trees, the cost of hidden reads is negligible: to update a block, it first needs to be read from disk anyway. Creating a special update operation for an LSM tree that doesn’t cause any hidden reads is really tempting.

Such an operation must contain not only a default value to be inserted if a key has no value yet, but also a list of update operations to perform if a value does exist.

At transaction execution time, Tarantool just saves the operation in an LSM tree, then “executes” it later, during a compaction.

The upsert operation:

space:upsert(tuple, {{operator, field, value}, ... })
  • Non-reading update or insert
  • Delayed execution
  • Background upsert squashing prevents upserts from piling up

Unfortunately, postponing the operation execution until a compaction doesn’t leave much leeway in terms of error handling. That’s why Tarantool tries to validate upserts as fully as possible before writing them to an LSM tree. However, some checks are only possible with older data on hand, for example when the update operation is trying to add a number to a string or to remove a field that doesn’t exist.

A semantically similar operation exists in many products including PostgreSQL and MongoDB. But anywhere you look, it’s just syntactic sugar that combines the update and replace operations without avoiding hidden reads. Most probably, the reason is that LSM trees as data storage structures are relatively new.

Even though an upsert is a very important optimization and implementing it cost us a lot of blood, sweat, and tears, we must admit that it has limited applicability. If a table contains secondary keys or triggers, hidden reads can’t be avoided. But if you have a scenario where secondary keys are not required and the update following the transaction completion will certainly not cause any errors, then the operation is for you.

I’d like to tell you a short story about an upsert. It takes place back when vinyl was only beginning to “mature” and we were using an upsert in production for the first time. We had what seemed like an ideal environment for it: we had tons of keys, the current time was being used as values; update operations were inserting keys or modifying the current time; and we had few reads. Load tests yielded great results.

Nevertheless, after a couple of days, the Tarantool process started eating up 100% of our CPU, and the system performance dropped close to zero.

We started digging into the issue and found out that the distribution of requests across keys was significantly different from what we had seen in the test environment. It was…well, quite nonuniform. Most keys were updated once or twice a day, so the database was idle for the most part, but there were much hotter keys with tens of thousands of updates per day. Tarantool handled those just fine. But in the case of lookups by key with tens of thousands of upserts, things quickly went downhill. To return the most recent value, Tarantool had to read and “replay” the whole history consisting of all of the upserts. When designing upserts, we had hoped this would happen automatically during a compaction, but the process never even got to that stage: the L0 size was more than enough, so there were no dumps.

We solved the problem by adding a background process that performed readaheads on any keys that had more than a few dozen upserts piled up, so all those upserts were squashed and substituted with the read value.

Secondary keys

Update is not the only operation where optimizing hidden reads is critical. Even the replace operation, given secondary keys, has to read the older value: it needs to be independently deleted from the secondary indexes, and inserting a new element might not do this, leaving some garbage behind.

_images/secondary.png


If secondary indexes are not unique, then collecting “garbage” from them can be put off until a compaction, which is what we do in Tarantool. The append-only nature of LSM trees allowed us to implement full-blown serializable transactions in vinyl. Read-only requests use older versions of data without blocking any writes. The transaction manager itself is fairly simple for now: in classical terms, it implements the MVTO (multiversion timestamp ordering) class, whereby the winning transaction is the one that finished earlier. There are no locks and associated deadlocks. Strange as it may seem, this is a drawback rather than an advantage: with parallel execution, you can increase the number of successful transactions by simply holding some of them on lock when necessary. We’re planning to improve the transaction manager soon. In the current release, we focused on making the algorithm behave 100% correctly and predictably. For example, our transaction manager is one of the few on the NoSQL market that supports so-called “gap locks”.

Tarantool Cartridge

In this chapter, we explain how you can benefit with Tarantool Cartridge, a framework for developing, deploying, and managing applications based on Tarantool.

This chapter contains the following sections:

About Tarantool Cartridge

Tarantool Cartridge is the recommended alternative to the old-school practices of application development for Tarantool.

As a software development kit (SDK), Tarantool Cartridge provides you with utilities and templates to help:

  • easily set up a development environment for your applications;
  • plug the necessary Lua modules.

The resulting package can be installed and started on one or multiple servers as one or multiple instantiated services – independent or organized into a cluster.

Note

A Tarantool cluster is a collection of Tarantool instances acting in concert. While a single Tarantool instance can leverage the performance of a single server and is vulnerable to failure, the cluster spans multiple servers, utilizes their cumulative CPU power, and is fault-tolerant.

To fully utilize the capabilities of a Tarantool cluster, you need to develop applications keeping in mind they are to run in a cluster environment.

Further on, Tarantool Cartridge provides your cluster-aware applications with the following benefits:

  • horizontal scalability and load balancing via built-in automatic sharding;
  • asynchronous replication;
  • automatic failover;
  • centralized cluster control via GUI or API;
  • automatic configuration synchronization;
  • instance functionality segregation.

A Tarantool Cartridge cluster can segregate functionality between instances via built-in and custom (user-defined) cluster roles. You can toggle instances on and off on the fly during cluster operation. This allows you to put different types of workloads (e.g., compute- and transaction-intensive ones) on different physical servers with dedicated hardware.

Tarantool Cartridge developer’s guide

For a quick start, skip the details below and jump right away to this detailed guide to creating a cluster-aware Tarantool application.

For a deep dive into what you can do with Tarantool Cartridge, go on with this section.

To develop and start an application, in short, you need to go through the following steps:

  1. Install Tarantool Cartridge and other components of the development environment.
  2. Choose a template for the application and create a project.
  3. Develop the application. In case it is a cluster-aware application, implement its logic in a custom (user-defined) cluster role to initialize the database in a cluster environment.
  4. Deploy the application to target server(s). This includes configuring and starting the instance(s).
  5. In case it is a cluster-aware application, deploy the cluster.

The following sections provide details for each of these steps.

Installing Tarantool Cartridge

  1. Install catridge-cli, a command-line tool for developing, deploying, and managing Tarantool applications:

    $ tarantoolctl rocks install cartridge-cli
    

    The Cartridge framework will come as a dependency when you create your project.

    Everything will be installed to .rocks/bin, so for convenient usage add .rocks/bin to the executable path:

    $ export PATH=$PWD/.rocks/bin/:$PATH
    
  2. Install git, a version control system.

  3. Install npm, a package manager for node.js.

  4. Install the unzip utility.

Application templates

Tarantool Cartridge provides you with two templates that help instantly set up the application development environment:

  • plain, for developing an application that runs on a single or multiple independent Tarantool instances (e.g. acting as a proxy to third-party databases) – that’s what you could do before, without Tarantool Cartridge, but now it’s more convenient.
  • cartridge, for developing a cluster-aware application – this is an exclusive feature of Tarantool Cartridge.

To create a project based on either template, in any directory say:

# plain application
$ plain create --name <app_name> /path/to/

# - OR -

# cluster application
$ cartridge create --name <app_name> /path/to/

This will automatically set up a Git repository in a new /path/to/<app_name>/ directory, tag it with version 0.1.0, and put the necessary files into it (read about default files for each template below).

In this Git repository, you can develop the application (by simply editing the default files provided by the template), plug the necessary modules, and then easily pack everything to deploy on your server(s).

Plain template

The plain template creates the <app_name>/ directory with the following contents:

  • <app_name>-scm-1.rockspec file where you can specify the application dependencies.
  • deps.sh script that resolves dependencies from the .rockspec file.
  • init.lua file which is the entry point for your application.
  • .git file necessary for a Git repository.
  • .gitignore file to ignore the unnecessary files.
Cluster template

In addition to the files listed in the plain template section, the cluster template contains the following:

  • env.lua file that sets common rock paths so that the application can be started from any directory.
  • custom-role.lua file that is a placeholder for a custom (user-defined) cluster role.

The entry point file (init.lua) of the cluster template differs from the plain one. Among other things, it loads the cartridge module and calls its initialization function:

...
local cartridge = require('cartridge')
...
cartridge.cfg({
-- cartridge options example
  workdir = '/var/lib/tarantool/app',
  advertise_uri = 'localhost:3301',
  cluster_cookie = 'super-cluster-cookie',
  ...
}, {
-- box options example
  memtx_memory = 1000000000,
  ... })
 ...

The cartridge.cfg() call renders the instance operable via the administrative console but does not call box.cfg() to configure instances.

Warning

Calling the box.cfg() function is forbidden.

The cluster itself will do it for you when it is time to:

  • bootstrap the current instance once you:
    • run cartridge.bootstrap() via the administrative console, or
    • click Create in the web interface;
  • join the instance to an existing cluster once you:
    • run cartridge.join_server({uri = 'other_instance_uri'}) via the console, or
    • click Join (an existing replica set) or Create (a new replica set) in the web interface.

Notice that you can specify a cookie for the cluster (cluster_cookie parameter) if you need to run several clusters in the same network. The cookie can be any string value.

Before developing a cluster-aware application, familiarize yourself with the notion of cluster roles and make sure to define a custom role to initialize the database for the cluster application.

Cluster roles

A Tarantool Cartridge cluster segregates instance functionality in a role-based way. Cluster roles are Lua modules that implement some instance-specific functions and/or logic.

Since all instances running cluster applications use the same source code and are aware of all the defined roles (and plugged modules), multiple different roles can be dynamically enabled and disabled on any number of instances without restarts even during cluster operation.

Built-in roles

The cartridge module comes with two built-in roles that implement automatic sharding:

  • vshard-router that handles the vshard’s compute-intensive workload: routes requests to storage nodes.

  • vshard-storage that handles the vshard’s transaction-intensive workload: stores and manages a subset of a dataset.

    Note

    For more information on sharding, see the vshard module documentation.

With the built-in and custom roles, Tarantool Cartridge allows you to develop applications with separated compute and transaction handling. Later, the relevant workload-specific roles can be enabled on different instances running on physical servers with workload-dedicated hardware.

Neither vshard-router nor vshard-storage manage spaces, indexes, or formats. To start developing an application, edit the custom-role.lua placeholder file: add a box.schema.space.create() call to your first cluster role.

Additionally, you can implement several such roles to:

  • define stored procedures;
  • implement functionality on top of vshard;
  • go without vshard at all;
  • implement one or multiple supplementary services such as e-mail notifier, replicator, etc.
Custom roles

To implement a custom cluster role, do the following:

  1. Register the new role in the cluster by modifying the cartridge.cfg() call in the init.lua entry point file:

    ...
    local cartridge = require('cartridge')
    ...
    cartridge.cfg({
      workdir = ...,
      advertise_uri = ...,
      roles = {'custom-role'},
    })
    ...
    

    where custom-role is the name of the Lua module to be loaded.

  2. Implement the role in a file with the appropriate name (custom-role.lua). For example:

    #!/usr/bin/env tarantool
    -- Custom role implementation
    local role_name = 'custom-role'
    
    local function init()
    ...
    end
    
    local function stop()
    ...
    end
    
    return {
        role_name = role_name,
        init = init,
        stop = stop,
    }
    

    Where the role_name may differ from the module name passed to the cartridge.cfg() function. If the role_name variable is not specified, the module name is the default value.

    Note

    Role names must be unique as it is impossible to register multiple roles with the same name.

The role module does not have required functions but the cluster may execute the following ones during the role’s life cycle:

  • init() is the role’s initialization function.

    Inside the function’s body you can call any box functions: create spaces, indexes, grant permissions, etc. Here is what the initialization function may look like:

    local function init(opts)
        -- The cluster passes an 'opts' Lua table containing an 'is_master' flag.
        if opts.is_master then
            local customer = box.schema.space.create('customer',
                { if_not_exists = true }
            )
            customer:format({
                {'customer_id', 'unsigned'},
                {'bucket_id', 'unsigned'},
                {'name', 'string'},
            })
            customer:create_index('customer_id', {
                parts = {'customer_id'},
                if_not_exists = true,
            })
        end
    end
    

    Note

    The function’s body is wrapped in a conditional statement that lets you call box functions on masters only. This protects against replication collisions as data propagates to replicas automatically.

  • stop() is the role’s termination function. Implement it if initialization starts a fiber that has to be stopped or does any job that has to be undone on termination.

  • validate_config() and apply_config() are validation and application functions that make custom roles configurable. Implement them if some configuration data has to be stored cluster-wide.

Next, get a grip on the role’s life cycle to implement the necessary functions.

Defining role dependencies

You can instruct the cluster to apply some other roles if your custom role is enabled.

For example:

-- Role dependencies defined in custom-role.lua
local role_name = 'custom-role'
...
return {
    role_name = role_name,
    dependencies = {'cartridge.roles.vshard-router'},
    ...
}

Here vshard-router role will be initialized automatically for every instance with custom-role enabled.

Using multiple vshard storage groups

Replica sets with vshard-storage roles can belong to different groups. For example, hot or cold groups meant to independently process hot and cold data.

Groups are specified in the cluster’s configuration:

cartridge.cfg({
    vshard_groups = {'hot', 'cold'},
    ...
})

If no groups are specified, the cluster assumes that all replica sets belong to the default group.

With multiple groups enabled, every replica set with a vshard-storage role enabled must be assigned to a particular group. The assignment can never be changed.

Another limitation is that you cannot add groups dynamically (this will become available in future).

Finally, mind the new syntax for router access. Every instance with a vshard-router role enabled initializes multiple routers. All of them are accessible through the role:

local router_role = cartridge.service_get('vshard-router')
router_role.get('hot'):call(...)

If you have no roles specified, you can access a static router as before:

local vhsard = require('vshard')
vshard.router.call(...)

However, when using the new API, you must call a static router with a colon:

local router_role = cartridge.service_get('vshard-router')
local default_router = router_role.get() -- or router_role.get('default')
default_router:call(...)
Role’s life cycle and the order of function execution

The cluster displays all custom role names along with the built-in vshard ones in the web interface. Cluster administrators can enable and disable them for particular instances either via the web interface or cluster public API. For example:

cartridge.admin.edit_replicaset('replicaset-uuid', {roles = {'vshard-router', 'custom-role'}})

If multiple roles are enabled on an instance at the same time, the cluster first initializes the built-in roles (if any) and then the custom ones (if any) in the order the latter were listed in cartridge.cfg().

If a custom role has dependent roles, the dependencies are registered and validated first, prior to the role itself.

The cluster calls the role’s functions in the following circumstances:

  • The init() function, typically, once: either when the role is enabled by the administrator or at the instance restart. Enabling a role once is normally enough.
  • The stop() function – only when the administrator disables the role, not on instance termination.
  • The validate_config() function, first, before the automatic box.cfg() call (database initialization), then – upon every configuration update.
  • The apply_config() function upon every configuration update.

Hence, if the cluster is tasked with performing the following actions, it will execute the functions listed in the following order:

  • Join an instance or create a replica set, both with an enabled role:
    1. validate_config()
    2. init()
    3. apply_config()
  • Restart an instance with an enabled role:
    1. validate_config()
    2. init()
    3. apply_config()
  • Disable role: stop().
  • Upon the cartridge.confapplier.patch_clusterwide() call:
    1. validate_config()
    2. apply_config()
  • Upon a triggered failover:
    1. validate_config()
    2. apply_config()

Considering the described behavior:

  • The init() function may:
    • Call box functions.
    • Start a fiber and, in this case, the stop() function should take care of the fiber’s termination.
    • Configure the built-in HTTP server.
    • Execute any code related to the role’s initialization.
  • The stop() functions must undo any job that has to be undone on role’s termination.
  • The validate_config() function must validate any configuration change.
  • The apply_config() function may execute any code related to a configuration change, e.g., take care of an expirationd fiber.

The validation and application functions together allow you to customize the cluster-wide configuration as described in the next section.

Configuring custom roles

You can:

  • Store configurations for your custom roles as sections in cluster-wide configuration, for example:

    # YAML configuration file
    my_role:
      notify_url: "https://localhost:8080"
    
    -- init.lua file
    local notify_url = 'http://localhost'
    function my_role.apply_config(conf, opts)
      local conf = conf['my_role'] or {}
      notify_url = conf.notify_url or 'default'
    end
    
  • Download and upload cluster-wide configuration using cluster UI or API (via GET/PUT queries to admin/config endpoint like curl localhost:8081/admin/config and curl -X PUT -d "{'my_parameter': 'value'}" localhost:8081/admin/config).

  • Utilize it in your role apply_config() function.

Every instance in the cluster stores a copy of the configuration file in its working directory (configured by cartridge.cfg({workdir = ...})):

  • /var/lib/tarantool/<instance_name>/config.yml for instances deployed from RPM packages and managed by systemd.
  • /home/<username>/tarantool_state/var/lib/tarantool/config.yml for instances deployed from archives.

The cluster’s configuration is a Lua table, downloaded and uploaded as YAML. If some application-specific configuration data, e.g., a database schema as defined by DDL (data definition language), has to be stored on every instance in the cluster, you can implement your own API by adding a custom section to the table. The cluster will help you spread it safely across all instances.

Such section goes in parallel (in the same file) with the topology-specific and vshard-specific ones the cluster automatically generates. Unlike the generated, the custom section’s modification, validation, and application logic has to be defined.

The common way is to define two functions:

  • validate_config(conf_new, conf_old) to validate changes made in the new configuration (conf_new) versus the old configuration (conf_old).
  • apply_config(conf, opts) to execute any code related to a configuration change. As input, this function takes the configuration to apply (conf, which is actually the new configuration that you validated earlier with validate_config()) and options (the opts argument that includes is_master, a Boolean flag described later).

Important

The validate_config() function must detect all configuration problems that may lead to apply_config() errors. For more information, see the next section.

When implementing validation and application functions that call box ones for some reason, the following precautions apply:

  • Due to the role’s life cycle, the cluster does not guarantee an automatic box.cfg() call prior to calling validate_config().

    If the validation function is to call any box functions (e.g., to check a format), make sure the calls are wrapped in a protective conditional statement that checks if box.cfg() has already happened:

    -- Inside the validation function:
    
    if type(box.cfg) == 'table' then
    
        -- Here you can call box functions
    
    end
    
  • Unlike the validation and similar to initialization function, apply_config() can call box functions freely as the cluster applies custom configuration after the automatic box.cfg() call.

    However, creating spaces, users, etc., can cause replication collisions when performed on both master and replica instances simultaneously. The appropriate way is to call such box functions on masters only and let the changes propagate to replicas automatically.

    Upon the apply_config(conf, opts) execution, the cluster passes an is_master flag in the opts table which you can use to wrap collision-inducing box functions in a protective conditional statement:

    -- Inside the configuration application function:
    
    if opts.is_master then
    
        -- Here you can call box functions
    
    end
    
Custom configuration example

Consider the following code as part of the role’s module (custom-role.lua) implementation:

#!/usr/bin/env tarantool
-- Custom role implementation

local cartridge = require('cartridge')

local role_name = 'custom-role'

-- Modify the config by implementing some setter (an alternative to HTTP PUT)
local function set_secret(secret)
    local custom_role_cfg = cartridge.confapplier.get_deepcopy(role_name) or {}
    custom_role_cfg.secret = secret
    cartridge.confapplier.patch_clusterwide({
        [role_name] = custom_role_cfg,
    })
end
-- Validate
local function validate_config(cfg)
    local custom_role_cfg = cfg[role_name] or {}
    if custom_role_cfg.secret ~= nil then
        assert(type(custom_role_cfg.secret) == 'string', 'custom-role.secret must be a string')
    end
    return true
end
-- Apply
local function apply_config(cfg)
    local custom_role_cfg = cfg[role_name] or {}
    local secret = custom_role_cfg.secret or 'default-secret'
    -- Make use of it
end

return {
    role_name = role_name,
    set_secret = set_secret,
    validate_config = validate_config,
    apply_config = apply_config,
}

Once the configuration is customized, do one of the following:

Applying custom role’s configuration

With the implementation showed by the example, you can call the set_secret() function to apply the new configuration via the administrative console or an HTTP endpoint if the role exports one.

The set_secret() function calls cartridge.confapplier.patch_clusterwide() which performs a two-phase commit:

  1. It patches the active configuration in memory: copies the table and replaces the "custom-role" section in the copy with the one given by the set_secret() function.
  2. The cluster checks if the new configuration can be applied on all instances except disabled and expelled. All instances subject to update must be healthy and alive according to the membership module.
  3. (Preparation phase) The cluster propagates the patched configuration. Every instance validates it with the validate_config() function of every registered role. Depending on the validation’s result:
    • If successful (i.e., returns true), the instance saves the new configuration to a temporary file named config.prepare.yml within the working directory.
    • (Abort phase) Otherwise, the instance reports an error and all other instances roll back the update: remove the file they may have already prepared.
  4. (Commit phase) Upon successful preparation of all instances, the cluster commits the changes. Every instance:
    1. Creates the active configuration’s hard-link.
    2. Atomically replaces the active one with the prepared. The atomic replacement is indivisible – it can either succeed or fail entirely, never partially.
    3. Calls the apply_config() function of every registered role.

If any of these steps fail, an error pops up in the web interface next to the corresponding instance. The cluster does not handle such errors automatically, they require manual repair.

You will avoid the repair if the validate_config() function can detect all configuration problems that may lead to apply_config() errors.

Using the built-in HTTP server

The cluster launches an httpd server instance during initialization (cartridge.cfg()). You can bind a port to the instance via an environmental variable:

-- Get the port from an environmental variable or the default one:
local http_port = os.getenv('HTTP_PORT') or '8080'

local ok, err = cartridge.cfg({
   ...
   -- Pass the port to the cluster:
   http_port = http_port,
   ...
})

To make use of the httpd instance, access it and configure routes inside the init() function of some role, e.g. a role that exposes API over HTTP:

local function init(opts)

...

   -- Get the httpd instance:
   local httpd = cartridge.service_get('httpd')
   if httpd ~= nil then
       -- Configure a route to, for example, metrics:
       httpd:route({
               method = 'GET',
               path = '/metrics',
               public = true,
           },
           function(req)
               return req:render({json = stat.stat()})
           end
       )
   end
end

For more information on the usage of Tarantool’s HTTP server, see its documentation.

Implementing authorization in the web interface

To implement authorization in the web interface of every instance in Tarantool cluster:

  1. Implement a new, say, auth module with a check_password function. It should check the credentials of any user trying to log in to the web interface.

    The check_password function accepts a username and password and returns an authentication success or failure.

    -- auth.lua
    
    -- Add a function to check the credentials
    local function check_password(username, password)
    
        -- Check the credentials any way you like
    
        -- Return an authentication success or failure
        if not ok then
            return false
        end
        return true
    end
    ...
    
  2. Pass the implemented auth module name as a parameter to cartridge.cfg(), so the cluster can use it:

    -- init.lua
    
    local ok, err = cartridge.cfg({
        auth_backend_name = 'auth',
        -- The cluster will automatically call 'require()' on the 'auth' module.
        ...
    })
    

    This adds a Log in button to the upper right corner of the web interface but still lets the unsigned users interact with the interface. This is convenient for testing.

    Note

    Also, to authorize requests to cluster API, you can use the HTTP basic authorization header.

  3. To require the authorization of every user in the web interface even before the cluster bootstrap, add the following line:

    -- init.lua
    
    local ok, err = cartridge.cfg({
        auth_backend_name = 'auth',
        auth_enabled = true,
        ...
    })
    

    With the authentication enabled and the auth module implemented, the user will not be able to even bootstrap the cluster without logging in. After the successful login and bootstrap, the authentication can be enabled and disabled cluster-wide in the web interface and the auth_enabled parameter is ignored.

Application versioning

Tarantool Cartridge understands semantic versioning as described at semver.org. When developing an application, create new Git branches and tag them appropriately. These tags are used to calculate version increments for subsequent packaging.

For example, if your application has version 1.2.1, tag your current branch with 1.2.1 (annotated or not).

To retrieve the current version from Git, say:

$ git describe --long --tags
1.2.1-12-g74864f2

This output shows that we are 12 commits after the version 1.2.1. If we are to package the application at this point, it will have a full version of 1.2.1-12 and its package will be named <app_name>-1.2.1-12.rpm.

Non-semantic tags are prohibited. You will not be able to create a package from a branch with the latest tag being non-semantic.

Once you package your application, the version is saved in a VERSION file in the package root.

Using .cartridge-cli.ignore files

You can add a .cartridge-cli.ignore file to your application repository to exclude particular files and/or directories from package builds.

For the most part, the logic is similar to that of .gitignore files. The major difference is that in .cartridge-cli.ignore files the order of exceptions relative to the rest of the templates does not matter, while in .gitignore files the order does matter.

.cartridge-cli.ignore entry ignores every…
target/ folder (due to the trailing /) named target, recursively
target file or folder named target, recursively
/target file or folder named target in the top-most directory (due to the leading /)
/target/ folder named target in the top-most directory (leading and trailing /)
*.class every file or folder ending with .class, recursively
#comment nothing, this is a comment (the first character is a #)
\#comment every file or folder with name #comment (\ for escaping)
target/logs/ every folder named logs which is a subdirectory of a folder named target
target/*/logs/ every folder named logs two levels under a folder named target (* doesn’t include /)
target/**/logs/ every folder named logs somewhere under a folder named target (** includes /)
*.py[co] every file or folder ending in .pyc or .pyo; however, it doesn’t match .py!
*.py[!co] every file or folder ending in anything other than c or o
*.file[0-9] every file or folder ending in digit
*.file[!0-9] every file or folder ending in anything other than digit
* every
/* everything in the top-most directory (due to the leading /)
**/*.tar.gz every *.tar.gz file or folder which is one or more levels under the starting folder
!file every file or folder will be ignored even if it matches other patterns

Deploying an application

You have four options to deploy a Tarantool Cartridge application:

  • as an rpm package (for production);
  • as a deb package (for production);
  • as a tar+gz archive (for testing, or as a workaround for production if root access is unavailable).
  • from sources (for local testing only).
Deploying as an rpm or deb package
  1. Pack the application into a distributable:

    $ cartridge pack rpm /path/to/<app_name>
    # -- OR --
    $ cartridge pack deb /path/to/<app_name>
    

    This will create an RPM package (e.g. ./my_app-0.1.0-1.rpm) or a DEB package (e.g. ./my_app-0.1.0-1.deb).

  2. Upload the package to target servers, with systemctl supported.

  3. Install:

    $ yum install APP_NAME-VERSION.rpm
    # -- OR --
    $ dpkg -i APP_NAME-VERSION.deb
    
  4. Configure the instance(s).

  5. Start Tarantool instances with the corresponding services. You can do it using systemctl, for example:

    # starts a single instance
    $ systemctl start my_app
    
    # starts multiple instances
    $ systemctl start my_app@router
    $ systemctl start my_app@storage_A
    $ systemctl start my_app@storage_B
    
  6. In case it is a cluster-aware application, proceed to deploying the cluster.

Deploying as a tar+gz archive
  1. Pack the application into a distributable:

    $ cartridge pack tgz /path/to/<app_name>
    

    This will create a tar+gz archive (e.g. ./my_app-0.1.0-1.tgz).

  2. Upload the archive to target servers, with tarantool and (optionally) cartridge-cli installed.

  3. Extract the archive:

    $ tar -xzvf APP_NAME-VERSION.tgz
    
  4. Configure the instance(s).

  5. Start Tarantool instance(s). You can do it using:

    • tarantool, for example:

      $ tarantool init.lua # starts a single instance
      
    • or cartridge, for example:

      # in application directory
      $ cartridge start # starts all instances
      $ cartridge start .router_1 # starts a single instance
      
      # in multi-application environment
      $ cartridge start my_app # starts all instances of my_app
      $ cartridge start my_app.router # starts a single instance
      
  6. In case it is a cluster-aware application, proceed to deploying the cluster.

Deploying from sources

This deployment method is intended for local testing only.

  1. Pull all dependencies to the .rocks directory:

    $ tarantoolctl rocks make

  2. Configure the instance(s).

  3. Start Tarantool instance(s). You can do it using:

    • tarantool, for example:

      $ tarantool init.lua # starts a single instance
      
    • or cartridge, for example:

      # in application directory
      cartridge start # starts all instances
      cartridge start .router_1 # starts a single instance
      
      # in multi-application environment
      cartridge start my_app # starts all instances of my_app
      cartridge start my_app.router # starts a single instance
      
  4. In case it is a cluster-aware application, proceed to deploying the cluster.

Configuring instances

Instance configuration includes two sets of parameters:

You can set any of these parameters in:

  1. Command line arguments.
  2. Environment variables.
  3. YAML configuration file.
  4. init.lua file.

The order here indicates the priority: command-line arguments override environment variables, and so forth.

No matter how you start the instances, you need to set the following cartridge.cfg() parameters for each instance:

  • advertise_uri – either <HOST>:<PORT>, or <HOST>:, or <PORT>. Used by other instances to connect to the current one. DO NOT specify 0.0.0.0 – this must be an external IP address, not a socket bind.
  • http_port – port to open administrative web interface and API on. Defaults to 8081. To disable it, specify "http_enabled": False.
  • workdir – a directory where all data will be stored: snapshots, wal logs, and cartridge configuration file. Defaults to ..

If you start instances using cartridge CLI or systemctl, save the configuration as a YAML file, for example:

my_app.router: {"advertise_uri": "localhost:3301", "http_port": 8080}
my_app.storage_A: {"advertise_uri": "localhost:3302", "http_enabled": False}
my_app.storage_B: {"advertise_uri": "localhost:3303", "http_enabled": False}

With cartridge CLI, you can pass the path to this file as the --cfg command-line argument to the cartridge start command – or specify the path in cartridge CLI configuration (in ./.cartridge.yml or ~/.cartridge.yml):

cfg: cartridge.yml
run_dir: tmp/run
apps_path: /usr/local/share/tarantool

With systemctl, save the YAML file to /etc/tarantool/conf.d/ (the default systemd path) or to a location set in the TARANTOOL_CFG environment variable.

If you start instances with tarantool init.lua, you need to pass other configuration options as command-line parameters and environment variables, for example:

$ tarantool init.lua --alias router --memtx-memory 100 --workdir "~/db/3301" --advertise_uri "localhost:3301" --http_port "8080"

Starting/stopping instances

Depending on your deployment method, you can start/stop the instances using tarantool, cartridge CLI, or systemctl.

Start/stop using tarantool

With tarantool, you can start only a single instance:

$ tarantool init.lua # the simplest command

You can also specify more options on the command line or in environment variables.

To stop the instance, use Ctrl+C.

Start/stop using cartridge CLI

With cartridge CLI, you can start one or multiple instances:

$ cartridge start [APP_NAME[.INSTANCE_NAME]] [options]

The options are:

--script FILE

Application’s entry point. Defaults to:

  • TARANTOOL_SCRIPT, or
  • ./init.lua when running from the app’s directory, or
  • :apps_path/:app_name/init.lua in a multi-app environment.
--apps_path PATH
Path to apps directory when running in a multi-app environment. Defaults to /usr/share/tarantool.
--run_dir DIR
Directory with pid and sock files. Defaults to TARANTOOL_RUN_DIR or /var/run/tarantool.
--cfg FILE
Cartridge instances YAML configuration file. Defaults to TARANTOOL_CFG or ./instances.yml.
--foreground
Do not daemonize.

For example:

cartridge start my_app --cfg demo.yml --run_dir ./tmp/run --foreground

It starts all tarantool instances specified in cfg file, in foreground, with enforced environment variables.

When APP_NAME is not provided, cartridge parses it from ./*.rockspec filename.

When INSTANCE_NAME is not provided, cartridge reads cfg file and starts all defined instances:

# in application directory
cartridge start # starts all instances
cartridge start .router_1 # start single instance

# in multi-application environment
cartridge start my_app # starts all instances of my_app
cartridge start my_app.router # start a single instance

To stop the instances, say:

$ cartridge stop [APP_NAME[.INSTANCE_NAME]] [options]

These options from the cartridge start command are supported:

  • --run_dir DIR
  • --cfg FILE
Start/stop using systemctl
  • To run a single instance:

    $ systemctl start APP_NAME
    

    This will start a systemd service that will listen to the port specified in instance configuration (http_port parameter).

  • To run multiple instances on one or multiple servers:

    $ systemctl start APP_NAME@INSTANCE_1
    $ systemctl start APP_NAME@INSTANCE_2
    ...
    $ systemctl start APP_NAME@INSTANCE_N
    

    where APP_NAME@INSTANCE_N is the instantiated service name for systemd with an incremental N – a number, unique for every instance, added to the port the instance will listen to (e.g., 3301, 3302, etc.)

  • To stop all services on a server, use the systemctl stop command and specify instance names one by one. For example:

    $ systemctl stop APP_NAME@INSTANCE_1 APP_NAME@INSTANCE_2 ... APP_NAME@INSTANCE_<N>
    

Tarantool Cartridge administrator’s guide

This guide explains how to deploy and manage a Tarantool cluster with Tarantool Cartridge.

Note

For more information on managing Tarantool instances, see the Server administration section.

Before deploying the cluster, familiarize yourself with the notion of cluster roles and deploy Tarantool instances according to the desired cluster topology.

Deploying the cluster

To deploy the cluster, first, configure your Tarantool instances according to the desired cluster topology, for example:

my_app.router: {"advertise_uri": "localhost:3301", "http_port": 3301, "workdir": "./tmp/router"}
my_app.storage_A_master: {"advertise_uri": "localhost:3302", "http_enabled": False, "workdir": "./tmp/storage-a-master"}
my_app.storage_A_replica: {"advertise_uri": "localhost:3303", "http_enabled": False, "workdir": "./tmp/storage-a-replica"}
my_app.storage_B_master: {"advertise_uri": "localhost:3304", "http_enabled": False, "workdir": "./tmp/storage-b-master"}
my_app.storage_B_replica: {"advertise_uri": "localhost:3305", "http_enabled": False, "workdir": "./tmp/storage-b-replica"}

Then start the instances, for example using cartridge CLI:

cartridge start my_app --cfg demo.yml --run_dir ./tmp/run --foreground

And bootstrap the cluster. You can do this via the Web interface which is available at http://<instance_hostname>:<instance_http_port> (in this example, http://localhost:3301).

In the web interface, do the following:

  1. Depending on the authentication state:

    • If enabled (in production), enter your credentials and click Login:

      _images/auth_creds-border-5px.png
    • If disabled (for easier testing), simply proceed to configuring the cluster.

  2. Click Сonfigure next to the first unconfigured server to create the first replica set – solely for the router (intended for compute-intensive workloads).

    _images/unconfigured-router-border-5px.png

     

    In the pop-up window, check the vshard-router role – or any custom role that has vshard-router as a dependent role (in this example, this is a custom role named app.roles.api).

    (Optional) Specify a display name for the replica set, for example router.

    _images/create-router-border-5px.png

     

    Note

    As described in the built-in roles section, it is a good practice to enable workload-specific cluster roles on instances running on physical servers with workload-specific hardware.

    Click Create replica set and see the newly-created replica set in the web interface:

    _images/router-replica-set-border-5px.png

     

    Warning

    Be careful: after an instance joins a replica set, you CAN NOT revert this or make the instance join any other replica set.

  3. Create another replica set – for a master storage node (intended for transaction-intensive workloads).

    Check the vshard-storage role – or any custom role that has vshard-storage as a dependent role (in this example, this is a custom role named app.roles.storage).

    (Optional) Check a specific group, for example hot. Replica sets with vshard-storage roles can belong to different groups. In our example, these are hot or cold groups meant to process hot and cold data independently. These groups are specified in the cluster’s configuration file; by default, a cluster has no groups.

    (Optional) Specify a display name for the replica set, for example hot-storage.

    Click Create replica set.

    _images/create-storage-border-5px.png
  4. (Optional) If required by topology, populate the second replica set with more storage nodes:

    1. Click Configure next to another unconfigured server dedicated for transaction-intensive workloads.

    2. Click Join Replica Set tab.

    3. Select the second replica set, and click Join replica set to add the server to it.

      _images/join-storage-border-5px.png
  5. Depending on cluster topology:

    • add more instances to the first or second replica sets, or
    • create more replica sets and populate them with instances meant to handle a specific type of workload (compute or transactions).

    For example:

    _images/final-cluster-border-5px.png
  6. (Optional) By default, all new vshard-storage replica sets get a weight of 1 before the vshard bootstrap in the next step.

    Note

    In case you add a new replica set after vshard bootstrap, as described in the topology change section, it will get a weight of 0 by default.

    To make different replica sets store different numbers of buckets, click Edit next to a replica set, change its default weight, and click Save:

    _images/change-weight-border-5px.png

     

    For more information on buckets and replica set’s weights, see the vshard module documentation.

  7. Bootstrap vshard by clicking the corresponding button, or by saying cartridge.admin.boostrap_vshard() over the administrative console.

    This command creates virtual buckets and distributes them among storages.

    From now on, all cluster configuration can be done via the web interface.

Updating the configuration

Cluster configuration is specified in a YAML configuration file. This file includes cluster topology and role descriptions.

All instances in Tarantool cluster have the same configuration. To this end, every instance stores a copy of the configuration file, and the cluster keeps these copies in sync: as you submit updated configuration in the Web interface, the cluster validates it (and rejects inappropriate changes) and distributes automatically across the cluster.

To update the configuration:

  1. Click Configuration files tab.

  2. (Optional) Click Downloaded to get hold of the current configuration file.

  3. Update the configuration file.

    You can add/change/remove any sections except system ones: topology, vshard, and vshard_groups.

    To remove a section, simply remove it from the configuration file.

  4. Compress the configuration file as a .zip archive and click Upload configuration button to upload it.

    You will see a message in the lower part of the screen saying whether configuration was uploaded successfully, and an error description if the new configuration was not applied.

Managing the cluster

This chapter explains how to:

  • change the cluster topology,
  • enable automatic failover,
  • switch the replica set’s master manually,
  • deactivate replica sets, and
  • expel instances.
Changing the cluster topology

Upon adding a newly deployed instance to a new or existing replica set:

  1. The cluster validates the configuration update by checking if the new instance is available using the membership module.

    Note

    The membership module works over the UDP protocol and can operate before the box.cfg function is called.

    All the nodes in the cluster must be healthy for validation success.

  2. The new instance waits until another instance in the cluster receives the configuration update and discovers it, again, using the membership module. On this step, the new instance does not have a UUID yet.

  3. Once the instance realizes its presence is known to the cluster, it calls the box.cfg function and starts living its life.

    For more information, see the box.cfg submodule reference.

An optimal strategy for connecting new nodes to the cluster is to deploy a new zero-weight replica set instance by instance, and then increase the weight. Once the weight is updated and all cluster nodes are notified of the configuration change, buckets start migrating to new nodes.

To populate the cluster with more nodes, do the following:

  1. Deploy new Tarantool instances as described in the deployment section.

    If new nodes do not appear in the Web interface, click Probe server and specify their URIs manually.

    _images/probe-server-border-5px.png

    If a node is accessible, it will appear in the list.

  2. In the Web interface:

    • Create a new replica set with one of the new instances: click Configure next to an unconfigured server, check the necessary roles, and click Create replica set:

      Note

      In case you are adding a new vshard-storage instance, remember that all such instances get a 0 weight by default after the vshard bootstrap which happened during the initial cluster deployment.

      _images/zero-border-5px.png
    • Or add the instances to existing replica sets: click Configure next to an unconfigured server, click Join replica set tab, select a replica set, and click Join replica set.

    If necessary, repeat this for more instances to reach the desired redundancy level.

  3. In case you are deploying a new vshard-storage replica set, populate it with data when you are ready: click Edit next to the replica set in question, increase its weight, and click Save to start data rebalancing.

As an alternative to the web interface, you can view and change cluster topology via GraphQL. The cluster’s endpoint for serving GraphQL queries is /admin/api. You can use any third-party GraphQL client like GraphiQL or Altair.

Examples:

  • listing all servers in the cluster:

    query {
        servers { alias uri uuid }
    }
    
  • listing all replica sets with their servers:

    query {
        replicasets {
            uuid
            roles
            servers { uri uuid }
        }
    }
    
  • joining a server to a new replica set with a storage role enabled:

    mutation {
        join_server(
            uri: "localhost:33003"
            roles: ["vshard-storage"]
        )
    }
    
Data rebalancing

Rebalancing (resharding) is initiated periodically and upon adding a new replica set with a non-zero weight to the cluster. For more information, see the rebalancing process section of the vshard module documentation.

The most convenient way to trace through the process of rebalancing is to monitor the number of active buckets on storage nodes. Initially, a newly added replica set has 0 active buckets. After a few minutes, the background rebalancing process begins to transfer buckets from other replica sets to the new one. Rebalancing continues until the data is distributed evenly among all replica sets.

To monitor the current number of buckets, connect to any Tarantool instance over the administrative console, and say:

tarantool> vshard.storage.info().bucket
---
- receiving: 0
  active: 1000
  total: 1000
  garbage: 0
  sending: 0
...

The number of buckets may be increasing or decreasing depending on whether the rebalancer is migrating buckets to or from the storage node.

For more information on the monitoring parameters, see the monitoring storages section.

Deactivating replica sets

To deactivate an entire replica set (e.g., to perform maintenance on it) means to move all of its buckets to other sets.

To deactivate a set, do the following:

  1. Click Edit next to the set in question.

  2. Set its weight to 0 and click Save:

    _images/zero-weight-border-5px.png
  3. Wait for the rebalancing process to finish migrating all the set’s buckets away. You can monitor the current bucket number as described in the data rebalancing section.

Expelling instances

Once an instance is expelled, it can never participate in the cluster again as every instance will reject it.

To expel an instance, click next to it, then click Expel server and Expel:

_images/expelling-instance-border-5px.png
Enabling automatic failover

In a master-replica cluster configuration with automatic failover enabled, if the user-specified master of any replica set fails, the cluster automatically chooses the next replica from the priority list and grants it the active master role (read/write). When the failed master comes back online, its role is restored and the active master, again, becomes a replica (read-only). This works for any roles.

To set the priority in a replica set:

  1. Click Edit next to the replica set in question.

  2. Scroll to the bottom of the Edit replica set box to see the list of servers.

  3. Drag replicas to their place in the priority list, and click Save:

    _images/failover-priority-border-5px.png

The failover is disabled by default. To enable it:

  1. Click Failover:

    _images/failover-border-5px.png
  2. In the Failover control box, click Enable:

    _images/failover-control-border-5px.png

The failover status will change to enabled:

_images/enabled-failover-border-5px.png

 

For more information, see the replication section.

Switching the replica set’s master

To manually switch the master in a replica set:

  1. Click the Edit button next to the replica set in question:

    _images/edit-replica-set-border-5px.png
  2. Scroll to the bottom of the Edit replica set box to see the list of servers. The server on the top is the master.

    _images/switch-master-border-5px.png
  3. Drag a required server to the top position and click Save.

The new master will automatically enter the read/write mode, while the ex-master will become read-only. This works for any roles.

Managing users

On the Users tab, you can enable/disable authentication as well as add, remove, edit, and view existing users who can access the web interface.

_images/users-tab-border-5px.png

 

Notice that the Users tab is available only if authorization in the web interface is implemented.

Also, some features (like deleting users) can be disabled in the cluster configuration; this is regulated by the auth_backend_name option passed to cartridge.cfg().

Resolving conflicts

Tarantool has an embedded mechanism for asynchronous replication. As a consequence, records are distributed among the replicas with a delay, so conflicts can arise.

To prevent conflicts, the special trigger space.before_replace is used. It is executed every time before making changes to the table for which it was configured. The trigger function is implemented in the Lua programming language. This function takes the original and new values of the tuple to be modified as its arguments. The returned value of the function is used to change the result of the operation: this will be the new value of the modified tuple.

For insert operations, the old value is absent, so nil is passed as the first argument.

For delete operations, the new value is absent, so nil is passed as the second argument. The trigger function can also return nil, thus turning this operation into delete.

This example shows how to use the space.before_replace trigger to prevent replication conflicts. Suppose we have a box.space.test table that is modified in multiple replicas at the same time. We store one payload field in this table. To ensure consistency, we also store the last modification time in each tuple of this table and set the space.before_replace trigger, which gives preference to newer tuples. Below is the code in Lua:

fiber = require('fiber')
-- define a function that will modify the function test_replace(tuple)
        -- add a timestamp to each tuple in the space
        tuple = box.tuple.new(tuple):update{{'!', 2, fiber.time()}}
        box.space.test:replace(tuple)
end
box.cfg{ } -- restore from the local directory
-- set the trigger to avoid conflicts
box.space.test:before_replace(function(old, new)
        if old ~= nil and new ~= nil and new[2] < old[2] then
                return old -- ignore the request
        end
        -- otherwise apply as is
end)
box.cfg{ replication = {...} } -- subscribe

Monitoring cluster via CLI

This section describes parameters you can monitor over the administrative console.

Connecting to nodes via CLI

Each Tarantool node (router/storage) provides an administrative console (Command Line Interface) for debugging, monitoring, and troubleshooting. The console acts as a Lua interpreter and displays the result in the human-readable YAML format. To connect to a Tarantool instance via the console, say:

$ tarantoolctl connect <instance_hostname>:<port>

where the <instance_hostname>:<port> is the instance’s URI.

Monitoring storages

Use vshard.storage.info() to obtain information on storage nodes.

Output example
tarantool> vshard.storage.info()
---
- replicasets:
    <replicaset_2>:
    uuid: <replicaset_2>
    master:
        uri: storage:storage@127.0.0.1:3303
    <replicaset_1>:
    uuid: <replicaset_1>
    master:
        uri: storage:storage@127.0.0.1:3301
  bucket: <!-- buckets status
    receiving: 0 <!-- buckets in the RECEIVING state
    active: 2 <!-- buckets in the ACTIVE state
    garbage: 0 <!-- buckets in the GARBAGE state (are to be deleted)
    total: 2 <!-- total number of buckets
    sending: 0 <!-- buckets in the SENDING state
  status: 1 <!-- the status of the replica set
  replication:
    status: disconnected <!-- the status of the replication
    idle: <idle>
  alerts:
  - ['MASTER_IS_UNREACHABLE', 'Master is unreachable: disconnected']
Status list
Code Critical level Description
0 Green A replica set works in a regular way.
1 Yellow There are some issues, but they don’t affect a replica set efficiency (worth noticing, but don’t require immediate intervention).
2 Orange A replica set is in a degraded state.
3 Red A replica set is disabled.
Potential issues
  • MISSING_MASTER — No master node in the replica set configuration.

    Critical level: Orange.

    Cluster condition: Service is degraded for data-change requests to the replica set.

    Solution: Set the master node for the replica set in the configuration using API.

  • UNREACHABLE_MASTER — No connection between the master and the replica.

    Critical level:

    • If idle value doesn’t exceed T1 threshold (1 s.) — Yellow,
    • If idle value doesn’t exceed T2 threshold (5 s.) — Orange,
    • If idle value exceeds T3 threshold (10 s.) — Red.

    Cluster condition: For read requests to replica, the data may be obsolete compared with the data on master.

    Solution: Reconnect to the master: fix the network issues, reset the current master, switch to another master.

  • LOW_REDUNDANCY — Master has access to a single replica only.

    Critical level: Yellow.

    Cluster condition: The data storage redundancy factor is equal to 2. It is lower than the minimal recommended value for production usage.

    Solution: Check cluster configuration:

    • If only one master and one replica are specified in the configuration, it is recommended to add at least one more replica to reach the redundancy factor of 3.
    • If three or more replicas are specified in the configuration, consider checking the replicas’ states and network connection among the replicas.
  • INVALID_REBALANCING — Rebalancing invariant was violated. During migration, a storage node can either send or receive buckets. So it shouldn’t be the case that a replica set sends buckets to one replica set and receives buckets from another replica set at the same time.

    Critical level: Yellow.

    Cluster condition: Rebalancing is on hold.

    Solution: There are two possible reasons for invariant violation:

    • The rebalancer has crashed.
    • Bucket states were changed manually.

    Either way, please contact Tarantool support.

  • HIGH_REPLICATION_LAG — Replica’s lag exceeds T1 threshold (1 sec.).

    Critical level:

    • If the lag doesn’t exceed T1 threshold (1 sec.) — Yellow;
    • If the lag exceeds T2 threshold (5 sec.) — Orange.

    Cluster condition: For read-only requests to the replica, the data may be obsolete compared with the data on the master.

    Solution: Check the replication status of the replica. Further instructions are given in the Tarantool troubleshooting guide.

  • OUT_OF_SYNC — Mal-synchronization occured. The lag exceeds T3 threshold (10 sec.).

    Critical level: Red.

    Cluster condition: For read-only requests to the replica, the data may be obsolete compared with the data on the master.

    Solution: Check the replication status of the replica. Further instructions are given in the Tarantool troubleshooting guide.

  • UNREACHABLE_REPLICA — One or multiple replicas are unreachable.

    Critical level: Yellow.

    Cluster condition: Data storage redundancy factor for the given replica set is less than the configured factor. If the replica is next in the queue for rebalancing (in accordance with the weight configuration), the requests are forwarded to the replica that is still next in the queue.

    Solution: Check the error message and find out which replica is unreachable. If a replica is disabled, enable it. If this doesn’t help, consider checking the network.

  • UNREACHABLE_REPLICASET — All replicas except for the current one are unreachable. Critical level: Red.

    Cluster condition: The replica stores obsolete data.

    Solution: Check if the other replicas are enabled. If all replicas are enabled, consider checking network issues on the master. If the replicas are disabled, check them first: the master might be working properly.

Monitoring routers

Use vshard.router.info() to obtain information on the router.

Output example
tarantool> vshard.router.info()
---
- replicasets:
    <replica set UUID>:
      master:
        status: <available / unreachable / missing>
        uri: <!-- URI of master
        uuid: <!-- UUID of instance
      replica:
        status: <available / unreachable / missing>
        uri: <!-- URI of replica used for slave requests
        uuid: <!-- UUID of instance
      uuid: <!-- UUID of replica set
    <replica set UUID>: ...
    ...
  status: <!-- status of router
  bucket:
    known: <!-- number of buckets with the known destination
    unknown: <!-- number of other buckets
  alerts: [<alert code>, <alert description>], ...
Status list
Code Critical level Description
0 Green The router works in a regular way.
1 Yellow Some replicas sre unreachable (affects the speed of executing read requests).
2 Orange Service is degraded for changing data.
3 Red Service is degraded for reading data.
Potential issues

Note

Depending on the nature of the issue, use either the UUID of a replica, or the UUID of a replica set.

  • MISSING_MASTER — The master in one or multiple replica sets is not specified in the configuration.

    Critical level: Orange.

    Cluster condition: Partial degrade for data-change requests.

    Solution: Specify the master in the configuration.

  • UNREACHABLE_MASTER — The router lost connection with the master of one or multiple replica sets.

    Critical level: Orange.

    Cluster condition: Partial degrade for data-change requests.

    Solution: Restore connection with the master. First, check if the master is enabled. If it is, consider checking the network.

  • SUBOPTIMAL_REPLICA — There is a replica for read-only requests, but this replica is not optimal according to the configured weights. This means that the optimal replica is unreachable.

    Critical level: Yellow.

    Cluster condition: Read-only requests are forwarded to a backup replica.

    Solution: Check the status of the optimal replica and its network connection.

  • UNREACHABLE_REPLICASET — A replica set is unreachable for both read-only and data-change requests.

    Critical Level: Red.

    Cluster condition: Partial degrade for read-only and data-change requests.

    Solution: The replica set has an unreachable master and replica. Check the error message to detect this replica set. Then fix the issue in the same way as for UNREACHABLE_REPLICA.

Troubleshooting

Please see the Troubleshooting guide.

Disaster recovery

Please see the section Disaster recovery.

Backups

Please see the section Backups.

Table of contents

Module cartridge

Tarantool framework for distributed applications development.

Cartridge provides you a simple way to manage distributed applications operations. The cluster consists of several Tarantool instances acting in concert. Cartridge does not care about how the instances start, it only cares about the configuration of already running processes.

Cartridge automates vshard and replication configuration, simplifies custom configuration and administrative tasks.

Functions
cfg (opts, box_opts)

Initialize the cartridge module.

After this call, you can operate the instance via Tarantool console. Notice that this call does not initialize the database - box.cfg is not called yet. Do not try to call box.cfg yourself, the cartridge will do it when it is time.

Both cartridge.cfg and box.cfg options can be configured with command-line arguments or environment variables.

Parameters:

  • opts: Available options are:
    • workdir: (optional string) a directory where all data will be stored: snapshots, wal logs and cartridge config file.(default: “.”, overridden byenv TARANTOOL_WORKDIR ,args --workdir )
    • advertise_uri: (optional string) either “<HOST>:<PORT>” or “<HOST>:” or “<PORT>”.Used by other instances to connect to the current one.When <HOST> isn’t specified, it’s detected as the only non-local IP address.If there is more than one IP address available - defaults to “localhost”.When <PORT> isn’t specified, it’s derived as follows:If the TARANTOOL_INSTANCE_NAME has numeric suffix _<N>, then <PORT> = 3300+<N>.Otherwise default <PORT> = 3301 is used.
    • cluster_cookie: (optional string) secret used to separate unrelated applications, whichprevents them from seeing each other during broadcasts.Also used as admin password in HTTP and binary connections and forencrypting internal communications.Allowed symbols are [a-zA-Z0-9_.~-] .(default: “secret-cluster-cookie”, overridden byenv TARANTOOL_CLUSTER_COOKIE ,args --cluster-cookie )
    • bucket_count: (optional number) bucket count for vshard cluster. See vshard doc for more details.(default: 30000, overridden byenv TARANTOOL_BUCKET_COUNT ,args --bucket-count )
    • vshard_groups: (optional {[string]=VshardGroup,…}) vshard storage groups, table keys used as names
    • http_enabled: (optional boolean) whether http server should be started(default: true, overridden byenv TARANTOOL_HTTP_ENABLED,args –http-enabled)
    • http_port: (string or number) port to open administrative UI and API on(default: 8081, derived from`TARANTOOL_INSTANCE_NAME`,overridden byenv TARANTOOL_HTTP_PORT,args –http-port)
    • alias: (optional string) human-readable instance name that will be available in administrative UI(default: argparse instance name, overridden byenv TARANTOOL_ALIAS,args –alias)
    • roles: (table) list of user-defined roles that will be availableto enable on the instance_uuid
    • auth_enabled: (optional boolean) toggle authentication in administrative UI and API(default: false)
    • auth_backend_name: (optional string) user-provided set of callbacks related to authentication
    • console_sock: (optional string) Socket to start console listening on.(default: nil, overridden byenv TARANTOOL_CONSOLE_SOCK ,args --console-sock )
    • webui_blacklist: (optional {string,…}) List of pages to be hidden in WebUI.(Added in v2.0.1-54, default: {} )
    • upgrade_schema: (optional boolean) Run schema upgrade on the leader instance.(Added in v2.0.2-3,default: false , overridden byenv TARANTOOL_UPGRADE_SCHEMA args --upgrade-schema )
  • box_opts: (optional table) tarantool extra box.cfg options (e.g. memtx_memory),that may require additional tuning

Returns:

true

Or

(nil)

(table) Error description

is_healthy ()

Check the cluster health. It is healthy if all instances are healthy.

The function is designed mostly for testing purposes.

Returns:

(boolean) true / false

Tables
VshardGroup

Vshard storage group configuration.

Every vshard storage must be assigned to a group.

Fields:

  • bucket_count: (number) Bucket count for the storage group.
Global functions
_G.cartridge_get_schema ()

Get clusterwide DDL schema.

(Added in v1.2.0-28)

Returns:

(string) Schema in YAML format

Or

(nil)

(table) Error description

_G.cartridge_set_schema (schema)

Apply clusterwide DDL schema.

(Added in v1.2.0-28)

Parameters:

  • schema: (string) in YAML format

Returns:

(string) The same new schema

Or

(nil)

(table) Error description

Clusterwide DDL schema
get_schema ()

Get clusterwide DDL schema. It’s like _G.cartridge_get_schema, but isn’t non-global variable.

(Added in v2.0.1-54)

Returns:

(string) Schema in YAML format

Or

(nil)

(table) Error description

set_schema (schema)

Apply clusterwide DDL schema. It’s like _G.cartridge_set_schema, but isn’t non-global variable.

(Added in v2.0.1-54)

Parameters:

  • schema: (string) in YAML format

Returns:

(string) The same new schema

Or

(nil)

(table) Error description

Cluster administration
ServerInfo

Instance general information.

Fields:

  • alias: (string) Human-readable instance name.
  • uri: (string)
  • uuid: (string)
  • disabled: (boolean)
  • status: (string) Instance health.
  • message: (string) Auxilary health status.
  • replicaset: (ReplicasetInfo) Circular reference to a replicaset.
  • priority: (number) Leadership priority for automatic failover.
  • clock_delta: (number) Difference between remote clock and the current one (inseconds), obtained from the membership module (SWIM protocol).Positive values mean remote clock are ahead of local, and viceversa.
ReplicasetInfo

Replicaset general information.

Fields:

  • uuid: (string) The replicaset UUID.
  • roles: ({string,…}) Roles enabled on the replicaset.
  • status: (string) Replicaset health.
  • master: (ServerInfo) Replicaset leader according to configuration.
  • active_master: (ServerInfo) Active leader.
  • weight: (number) Vshard replicaset weight.Matters only if vshard-storage role is enabled.
  • vshard_group: (string) Name of vshard group the replicaset belongs to.
  • all_rw: (boolean) A flag indicating that all servers in the replicaset should be read-write.
  • alias: (string) Human-readable replicaset name.
  • servers: ({ServerInfo,…}) Circular reference to all instances in the replicaset.
admin_get_servers ([uuid])

Get servers list. Optionally filter out the server with the given uuid.

Parameters:

Returns:

({ServerInfo,…})

Or

(nil)

(table) Error description

admin_get_replicasets ([uuid])

Get replicasets list. Optionally filter out the replicaset with given uuid.

Parameters:

Returns:

({ReplicasetInfo,…})

Or

(nil)

(table) Error description

admin_probe_server (uri)

Discover an instance.

Parameters:

admin_enable_servers (uuids)

Enable nodes after they were disabled.

Parameters:

Returns:

({ServerInfo,…})

Or

(nil)

(table) Error description

admin_disable_servers (uuids)

Temporarily diable nodes.

Parameters:

Returns:

({ServerInfo,…})

Or

(nil)

(table) Error description

admin_bootstrap_vshard ()

Call vshard.router.bootstrap() . This function distributes all buckets across the replica sets.

Returns:

(boolean) true

Or

(nil)

(table) Error description

Automatic failover management
FailoverParams

Failover parameters.

(Added in v2.0.2-2)

Fields:

  • mode: (string) Supported modes are “disabled”, “eventual” and “stateful”
  • state_provider: (nil or string) Only “tarantool” is supported now
  • tarantool_params: (nil or table) {uri = ‘string’, password = ‘string’}
failover_get_params ()

Get failover configuration.

(Added in v2.0.2-2)

Returns:

(FailoverParams)

failover_set_params (opts)

Configure automatic failover.

(Added in v2.0.2-2)

Parameters:

  • opts:
    • mode: (optional string)
    • state_provider: (optional string)
    • tarantool_params: (optional table)

Returns:

(boolean) true if config applied successfully

Or

(nil)

(table) Error description

failover_promote (replicaset_uuid)

Promote leaders in replicasets.

Parameters:

  • replicaset_uuid: (table) ] = leader_uuid }

Returns:

(boolean) true On success

Or

(nil)

(table) Error description

admin_get_failover ()

Get current failover state.

(Deprecated since v2.0.2-2)

admin_enable_failover ()

Enable failover. (Deprecated since v2.0.1-95 in favor of cartridge.failover_set_params)

admin_disable_failover ()

Disable failover. (Deprecated since v2.0.1-95 in favor of cartridge.failover_set_params)

Managing cluster topology
admin_edit_topology (args)

Edit cluster topology. This function can be used for:

  • bootstrapping cluster from scratch
  • joining a server to an existing replicaset
  • creating new replicaset with one or more servers
  • editing uri/labels of servers
  • disabling and expelling servers

(Added in v1.0.0-17)

Parameters:

EditReplicasetParams

Replicatets modifications.

Fields:

EditServerParams

Servers modifications.

Fields:

  • uri: (optional string)
  • uuid: (string)
  • labels: (optional table)
  • disabled: (optional boolean)
  • expelled: (optional boolean) Expelling an instance is permanent and can’t be undone.It’s suitable for situations when the hardware is destroyed,snapshots are lost and there is no hope to bring it back to life.
JoinServerParams

Parameters required for joining a new server.

Fields:

Clusterwide configuration
config_get_readonly ([section_name])

Get a read-only view on the clusterwide configuration.

Returns either conf[section_name] or entire conf . Any attempt to modify the section or its children will raise an error.

Parameters:

  • section_name: (string) (optional)

Returns:

(table)

config_get_deepcopy ([section_name])

Get a read-write deep copy of the clusterwide configuration.

Returns either conf[section_name] or entire conf . Changing it has no effect unless it’s used to patch clusterwide configuration.

Parameters:

  • section_name: (string) (optional)

Returns:

(table)

config_patch_clusterwide (patch)

Edit the clusterwide configuration. Top-level keys are merged with the current configuration. To remove a top-level section, use patch_clusterwide{key = box.NULL} .

The function uses a two-phase commit algorithm with the following steps:

  1. Patches the current configuration.
  2. Validates topology on the current server.

III. Executes the preparation phase ( prepare_2pc ) on every server excluding expelled and disabled servers.

IV. If any server reports an error, executes the abort phase ( abort_2pc ). All servers prepared so far are rolled back and unlocked.

V. Performs the commit phase ( commit_2pc ). In case the phase fails, an automatic rollback is impossible, the cluster should be repaired manually.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

Inter-role interaction
service_get (module_name)

Get a module from registry.

Parameters:

Returns:

(nil)

Or

(table) instance

service_set (module_name, instance)

Put a module into registry or drop it. This function typically doesn’t need to be called explicitly, the cluster automatically sets all the initialized roles.

Parameters:

Returns:

(nil)

Cross-instance calls
rpc_call (role_name, fn_name[, args[, opts]])

Perform a remote procedure call. Find a suitable healthy instance with an enabled role and perform a [ net.box conn:call ]( https://tarantool.io/en/doc/latest/reference/reference_lua/net_box/#net-box-call) on it.

Parameters:

  • role_name: (string)
  • fn_name: (string)
  • args: (table) (optional)
  • opts:
    • prefer_local: (optional boolean) Don’t perform a remote call if possible. When the role is enabledlocally and current instance is healthy the remote netbox call issubstituted with a local Lua function call. When the option isdisabled it never tries to perform call locally and always usesnetbox connection, even to connect self.(default: true)
    • leader_only: (optional boolean) Perform a call only on the replica set leaders.(default: false)
    • uri: (optional string) Force a call to be performed on this particular uri.Disregards member status and opts.prefer_local .Conflicts with opts.leader_only = true .(added in v1.2.0-63)
    • remote_only: (deprecated) Use prefer_local instead.
    • timeout: passed to net.box conn:call options.
    • buffer: passed to net.box conn:call options.

Returns:

conn:call() result

Or

(nil)

(table) Error description

rpc_get_candidates (role_name[, opts])

List instances suitable for performing a remote call.

Parameters:

  • role_name: (string)
  • opts:
    • leader_only: (optional boolean) Filter instances which are leaders now.(default: false)
    • healthy_only: (optional boolean) Filter instances which have membership status healthy.(added in v1.1.0-11, default: true)

Returns:

({string,…}) URIs

Authentication and authorization
http_authorize_request (request)

Authorize an HTTP request.

Get username from cookies or basic HTTP authentication.

(Added in v1.1.0-4)

Parameters:

Returns:

(boolean) Access granted

http_render_response (response)

Render HTTP response.

Inject set-cookie headers into response in order to renew or reset the cookie.

(Added in v1.1.0-4)

Parameters:

Returns:

(table) The same response with cookies injected

http_get_username ()

Get username for the current HTTP session.

(Added in v1.1.0-4)

Returns:

(string or nil)

Deprecated functions
admin_edit_replicaset (args)

Edit replicaset parameters (deprecated).

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

admin_edit_server (args)

Edit an instance (deprecated).

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

admin_join_server (args)

Join an instance to the cluster (deprecated).

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

admin_expel_server (uuid)

Expel an instance (deprecated). Forever.

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

Module cartridge.auth

Administrators authentication and authorization.

Local Functions
set_enabled (enabled)

Allow or deny unauthenticated access to the administrator’s page. (Changed in v0.11)

This function affects only the current instance. It can’t be used after the cluster was bootstrapped. To modify clusterwide config use set_params instead.

Parameters:

  • enabled: (boolean)

Returns:

(boolean) true

Or

(nil)

(table) Error description

get_enabled ()

Check if unauthenticated access is forbidden. (Added in v0.7)

Returns:

(boolean) enabled

init ()

Initialize the authentication HTTP API.

Set up login and logout HTTP endpoints.

set_callbacks (callbacks)

Set authentication callbacks.

Parameters:

  • callbacks:
    • add_user: (function)
    • get_user: (function)
    • edit_user: (function)
    • list_users: (function)
    • remove_user: (function)
    • check_password: (function)

Returns:

(boolean) true

get_callbacks ()

Get authentication callbacks.

Returns:

(table) callbacks

Configuration
set_params (opts)

Modify authentication params. (Changed in v0.11)

Can’t be used before the bootstrap. Affects all cluster instances. Triggers cluster.config_patch_clusterwide .

Parameters:

  • opts:
    • enabled: (optional boolean) (Added in v0.11)
    • cookie_max_age: (optional number)
    • cookie_renew_age: (optional number) (Added in v0.11)

Returns:

(boolean) true

Or

(nil)

(table) Error description

get_params ()

Retrieve authentication params.

Returns:

(AuthParams)

AuthParams

Authentication params.

Fields:

  • enabled: (boolean) Wether unauthenticated access is forbidden
  • cookie_max_age: (number) Number of seconds until the authentication cookie expires
  • cookie_renew_age: (number) Update provided cookie if it’s older then this age (in seconds)
Authorizarion
get_session_username ()

Get username for the current HTTP session.

(Added in v1.1.0-4)

Returns:

(string or nil)

authorize_request (request)

Authorize an HTTP request.

Get username from cookies or basic HTTP authentication.

(Added in v1.1.0-4)

Parameters:

Returns:

(boolean) Access granted

render_response (response)

Render HTTP response.

Inject set-cookie headers into response in order to renew or reset the cookie.

(Added in v1.1.0-4)

Parameters:

Returns:

(table) The same response with cookies injected

User management
UserInfo

User information.

Fields:

add_user (username, password, fullname, email)

Trigger registered add_user callback.

The callback is triggered with the same arguments and must return a table with fields conforming to UserInfo . Unknown fields are ignored.

Parameters:

Returns:

(UserInfo)

Or

(nil)

(table) Error description

get_user (username)

Trigger registered get_user callback.

The callback is triggered with the same arguments and must return a table with fields conforming to UserInfo . Unknown fields are ignored.

Parameters:

Returns:

(UserInfo)

Or

(nil)

(table) Error description

edit_user (username, password, fullname, email)

Trigger registered edit_user callback.

The callback is triggered with the same arguments and must return a table with fields conforming to UserInfo . Unknown fields are ignored.

Parameters:

Returns:

(UserInfo)

Or

(nil)

(table) Error description

list_users ()

Trigger registered list_users callback.

The callback is triggered without any arguments. It must return an array of UserInfo objects.

Returns:

({UserInfo,…})

Or

(nil)

(table) Error description

remove_user (username)

Trigger registered remove_user callback.

The callback is triggered with the same arguments and must return a table with fields conforming to UserInfo , which was removed. Unknown fields are ignored.

Parameters:

Returns:

(UserInfo)

Or

(nil)

(table) Error description

Module cartridge.roles

Role management (internal module).

The module consolidates all the role management functions: register_role , some getters, validate_config and apply_config .

The module is almost stateless, it’s only state is a collection of registered roles.

(Added in v1.2.0-20)

Local Functions
get_all_roles ()

List all registered roles.

Hidden and permanent roles are listed too.

Returns:

({string,..})

get_known_roles ()

List registered roles names.

Hidden roles are not listed as well as permanent ones.

Returns:

({string,..})

get_enabled_roles (roles)

Roles to be enabled on the server. This function returns all roles that will be enabled including their dependencies (bot hidden and not) and permanent roles.

Parameters:

Returns:

({[string]=boolean,…})

get_role_dependencies (role_name)

List role dependencies. Including sub-dependencies.

Parameters:

Returns:

({string,..})

validate_config (conf_new, conf_old)

Validate configuration by all roles.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

apply_config (conf)

Apply the role configuration.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

Module cartridge.argparse

Gather configuration options.

The module tries to read configuration options from multiple sources and then merge them together according to the priority of the source:

  • –<VARNAME> command line arguments
  • TARANTOOL_<VARNAME> environment variables
  • configuration files

One can specify a configuration file using the –cfg <CONFIG_FILE> option or the TARANTOOL_CFG=<CONFIG_FILE> environment variable.

Configuration files are yaml files, divided into sections like the following:

default:
  memtx_memory: 10000000
  some_option: "default value"
myapp.router:
  memtx_memory: 1024000000
  some_option: "router specific value"

Within the configuration file argparse looks for multiple matching section:

  • The section named <APP_NAME>.<INSTANCE_NAME> is parsed first. Application name is derived automatically from the rockspec filename in the project directory. Or it can be can be specified manually with --app-name command line argument or TARANTOOL_APP_NAME environment variable. Instance name can be specified the same way either as –instance-name or TARANTOOL_INSTANCE_NAME .
  • The common <APP_NAME> section is parsed next.
  • Finally, the section [default] with global configuration is parsed with the lowest priority.
Functions
parse ()

Parse command line arguments, environment variables, and config files.

Returns:

({argname=value,…})

get_opts (filter)

Filter results of parsing and cast variables to a given type.

From all configuration options gathered by parse , select only ones specified in filter.

For example, running application as the following:

./init.lua --alias router --memtx-memory 100

results in:

parse()            -> {memtx_memory = "100", alias = "router"}
get_cluster_opts() -> {alias = "router"} -- a string
get_box_opts()     -> {memtx_memory = 100} -- a number

Parameters:

  • filter: ({argname=type,…})

Returns:

({argname=value,…})

get_box_opts ()

Shorthand for get_opts(box_opts) .

get_cluster_opts ()

Shorthand for get_opts(cluster_opts) .

Tables
cluster_opts

Common cartridge.cfg options.

Options, which are not listed here (like roles ) can’t be modified with argparse and should be configured in code.

Fields:

  • alias: string
  • workdir: string
  • http_port: number
  • http_enabled: boolean
  • advertise_uri: string
  • cluster_cookie: string
  • console_sock: string
  • auth_enabled: boolean
  • bucket_count: number
  • upgrade_schema: boolean
box_opts

Common [box.cfg](https://www.tarantool.io/en/doc/latest/reference/configuration/) tuning options.

Fields:

  • listen: string
  • memtx_memory: number
  • strip_core: boolean
  • memtx_min_tuple_size: number
  • memtx_max_tuple_size: number
  • slab_alloc_factor: number
  • work_dir: string (deprecated)
  • memtx_dir: string
  • wal_dir: string
  • vinyl_dir: string
  • vinyl_memory: number
  • vinyl_cache: number
  • vinyl_max_tuple_size: number
  • vinyl_read_threads: number
  • vinyl_write_threads: number
  • vinyl_timeout: number
  • vinyl_run_count_per_level: number
  • vinyl_run_size_ratio: number
  • vinyl_range_size: number
  • vinyl_page_size: number
  • vinyl_bloom_fpr: number
  • log: string
  • log_nonblock: boolean
  • log_level: number
  • log_format: string
  • io_collect_interval: number
  • readahead: number
  • snap_io_rate_limit: number
  • too_long_threshold: number
  • wal_mode: string
  • rows_per_wal: number
  • wal_max_size: number
  • wal_dir_rescan_delay: number
  • force_recovery: boolean
  • replication: string
  • instance_uuid: string
  • replicaset_uuid: string
  • custom_proc_title: string
  • pid_file: string
  • background: boolean
  • username: string
  • coredump: boolean
  • checkpoint_interval: number
  • checkpoint_wal_threshold: number
  • checkpoint_count: number
  • read_only: boolean
  • hot_standby: boolean
  • worker_pool_threads: number
  • replication_timeout: number
  • replication_sync_lag: number
  • replication_sync_timeout: number
  • replication_connect_timeout: number
  • replication_connect_quorum: number
  • replication_skip_conflict: boolean
  • feedback_enabled: boolean
  • feedback_host: string
  • feedback_interval: number
  • net_msg_max: number

Module cartridge.twophase

Clusterwide configuration propagation two-phase algorithm.

(Added in v1.2.0-19)

Functions
patch_clusterwide (patch)

Edit the clusterwide configuration. Top-level keys are merged with the current configuration. To remove a top-level section, use patch_clusterwide{key = box.NULL} .

The function uses a two-phase commit algorithm with the following steps:

  1. Patches the current configuration.
  2. Validates topology on the current server.

III. Executes the preparation phase ( prepare_2pc ) on every server excluding expelled and disabled servers.

IV. If any server reports an error, executes the abort phase ( abort_2pc ). All servers prepared so far are rolled back and unlocked.

V. Performs the commit phase ( commit_2pc ). In case the phase fails, an automatic rollback is impossible, the cluster should be repaired manually.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

get_schema ()

Get clusterwide DDL schema.

(Added in v1.2.0-28)

Returns:

(string) Schema in YAML format

Or

(nil)

(table) Error description

set_schema (schema)

Apply clusterwide DDL schema.

(Added in v1.2.0-28)

Parameters:

  • schema: (string) in YAML format

Returns:

(string) The same new schema

Or

(nil)

(table) Error description

Local Functions
prepare_2pc (data)

Two-phase commit - preparation stage.

Validate the configuration and acquire a lock setting local variable and writing “config.prepare.yml” file. If the validation fails, the lock isn’t acquired and doesn’t have to be aborted.

Parameters:

  • data: (table) clusterwide config content

Returns:

(boolean) true

Or

(nil)

(table) Error description

commit_2pc ()

Two-phase commit - commit stage.

Back up the active configuration, commit changes to filesystem by renaming prepared file, release the lock, and configure roles. If any errors occur, configuration is not rolled back automatically. Any problem encountered during this call has to be solved manually.

Returns:

(boolean) true

Or

(nil)

(table) Error description

abort_2pc ()

Two-phase commit - abort stage.

Release the lock for further commit attempts.

Returns:

(boolean) true

Module cartridge.failover

Gather information regarding instances leadership.

Failover can operate in two modes:

  • In disabled mode the leader is the first server configured in topology.replicasets[].master array.
  • In eventual mode the leader isn’t elected consistently. Instead, every instance in cluster thinks the leader is the first healthy server in replicaset, while instance health is determined according to membership status (the SWIM protocol).
  • In stateful mode leaders appointments are polled from the external storage. (Added in v2.0.2-2)

This module behavior depends on the instance state.

From the very beginning it reports is_rw() == false, is_leader() == false , get_active_leaders() == {} .

The module is configured when the instance enters ConfiguringRoles state for the first time. From that moment it reports actual values according to the mode set in clusterwide config.

(Added in v1.2.0-17)

Functions
get_coordinator ()

Get current stateful failover coordinator

Returns:

(table) coordinator

Or

(nil)

(table) Error description

Local Functions
_get_appointments_disabled_mode ()

Generate appointments according to clusterwide configuration. Used in ‘disabled’ failover mode.

_get_appointments_eventual_mode ()

Generate appointments according to membership status. Used in ‘eventual’ failover mode.

_get_appointments_stateful_mode ()

Get appointments from external storage. Used in ‘stateful’ failover mode.

accept_appointments (replicaset_uuid)

Accept new appointments.

Get appointments wherever they come from and put them into cache.

Parameters:

Returns:

(boolean) Whether leadership map has changed

failover_loop ()

Repeatedly fetch new appointments and reconfigure roles.

cfg ()

Initialize the failover module.

get_active_leaders ()

Get map of replicaset leaders.

Returns:

{[replicaset_uuid] = instance_uuid,…}

is_leader ()

Check current instance leadership.

Returns:

(boolean) true / false

is_rw ()

Check current instance writability.

Returns:

(boolean) true / false

Module cartridge.topology

Topology validation and filtering.

Functions
cluster_is_healthy ()

Check the cluster health. It is healthy if all instances are healthy.

The function is designed mostly for testing purposes.

Returns:

(boolean) true / false

Local Functions
get_leaders_orded (topology_cfg, replicaset_uuid, new_order)

Get full list of replicaset leaders.

Full list is composed of:

  • New order array
  • Initial order from topology_cfg (with no repetitions)
  • All other servers in the replicaset, sorted by uuid, ascending

Neither topology_cfg nor new_order tables are modified. New order validity is ignored too.

Parameters:

Returns:

({string,…}) array of leaders uuids

validate (topology_new, topology_old)

Validate topology configuration.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

find_server_by_uri (topology_cfg, uri)

Find the server in topology config.

(Added in v1.2.0-17)

Parameters:

Returns:

(nil or string) instance_uuid found

probe_missing_members (servers)

Send UDP ping to servers missing from membership table.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

get_fullmesh_replication (topology_cfg, replicaset_uuid)

Get replication config to set up full mesh.

(Added in v1.2.0-17)

Parameters:

Returns:

(table)

Module cartridge.clusterwide-config

The abstraction, representing clusterwide configuration.

Clusterwide configuration is more than just a lua table. It’s an object in terms of OOP paradigm.

On filesystem clusterwide config is represented by a file tree.

In Lua it’s represented as an object which holds both plaintext files content and unmarshalled lua tables. Unmarshalling is implicit and performed automatically for the sections with .yml file extension.

To access plaintext content there are two functions: get_plaintext and set_plaintext .

Unmarshalled lua tables are accessed without .yml extension by get_readonly and get_deepcopy . Plaintext serves for accessing unmarshalled representation of corresponding sections.

To avoid ambiguity it’s prohibited to keep both <FILENAME> and <FILENAME>.yml in the configuration. An attempt to do so would result in return nil, err from new() and load(), and an attempt to call get_readonly/deepcopy would raise an error. Nevertheless one can keep any other extensions because they aren’t unmarshalled implicitly.

(Added in v1.2.0-17)

Usage:
tarantool> cfg = ClusterwideConfig.new({
         >     -- two files
         >     ['forex.yml'] = '{EURRUB_TOM: 70.33, USDRUB_TOM: 63.18}',
         >     ['text'] = 'Lorem ipsum dolor sit amet',
         > })
---
...

tarantool> cfg:get_plaintext()
---
- text: Lorem ipsum dolor sit amet
  forex.yml: '{EURRUB_TOM: 70.33, USDRUB_TOM: 63.18}'
...

tarantool> cfg:get_readonly()
---
- forex.yml: '{EURRUB_TOM: 70.33, USDRUB_TOM: 63.18}'
  forex:
    EURRUB_TOM: 70.33
    USDRUB_TOM: 63.18
  text: Lorem ipsum dolor sit amet
...
Functions
new ([data])

Create new object.

Parameters:

Returns:

(ClusterwideConfig)

Or

(nil)

(table) Error description

save (clusterwide_config, filename)

Write configuration to filesystem.

Write atomicity is achieved by splitting it into two phases: 1. Configuration is saved with a random filename in the same directory 2. Temporal filename is renamed to the destination

Parameters:

  • clusterwide_config: (ClusterwideConfig)
  • filename: (string)

Returns:

(boolean) true

Or

(nil)

(table) Error description

load (filename)

Load object from filesystem.

This function handles both old-style single YAML and new-style directory with a file tree.

Parameters:

Returns:

(ClusterwideConfig)

Or

(nil)

(table) Error description

Local Functions
load_from_file (filename)

Load old-style config from YAML file.

Parameters:

  • filename: (string) Filename to load.

Returns:

(ClusterwideConfig)

Or

(nil)

(table) Error description

load_from_dir (path)

Load new-style config from a directory.

Parameters:

  • path: (string) Path to the config.

Returns:

(ClusterwideConfig)

Or

(nil)

(table) Error description

remove (string)

Remove config from filesystem atomically.

The atomicity is achieved by splitting it into two phases: 1. Configuration is saved with a random filename in the same directory 2. Temporal filename is renamed to the destination

Parameters:

  • string: (path) Directory path to remove.

Returns:

(boolean) true

Or

(nil)

(table) Error description

Module cartridge.rpc

Remote procedure calls between cluster instances.

Functions
get_candidates (role_name[, opts])

List instances suitable for performing a remote call.

Parameters:

  • role_name: (string)
  • opts:
    • leader_only: (optional boolean) Filter instances which are leaders now.(default: false)
    • healthy_only: (optional boolean) Filter instances which have membership status healthy.(added in v1.1.0-11, default: true)

Returns:

({string,…}) URIs

call (role_name, fn_name[, args[, opts]])

Perform a remote procedure call. Find a suitable healthy instance with an enabled role and perform a [ net.box conn:call ]( https://tarantool.io/en/doc/latest/reference/reference_lua/net_box/#net-box-call) on it.

Parameters:

  • role_name: (string)
  • fn_name: (string)
  • args: (table) (optional)
  • opts:
    • prefer_local: (optional boolean) Don’t perform a remote call if possible. When the role is enabledlocally and current instance is healthy the remote netbox call issubstituted with a local Lua function call. When the option isdisabled it never tries to perform call locally and always usesnetbox connection, even to connect self.(default: true)
    • leader_only: (optional boolean) Perform a call only on the replica set leaders.(default: false)
    • uri: (optional string) Force a call to be performed on this particular uri.Disregards member status and opts.prefer_local .Conflicts with opts.leader_only = true .(added in v1.2.0-63)
    • remote_only: (deprecated) Use prefer_local instead.
    • timeout: passed to net.box conn:call options.
    • buffer: passed to net.box conn:call options.

Returns:

conn:call() result

Or

(nil)

(table) Error description

Local Functions
get_connection (role_name[, opts])

Connect to an instance with an enabled role.

Parameters:

  • role_name: (string)
  • opts:
    • prefer_local: (optional boolean)
    • leader_only: (optional boolean)

Returns:

net.box connection

Or

(nil)

(table) Error description

Module cartridge.tar

Handle basic tar format.

<http://www.gnu.org/software/tar/manual/html_node/Standard.html>

While an archive may contain many files, the archive itself is a single ordinary file. Physically, an archive consists of a series of file entries terminated by an end-of-archive entry, which consists of two 512 blocks of zero bytes. A file entry usually describes one of the files in the archive (an archive member), and consists of a file header and the contents of the file. File headers contain file names and statistics, checksum information which tar uses to detect file corruption, and information about file types.

A tar archive file contains a series of blocks. Each block contains exactly 512 (BLOCKSIZE) bytes:

+---------+-------+-------+-------+---------+-------+-----
| header1 | file1 |  ...  |  ...  | header2 | file2 | ...
+---------+-------+-------+-------+---------+-------+-----

All characters in header blocks are represented by using 8-bit characters in the local variant of ASCII. Each field within the structure is contiguous; that is, there is no padding used within the structure. Each character on the archive medium is stored contiguously. Bytes representing the contents of files (after the header block of each file) are not translated in any way and are not constrained to represent characters in any character set. The tar format does not distinguish text files from binary files, and no translation of file contents is performed.

Functions
pack (files)

Create TAR archive.

Parameters:

Returns:

(string) The archive

Or

(nil)

(table) Error description

unpack (tar)

Parse TAR archive.

Only regular files are extracted, directories are ommitted.

Parameters:

Returns:

({string=string}) Extracted files (their names and content)

Or

(nil)

(table) Error description

Module cartridge.pool

Connection pool.

Reuse tarantool net.box connections with ease.

Functions
connect (uri[, opts])

Connect a remote or get cached connection. Connection is established using net.box.connect() .

Parameters:

  • uri: (string)
  • opts:
    • wait_connected: (boolean or number) by default, connection creation is blocked until theconnection is established, but passing wait_connected=false makes it return immediately. Also, passing a timeout makes itwait before returning (e.g. wait_connected=1.5 makes it waitat most 1.5 seconds).
    • connect_timeout: (optional number) (deprecated)Use wait_connected instead
    • user: (deprecated) don’t use it
    • password: (deprecated) don’t use it
    • reconnect_after: (deprecated) don’t use it

Returns:

net.box connection

Or

(nil)

(table) Error description

Local Functions
format_uri (uri)

Enrich URI with credentials. Suitable to connect other cluster instances.

Parameters:

Returns:

(string) username:password@host:port

map_call (fn_name[, args[, opts]])

Perform a remote call to multiple URIs and map results.

(Added in v1.2.0-17)

Parameters:

  • fn_name: (string)
  • args: (table) function arguments (optional)
  • opts:
    • uri_list: ({string,…}) array of URIs for performing remote call
    • timeout: (optional number) passed to net.box conn:call()

Returns:

({URI=value,…}) Call results mapping for every URI.

(table) United error object, gathering errors for every URI that failed.

Module cartridge.confapplier

Configuration management primitives.

Implements the internal state machine which helps to manage cluster operation and protects from invalid state transitions.

Functions
get_readonly ([section_name])

Get a read-only view on the clusterwide configuration.

Returns either conf[section_name] or entire conf . Any attempt to modify the section or its children will raise an error.

Parameters:

  • section_name: (string) (optional)

Returns:

(table)

get_deepcopy ([section_name])

Get a read-write deep copy of the clusterwide configuration.

Returns either conf[section_name] or entire conf . Changing it has no effect unless it’s used to patch clusterwide configuration.

Parameters:

  • section_name: (string) (optional)

Returns:

(table)

Local Functions
set_state (state[, err])

Perform state transition.

Parameters:

  • state: (string) New state
  • err: (optional)

Returns:

(nil)

wish_state (state[, timeout])

Make a wish for meeting desired state.

Parameters:

  • state: (string) Desired state.
  • timeout: (number) (optional)

Returns:

(string) Final state, may differ from desired.

validate_config (clusterwide_config_new)

Validate configuration by all roles.

Parameters:

  • clusterwide_config_new: (table)

Returns:

(boolean) true

Or

(nil)

(table) Error description

apply_config (clusterwide_config)

Apply the role configuration.

Parameters:

  • clusterwide_config: (table)

Returns:

(boolean) true

Or

(nil)

(table) Error description

Module cartridge.test-helpers

Helpers for integration testing.

This module extends luatest.helpers with cartridge-specific classes and helpers.

Fields
Server

Extended luatest.server class to run tarantool instance.

See also:

  • cartridge.test-helpers.server
Cluster

Class to run and manage multiple tarantool instances.

See also:

  • cartridge.test-helpers.cluster

Module cartridge.remote-control

Tarantool remote control server.

Allows to control an instance over TCP by net.box call and eval . The server is designed as a partial replacement for the iproto protocol. It’s most useful when box.cfg wasn’t configured yet.

Other net.box features aren’t supported and will never be.

(Added in v0.10.0-2)

Local Functions
bind (host, port)

Init remote control server.

Bind the port but don’t start serving connections yet.

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

accept (credentials)

Start remote control server. To connect the server use regular net.box connection.

Access is restricted to the user with specified credentials, which can be passed as net_box.connect('username:password@host:port') .

Parameters:

unbind ()

Stop the server.

It doesn’t interrupt any existing connections.

drop_connections ()

Explicitly drop all established connections.

Module cartridge.service-registry

Inter-role interaction.

These functions make different roles interact with each other.

The registry stores initialized modules and accesses them within the one and only current instance. For cross-instance access, use the cartridge.rpc module.

Functions
set (module_name, instance)

Put a module into registry or drop it. This function typically doesn’t need to be called explicitly, the cluster automatically sets all the initialized roles.

Parameters:

Returns:

(nil)

get (module_name)

Get a module from registry.

Parameters:

Returns:

(nil)

Or

(table) instance

Module custom-role

User-defined role API.

If you want to implement your own role it must conform this API.

Functions
init (opts)

Role initialization callback. Called when role is enabled on an instance. Caused either by editing topology or instance restart.

Parameters:

  • opts:
    • is_master: (boolean)
stop (opts)

Role shutdown callback. Called when role is disabled on an instance.

Parameters:

  • opts:
    • is_master: (boolean)
validate_config (conf_new, conf_old)

Validate clusterwide configuration callback.

Parameters:

apply_config (conf, opts)

Apply clusterwide configuration callback.

Parameters:

  • conf: (table) Clusterwide configuration
  • opts:
    • is_master: (boolean)
Fields
role_name

Displayed role name. When absent, module name is used instead.

hidden

Hidden role flag. aren’t listed in cartridge.admin_get_replicasets().roles and therefore in WebUI. Hidden roled are supposed to be a dependency for another role.

  • hidden: (boolean)
permanent

Permanent role flag. Permanent roles will be enabled on every instance in cluster. Implies hidden = true .

  • permanent: (boolean)

Module cartridge.lua-api.stat

Administration functions ( box.slab.info related).

Local Functions
get_stat (uri)

Retrieve box.slab.info of a remote server.

Parameters:

Returns:

(table)

Or

(nil)

(table) Error description

Module cartridge.lua-api.boxinfo

Administration functions ( box.info related).

Local Functions
get_info (uri)

Retrieve box.cfg and box.info of a remote server.

Parameters:

Returns:

(table)

Or

(nil)

(table) Error description

Module cartridge.lua-api.get-topology

Administration functions ( get-topology implementation).

Tables
ServerInfo

Instance general information.

Fields:

  • alias: (string) Human-readable instance name.
  • uri: (string)
  • uuid: (string)
  • disabled: (boolean)
  • status: (string) Instance health.
  • message: (string) Auxilary health status.
  • replicaset: (ReplicasetInfo) Circular reference to a replicaset.
  • priority: (number) Leadership priority for automatic failover.
  • clock_delta: (number) Difference between remote clock and the current one (inseconds), obtained from the membership module (SWIM protocol).Positive values mean remote clock are ahead of local, and viceversa.
ReplicasetInfo

Replicaset general information.

Fields:

  • uuid: (string) The replicaset UUID.
  • roles: ({string,…}) Roles enabled on the replicaset.
  • status: (string) Replicaset health.
  • master: (ServerInfo) Replicaset leader according to configuration.
  • active_master: (ServerInfo) Active leader.
  • weight: (number) Vshard replicaset weight.Matters only if vshard-storage role is enabled.
  • vshard_group: (string) Name of vshard group the replicaset belongs to.
  • all_rw: (boolean) A flag indicating that all servers in the replicaset should be read-write.
  • alias: (string) Human-readable replicaset name.
  • servers: ({ServerInfo,…}) Circular reference to all instances in the replicaset.
Local Functions
get_topology ()

Get servers and replicasets lists.

Returns:

({servers={ServerInfo,…},replicasets={ReplicasetInfo,…}})

Or

(nil)

(table) Error description

Module cartridge.lua-api.edit-topology

Administration functions ( edit-topology implementation).

Editing topology
edit_topology (args)

Edit cluster topology. This function can be used for:

  • bootstrapping cluster from scratch
  • joining a server to an existing replicaset
  • creating new replicaset with one or more servers
  • editing uri/labels of servers
  • disabling and expelling servers

(Added in v1.0.0-17)

Parameters:

EditReplicasetParams

Replicatets modifications.

Fields:

JoinServerParams

Parameters required for joining a new server.

Fields:

EditServerParams

Servers modifications.

Fields:

  • uri: (optional string)
  • uuid: (string)
  • labels: (optional table)
  • disabled: (optional boolean)
  • expelled: (optional boolean) Expelling an instance is permanent and can’t be undone.It’s suitable for situations when the hardware is destroyed,snapshots are lost and there is no hope to bring it back to life.

Module cartridge.lua-api.topology

Administration functions (topology related).

Functions
get_servers ([uuid])

Get servers list. Optionally filter out the server with the given uuid.

Parameters:

Returns:

({ServerInfo,…})

Or

(nil)

(table) Error description

get_replicasets ([uuid])

Get replicasets list. Optionally filter out the replicaset with given uuid.

Parameters:

Returns:

({ReplicasetInfo,…})

Or

(nil)

(table) Error description

probe_server (uri)

Discover an instance.

Parameters:

enable_servers (uuids)

Enable nodes after they were disabled.

Parameters:

Returns:

({ServerInfo,…})

Or

(nil)

(table) Error description

disable_servers (uuids)

Temporarily diable nodes.

Parameters:

Returns:

({ServerInfo,…})

Or

(nil)

(table) Error description

Local Functions
get_self ()

Get alias, uri and uuid of current instance.

Returns:

(table)

Module cartridge.lua-api.failover

Administration functions (failover related).

Functions
get_params ()

Get failover configuration.

(Added in v2.0.2-2)

Returns:

(FailoverParams)

set_params (opts)

Configure automatic failover.

(Added in v2.0.2-2)

Parameters:

  • opts:
    • mode: (optional string)
    • state_provider: (optional string)
    • tarantool_params: (optional table)

Returns:

(boolean) true if config applied successfully

Or

(nil)

(table) Error description

get_failover_enabled ()

Get current failover state.

(Deprecated since v2.0.2-2)

set_failover_enabled (enabled)

Enable or disable automatic failover.

(Deprecated since v2.0.2-2)

Parameters:

  • enabled: (boolean)

Returns:

(boolean) New failover state

Or

(nil)

(table) Error description

promote (replicaset_uuid)

Promote leaders in replicasets.

Parameters:

  • replicaset_uuid: (table) ] = leader_uuid }

Returns:

(boolean) true On success

Or

(nil)

(table) Error description

Tables
FailoverParams

Failover parameters.

(Added in v2.0.2-2)

Fields:

  • mode: (string) Supported modes are “disabled”, “eventual” and “stateful”
  • state_provider: (nil or string) Only “tarantool” is supported now
  • tarantool_params: (nil or table) {uri = ‘string’, password = ‘string’}

Module cartridge.lua-api.vshard

Administration functions (vshard related).

Functions
bootstrap_vshard ()

Call vshard.router.bootstrap() . This function distributes all buckets across the replica sets.

Returns:

(boolean) true

Or

(nil)

(table) Error description

Module cartridge.lua-api.deprecated

Administration functions (deprecated).

Deprecated functions
join_server (args)

Join an instance to the cluster (deprecated).

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

edit_server (args)

Edit an instance (deprecated).

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

expel_server (uuid)

Expel an instance (deprecated). Forever.

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

edit_replicaset (args)

Edit replicaset parameters (deprecated).

(Deprecated since v1.0.0-17 in favor of cartridge.admin_edit_topology)

Parameters:

Returns:

(boolean) true

Or

(nil)

(table) Error description

Class cartridge.test-helpers.cluster

Class to run and manage multiple tarantool instances.

Functions
Cluster:new (object)

Build cluster object.

Parameters:

  • object:
    • datadir: (string) Data directory for all cluster servers.
    • server_command: (string) Command to run server.
    • cookie: (string) Cluster cookie.
    • base_http_port: (int) Value to calculate server’s http_port. (optional)
    • base_advertise_port: (int) Value to calculate server’s advertise_port. (optional)
    • use_vshard: (bool) bootstrap vshard after server is started. (optional)
    • replicasets: (tab) Replicasets configuration. List of replicaset_config

Returns:

object

Cluster:server (alias)

Find server by alias.

Parameters:

Returns:

cartridge.test-helpers.server

Cluster:apply_topology ()

Execute edit_topology GraphQL request to setup replicasets, apply roles join servers to replicasets.

Cluster:start ()

Bootstraps cluster if it wasn’t bootstrapped before. Otherwise starts servers.

Cluster:stop ()

Stop all servers.

Cluster:join_server (server)

Register running server in the cluster.

Parameters:

  • server: (Server) Server to be registered.
Cluster:wait_until_healthy (server)

Blocks fiber until cartridge.is_healthy() returns true on main_server.

Parameters:

  • server:
Cluster:upload_config (config)

Upload application config, shortcut for cluster.main_server:upload_config(config) .

Parameters:

  • config:

See also:

  • cartridge.test-helpers.server.Server:upload_config
Cluster:download_config ()

Download application config, shortcut for cluster.main_server:download_config() .

See also:

  • cartridge.test-helpers.server.Server:download_config
Cluster:retrying (config, fn[, …])

Keeps calling fn until it returns without error. Throws last error if config.timeout is elapsed.

Parameters:

  • config: (tab) Options for luatest.helpers.retrying .
  • fn: (func) Function to call
  • …: Args to run fn with. (optional)
Tables
cartridge.test-helpers.cluster.replicaset_config

Replicaset config.

Fields:

  • alias: (string) Prefix to generate server alias automatically. (optional)
  • uuid: (string) Replicaset uuid.
  • roles: ({string}) List of roles for servers in the replicaset.
  • vshard_group: (optional string) Name of vshard group.
  • all_rw: (optional boolan) Make all replicas writable.
  • servers: (tab) List of objects to build Server s with.

Class cartridge.test-helpers.server

Extended luatest.Server class to run tarantool instance.

Functions
Server:build_env ()

Generates environment to run process with. The result is merged into os.environ().

Returns:

map

Server:start ()

Start the server.

Server:stop ()

Stop server process.

Server:graphql (request, http_options)

Perform GraphQL request.

Parameters:

  • request:
    • query: (string) grapqhl query
    • variables: (optional table) variables for graphql query
    • raise: (optional boolean) raise if response contains an error(default: true)
  • http_options: (table) passed to http_request options. (optional)

Returns:

(table) parsed response JSON.

Raises:

  • HTTPRequest error
  • GraphQL error
Server:join_cluster (main_server[, options])

Advertise this server to the cluster.

Parameters:

  • main_server: Server to perform GraphQL request on.
  • options:
    • timeout: request timeout
Server:setup_replicaset (config)

Update server’s replicaset config.

Parameters:

  • config:
    • uuid: replicaset uuid
    • roles: list of roles
    • master:
    • weight:
Server:upload_config (config)

Upload application config.

Parameters:

  • config: (string or table) * table will be encoded as yaml and posted to /admin/config.
Server:download_config ()

Download application config.

Methods
cartridge.test-helpers.server:new (object)

Build server object.

Parameters:

  • object:
    • command: (string) Command to start server process.
    • workdir: (string) Value to be passed in TARANTOOL_WORKDIR .
    • chdir: (bool) Path to cwd before starting a process. (optional)
    • env: (tab) Table to pass as env variables to process. (optional)
    • args: (tab) Args to run command with. (optional)
    • http_port: (int) Value to be passed in TARANTOOL_HTTP_PORT and used to perform HTTP requests. (optional)
    • advertise_port: (int) Value to generate TARANTOOL_ADVERTISE_URI and used for net_box connection.
    • net_box_port: (int) Alias for advertise_port . (optional)
    • net_box_credentials: (tab) Override default net_box credentials. (optional)
    • alias: (string) Instance alias.
    • cluster_cookie: (string) Value to be passed in TARANTOOL_CLUSTER_COOKIE and used as default net_box password.
    • instance_uuid: (string) Server identifier. (optional)
    • replicaset_uuid: (string) Replicaset identifier. (optional)

Returns:

input object

Tarantool Cartridge — a framework for distributed applications development

About Tarantool Cartridge

Tarantool Cartridge allows you to easily develop Tarantool-based applications and run them on one or more Tarantool instances organized into a cluster.

As a cluster management tool, Tarantool Cartridge provides your cluster-aware applications with the following key benefits:

  • horizontal scalability and load balancing via built-in automatic sharding;
  • asynchronous replication;
  • automatic failover;
  • centralized cluster control via GUI or API;
  • automatic configuration synchronization;
  • instance functionality segregation.

A Tarantool Cartridge cluster can segregate functionality between instances via built-in and custom (user-defined) cluster roles. You can toggle instances on and off on the fly during cluster operation. This allows you to put different types of workloads (e.g., compute- and transaction-intensive ones) on different physical servers with dedicated hardware.

Tarantool Cartridge has an external utility called cartridge-cli which provides you with utilities and templates to help:

  • easily set up a development environment for your applications;
  • plug the necessary Lua modules;
  • pack the applications in an environment-independent way: together with module binaries and Tarantool executables.
Getting Started
Create first application

To get a template application that uses Tarantool Cartridge and run it you need to install cartridge-cli utility (supposing that Tarantool is already installed).

Long story short, copy-paste it into console:

tarantoolctl rocks install cartridge-cli
.rocks/bin/cartridge create --name myapp
cd myapp
../.rocks/bin/cartridge build
../.rocks/bin/cartridge start

That’s all! You can visit http://localhost:8081 and see your application Admin Web UI:

Next stages

See:

Contribution

From the point of view of cartridge contributor, the workflow differs: it implies building the project from source (documentation, webui) and running tests.

Building from source

Prerequisites:

The fastest way to build the project is to skip building Web UI:

CMAKE_DUMMY_WEBUI=true tarantoolctl rocks make

But if you want to build frontend too, you’ll also need:

Documentation is generated from source code, but only if ldoc tool is installed:

tarantoolctl rocks install ldoc --server=http://rocks.moonscript.org
tarantoolctl rocks make
Running demo cluster

There are several example entrypoints which are mostly used for testing, but can also be useful for demo purposes or experiments:

tarantoolctl rocks install cartridge-cli
.rocks/bin/cartridge start

# or select specific entrypoint
# .rocks/bin/cartridge start --script ./test/entrypoint/srv_basic.lua

It can be accessed through Web UI (http://localhost:8081) or with binary protocol:

tarantoolctl connect admin@localhost:3301

For more detailed information about cartridge-cli see here.

Auto-generated sources

After GraphQL API is changed one shouldn’t forget to fetch the schema doc/schema.graphql:

npm install graphql-cli@3.0.14
./fetch-schema.sh
Running tests
# Backend
tarantoolctl rocks install luacheck
tarantoolctl rocks install luatest 0.5.0
.rocks/bin/luacheck .
.rocks/bin/luatest -v --exclude cypress

# Frontend
npm install cypress@3.4.1
./frontend-test.sh
.rocks/bin/luatest -v -p cypress

# Collect coverage
tarantoolctl rocks install luacov
tarantoolctl rocks install luacov-console
.rocks/bin/luatest -v --coverage
.rocks/bin/luacov-console `pwd`
.rocks/bin/luacov-console -s

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

[Unreleased]
Added
  • Show “You are here” marker in webui.
  • Respect box.cfg options wal_dir, memtx_dir, vinyl_dir. They can be either absolute or relative - in the later case it’s calculated relative to cartridge.workdir.
  • Implement stateful failover mode.
  • Add new field Graphql field cluster{ issues {topic} }
  • Extend issues API for stateful failover
  • New option in cartridge.cfg({upgrade_schema=…}, …) to perform auto upgrade schema to actual tarantool version (only for leader). It also has bean added for argparse.
  • GraphQL validation significantly improved: scalar values can’t have subselections; composite types must have subselections; omitting non-nullable arguments in variable list is forbidden.
  • Indicate replication and failover issues in WebUI.
Deprecated

Lua API:

  • cartridge.admin_get_failover -> cartridge.failover_get_params
  • cartridge.admin_enable/disable_failover -> cartridge.failover_set_params

GraphQL API:

  • query {cluster {failover} } -> query {cluster {failover_params {…} } }
  • mutation {cluster {failover()} } -> mutation {cluster {failover_params() {…} } }
Fixed
  • Fix quering failure cluster {issues {…} } on uninitialized instance
[2.0.2] - 2020-03-17
Added
  • Expose membership options in argparse module (edit them with environment variables and command-line arguments).
  • New internal module to handle .tar files.

Lua API:

  • cartridge.cfg({webui_blacklist = {‘/admin/code’, …}}): blacklist certain WebUI pages.
  • cartridge.get_schema() referencing older _G.cartridge_get_schema.
  • cartridge.set_schema() referencing older _G.cartridge_set_schema.

GraphQL API:

  • Make use of GraphQL error extensions: provide additional information about class_name and stack of original error.
  • cluster{ issues{ level message … }}: obtain more details on replication status
  • cluster{ self {…} }: new fields app_name, instance_name.
  • servers{ boxinfo { cartridge {…} }}: new fields version, state, error.

Test helpers:

  • Allow specifying all_rw replicaset flag in luatest helpers.
  • Add cluster({env = …}) option for specifying clusterwide environment variables.
Changed
  • Remove redundant topology availability checks from two-phase commit.
  • Prevent instance state transition from ConnectingFullmesh to OperationError if replication fails to connect or to sync. Since now such fails result in staying in ConnectingFullmesh state until it succeeds.
  • Specifying pool.connect() options user, password, reconnect_after are deprecated and ignored, they never worked as intended and will never do. Option connect_timeout is deprecated, but for backward compatibility treated as wait_connected.
Fixed
  • Fix DDL failure if spaces field is null in input schema.
  • Check content of cluster_cookie for absence of special characters so it doesn’t break the authorization. Allowed symbols are [a-zA-Z0-9_.~-].
  • Drop remote-control connections after full-featured box.cfg becomes available to prevent clients from using limited functionality for too long. During instance recovery remote-control won’t accept any connections: clients wait for box.cfg to finish recovery.
  • Update errors rock dependency to 2.1.2: eliminate duplicate stack trace from error.str field.
  • Apply custom_proc_title setting without waiting for box.cfg.
  • Make GraphQL compatible with req:read_cached() call in httpd hooks.
  • Avoid “attempt to index nil value” error when using rpc on an uninitialized instance.
Enhanced in WebUI
  • Add an ability to hide certain WebUI pages.
  • Validate YAML in code editor WebUI.
  • Fix showing errors in Code editor page.
  • Remember last open file in Code editor page. Open first file when local storage is empty.
  • Expand file tree in Code editor page by default.
  • Show Cartridge version in server info dialog.
  • Server alias is clickable in replicaset list.
  • Show networking errors in splash panel instead of notifications.
  • Accept float values for vshard-storage weight.
[2.0.1] - 2020-01-15
Added
  • Expose TARANTOOL_DEMO_URI environment variable in GraphQL query cluster{ self{demo_uri} } for demo purposes.
Fixed
  • Notifications in schema editor WebUI.
  • Fix GraphQL servers query compatibility with old cartridge versions.
  • Two-phase commit backward compatibility with v1.2.0.
[2.0.0] - 2019-12-27
Added
  • Use for frontend part single point of configuration HTTP handlers. As example: you can add your own client HTTP middleware for auth.
  • Built-in DDL schema management. Schema is a part of clusterwide configuration. It’s applied to every instance in cluster.
  • DDL schema editor and code editor pages in WebUI.
  • Instances now have internal state machine which helps to manage cluster operation and protect from invalid state transitions.
  • WebUI checkbox to specify all_rw replicaset property.
  • GraphQL API for clusterwide configuration management.
  • Measure clock difference across instances and provide clock_delta in GraphQL servers query and in admin.get_servers() Lua API.
  • New option in rpc_call(…, {uri=…}) to perform a call on a particular uri.
Changed
  • cartridge.rpc_get_candidates() doesn’t return error “No remotes with role available” anymore, empty table is returned instead.

    (incompatible change)

  • Base advertise port in luatest helpers changed from 33000 to 13300, which is outside ip_local_port_range. Using port from local range usually caused tests failing with an error “address already in use”. (incompatible change, but affects tests only)

  • Whole new way to bootstrap instances. Instead of polling membership for getting clusterwide config the instance now start Remote Control Server (with limited iproto protocol functionality) on the same port. Two-phase commit is then executed over net.box connection. (major change, but still compatible)

  • Failover isn’t triggered on suspect instance state anymore

  • Functions admin.get_servers, get_replicasets and similar GraphQL queries now return an error if the instance handling the request is in state InitError or BootError.

  • Clusterwide configuration is now represented with a file tree. All sections that were tables are saved to separate .yml files. Compatibility with the old-style configuration is preserved. Accessing unmarshalled sections with get_readonly/deepcopy methods is provided without .yml extension as earlier. (major change, but still compatible)

  • After an old leader restarts it’ll try to sync with an active one before taking the leadership again so that failover doesn’t switch too early before leader finishes recovery. If replication setup fails the instance enters the OperationError state, which can be avoided by explicitly specifying replication_connect_quorum = 1 (or 0).

    (major change)

  • Option {prefer_local = false} in rpc_call makes it always use netbox connection, even to connect self. It never tries to perform call locally.

  • Update vshard dependency to 0.1.14.

Removed
  • Function cartridge.bootstrap is removed. Use admin_edit_topology interad. (incompatible change)
  • Misspelled role callback validate is now removed completely. Keep using validate_config.
Fixed
  • Arrange proper failover triggering: don’t miss events, don’t trigger if nothing changed. Fix races in calling apply_config between failover and two-phase commit.
  • Race condition when creating working directory.
  • Hide users page in WebUI when auth backend implements no user management functions. Enable auth switcher is displayed on main cluster page in this case.
  • Displaying boolean values in server details.
  • Add deduplication for WebUI notifications: no more spam.
  • Automatically choose default vshard group in create and edit replicaset modals.
  • Enhance WebUI modals scrolling.
[1.2.0] - 2019-10-21
Added
  • ‘Auto’ placeholder to weight input in the Replicaset forms.
  • ‘Select all’ and ‘Deselect all’ buttons to roles field in Replicaset add and edit forms.
  • Refresh replicaset list in UI after topology edit actions: bootstrap, join, expel, probe, replicaset edit.
  • New Lua API cartridge.http_authorize_request() suitable for checking HTTP request headers.
  • New Lua API cartridge.http_render_response() for generating HTTP response with proper Set-Cookie headers.
  • New Lua API cartridge.http_get_username() to check authorization of active HTTP session.
  • New Lua API cartridge.rpc_get_candidates() to get list of instances suitable for performing a remote call.
  • Network error notification in UI.
  • Allow specifying vshard storage group in test helpers.
Changed
  • Get UI components from Tarantool UI-Kit
  • When recovering from snapshot, instances are started read-only. It is still possible to override it by argparse (command line arguments or environment variables)
Fixed
  • Editing topology with failover_priority argument.
  • Now cartridge.rpc.get_candidates() returns value as specified in doc. Also it accepts new option healthy_only to filter instances which have membership status healthy.
  • Replicaset weight tooltip in replicasets list
  • Total buckets count in buckets tooltip
  • Validation error in user edit form
  • Leader flag in server details modal
  • Human-readable error for invalid GrqphQL queries: Field “x” is not defined on type “String”
  • User management error “attempt to index nil value” when one of users has empty e-mail value
  • Catch rpc_call errors when they are performed locally
[1.1.0] - 2019-09-24
Added
  • New Lua API admin_edit_topology has been added to unite multiple others: admin_edit_replicaset, admin_edit_server, admin_join_server, admin_expel_server. It’s suitable for editing multiple servers/replicasets at once. It can be used for bootstrapping cluster from scratch, joining a server to an existing replicaset, creating new replicaset with one or more servers, editing uri/labels of servers, disabling or expelling servers.
  • Similar API is implemented in a GraphQL mutation cluster{edit_topology()}.
  • New GraphQL mutation cluster { edit_vshard_options } is suitable for fine-tuning vshard options: rebalancer_max_receiving, collect_lua_garbage, sync_timeout, collect_bucket_garbage_interval, rebalancer_disbalance_threshold.
Changed
  • Both bootstrapping from scratch and patching topology in clusterwide config automatically probe servers, which aren’t added to membership yet (earlier it influenced join_server mutation only). This is a prerequisite for multijoin api implementation.
  • WebUI users page is hidden if auth_backend doesn’t provide list_users callback.
Deprecated

Lua API:

  • cartridge.admin_edit_replicaset()
  • cartridge.admin_edit_server()
  • cartridge.admin_join_server()
  • cartridge.admin_expel_server()

GraphQL API:

  • mutation{ edit_replicaset() }
  • mutation{ edit_server() }
  • mutation{ join_server() }
  • mutation{ expel_server() }
Fixed
  • Protect users_acl and auth sections when downloading clusterwide config. Also forbid uploading them.
[1.0.0] - 2019-08-29
Added
  • New parameter topology.replicasets[].all_rw in clusterwide config for configuring all instances in the replicaset as read_only = false. It can be managed with both GraphQL and Lua API edit_replicaset.
  • Remote Control server - a replacement for the box.cfg({listen}), with limited functionality, independent on box.cfg. The server is only to be used internally for bootstrapping new instances.
  • New module argparse for gathering configuration options from command-line arguments, environment variables, and configuration files. It is used internally and overrides cluster.cfg and box.cfg options.
  • Auth parameter cookie_max_age is now configurable with GraphQL API. Also now it’s stored in clusterwide config, so changing it on a single server will affect all others in cluster.
  • Detect that we run under systemd and switch to syslog logging from stderr. This allows to filter log messages by severity with journalctl
  • Redesign WebUI
Changed
  • The project renamed to cartridge. Use require(‘cartridge’) instead of require(‘cluster’). All submodules are renamed too.

    (incompatible change)

  • Submodule cluster.test_helpers renamed to cartridge.test-helpers for consistency.

    (incompatible change)

  • Modifying auth params with GraphQL before the cluster was bootstrapped is now forbidden and returns an error.

  • Introducing a new auth parameter cookie_renew_age. When cluster handles an HTTP request with the cookie, whose age in older then specified, it refreshes the cookie. It may be useful to set cookie_max_age to a small value (for example 10 minutes), so the user will be logged out after cookie_max_age seconds of inactivity. Otherwise, if he’s active, the cookie will be updated every cookie_renew_age seconds and the session will not be interrupted.

  • Changed configuration options for cluster.cfg(): roles now is a mandatory table, workdir is optional now (defaults to “.”)

  • Parameter advertise_uri is optional now, default value is derived as follows. advertise_uri is a compound of <HOST> and <PORT>. When <HOST> isn’t specified, it’s detected as the only non-local IP address. If it can’t be determined or there is more than one IP address available it defaults to “localhost”. When <PORT> isn’t specified, it’s derived from numeric suffix _<N> of TARANTOOL_INSTANCE_NAME: <PORT> = 3300+<N>. Otherwise default <PORT> = 3301 is used.

  • Parameter http_port is derived from instance name too. If it can’t be derived it defaults to 8081. New parameter http_enabled = false is used to disable it (by default it’s enabled).

  • Removed user cluster, which was used internally for orchestration over netbox. Tarantool built-in user admin is used instead now. It can also be used for HTTP authentication to access WebUI. Cluster cookie is used as a password in both cases.

    (incompatible change)

Removed

Two-layer table structure in API, which was deprecated earlier, is now removed completely:

  • cartridge.service_registry.*
  • cartridge.confapplier.*
  • cartridge.admin.*

Instead you can use top-level functions:

  • cartridge.config_get_readonly
  • cartridge.config_get_deepcopy
  • cartridge.config_patch_clusterwide
  • cartridge.service_get
  • cartridge.admin_get_servers
  • cartridge.admin_get_replicasets
  • cartridge.admin_probe_server
  • cartridge.admin_join_server
  • cartridge.admin_edit_server
  • cartridge.admin_expel_server
  • cartridge.admin_enable_servers
  • cartridge.admin_disable_servers
  • cartridge.admin_edit_replicaset
  • cartridge.admin_get_failover
  • cartridge.admin_enable_failover
  • cartridge.admin_disable_failover
[0.10.0] - 2019-08-01
Added
  • Cluster can now operate without vshard roles (if you don’t need sharding). Deprecation warning about implicit vshard roles isn’t issued any more, they aren’t registered unless explicitly specified either in cluster.cfg({roles=…}) or in dependencies to one of user-defined roles.
  • New role flag hidden = true. Hidden roles aren’t listed in cluster.admin.get_replicasets().roles and therefore in WebUI. Hidden roles are supposed to be a dependency for another role, yet they still can be enabled with edit_replicaset function (both Lua and GraphQL).
  • New role flag: permanent = true. Permanent roles are always enabled. Also they are hidden implicitly.
  • New functions in cluster test_helpers - Cluster:upload_config(config) and Cluster:download_config()
Fixed
  • cluster.call_rpc used to return ‘Role unavailable’ error as a first argument instead of nil, err. It can appear when role is specified in clusterwide config, but wasn’t initialized properly. There are two reasons for that: race condition, or prior error in either role init or apply_config methods.
[0.9.2] - 2019-07-12
Fixed
  • Update frontend-core dependency which used to litter package.loaded with tons of JS code
[0.9.1] - 2019-07-10
Added
  • Support for vshard groups in WebUI
Fixed
  • Uniform handling vshard group ‘default’ when multiple groups aren’t configured
  • Requesting multiple vshard groups info before the cluster was bootstrapped
[0.9.0] - 2019-07-02
Added
  • User management page in WebUI
  • Configuring multiple isolated vshard groups in a single cluster
  • Support for joining multiple instances in a single call to config_patch_clusterwide
  • Integration tests helpers
Changed
  • GraphQL API known_roles format now includes roles dependencies
  • cluster.rpc_call option remote_only renamed to prefer_local with the opposite meaning
Fixed
  • Don’t display renamed or removed roles in webui
  • Uploading config without a section removes it from clusterwide config
[0.8.0] - 2019-05-20
Added
  • Specifying role dependencies
  • Set read-only option for slave nodes
  • Labels for servers
Changed
  • Admin http endpoint changed from /graphql to /admin/api

  • Graphql output now contains null values for empty objects

  • Deprecate implicity of vshard roles ‘cluster.roles.vshard-storage’, ‘cluster.roles.vshard-router’. Now they should be specified explicitly in cluster.cfg({roles = …})

  • cluster.service_get(‘vshard-router’) now returns cluster.roles.vshard-router module instead of vshard.router

    (incompatible change)

  • cluster.service_get(‘vshard-storage’) now returns cluster.roles.vshard-storage module instead of vshard.storage

    (incompatible change)

  • cluster.admin.bootstrap_vshard now can be called on any instance

Fixed
  • Operating vshard-storage roles before vshard was bootstrapped
[0.7.0] - 2019-04-05
Added
  • Failover priority configuration using WebUI
  • Remote calls across cluster instances using cluster.rpc module
  • Displaying box.cfg and box.info in WebUI
  • Authorization for HTTP API and WebUI
  • Configuration download/upload via WebUI
  • Lua API documentation, which you can read with tarantoolctl rocks doc cluster command.
Changed
  • Instance restart now triggers config validation before roles initialization
  • Update WebUI design
  • Lua API changed (old functions still work, but issue warnings):
    • cluster.confapplier.* -> cluster.config_*
    • cluster.service_registry.* -> cluster.service_*
[0.6.3] - 2019-02-08
Fixed
  • Cluster used to call ‘validate()’ role method instead of documented ‘validate_config()’, so it was added. The undocumented ‘validate()’ still may be used for the sake of compatibility, but issues a warning that it was deprecated.
[0.6.2] - 2019-02-07
Fixed
  • Minor internal corner cases
[0.6.1] - 2019-02-05
Fixed
  • UI/UX: Replace “bootstrap vshard” button with a noticable panel
  • UI/UX: Replace failover panel with a small button
[0.6.0] - 2019-01-30
Fixed
  • Ability to disable vshard-storage role when zero-weight rebalancing finishes
  • Active master indication during failover
  • Other minor improvements
Changed
  • New frontend core
  • Dependencies update
  • Call to join_server automatically does probe_server
Added
  • Servers filtering by roles, uri, alias in WebUI
[0.5.1] - 2018-12-12
Fixed
  • WebUI errors
[0.5.0] - 2018-12-11
Fixed
  • Graphql mutations order
Changed
  • Callbacks in user-defined roles are called with is_master parameter, indicating state of the instance
  • Combine cluster.init and cluster.register_role api calls in single cluster.cfg
  • Eliminate raising exceptions
  • Absorb http server in cluster.cfg
Added
  • Support of vshard replicaset weight parameter
  • join_server() timeout parameter to make call synchronous
[0.4.0] - 2018-11-27
Fixed/Improved
  • Uncaught exception in WebUI
  • Indicate when backend is unavailable
  • Sort servers in replicaset, put master first
  • Cluster mutations are now synchronous, except joining new servers
Added
  • Lua API for temporarily disabling servers
  • Lua API for implementing user-defined roles
[0.3] - 2018-10-30
Changed
  • Config structure incompatible with v0.2
Added
  • Explicit vshard master configuration
  • Automatic failover (switchable)
  • Unit tests
[0.2] - 2018-10-01
Changed
  • Allow vshard bootstrapping from ui
  • Several stability improvements
[0.1] - 2018-09-25
Added
  • Basic functionality
  • Integration tests
  • Luarock-based packaging
  • Gitlab CI integration

Cartridge Command Line Interface

![pipeline status](https://gitlab.com/tarantool/cartridge-cli/badges/master/pipeline.svg)

Installation

RPM package (CentOS, Fedora, ALT Linux)
# Select a Tarantool version (copy one of these lines):
TARANTOOL_VERSION=1_10
TARANTOOL_VERSION=2x
TARANTOOL_VERSION=2_2

# Set up the Tarantool packages repository:
curl -s https://packagecloud.io/install/repositories/tarantool/$TARANTOOL_VERSION/script.rpm.sh | sudo bash

# Install the package:
sudo yum install cartridge-cli

# Check the installation:
cartridge --version
DEB package (Debian, Ubuntu)
# Select a Tarantool version (copy one of these lines):
TARANTOOL_VERSION=1_10
TARANTOOL_VERSION=2x
TARANTOOL_VERSION=2_2

# Set up the Tarantool packages repository:
curl -s https://packagecloud.io/install/repositories/tarantool/$TARANTOOL_VERSION/script.deb.sh | sudo bash

# Install the package:
sudo apt-get install cartridge-cli

# Check the installation:
cartridge --version
Homebrew (MacOS)
brew install cartridge-cli

# Check the installation:
cartridge --version
From luarocks

To install cartridge-cli to the project’s folder (installed Tarantool is required):

tarantoolctl rocks install cartridge-cli

The executable will be available at .rocks/bin/cartridge. Optionally, you can add .rocks/bin to the executable path:

export PATH=$PWD/.rocks/bin/:$PATH

Usage

For more details, say:

cartridge --help

These commands are supported:

  • create - create a new app from template;
  • pack - pack application into a distributable bundle;
  • build - build application for local development;
  • start - start a Tarantool instance(s);
  • stop - stop a Tarantool instance(s).

Applications lifecycle

Create an application from a template:

cartridge create --name myapp

Build an application:

cartridge build ./myapp

Run instances locally:

cartridge start
cartridge stop

Pack an application into a distributable, for example into an RPM package:

cartridge pack rpm ./myapp

Building an application

You can call cartridge build [<path>] command to build application locally. It can be useful for local development.

This command requires one argument - path to the application. By default - it’s . (current directory).

These steps will be performed on running this command:

  • running cartridge.pre-build (or [DEPRECATED] .cartridge.pre);
  • running tarantoolctl rocks make.

Application packing details

An application can be packed by running the cartridge pack <type> [<path>] command.

These types of packages are supported: rpm, deb, tgz, rock, and docker.

If path isn’t specified, current directory is used by default.

For rpm, deb, and tgz, we also deliver rocks modules and executables specific for the system where the cartridge pack command is running.

For docker, the resulting image will contain rocks modules and executables specific for the base image (centos:8).

Common options:

  • –name: name of the app to pack;
  • –version: application version.

The result will be named as <name>-<version>.<type>. By default, the application name is detected from the rockspec, and the application version is detected from git describe.

Build directory

By default, application build is performed in the temporarily directory in the ~/.cartridge/tmp/, so the packaging process doesn’t affect the contents of your application directory.

You can specify custom build directory for your project in CARTRIDGE_BUILDDIR environment variable. If this directory doesn’t exists, it will be created, used for building the application and then removed. Note, that specified directory can’t be project subdirectory.

If you specify existent directory in CARTRIDGE_BUILDDIR environment variable, CARTRIDGE_BUILDDIR/build.cartridge repository will be used for build and then removed. This directory will be cleaned before building application.

General packing flow and options

A package build comprises these steps:

1. Forming the distribution directory

On this stage, some files will be filtered out: * First, git clean -X -d -f will be called to remove all untracked and

ignored files.
  • Then .rocks and .git directories will be removed.

Note: All application files should have at least a+r permissions (a+rx for directories). Otherwise, cartridge pack command raises an error. Files permissions will be kept “as they are”, and the code files owner will be set to root:root in the resulting package.

2. Building an application

Note: When packing in docker, this stage is running in the container itself, so all rocks dependencies will be installed correctly. For other package types, this stage is running on the local machine, so the resulting package will contain rocks modules and binaries specific for the local OS.

  • First, cartridge.pre-build script is run (if it’s present).
  • Then, tarantoolctl rocks make command is run to deliver all rocks dependencies specified in the rockspec. It will form the .rocks directory that will be delivered in the resulting

package. * Finally, cartridge.post-build script is run (if it’s present).

Special files

You can place these files in your application root to control the application packing flow (see `examples`_ below):

  • cartridge.pre-build: a script to be run before tarantoolctl rocks make. The main purpose of this script is to build some non-standard rocks modules (for example, from a submodule).
  • cartridge.post-build: a script to be run after tarantoolctl rocks make. The main purpose of this script is to remove build artifacts from result package.
  • [DEPRECATED] .cartridge.ignore: here you can specify some files and directories to be excluded from the package build. See the documentation for details.
  • [DEPRECATED] .cartridge.pre: a script to be run before tarantoolctl rocks make. The main purpose of this script is to build some non-standard rocks modules (for example, from a submodule).

Note: You can use any of these approaches (just take care not to mix them): cartridge.pre-build + cartridge.post-build or deprecated .cartridge.ignore + .cartridge.pre.

Note: Packing to docker image isn’t compatible with the deprecated packing flow.

Special files examples

cartridge.pre-build:

#!/bin/sh

# The main purpose of this script is to build some non-standard rocks modules.
# It will be ran before `tarantoolctl rocks make` on application build

tarantoolctl rocks make --chdir ./third_party/my-custom-rock-module

cartridge.post-build:

#!/bin/sh

# The main purpose of this script is to remove build artifacts from result package.
# It will be ran after `tarantoolctl rocks make` on application build

rm -rf third_party
rm -rf node_modules
rm -rf doc

Application packing type-specific details

TGZ

cartridge pack tgz ./myapp will create a .tgz archive containing the application source code and rocks modules described in the application rockspec.

RPM and DEB

cartridge pack rpm|deb ./myapp will create an RPM or DEB package.

If you use an opensource version of Tarantool, the package has a tarantool dependency (version >= <major>.<minor> and < <major+1>, where <major>.<minor> is the version of Tarantool used for application packing). You should enable the Tarantool repo to allow your package manager install this dependency correctly.

After package installation:

  • the application code and rocks modules described in the application rockspec will be placed in the /usr/share/tarantool/<app_name> directory (for Tarantool Enterprise, this directory will also contain tarantool and tarantoolctl binaries);
  • unit files for running the application as a systemd service will be delivered in /etc/systemd/system.

These directories will be created:

  • /etc/tarantool/conf.d/ - directory for instances configuration;
  • /var/lib/tarantool/ - directory to store instances snapshots;
  • /var/run/tarantool/ - directory to store PID-files and console sockets.

Read the doc to learn more about deploying a Tarantool Cartridge application.

To start the instance-1 instance of the myapp service:

systemctl start myapp@instance-1

This instance will look for its configuration across all sections of the YAML file(s) stored in /etc/tarantool/conf.d/*.

Docker

cartridge pack docker ./myapp will build a docker image.

Specific options:

  • –tag - resulting image tag;
  • –from - path to the base dockerfile for runtime image (default to Dockerfile.cartridge in the project root);
  • –build-from - path to the base dockerfile for build image (default to Dockerfile.build.cartridge in the project root);
  • –sdk-local - flag indicates that SDK from local machine should be installed on the image;
  • –sdk-path - path to SDK to be installed on the image (env TARANTOOL_SDK_PATH, has lower priority);

Note, that one and only one of –sdk-local and –sdk-path options should be specified for Tarantool Enterprise.

Image tag

The image is tagged as follows:

  • <name>:<detected_version>: by default;
  • <name>:<version>: if the –version parameter is specified;
  • <tag>: if the –tag parameter is specified.

<name> can be specified in the –name parameter, otherwise it will be auto-detected from the application rockspec.

Tarantool Enterprise SDK

If you use Tarantool Enterprise, you should explicitly specify Tarantool SDK to be delivered on the result image. If you want to use SDK from your local machine, just pass –sdk-local flag to cartridge pack docker command. You can specify local path to the other SDK using –sdk-path option (can be passed in environment variable TARANTOOL_SDK_PATH, has lower priority).

Build and runtime images

In fact, two images are created - build image and runtime image. Build image is used to perform application build. Then, application files are delivered to the runtime image (that is exactly the result of running cartridge pack docker).

Both images are created based on centos:8. All packages required for the default cartridge application build (git, gcc, make, cmake, unzip) are installed on the build image. Opensource Tarantool is installed on both images (if Tarantool Enterprise isn’t used).

If your application requires some other applications for build or runtime, you can specify base layers for build and runtime images:

  • build image: Dockerfile.build.cartridge or –build-from;
  • runtime image: Dockerfile.cartridge or –from.

The base image dockerfile should be started with the FROM centos:8 line (except comments).

For example, if your application requires gcc-c++ for build and zip for runtime:

Dockerfile.cartridge.build:

FROM centos:8
RUN yum install -y gcc-c++
# Note, that git, gcc, make, cmake, unzip packages
# will be installed anyway

Dockerfile.cartridge:

FROM centos:8
RUN yum install -y zip
Building the app

If you want the docker build command to be run with custom arguments, you can specify them using the TARANTOOL_DOCKER_BUILD_ARGS environment variable. For example, TARANTOOL_DOCKER_BUILD_ARGS=’–no-cache –quiet’

Using the result image

The application code will be placed in the /usr/share/tarantool/${app_name} directory. An opensource version of Tarantool will be installed to the image.

The run directory is /var/run/tarantool/${app_name}, the workdir is /var/lib/tarantool/${app_name}.

To start the instance-1 instance of the myapp application, say:

docker run -d \

                --name instance-1 \
                -e TARANTOOL_INSTANCE_NAME=instance-1 \
                -e TARANTOOL_ADVERTISE_URI=3302 \
                -e TARANTOOL_CLUSTER_COOKIE=secret \
                -e TARANTOOL_HTTP_PORT=8082 \
                myapp:1.0.0

By default, TARANTOOL_INSTANCE_NAME is set to default.

To check the instance logs:

docker logs instance-1

It is the user’s responsibility to set up a proper advertise URI (<host>:<port>) if the containers are deployed on different machines.

If the user specifies only a port, cartridge will use an auto-detected IP, so the user needs to configure docker networks to set up inter-instance communication.

You can use docker volumes to store instance snapshots and xlogs on the host machine. To start an image with a new application code, just stop the old container and start a new one using the new image.

Managing instances

cartridge start [APP_NAME[.INSTANCE_NAME]] [options]

Options
    --script FILE       Application's entry point.
                        Defaults to TARANTOOL_SCRIPT,
                        or ./init.lua when running from the app's directory,
                        or :apps_path/:app_name/init.lua in a multi-app env.

    --apps-path PATH    Path to apps directory when running in a multi-app env.
                        Default to /usr/share/tarantool

    --run-dir DIR       Directory with pid and sock files.
                        Defaults to TARANTOOL_RUN_DIR or /var/run/tarantool

    --cfg FILE          Cartridge instances config file.
                        Defaults to TARANTOOL_CFG or ./instances.yml

    --daemonize / -d    Start in background

It starts a tarantool instance with enforced environment variables.

With the –daemonize option, it also waits until the app’s main script is finished.

TARANTOOL_INSTANCE_NAME
TARANTOOL_CFG
TARANTOOL_PID_FILE - %run_dir%/%instance_name%.pid
TARANTOOL_CONSOLE_SOCK - %run_dir%/%instance_name%.pid

cartridge.cfg() uses TARANTOOL_INSTANCE_NAME to read the instance’s configuration from the file provided in TARANTOOL_CFG.

Default options for the cartridge command can be overridden in ./.cartridge.yml or ~/.cartridge.yml, also options from .cartridge.yml can be overriden by corresponding to them environment variables TARANTOOL_*.

Here is an example content of .config.yml:

run_dir: tmp/run
cfg: cartridge.yml
apps_path: /usr/local/share/tarantool
script: init.lua

When APP_NAME is not provided, it is parsed from the ./*.rockspec filename.

When INSTANCE_NAME is not provided, cartridge reads the cfg file and starts all defined instances:

# in the application directory
cartridge start # starts all instances
cartridge start .router_1 # start single instance

# in a multi-application environment
cartridge start app_1 # starts all instances of app_1
cartridge start app_1.router_1 # start single instance

To stop one or more running instances, say:

cartridge stop [APP_NAME[.INSTANCE_NAME]] [options]

These options from `start` command are supported
    --run-dir DIR
    --cfg FILE

Misc

Running end-to-end tests
./test-e2e.sh

Application server

In this chapter, we introduce the basics of working with Tarantool as a Lua application server.

This chapter contains the following sections:

Launching an application

Using Tarantool as an application server, you can write your own applications. Tarantool’s native language for writing applications is Lua, so a typical application would be a file that contains your Lua script. But you can also write applications in C or C++.

Note

If you’re new to Lua, we recommend going over the interactive Tarantool tutorial before proceeding with this chapter. To launch the tutorial, say tutorial() in Tarantool console:

tarantool> tutorial()
---
- |
 Tutorial -- Screen #1 -- Hello, Moon
 ====================================

 Welcome to the Tarantool tutorial.
 It will introduce you to Tarantool’s Lua application server
 and database server, which is what’s running what you’re seeing.
 This is INTERACTIVE -- you’re expected to enter requests
 based on the suggestions or examples in the screen’s text.
 <...>

Let’s create and launch our first Lua application for Tarantool. Here’s a simplest Lua application, the good old “Hello, world!”:

#!/usr/bin/env tarantool
print('Hello, world!')

We save it in a file. Let it be myapp.lua in the current directory.

Now let’s discuss how we can launch our application with Tarantool.

Launching in Docker

If we run Tarantool in a Docker container, the following command will start Tarantool without any application:

$ # create a temporary container and run it in interactive mode
$ docker run --rm -t -i tarantool/tarantool:1

To run Tarantool with our application, we can say:

$ # create a temporary container and
$ # launch Tarantool with our application
$ docker run --rm -t -i \
             -v `pwd`/myapp.lua:/opt/tarantool/myapp.lua \
             -v /data/dir/on/host:/var/lib/tarantool \
             tarantool/tarantool:1 tarantool /opt/tarantool/myapp.lua

Here two resources on the host get mounted in the container:

  • our application file (myapp.lua) and
  • Tarantool data directory (/data/dir/on/host).

By convention, the directory for Tarantool application code inside a container is /opt/tarantool, and the directory for data is /var/lib/tarantool.

Launching a binary program

If we run Tarantool from a binary package or from a source build, we can launch our application:

  • in the script mode,
  • as a server application, or
  • as a daemon service.

The simplest way is to pass the filename to Tarantool at start:

$ tarantool myapp.lua
Hello, world!
$

Tarantool starts, executes our script in the script mode and exits.

Now let’s turn this script into a server application. We use box.cfg from Tarantool’s built-in Lua module to:

  • launch the database (a database has a persistent on-disk state, which needs to be restored after we start an application) and
  • configure Tarantool as a server that accepts requests over a TCP port.

We also add some simple database logic, using space.create() and create_index() to create a space with a primary index. We use the function box.once() to make sure that our logic will be executed only once when the database is initialized for the first time, so we don’t try to create an existing space or index on each invocation of the script:

#!/usr/bin/env tarantool
-- Configure database
box.cfg {
   listen = 3301
}
box.once("bootstrap", function()
   box.schema.space.create('tweedledum')
   box.space.tweedledum:create_index('primary',
       { type = 'TREE', parts = {1, 'unsigned'}})
end)

Now we launch our application in the same manner as before:

$ tarantool myapp.lua
Hello, world!
2017-08-11 16:07:14.250 [41436] main/101/myapp.lua C> version 2.1.0-429-g4e5231702
2017-08-11 16:07:14.250 [41436] main/101/myapp.lua C> log level 5
2017-08-11 16:07:14.251 [41436] main/101/myapp.lua I> mapping 1073741824 bytes for tuple arena...
2017-08-11 16:07:14.255 [41436] main/101/myapp.lua I> recovery start
2017-08-11 16:07:14.255 [41436] main/101/myapp.lua I> recovering from `./00000000000000000000.snap'
2017-08-11 16:07:14.271 [41436] main/101/myapp.lua I> recover from `./00000000000000000000.xlog'
2017-08-11 16:07:14.271 [41436] main/101/myapp.lua I> done `./00000000000000000000.xlog'
2017-08-11 16:07:14.272 [41436] main/102/hot_standby I> recover from `./00000000000000000000.xlog'
2017-08-11 16:07:14.274 [41436] iproto/102/iproto I> binary: started
2017-08-11 16:07:14.275 [41436] iproto/102/iproto I> binary: bound to [::]:3301
2017-08-11 16:07:14.275 [41436] main/101/myapp.lua I> done `./00000000000000000000.xlog'
2017-08-11 16:07:14.278 [41436] main/101/myapp.lua I> ready to accept requests

This time, Tarantool executes our script and keeps working as a server, accepting TCP requests on port 3301. We can see Tarantool in the current session’s process list:

$ ps | grep "tarantool"
  PID TTY           TIME CMD
41608 ttys001       0:00.47 tarantool myapp.lua <running>

But the Tarantool instance will stop if we close the current terminal window. To detach Tarantool and our application from the terminal window, we can launch it in the daemon mode. To do so, we add some parameters to box.cfg{}:

  • background = true that actually tells Tarantool to work as a daemon service,
  • log = 'dir-name' that tells the Tarantool daemon where to store its log file (other log settings are available in Tarantool log module), and
  • pid_file = 'file-name' that tells the Tarantool daemon where to store its pid file.

For example:

box.cfg {
   listen = 3301,
   background = true,
   log = '1.log',
   pid_file = '1.pid'
}

We launch our application in the same manner as before:

$ tarantool myapp.lua
Hello, world!
$

Tarantool executes our script, gets detached from the current shell session (you won’t see it with ps | grep "tarantool") and continues working in the background as a daemon attached to the global session (with SID = 0):

$ ps -ef | grep "tarantool"
  PID SID     TIME  CMD
42178   0  0:00.72 tarantool myapp.lua <running>

Now that we have discussed how to create and launch a Lua application for Tarantool, let’s dive deeper into programming practices.

Creating an application

Further we walk you through key programming practices that will give you a good start in writing Lua applications for Tarantool. For an adventure, this is a story of implementing… a real microservice based on Tarantool! We implement a backend for a simplified version of Pokémon Go, a location-based augmented reality game released in mid-2016. In this game, players use a mobile device’s GPS capability to locate, capture, battle and train virtual monsters called “pokémon”, who appear on the screen as if they were in the same real-world location as the player.

To stay within the walk-through format, let’s narrow the original gameplay as follows. We have a map with pokémon spawn locations. Next, we have multiple players who can send catch-a-pokémon requests to the server (which runs our Tarantool microservice). The server replies whether the pokémon is caught or not, increases the player’s pokémon counter if yes, and triggers the respawn-a-pokémon method that spawns a new pokémon at the same location in a while.

We leave client-side applications outside the scope of this story. Yet we promise a mini-demo in the end to simulate real users and give us some fun. :-)

_images/aster.svg

First, what would be the best way to deliver our microservice?

Modules, rocks and applications

To make our game logic available to other developers and Lua applications, let’s put it into a Lua module.

A module (called “rock” in Lua) is an optional library which enhances Tarantool functionality. So, we can install our logic as a module in Tarantool and use it from any Tarantool application or module. Like applications, modules in Tarantool can be written in Lua (rocks), C or C++.

Modules are good for two things:

  • easier code management (reuse, packaging, versioning), and
  • hot code reload without restarting the Tarantool instance.

Technically, a module is a file with source code that exports its functions in an API. For example, here is a Lua module named mymodule.lua that exports one function named myfun:

local exports = {}
exports.myfun = function(input_string)
   print('Hello', input_string)
end
return exports

To launch the function myfun() – from another module, from a Lua application, or from Tarantool itself, – we need to save this module as a file, then load this module with the require() directive and call the exported function.

For example, here’s a Lua application that uses myfun() function from mymodule.lua module:

-- loading the module
local mymodule = require('mymodule')

-- calling myfun() from within test() function
local test = function()
  mymodule.myfun()
end

A thing to remember here is that the require() directive takes load paths to Lua modules from the package.path variable. This is a semicolon-separated string, where a question mark is used to interpolate the module name. By default, this variable contains system-wide Lua paths and the working directory. But if we put our modules inside a specific folder (e.g. scripts/), we need to add this folder to package.path before any calls to require():

package.path = 'scripts/?.lua;' .. package.path

For our microservice, a simple and convenient solution would be to put all methods in a Lua module (say pokemon.lua) and to write a Lua application (say game.lua) that initializes the gaming environment and starts the game loop.

_images/aster.svg

Now let’s get down to implementation details. In our game, we need three entities:

  • map, which is an array of pokémons with coordinates of respawn locations; in this version of the game, let a location be a rectangle identified with two points, upper-left and lower-right;
  • player, which has an ID, a name, and coordinates of the player’s location point;
  • pokémon, which has the same fields as the player, plus a status (active/inactive, that is present on the map or not) and a catch probability (well, let’s give our pokémons a chance to escape :-) )

We’ll store these entities as tuples in Tarantool spaces. But to deliver our backend application as a microservice, the good practice would be to send/receive our data in the universal JSON format, thus using Tarantool as a document storage.

Avro schemas

To store JSON data as tuples, we will apply a savvy practice which reduces data footprint and ensures all stored documents are valid. We will use Tarantool module avro-schema which checks the schema of a JSON document and converts it to a Tarantool tuple. The tuple will contain only field values, and thus take a lot less space than the original document. In avro-schema terms, converting JSON documents to tuples is “flattening”, and restoring the original documents is “unflattening”.

First you need to install the module with tarantoolctl rocks install avro-schema.

Further usage is quite straightforward:

  1. For each entity, we need to define a schema in Apache Avro schema syntax, where we list the entity’s fields with their names and Avro data types.
  2. At initialization, we call avro-schema.create() that creates objects in memory for all schema entities, and compile() that generates flatten/unflatten methods for each entity.
  3. Further on, we just call flatten/unflatten methods for a respective entity on receiving/sending the entity’s data.

Here’s what our schema definitions for the player and pokémon entities look like:

local schema = {
    player = {
        type="record",
        name="player_schema",
        fields={
            {name="id", type="long"},
            {name="name", type="string"},
            {
                name="location",
                type= {
                    type="record",
                    name="player_location",
                    fields={
                        {name="x", type="double"},
                        {name="y", type="double"}
                    }
                }
            }
        }
    },
    pokemon = {
        type="record",
        name="pokemon_schema",
        fields={
            {name="id", type="long"},
            {name="status", type="string"},
            {name="name", type="string"},
            {name="chance", type="double"},
            {
                name="location",
                type= {
                    type="record",
                    name="pokemon_location",
                    fields={
                        {name="x", type="double"},
                        {name="y", type="double"}
                    }
                }
            }
        }
    }
}

And here’s how we create and compile our entities at initialization:

-- load avro-schema module with require()
local avro = require('avro_schema')

-- create models
local ok_m, pokemon = avro.create(schema.pokemon)
local ok_p, player = avro.create(schema.player)
if ok_m and ok_p then
    -- compile models
    local ok_cm, compiled_pokemon = avro.compile(pokemon)
    local ok_cp, compiled_player = avro.compile(player)
    if ok_cm and ok_cp then
        -- start the game
        <...>
    else
        log.error('Schema compilation failed')
    end
else
    log.info('Schema creation failed')
end
return false

As for the map entity, it would be an overkill to introduce a schema for it, because we have only one map in the game, it has very few fields, and – which is most important – we use the map only inside our logic, never exposing it to external users.

_images/aster.svg

Next, we need methods to implement the game logic. To simulate object-oriented programming in our Lua code, let’s store all Lua functions and shared variables in a single local variable (let’s name it as game). This will allow us to address functions or variables from within our module as self.func_name or self.var_name. Like this:

local game = {
    -- a local variable
    num_players = 0,

    -- a method that prints a local variable
    hello = function(self)
      print('Hello! Your player number is ' .. self.num_players .. '.')
    end,

    -- a method that calls another method and returns a local variable
    sign_in = function(self)
      self.num_players = self.num_players + 1
      self:hello()
      return self.num_players
    end
}

In OOP terms, we can now regard local variables inside game as object fields, and local functions as object methods.

Note

In this manual, Lua examples use local variables. Use global variables with caution, since the module’s users may be unaware of them.

To enable/disable the use of undeclared global variables in your Lua code, use Tarantool’s strict module.

So, our game module will have the following methods:

  • catch() to calculate whether the pokémon was caught (besides the coordinates of both the player and pokémon, this method will apply a probability factor, so not every pokémon within the player’s reach will be caught);
  • respawn() to add missing pokémons to the map, say, every 60 seconds (we assume that a frightened pokémon runs away, so we remove a pokémon from the map on any catch attempt and add it back to the map in a while);
  • notify() to log information about caught pokémons (like “Player 1 caught pokémon A”);
  • start() to initialize the game (it will create database spaces, create and compile avro schemas, and launch respawn()).

Besides, it would be convenient to have methods for working with Tarantool storage. For example:

  • add_pokemon() to add a pokémon to the database, and
  • map() to populate the map with all pokémons stored in Tarantool.

We’ll need these two methods primarily when initializing our game, but we can also call them later, for example to test our code.

Bootstrapping a database

Let’s discuss game initialization. In start() method, we need to populate Tarantool spaces with pokémon data. Why not keep all game data in memory? Why use a database? The answer is: persistence. Without a database, we risk losing data on power outage, for example. But if we store our data in an in-memory database, Tarantool takes care to persist it on disk whenever it’s changed. This gives us one more benefit: quick startup in case of failure. Tarantool has a smart algorithm that quickly loads all data from disk into memory on startup, so the warm-up takes little time.

We’ll be using functions from Tarantool built-in box module:

  • box.schema.create_space('pokemons') to create a space named pokemon for storing information about pokémons (we don’t create a similar space for players, because we intend to only send/receive player information via API calls, so we needn’t store it);
  • box.space.pokemons:create_index('primary', {type = 'hash', parts = {1, 'unsigned'}}) to create a primary HASH index by pokémon ID;
  • box.space.pokemons:create_index('status', {type = 'tree', parts = {2, 'str'}}) to create a secondary TREE index by pokémon status.

Notice the parts = argument in the index specification. The pokémon ID is the first field in a Tarantool tuple since it’s the first member of the respective Avro type. So does the pokémon status. The actual JSON document may have ID or status fields at any position of the JSON map.

The implementation of start() method looks like this:

-- create game object
start = function(self)
    -- create spaces and indexes
    box.once('init', function()
        box.schema.create_space('pokemons')
        box.space.pokemons:create_index(
            "primary", {type = 'hash', parts = {1, 'unsigned'}}
        )
        box.space.pokemons:create_index(
            "status", {type = "tree", parts = {2, 'str'}}
        )
    end)

    -- create models
    local ok_m, pokemon = avro.create(schema.pokemon)
    local ok_p, player = avro.create(schema.player)
    if ok_m and ok_p then
        -- compile models
        local ok_cm, compiled_pokemon = avro.compile(pokemon)
        local ok_cp, compiled_player = avro.compile(player)
        if ok_cm and ok_cp then
            -- start the game
            <...>
        else
            log.error('Schema compilation failed')
        end
    else
        log.info('Schema creation failed')
    end
    return false
end

GIS

Now let’s discuss catch(), which is the main method in our gaming logic.

Here we receive the player’s coordinates and the target pokémon’s ID number, and we need to answer whether the player has actually caught the pokémon or not (remember that each pokémon has a chance to escape).

First thing, we validate the received player data against its Avro schema. And we check whether such a pokémon exists in our database and is displayed on the map (the pokémon must have the active status):

catch = function(self, pokemon_id, player)
    -- check player data
    local ok, tuple = self.player_model.flatten(player)
    if not ok then
        return false
    end
    -- get pokemon data
    local p_tuple = box.space.pokemons:get(pokemon_id)
    if p_tuple == nil then
        return false
    end
    local ok, pokemon = self.pokemon_model.unflatten(p_tuple)
    if not ok then
        return false
    end
    if pokemon.status ~= self.state.ACTIVE then
        return false
    end
    -- more catch logic to follow
    <...>
end

Next, we calculate the answer: caught or not.

To work with geographical coordinates, we use Tarantool gis module.

To keep things simple, we don’t load any specific map, assuming that we deal with a world map. And we do not validate incoming coordinates, assuming again that all received locations are within the planet Earth.

We use two geo-specific variables:

  • wgs84, which stands for the latest revision of the World Geodetic System standard, WGS84. Basically, it comprises a standard coordinate system for the Earth and represents the Earth as an ellipsoid.
  • nationalmap, which stands for the US National Atlas Equal Area. This is a projected coordinates system based on WGS84. It gives us a zero base for location projection and allows positioning our players and pokémons in meters.

Both these systems are listed in the EPSG Geodetic Parameter Registry, where each system has a unique number. In our code, we assign these listing numbers to respective variables:

wgs84 = 4326,
nationalmap = 2163,

For our game logic, we need one more variable, catch_distance, which defines how close a player must get to a pokémon before trying to catch it. Let’s set the distance to 100 meters.

catch_distance = 100,

Now we’re ready to calculate the answer. We need to project the current location of both player (p_pos) and pokémon (m_pos) on the map, check whether the player is close enough to the pokémon (using catch_distance), and calculate whether the player has caught the pokémon (here we generate some random value and let the pokémon escape if the random value happens to be less than 100 minus pokémon’s chance value):

-- project locations
local m_pos = gis.Point(
    {pokemon.location.x, pokemon.location.y}, self.wgs84
):transform(self.nationalmap)
local p_pos = gis.Point(
    {player.location.x, player.location.y}, self.wgs84
):transform(self.nationalmap)

-- check catch distance condition
if p_pos:distance(m_pos) > self.catch_distance then
    return false
end
-- try to catch pokemon
local caught = math.random(100) >= 100 - pokemon.chance
if caught then
    -- update and notify on success
    box.space.pokemons:update(
        pokemon_id, {{'=', self.STATUS, self.state.CAUGHT}}
    )
    self:notify(player, pokemon)
end
return caught

Index iterators

By our gameplay, all caught pokémons are returned back to the map. We do this for all pokémons on the map every 60 seconds using respawn() method. We iterate through pokémons by status using Tarantool index iterator function index:pairs and reset the statuses of all “caught” pokémons back to “active” using box.space.pokemons:update().

respawn = function(self)
    fiber.name('Respawn fiber')
    for _, tuple in box.space.pokemons.index.status:pairs(
           self.state.CAUGHT) do
        box.space.pokemons:update(
            tuple[self.ID],
            {{'=', self.STATUS, self.state.ACTIVE}}
        )
    end
 end

For readability, we introduce named fields:

ID = 1, STATUS = 2,

The complete implementation of start() now looks like this:

-- create game object
start = function(self)
    -- create spaces and indexes
    box.once('init', function()
       box.schema.create_space('pokemons')
       box.space.pokemons:create_index(
           "primary", {type = 'hash', parts = {1, 'unsigned'}}
       )
       box.space.pokemons:create_index(
           "status", {type = "tree", parts = {2, 'str'}}
       )
    end)

    -- create models
    local ok_m, pokemon = avro.create(schema.pokemon)
    local ok_p, player = avro.create(schema.player)
    if ok_m and ok_p then
        -- compile models
        local ok_cm, compiled_pokemon = avro.compile(pokemon)
        local ok_cp, compiled_player = avro.compile(player)
        if ok_cm and ok_cp then
            -- start the game
            self.pokemon_model = compiled_pokemon
            self.player_model = compiled_player
            self.respawn()
            log.info('Started')
            return true
         else
            log.error('Schema compilation failed')
         end
    else
        log.info('Schema creation failed')
    end
    return false
end

Fibers

But wait! If we launch it as shown above – self.respawn() – the function will be executed only once, just like all the other methods. But we need to execute respawn() every 60 seconds. Creating a fiber is the Tarantool way of making application logic work in the background at all times.

A fiber exists for executing instruction sequences but it is not a thread. The key difference is that threads use preemptive multitasking, while fibers use cooperative multitasking. This gives fibers the following two advantages over threads:

  • Better controllability. Threads often depend on the kernel’s thread scheduler to preempt a busy thread and resume another thread, so preemption may occur unpredictably. Fibers yield themselves to run another fiber while executing, so yields are controlled by application logic.
  • Higher performance. Threads require more resources to preempt as they need to address the system kernel. Fibers are lighter and faster as they don’t need to address the kernel to yield.

Yet fibers have some limitations as compared with threads, the main limitation being no multi-core mode. All fibers in an application belong to a single thread, so they all use the same CPU core as the parent thread. Meanwhile, this limitation is not really serious for Tarantool applications, because a typical bottleneck for Tarantool is the HDD, not the CPU.

A fiber has all the features of a Lua coroutine and all programming concepts that apply for Lua coroutines will apply for fibers as well. However, Tarantool has made some enhancements for fibers and has used fibers internally. So, although use of coroutines is possible and supported, use of fibers is recommended.

Well, performance or controllability are of little importance in our case. We’ll launch respawn() in a fiber to make it work in the background all the time. To do so, we’ll need to amend respawn():

respawn = function(self)
    -- let's give our fiber a name;
    -- this will produce neat output in fiber.info()
    fiber.name('Respawn fiber')
    while true do
        for _, tuple in box.space.pokemons.index.status:pairs(
                self.state.CAUGHT) do
            box.space.pokemons:update(
                tuple[self.ID],
                {{'=', self.STATUS, self.state.ACTIVE}}
            )
        end
        fiber.sleep(self.respawn_time)
    end
end

and call it as a fiber in start():

start = function(self)
    -- create spaces and indexes
        <...>
    -- create models
        <...>
    -- compile models
        <...>
    -- start the game
       self.pokemon_model = compiled_pokemon
       self.player_model = compiled_player
       fiber.create(self.respawn, self)
       log.info('Started')
    -- errors if schema creation or compilation fails
       <...>
end

Logging

One more helpful function that we used in start() was log.infо() from Tarantool log module. We also need this function in notify() to add a record to the log file on every successful catch:

-- event notification
notify = function(self, player, pokemon)
    log.info("Player '%s' caught '%s'", player.name, pokemon.name)
end

We use default Tarantool log settings, so we’ll see the log output in console when we launch our application in script mode.

_images/aster.svg

Great! We’ve discussed all programming practices used in our Lua module (see pokemon.lua).

Now let’s prepare the test environment. As planned, we write a Lua application (see game.lua) to initialize Tarantool’s database module, initialize our game, call the game loop and simulate a couple of player requests.

To launch our microservice, we put both pokemon.lua module and game.lua application in the current directory, install all external modules, and launch the Tarantool instance running our game.lua application (this example is for Ubuntu):

$ ls
game.lua  pokemon.lua
$ sudo apt-get install tarantool-gis
$ sudo apt-get install tarantool-avro-schema
$ tarantool game.lua

Tarantool starts and initializes the database. Then Tarantool executes the demo logic from game.lua: adds a pokémon named Pikachu (its chance to be caught is very high, 99.1), displays the current map (it contains one active pokémon, Pikachu) and processes catch requests from two players. Player1 is located just near the lonely Pikachu pokémon and Player2 is located far away from it. As expected, the catch results in this output are “true” for Player1 and “false” for Player2. Finally, Tarantool displays the current map which is empty, because Pikachu is caught and temporarily inactive:

$ tarantool game.lua
2017-01-09 20:19:24.605 [6282] main/101/game.lua C> version 1.7.3-43-gf5fa1e1
2017-01-09 20:19:24.605 [6282] main/101/game.lua C> log level 5
2017-01-09 20:19:24.605 [6282] main/101/game.lua I> mapping 1073741824 bytes for tuple arena...
2017-01-09 20:19:24.609 [6282] main/101/game.lua I> initializing an empty data directory
2017-01-09 20:19:24.634 [6282] snapshot/101/main I> saving snapshot `./00000000000000000000.snap.inprogress'
2017-01-09 20:19:24.635 [6282] snapshot/101/main I> done
2017-01-09 20:19:24.641 [6282] main/101/game.lua I> ready to accept requests
2017-01-09 20:19:24.786 [6282] main/101/game.lua I> Started
---
- {'id': 1, 'status': 'active', 'location': {'y': 2, 'x': 1}, 'name': 'Pikachu', 'chance': 99.1}
...

2017-01-09 20:19:24.789 [6282] main/101/game.lua I> Player 'Player1' caught 'Pikachu'
true
false
--- []
...

2017-01-09 20:19:24.789 [6282] main C> entering the event loop

nginx

In the real life, this microservice would work over HTTP. Let’s add nginx web server to our environment and make a similar demo. But how do we make Tarantool methods callable via REST API? We use nginx with Tarantool nginx upstream module and create one more Lua script (app.lua) that exports three of our game methods – add_pokemon(), map() and catch() – as REST endpoints of the nginx upstream module:

local game = require('pokemon')
box.cfg{listen=3301}
game:start()

-- add, map and catch functions exposed to REST API
function add(request, pokemon)
    return {
        result=game:add_pokemon(pokemon)
    }
end

function map(request)
    return {
        map=game:map()
    }
end

function catch(request, pid, player)
    local id = tonumber(pid)
    if id == nil then
        return {result=false}
    end
    return {
        result=game:catch(id, player)
    }
end

An easy way to configure and launch nginx would be to create a Docker container based on a Docker image with nginx and the upstream module already installed (see http/Dockerfile). We take a standard nginx.conf, where we define an upstream with our Tarantool backend running (this is another Docker container, see details below):

upstream tnt {
      server pserver:3301 max_fails=1 fail_timeout=60s;
      keepalive 250000;
}

and add some Tarantool-specific parameters (see descriptions in the upstream module’s README file):

server {
  server_name tnt_test;

  listen 80 default deferred reuseport so_keepalive=on backlog=65535;

  location = / {
      root /usr/local/nginx/html;
  }

  location /api {
    # answers check infinity timeout
    tnt_read_timeout 60m;
    if ( $request_method = GET ) {
       tnt_method "map";
    }
    tnt_http_rest_methods get;
    tnt_http_methods all;
    tnt_multireturn_skip_count 2;
    tnt_pure_result on;
    tnt_pass_http_request on parse_args;
    tnt_pass tnt;
  }
}

Likewise, we put Tarantool server and all our game logic in a second Docker container based on the official Tarantool 1.9 image (see src/Dockerfile) and set the container’s default command to tarantool app.lua. This is the backend.

Non-blocking IO

To test the REST API, we create a new script (client.lua), which is similar to our game.lua application, but makes HTTP POST and GET requests rather than calling Lua functions:

local http = require('curl').http()
local json = require('json')
local URI = os.getenv('SERVER_URI')
local fiber = require('fiber')

local player1 = {
    name="Player1",
    id=1,
    location = {
        x=1.0001,
        y=2.0003
    }
}
local player2 = {
    name="Player2",
    id=2,
    location = {
        x=30.123,
        y=40.456
    }
}

local pokemon = {
    name="Pikachu",
    chance=99.1,
    id=1,
    status="active",
    location = {
        x=1,
        y=2
    }
}

function request(method, body, id)
    local resp = http:request(
        method, URI, body
    )
    if id ~= nil then
        print(string.format('Player %d result: %s',
            id, resp.body))
    else
        print(resp.body)
    end
end

local players = {}
function catch(player)
    fiber.sleep(math.random(5))
    print('Catch pokemon by player ' .. tostring(player.id))
    request(
        'POST', '{"method": "catch",
        "params": [1, '..json.encode(player)..']}',
        tostring(player.id)
    )
    table.insert(players, player.id)
end

print('Create pokemon')
request('POST', '{"method": "add",
    "params": ['..json.encode(pokemon)..']}')
request('GET', '')

fiber.create(catch, player1)
fiber.create(catch, player2)

-- wait for players
while #players ~= 2 do
    fiber.sleep(0.001)
end

request('GET', '')
os.exit()

When you run this script, you’ll notice that both players have equal chances to make the first attempt at catching the pokémon. In a classical Lua script, a networked call blocks the script until it’s finished, so the first catch attempt can only be done by the player who entered the game first. In Tarantool, both players play concurrently, since all modules are integrated with Tarantool cooperative multitasking and use non-blocking I/O.

Indeed, when Player1 makes its first REST call, the script doesn’t block. The fiber running catch() function on behalf of Player1 issues a non-blocking call to the operating system and yields control to the next fiber, which happens to be the fiber of Player2. Player2’s fiber does the same. When the network response is received, Player1’s fiber is activated by Tarantool cooperative scheduler, and resumes its work. All Tarantool modules use non-blocking I/O and are integrated with Tarantool cooperative scheduler. For module developers, Tarantool provides an API.

For our HTTP test, we create a third container based on the official Tarantool 1.9 image (see client/Dockerfile) and set the container’s default command to tarantool client.lua.

_images/aster.svg

To run this test locally, download our pokemon project from GitHub and say:

$ docker-compose build
$ docker-compose up

Docker Compose builds and runs all the three containers: pserver (Tarantool backend), phttp (nginx) and pclient (demo client). You can see log messages from all these containers in the console, pclient saying that it made an HTTP request to create a pokémon, made two catch requests, requested the map (empty since the pokémon is caught and temporarily inactive) and exited:

pclient_1  | Create pokemon
<...>
pclient_1  | {"result":true}
pclient_1  | {"map":[{"id":1,"status":"active","location":{"y":2,"x":1},"name":"Pikachu","chance":99.100000}]}
pclient_1  | Catch pokemon by player 2
pclient_1  | Catch pokemon by player 1
pclient_1  | Player 1 result: {"result":true}
pclient_1  | Player 2 result: {"result":false}
pclient_1  | {"map":[]}
pokemon_pclient_1 exited with code 0

Congratulations! Here’s the end point of our walk-through. As further reading, see more about installing and contributing a module.

See also reference on Tarantool modules and C API, and don’t miss our Lua cookbook recipes.

Installing a module

Modules in Lua and C that come from Tarantool developers and community contributors are available in the following locations:

  • Tarantool modules repository (see below)
  • Tarantool deb/rpm repositories (see below)

Installing a module from a repository

See README in tarantool/rocks repository for detailed instructions.

Installing a module from deb/rpm

Follow these steps:

  1. Install Tarantool as recommended on the download page.

  2. Install the module you need. Look up the module’s name on Tarantool rocks page and put the prefix “tarantool-” before the module name to avoid ambiguity:

    $ # for Ubuntu/Debian:
    $ sudo apt-get install tarantool-<module-name>
    
    $ # for RHEL/CentOS/Amazon:
    $ sudo yum install tarantool-<module-name>
    

    For example, to install the module shard on Ubuntu, say:

    $ sudo apt-get install tarantool-shard
    

Once these steps are complete, you can:

  • load any module with

    tarantool> name = require('module-name')
    

    for example:

    tarantool> shard = require('shard')
    
  • search locally for installed modules using package.path (Lua) or package.cpath (C):

    tarantool> package.path
    ---
    - ./?.lua;./?/init.lua; /usr/local/share/tarantool/?.lua;/usr/local/share/
    tarantool/?/init.lua;/usr/share/tarantool/?.lua;/usr/share/tarantool/?/ini
    t.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua;/
    usr/share/lua/5.1/?.lua;/usr/share/lua/5.1/?/init.lua;
    ...
    
    tarantool> package.cpath
    ---
    - ./?.so;/usr/local/lib/x86_64-linux-gnu/tarantool/?.so;/usr/lib/x86_64-li
    nux-gnu/tarantool/?.so;/usr/local/lib/tarantool/?.so;/usr/local/lib/x86_64
    -linux-gnu/lua/5.1/?.so;/usr/lib/x86_64-linux-gnu/lua/5.1/?.so;/usr/local/
    lib/lua/5.1/?.so;
    ...
    

    Note

    Question-marks stand for the module name that was specified earlier when saying require('module-name').

Contributing a module

We have already discussed how to create a simple module in Lua for local usage. Now let’s discuss how to create a more advanced Tarantool module and then get it published on Tarantool rocks page and included in official Tarantool images for Docker.

To help our contributors, we have created modulekit, a set of templates for creating Tarantool modules in Lua and C.

Note

As a prerequisite for using modulekit, install tarantool-dev package first. For example, in Ubuntu say:

$ sudo apt-get install tarantool-dev

Contributing a module in Lua

See README in “luakit” branch of tarantool/modulekit repository for detailed instructions and examples.

Contributing a module in C

In some cases, you may want to create a Tarantool module in C rather than in Lua. For example, to work with specific hardware or low-level system interfaces.

See README in “ckit” branch of tarantool/modulekit repository for detailed instructions and examples.

Note

You can also create modules with C++, provided that the code does not throw exceptions.

Reloading a module

You can reload any Tarantool application or module with zero downtime.

Reloading a module in Lua

Here’s an example that illustrates the most typical case – “update and reload”.

Note

In this example, we use recommended administration practices based on instance files and tarantoolctl utility.

  1. Update the application file.

    For example, a module in /usr/share/tarantool/app.lua:

    local function start()
      -- initial version
      box.once("myapp:v1.0", function()
        box.schema.space.create("somedata")
        box.space.somedata:create_index("primary")
        ...
      end)
    
      -- migration code from 1.0 to 1.1
      box.once("myapp:v1.1", function()
        box.space.somedata.index.primary:alter(...)
        ...
      end)
    
      -- migration code from 1.1 to 1.2
      box.once("myapp:v1.2", function()
        box.space.somedata.index.primary:alter(...)
        box.space.somedata:insert(...)
        ...
      end)
    end
    
    -- start some background fibers if you need
    
    local function stop()
      -- stop all background fibers and clean up resources
    end
    
    local function api_for_call(xxx)
      -- do some business
    end
    
    return {
      start = start,
      stop = stop,
      api_for_call = api_for_call
    }
    
  2. Update the instance file.

    For example, /etc/tarantool/instances.enabled/my_app.lua:

    #!/usr/bin/env tarantool
    --
    -- hot code reload example
    --
    
    box.cfg({listen = 3302})
    
    -- ATTENTION: unload it all properly!
    local app = package.loaded['app']
    if app ~= nil then
      -- stop the old application version
      app.stop()
      -- unload the application
      package.loaded['app'] = nil
      -- unload all dependencies
      package.loaded['somedep'] = nil
    end
    
    -- load the application
    log.info('require app')
    app = require('app')
    
    -- start the application
    app.start({some app options controlled by sysadmins})
    

    The important thing here is to properly unload the application and its dependencies.

  3. Manually reload the application file.

    For example, using tarantoolctl:

    $ tarantoolctl eval my_app /etc/tarantool/instances.enabled/my_app.lua
    

Reloading a module in C

After you compiled a new version of a C module (*.so shared library), call box.schema.func.reload(‘module-name’) from your Lua script to reload the module.

Developing with an IDE

You can use IntelliJ IDEA as an IDE to develop and debug Lua applications for Tarantool.

  1. Download and install the IDE from the official web-site.

    JetBrains provides specialized editions for particular languages: IntelliJ IDEA (Java), PHPStorm (PHP), PyCharm (Python), RubyMine (Ruby), CLion (C/C++), WebStorm (Web) and others. So, download a version that suits your primary programming language.

    Tarantool integration is supported for all editions.

  2. Configure the IDE:

    1. Start IntelliJ IDEA.

    2. Click Configure button and select Plugins.

      _images/ide_1.png
    3. Click Browse repositories.

      _images/ide_2.png
    4. Install EmmyLua plugin.

      Note

      Please don’t be confused with Lua plugin, which is less powerful than EmmyLua.

      _images/ide_3.png
    5. Restart IntelliJ IDEA.

    6. Click Configure, select Project Defaults and then Run Configurations.

      _images/ide_4.png
    7. Find Lua Application in the sidebar at the left.

    8. In Program, type a path to an installed tarantool binary.

      By default, this is tarantool or /usr/bin/tarantool on most platforms.

      If you installed tarantool from sources to a custom directory, please specify the proper path here.

      _images/ide_5.png

      Now IntelliJ IDEA is ready to use with Tarantool.

  3. Create a new Lua project.

    _images/ide_6.png
  4. Add a new Lua file, for example init.lua.

    _images/ide_7.png
  5. Write your code, save the file.

  6. To run you application, click Run -> Run in the main menu and select your source file in the list.

    _images/ide_8.png

    Or click Run -> Debug to start debugging.

    Note

    To use Lua debugger, please upgrade Tarantool to version 1.7.5-29-gbb6170e4b or later.

    _images/ide_9.png

Cookbook recipes

Here are contributions of Lua programs for some frequent or tricky situations.

You can execute any of these programs by copying the code into a .lua file, and then entering chmod +x ./program-name.lua and ./program-name.lua on the terminal.

The first line is a “hashbang”:

#!/usr/bin/env tarantool

This runs Tarantool Lua application server, which should be on the execution path.

This section contains the following recipes:

Use freely.

hello_world.lua

The standard example of a simple program.

#!/usr/bin/env tarantool

print('Hello, World!')

console_start.lua

Use box.once() to initialize a database (creating spaces) if this is the first time the server has been run. Then use console.start() to start interactive mode.

#!/usr/bin/env tarantool

-- Configure database
box.cfg {
    listen = 3313
}

box.once("bootstrap", function()
    box.schema.space.create('tweedledum')
    box.space.tweedledum:create_index('primary',
        { type = 'TREE', parts = {1, 'unsigned'}})
end)

require('console').start()

fio_read.lua

Use the fio module to open, read, and close a file.

#!/usr/bin/env tarantool

local fio = require('fio')
local errno = require('errno')
local f = fio.open('/tmp/xxxx.txt', {'O_RDONLY' })
if not f then
    error("Failed to open file: "..errno.strerror())
end
local data = f:read(4096)
f:close()
print(data)

fio_write.lua

Use the fio module to open, write, and close a file.

#!/usr/bin/env tarantool

local fio = require('fio')
local errno = require('errno')
local f = fio.open('/tmp/xxxx.txt', {'O_CREAT', 'O_WRONLY', 'O_APPEND'},
    tonumber('0666', 8))
if not f then
    error("Failed to open file: "..errno.strerror())
end
f:write("Hello\n");
f:close()

ffi_printf.lua

Use the LuaJIT ffi library to call a C built-in function: printf(). (For help understanding ffi, see the FFI tutorial.)

#!/usr/bin/env tarantool

local ffi = require('ffi')
ffi.cdef[[
    int printf(const char *format, ...);
]]

ffi.C.printf("Hello, %s\n", os.getenv("USER"));

ffi_gettimeofday.lua

Use the LuaJIT ffi library to call a C function: gettimeofday(). This delivers time with millisecond precision, unlike the time function in Tarantool’s clock module.

#!/usr/bin/env tarantool

local ffi = require('ffi')
ffi.cdef[[
    typedef long time_t;
    typedef struct timeval {
    time_t tv_sec;
    time_t tv_usec;
} timeval;
    int gettimeofday(struct timeval *t, void *tzp);
]]

local timeval_buf = ffi.new("timeval")
local now = function()
    ffi.C.gettimeofday(timeval_buf, nil)
    return tonumber(timeval_buf.tv_sec * 1000 + (timeval_buf.tv_usec / 1000))
end

ffi_zlib.lua

Use the LuaJIT ffi library to call a C library function. (For help understanding ffi, see the FFI tutorial.)

#!/usr/bin/env tarantool

local ffi = require("ffi")
ffi.cdef[[
    unsigned long compressBound(unsigned long sourceLen);
    int compress2(uint8_t *dest, unsigned long *destLen,
    const uint8_t *source, unsigned long sourceLen, int level);
    int uncompress(uint8_t *dest, unsigned long *destLen,
    const uint8_t *source, unsigned long sourceLen);
]]
local zlib = ffi.load(ffi.os == "Windows" and "zlib1" or "z")

-- Lua wrapper for compress2()
local function compress(txt)
    local n = zlib.compressBound(#txt)
    local buf = ffi.new("uint8_t[?]", n)
    local buflen = ffi.new("unsigned long[1]", n)
    local res = zlib.compress2(buf, buflen, txt, #txt, 9)
    assert(res == 0)
    return ffi.string(buf, buflen[0])
end

-- Lua wrapper for uncompress
local function uncompress(comp, n)
    local buf = ffi.new("uint8_t[?]", n)
    local buflen = ffi.new("unsigned long[1]", n)
    local res = zlib.uncompress(buf, buflen, comp, #comp)
    assert(res == 0)
    return ffi.string(buf, buflen[0])
end

-- Simple test code.
local txt = string.rep("abcd", 1000)
print("Uncompressed size: ", #txt)
local c = compress(txt)
print("Compressed size: ", #c)
local txt2 = uncompress(c, #txt)
assert(txt2 == txt)

ffi_meta.lua

Use the LuaJIT ffi library to access a C object via a metamethod (a method which is defined with a metatable).

#!/usr/bin/env tarantool

local ffi = require("ffi")
ffi.cdef[[
typedef struct { double x, y; } point_t;
]]

local point
local mt = {
  __add = function(a, b) return point(a.x+b.x, a.y+b.y) end,
  __len = function(a) return math.sqrt(a.x*a.x + a.y*a.y) end,
  __index = {
    area = function(a) return a.x*a.x + a.y*a.y end,
  },
}
point = ffi.metatype("point_t", mt)

local a = point(3, 4)
print(a.x, a.y)  --> 3  4
print(#a)        --> 5
print(a:area())  --> 25
local b = a + point(0.5, 8)
print(#b)        --> 12.5

ffi_varbinary_insert.lua

Use the LuaJIT ffi library to insert a tuple which has a VARBINARY field. Lua does not have direct support for VARBINARY, so using C is one way to put in data which in MessagePack is stored as bin (MP_BIN). If the tuple is retrieved later, field “b” will have type = ‘cdata’.

#!/usr/bin/env tarantool

-- box.cfg{} should be here

s = box.schema.space.create('withdata')
s:format({{"b", "varbinary"}})
s:create_index('pk', {parts = {1, "varbinary"}})

buffer = require('buffer')
ffi = require('ffi')

function varbinary_insert(space, bytes)
    local tmpbuf = buffer.IBUF_SHARED
    tmpbuf:reset()
    local p = tmpbuf:alloc(3 + #bytes)
    p[0] = 0x91 -- MsgPack code for "array-1"
    p[1] = 0xC4 -- MsgPack code for "bin-8" so up to 256 bytes
    p[2] = #bytes
    for i, c in pairs(bytes) do p[i + 3 - 1] = c end
    ffi.cdef[[int box_insert(uint32_t space_id,
                             const char *tuple,
                             const char *tuple_end,
                             box_tuple_t **result);]]
    ffi.C.box_insert(space.id, tmpbuf.rpos, tmpbuf.wpos, nil)
end

varbinary_insert(s, {0xDE, 0xAD, 0xBE, 0xAF})
varbinary_insert(s, {0xFE, 0xED, 0xFA, 0xCE})

-- if successful, Tarantool enters the event loop now

count_array.lua

Use the ‘#’ operator to get the number of items in an array-like Lua table. This operation has O(log(N)) complexity.

#!/usr/bin/env tarantool

array = { 1, 2, 3}
print(#array)

count_array_with_nils.lua

Missing elements in arrays, which Lua treats as “nil”s, cause the simple “#” operator to deliver improper results. The “print(#t)” instruction will print “4”; the “print(counter)” instruction will print “3”; the “print(max)” instruction will print “10”. Other table functions, such as table.sort(), will also misbehave when “nils” are present.

#!/usr/bin/env tarantool

local t = {}
t[1] = 1
t[4] = 4
t[10] = 10
print(#t)
local counter = 0
for k,v in pairs(t) do counter = counter + 1 end
print(counter)
local max = 0
for k,v in pairs(t) do if k > max then max = k end end
print(max)

count_array_with_nulls.lua

Use explicit NULL values to avoid the problems caused by Lua’s nil == missing value behavior. Although json.NULL == nil is true, all the print instructions in this program will print the correct value: 10.

#!/usr/bin/env tarantool

local json = require('json')
local t = {}
t[1] = 1; t[2] = json.NULL; t[3]= json.NULL;
t[4] = 4; t[5] = json.NULL; t[6]= json.NULL;
t[6] = 4; t[7] = json.NULL; t[8]= json.NULL;
t[9] = json.NULL
t[10] = 10
print(#t)
local counter = 0
for k,v in pairs(t) do counter = counter + 1 end
print(counter)
local max = 0
for k,v in pairs(t) do if k > max then max = k end end
print(max)

count_map.lua

Get the number of elements in a map-like table.

#!/usr/bin/env tarantool

local map = { a = 10, b = 15, c = 20 }
local size = 0
for _ in pairs(map) do size = size + 1; end
print(size)

swap.lua

Use a Lua peculiarity to swap two variables without needing a third variable.

#!/usr/bin/env tarantool

local x = 1
local y = 2
x, y = y, x
print(x, y)

class.lua

Create a class, create a metatable for the class, create an instance of the class. Another illustration is at http://lua-users.org/wiki/LuaClassesWithMetatable.

#!/usr/bin/env tarantool

-- define class objects
local myclass_somemethod = function(self)
    print('test 1', self.data)
end

local myclass_someothermethod = function(self)
    print('test 2', self.data)
end

local myclass_tostring = function(self)
    return 'MyClass <'..self.data..'>'
end

local myclass_mt = {
    __tostring = myclass_tostring;
    __index = {
        somemethod = myclass_somemethod;
        someothermethod = myclass_someothermethod;
    }
}

-- create a new object of myclass
local object = setmetatable({ data = 'data'}, myclass_mt)
print(object:somemethod())
print(object.data)

garbage.lua

Activate the Lua garbage collector with the collectgarbage function.

#!/usr/bin/env tarantool

collectgarbage('collect')

fiber_producer_and_consumer.lua

Start one fiber for producer and one fiber for consumer. Use fiber.channel() to exchange data and synchronize. One can tweak the channel size (ch_size in the program code) to control the number of simultaneous tasks waiting for processing.

#!/usr/bin/env tarantool

local fiber = require('fiber')
local function consumer_loop(ch, i)
    -- initialize consumer synchronously or raise an error()
    fiber.sleep(0) -- allow fiber.create() to continue
    while true do
        local data = ch:get()
        if data == nil then
            break
        end
        print('consumed', i, data)
        fiber.sleep(math.random()) -- simulate some work
    end
end

local function producer_loop(ch, i)
    -- initialize consumer synchronously or raise an error()
    fiber.sleep(0) -- allow fiber.create() to continue
    while true do
        local data = math.random()
        ch:put(data)
        print('produced', i, data)
    end
end

local function start()
    local consumer_n = 5
    local producer_n = 3

    -- Create a channel
    local ch_size = math.max(consumer_n, producer_n)
    local ch = fiber.channel(ch_size)

    -- Start consumers
    for i=1, consumer_n,1 do
        fiber.create(consumer_loop, ch, i)
    end

    -- Start producers
    for i=1, producer_n,1 do
        fiber.create(producer_loop, ch, i)
    end
end

start()
print('started')

socket_tcpconnect.lua

Use socket.tcp_connect() to connect to a remote host via TCP. Display the connection details and the result of a GET request.

#!/usr/bin/env tarantool

local s = require('socket').tcp_connect('google.com', 80)
print(s:peer().host)
print(s:peer().family)
print(s:peer().type)
print(s:peer().protocol)
print(s:peer().port)
print(s:write("GET / HTTP/1.0\r\n\r\n"))
print(s:read('\r\n'))
print(s:read('\r\n'))

socket_tcp_echo.lua

Use socket.tcp_connect() to set up a simple TCP server, by creating a function that handles requests and echos them, and passing the function to socket.tcp_server(). This program has been used to test with 100,000 clients, with each client getting a separate fiber.

#!/usr/bin/env tarantool

local function handler(s, peer)
    s:write("Welcome to test server, " .. peer.host .."\n")
    while true do
        local line = s:read('\n')
        if line == nil then
            break -- error or eof
        end
        if not s:write("pong: "..line) then
            break -- error or eof
        end
    end
end

local server, addr = require('socket').tcp_server('localhost', 3311, handler)

getaddrinfo.lua

Use socket.getaddrinfo() to perform non-blocking DNS resolution, getting both the AF_INET6 and AF_INET information for ‘google.com’. This technique is not always necessary for tcp connections because socket.tcp_connect() performs socket.getaddrinfo under the hood, before trying to connect to the first available address.

#!/usr/bin/env tarantool

local s = require('socket').getaddrinfo('google.com', 'http', { type = 'SOCK_STREAM' })
print('host=',s[1].host)
print('family=',s[1].family)
print('type=',s[1].type)
print('protocol=',s[1].protocol)
print('port=',s[1].port)
print('host=',s[2].host)
print('family=',s[2].family)
print('type=',s[2].type)
print('protocol=',s[2].protocol)
print('port=',s[2].port)

socket_udp_echo.lua

Tarantool does not currently have a udp_server function, therefore socket_udp_echo.lua is more complicated than socket_tcp_echo.lua. It can be implemented with sockets and fibers.

#!/usr/bin/env tarantool

local socket = require('socket')
local errno = require('errno')
local fiber = require('fiber')

local function udp_server_loop(s, handler)
    fiber.name("udp_server")
    while true do
        -- try to read a datagram first
        local msg, peer = s:recvfrom()
        if msg == "" then
            -- socket was closed via s:close()
            break
        elseif msg ~= nil then
            -- got a new datagram
            handler(s, peer, msg)
        else
            if s:errno() == errno.EAGAIN or s:errno() == errno.EINTR then
                -- socket is not ready
                s:readable() -- yield, epoll will wake us when new data arrives
            else
                -- socket error
                local msg = s:error()
                s:close() -- save resources and don't wait GC
                error("Socket error: " .. msg)
            end
        end
    end
end

local function udp_server(host, port, handler)
    local s = socket('AF_INET', 'SOCK_DGRAM', 0)
    if not s then
        return nil -- check errno:strerror()
    end
    if not s:bind(host, port) then
        local e = s:errno() -- save errno
        s:close()
        errno(e) -- restore errno
        return nil -- check errno:strerror()
    end

    fiber.create(udp_server_loop, s, handler) -- start a new background fiber
    return s
end

A function for a client that connects to this server could look something like this …

local function handler(s, peer, msg)
    -- You don't have to wait until socket is ready to send UDP
    -- s:writable()
    s:sendto(peer.host, peer.port, "Pong: " .. msg)
end

local server = udp_server('127.0.0.1', 3548, handler)
if not server then
    error('Failed to bind: ' .. errno.strerror())
end

print('Started')

require('console').start()

http_get.lua

Use the http module to get data via HTTP.

#!/usr/bin/env tarantool

local http_client = require('http.client')
local json = require('json')
local r = http_client.get('http://api.openweathermap.org/data/2.5/weather?q=Oakland,us')
if r.status ~= 200 then
    print('Failed to get weather forecast ', r.reason)
    return
end
local data = json.decode(r.body)
print('Oakland wind speed: ', data.wind.speed)

http_send.lua

Use the http module to send data via HTTP.

#!/usr/bin/env tarantool

local http_client = require('http.client')
local json = require('json')
local data = json.encode({ Key = 'Value'})
local headers = { Token = 'xxxx', ['X-Secret-Value'] = 42 }
local r = http_client.post('http://localhost:8081', data, { headers = headers})
if r.status == 200 then
    print 'Success'
end

http_server.lua

Use the http rock (which must first be installed) to turn Tarantool into a web server.

#!/usr/bin/env tarantool

local function handler(self)
    return self:render{ json = { ['Your-IP-Is'] = self.peer.host } }
end

local server = require('http.server').new(nil, 8080) -- listen *:8080
server:route({ path = '/' }, handler)
server:start()
-- connect to localhost:8080 and see json

http_generate_html.lua

Use the http rock (which must first be installed) to generate HTML pages from templates. The http rock has a fairly simple template engine which allows execution of regular Lua code inside text blocks (like PHP). Therefore there is no need to learn new languages in order to write templates.

#!/usr/bin/env tarantool

local function handler(self)
local fruits = { 'Apple', 'Orange', 'Grapefruit', 'Banana'}
    return self:render{ fruits = fruits }
end

local server = require('http.server').new(nil, 8080) -- nil means '*'
server:route({ path = '/', file = 'index.html.lua' }, handler)
server:start()

An “HTML” file for this server, including Lua, could look like this (it would produce “1 Apple | 2 Orange | 3 Grapefruit | 4 Banana”).

<html>
<body>
    <table border="1">
        % for i,v in pairs(fruits) do
        <tr>
            <td><%= i %></td>
            <td><%= v %></td>
        </tr>
        % end
    </table>
</body>
</html>

Server administration

Tarantool is designed to have multiple running instances on the same host.

Here we show how to administer Tarantool instances using any of the following utilities:

  • systemd native utilities, or
  • tarantoolctl, a utility shipped and installed as part of Tarantool distribution.

Note

  • Unlike the rest of this manual, here we use system-wide paths.
  • Console examples here are for Fedora.

This chapter includes the following sections:

Instance configuration

For each Tarantool instance, you need two files:

  • [Optional] An application file with instance-specific logic. Put this file into the /usr/share/tarantool/ directory.

    For example, /usr/share/tarantool/my_app.lua (here we implement it as a Lua module that bootstraps the database and exports start() function for API calls):

    local function start()
        box.schema.space.create("somedata")
        box.space.somedata:create_index("primary")
        <...>
    end
    
    return {
      start = start;
    }
    
  • An instance file with instance-specific initialization logic and parameters. Put this file, or a symlink to it, into the instance directory (see instance_dir parameter in tarantoolctl configuration file).

    For example, /etc/tarantool/instances.enabled/my_app.lua (here we load my_app.lua module and make a call to start() function from that module):

    #!/usr/bin/env tarantool
    
    box.cfg {
        listen = 3301;
    }
    
    -- load my_app module and call start() function
    -- with some app options controlled by sysadmins
    local m = require('my_app').start({...})
    

Instance file

After this short introduction, you may wonder what an instance file is, what it is for, and how tarantoolctl uses it. After all, Tarantool is an application server, so why not start the application stored in /usr/share/tarantool directly?

A typical Tarantool application is not a script, but a daemon running in background mode and processing requests, usually sent to it over a TCP/IP socket. This daemon needs to be started automatically when the operating system starts, and managed with the operating system standard tools for service management – such as systemd or init.d. To serve this very purpose, we created instance files.

You can have more than one instance file. For example, a single application in /usr/share/tarantool can run in multiple instances, each of them having its own instance file. Or you can have multiple applications in /usr/share/tarantool – again, each of them having its own instance file.

An instance file is typically created by a system administrator. An application file is often provided by a developer, in a Lua rock or an rpm/deb package.

An instance file is designed to not differ in any way from a Lua application. It must, however, configure the database, i.e. contain a call to box.cfg{} somewhere in it, because it’s the only way to turn a Tarantool script into a background process, and tarantoolctl is a tool to manage background processes. Other than that, an instance file may contain arbitrary Lua code, and, in theory, even include the entire application business logic in it. We, however, do not recommend this, since it clutters the instance file and leads to unnecessary copy-paste when you need to run multiple instances of an application.

tarantoolctl configuration file

While instance files contain instance configuration, the tarantoolctl configuration file contains the configuration that tarantoolctl uses to override instance configuration. In other words, it contains system-wide configuration defaults. If tarantoolctl fails to find this file with the method described in section Starting/stopping an instance, it uses default settings.

Most of the parameters are similar to those used by box.cfg{}. Here are the default settings (possibly installed in /etc/default/tarantool or /etc/sysconfig/tarantool as part of Tarantool distribution – see OS-specific default paths in Notes for operating systems):

default_cfg = {
    pid_file  = "/var/run/tarantool",
    wal_dir   = "/var/lib/tarantool",
    memtx_dir = "/var/lib/tarantool",
    vinyl_dir = "/var/lib/tarantool",
    log       = "/var/log/tarantool",
    username  = "tarantool",
    language  = "Lua",
}
instance_dir = "/etc/tarantool/instances.enabled"

where:

  • pid_file
    Directory for the pid file and control-socket file; tarantoolctl will add “/instance_name” to the directory name.
  • wal_dir
    Directory for write-ahead .xlog files; tarantoolctl will add “/instance_name” to the directory name.
  • memtx_dir
    Directory for snapshot .snap files; tarantoolctl will add “/instance_name” to the directory name.
  • vinyl_dir
    Directory for vinyl files; tarantoolctl will add “/instance_name” to the directory name.
  • log
    The place where the application log will go; tarantoolctl will add “/instance_name.log” to the name.
  • username
    The user that runs the Tarantool instance. This is the operating-system user name rather than the Tarantool-client user name. Tarantool will change its effective user to this user after becoming a daemon.
  • language
    The interactive console language. Can be either Lua or SQL.
  • instance_dir
    The directory where all instance files for this host are stored. Put instance files in this directory, or create symbolic links.

    The default instance directory depends on Tarantool’s WITH_SYSVINIT build option: when ON, it is /etc/tarantool/instances.enabled, otherwise (OFF or not set) it is /etc/tarantool/instances.available. The latter case is typical for Tarantool builds for Linux distros with systemd.

    To check the build options, say tarantool --version.

As a full-featured example, you can take example.lua script that ships with Tarantool and defines all configuration options.

Starting/stopping an instance

While a Lua application is executed by Tarantool, an instance file is executed by tarantoolctl which is a Tarantool script.

Here is what tarantoolctl does when you issue the command:

$ tarantoolctl start <instance_name>
  1. Read and parse the command line arguments. The last argument, in our case, contains an instance name.

  2. Read and parse its own configuration file. This file contains tarantoolctl defaults, like the path to the directory where instances should be searched for.

    When tarantool is invoked by root, it looks for a configuration file in /etc/default/tarantool. When tarantool is invoked by a local (non-root) user, it looks for a configuration file first in the current directory ($PWD/.tarantoolctl), and then in the current user’s home directory ($HOME/.config/tarantool/tarantool). If no configuration file is found there, or in the /usr/local/etc/default/tarantool file, then tarantoolctl falls back to built-in defaults.

  3. Look up the instance file in the instance directory, for example /etc/tarantool/instances.enabled. To build the instance file path, tarantoolctl takes the instance name, prepends the instance directory and appends “.lua” extension to the instance file.

  4. Override box.cfg{} function to pre-process its parameters and ensure that instance paths are pointing to the paths defined in the tarantoolctl configuration file. For example, if the configuration file specifies that instance work directory must be in /var/tarantool, then the new implementation of box.cfg{} ensures that work_dir parameter in box.cfg{} is set to /var/tarantool/<instance_name>, regardless of what the path is set to in the instance file itself.

  5. Create a so-called “instance control file”. This is a Unix socket with Lua console attached to it. This file is used later by tarantoolctl to query the instance state, send commands to the instance and so on.

  6. Set the TARANTOOLCTL environment variable to ‘true’. This allows the user to know that the instance was started by tarantoolctl.

  7. Finally, use Lua dofile command to execute the instance file.

If you start an instance using systemd tools, like this (the instance name is my_app):

$ systemctl start tarantool@my_app
$ ps axuf|grep exampl[e]
taranto+  5350  1.3  0.3 1448872 7736 ?        Ssl  20:05   0:28 tarantool my_app.lua <running>

… this actually calls tarantoolctl like in case of tarantoolctl start my_app.

To check the instance file for syntax errors prior to starting my_app instance, say:

$ tarantoolctl check my_app

To enable my_app instance for auto-load during system startup, say:

$ systemctl enable tarantool@my_app

To stop a running my_app instance, say:

$ tarantoolctl stop my_app
$ # - OR -
$ systemctl stop tarantool@my_app

To restart (i.e. stop and start) a running my_app instance, say:

$ tarantoolctl restart my_app
$ # - OR -
$ systemctl restart tarantool@my_app

Running Tarantool locally

Sometimes you may need to run a Tarantool instance locally, e.g. for test purposes. Let’s configure a local instance, then start and monitor it with tarantoolctl.

First, we create a sandbox directory on the user’s path:

$ mkdir ~/tarantool_test

… and set default tarantoolctl configuration in $HOME/.config/tarantool/tarantool. Let the file contents be:

default_cfg = {
    pid_file  = "/home/user/tarantool_test/my_app.pid",
    wal_dir   = "/home/user/tarantool_test",
    snap_dir  = "/home/user/tarantool_test",
    vinyl_dir = "/home/user/tarantool_test",
    log       = "/home/user/tarantool_test/log",
}
instance_dir = "/home/user/tarantool_test"

Note

  • Specify a full path to the user’s home directory instead of “~/”.
  • Omit username parameter.