Blog

ZDB as historical trade database – Part 2

Posted by Mattias on 2012-10-28

In a series of posts we will examine how you can use ZDB to build a historical trade database, and how to perform queries and aggregations in it. In the previous part we discussed how to create the database, and in this part we will discuss how to write data to it.

There are three different ways to write to a database, all having different advantages and use-patterns. The three different ways is a series of getTable-updates-putTable, just using putTable and finally using a TableAppender as returned by directAppend.

Using getTable-updates-putTable is the most generic way to update a table, since it allows you to perform updates and deletes as well as appending data. In this series we are only appending data, not updating or deleting, so using one of the two other methods will be more efficient since it will avoid unnecessary read operations.

The choice between putTable and directAppend is fairly straightforward, if all data is available one should use putTable and if data is available a bit at a time one should use directAppend. In the context of market data, where updates get available one row at a time, the most natural way is to use directAppend and write every update as they arrive. Unfortunately, the currently available implementations of ColumnLoader perform an open-write-close cycle for each column on every append so it is always a good idea to collect the update in batches.

Now that we know how to write a table, it is time to actually creating the table we will be writing. Again there are several possible ways to create a table, or rather create the columns containing the actual data that will then be combined to form a table. The simplest way to use the implementations found in the package com.zolltov.zdb.memory.

There are several ways to create a memory column containing data. Data can be copied from a java.util.Collection, an array or another column of the same type. One can also create a column of appropriate size and set all the values in it. All methods have roughly the same complexity so you should select one that best suits the data you have available.

Putting all of this together gives a code similar to this

private static void writeData(List beans) { DateColumn date = new MemoryDateColumn(beans.size()); StringColumn symbol = new MemoryStringColumn(beans.size()); StringColumn market = new MemoryStringColumn(beans.size()); StringColumn currency = new MemoryStringColumn(beans.size()); StringColumn isin = new MemoryStringColumn(beans.size()); StringColumn name = new MemoryStringColumn(beans.size()); DoubleColumn price = new MemoryDoubleColumn(beans.size()); DoubleColumn volume = new MemoryDoubleColumn(beans.size()); BoolColumn onexchange = new MemoryBoolColumn(beans.size()); TimeColumn time = new MemoryTimeColumn(beans.size()); for(int i=0; i < beans.size(); i++) { TradeBean tb = beans.get(i); date.setInt(i, tb.getDate().toInteger()); symbol.setString(i, tb.getSymbol()); market.setString(i, tb.getMarket()); currency.setString(i, tb.getCurrency()); isin.setString(i, tb.getIsin()); name.setString(i, tb.getName()); price.setDouble(i, tb.getPrice()); volume.setDouble(i, tb.getVolume()); onexchange.setBoolean(i, tb.isOnExchange()); time.setInt(i, tb.getTime().toInteger()); } MemoryTable table = new MemoryTable(); table.addColumn(Schema.date, date); table.addColumn(Schema.symbol, symbol); table.addColumn(Schema.market, market); table.addColumn(Schema.currency, currency); table.addColumn(Schema.isin, isin); table.addColumn(Schema.name, name); table.addColumn(Schema.price, price); table.addColumn(Schema.volume, volume); table.addColumn(Schema.onexchange, onexchange); table.addColumn(Schema.time, time); db.putTable("trade", table, new Date(), new Date()); }

ZDB as historical trade database – Part 1

Posted by Mattias on 2012-09-02

In a series of posts we will examine how you can use ZDB to build a historical trade database, and how to perform queries and aggregations in it; in this part we are going to discuss how to create the database. In the examples we will use equity market data as reported by trading venues, such as stock exchanges, but it can fairly easily be adapted to other types of instruments.

First off is the instrument data, and we will be using the following fields

Symbol

This is the unique symbol, or ticker, that your market data vendor uses to identify a listing. The content and format of this varies from vendor to vendor, so we will just save it as a string.

ISIN

This is the ISIN, or International Securities Identification Number, that can be used to identify the traded security. This will also be saved as a string since it includes a country code.

MIC

The Market Identifier Code is also saved as a string.

Currency

The currency the instrument is traded in.

Name

This is a string that contains the name of the company.

In the root of the hierarchy we have the company name, for example 'AstraZeneca PLC'. Each company has one or more ISIN codes. Each ISIN is in turn traded on one or more markets, and on each market you can trade it in one or more currencies.

For the actual trades, we will be using the following fields

Date

Date it was traded. To make things easier, we are going to assume this is always todays date on the input feed.

Time

Time the trade was done. Again we are going to make things simple for us, and assume that there are no late trades. We are also going to ignore timezones and daylight saving time.

Quantity

The number of shares traded. This will be stored as a double.

Price

The price per share the trade took place at. This will also be stored as a double.

OnExchange

This boolean will be set to true if this trade was crossed on the exchange, or false if it was done off exchange for various reasons.

Also, we are going to ignore trade corrections or cancels to keep things simple and tidy.

Now that we have defined all data that will be stored in the database, it is time for implementation. At the center of any file database is the FileDataBase class that manages all tables and also read and write data to file when needed. At its aid is an implementation of the interface ColumnLoader that performs the actual reading and writing of column files. In this case we are going to use the already existing class SimpleColumnLoader that is suitable for most needs.

The first step is to create the database directory structure. The quickly and easiest way is to instantiate an SimpleColumnLoader with the, preferable empty, root directory for the database, and then call the method createDataBase(). This will create all the necessary files and directories for an empty database that we can use later to store all our data in.

Then we use the previously created SimpleColumnLoader to create a FileDataBase object that will be used to create the table. It is important that we call createDataBase() before we pass it along to FileDataBase, since the later will fetch a list of tables and available dates among other things when it is constructed.

The table is created by a call to createPartitionedTable that takes a name for the table and a list of columns it will contain. The class ColumnInfo is used when creating a table to provide a name and type for each column. The types we will be using in this series is boolean, date, double, string and time but there are several other types available as well.

Putting this together give the following code snippet for creating the needed database in the directory given by the argument root

public void createDataBase(File root) { SimpleColumnLoader loader = new SimpleColumnLoader(root); loader.createDataBase(); FileDataBase db = new FileDataBase(loader); db.createPartitionedTable("trade", new ColumnInfo(Type.DatePartition, "date"), new ColumnInfo(Type.String, "symbol"), new ColumnInfo(Type.String, "market"), new ColumnInfo(Type.String, "currency"), new ColumnInfo(Type.String, "isin"), new ColumnInfo(Type.String, "name"), new ColumnInfo(Type.Double, "price"), new ColumnInfo(Type.Double, "volume"), new ColumnInfo(Type.Bool, "onExchange"), new ColumnInfo(Type.Time, "time")); }

ZDB 1.0 is completely implemented

Posted by Mattias on 2012-08-31

As of todays release, the ZDB 1.0 API is completely implemented. Even if the previous version have been well tested and very usable, you can now download and explore the entire API knowing that you will not get an exception, unless there is an actual error of course.

As a consequence of this, the release schedule will be changed from once every 5-10 days to whenever a bug-fix or performance enhancement is available. While this in theory could mean a new release every day, we do not expect it to happen very often. As there are no known bugs it is impossible to predict when and how many more releases there will be.

And of course there is already an 1.1 version of the API under development, with some new features.

Another ZDB 1.0 release is out

Posted by Mattias on 2012-08-11

ZDB 1.0 20120811 is available for download containing the following updates:

- Implemented methods
    MemoryTimeColumn(Collection

The methods that are still unimplemented are most aggregations, some predicate and sporadic methods in multiple and subselect columns.

Latest ZDB 1.0 release is out

Posted by Mattias on 2012-08-04

Another release of ZDB 1.0 is out today. The changes since last release are

- Implemented methods
    Object Addition.aggregate(Table);
    Aggregation Aggregation.average(Aggregation);
    Table AggregationEngineBy.apply(Table);
    DoubleColumn MultipleDoubleColumn.pack();
    DateIterator SubDateColumn.dateIterator();
    DateTimeIterator SubDateTimeColum.dateTimeIterator();
    LongIterator SubDateTimeColumn.longIterator();
    LongIterator SubLongColumn.longIterator();
    SubTimeColumn(TimeColumn, BitSet);
    IntIterator SubTimeColumn.intIterator();
    int SubTimeColumn.length();
    TimeIterator SubTimeColumn.timeIterator();