Import Formats

TubeDB contains readers for several timeseries data source file formats:

type description text/binary
csv CSV - generic comma-separated values text
asc ASC - specific logger format text
toa5 TOA5 - Campbell Scientific data loggers text
udbf_be UDBF - Gantner Universal-Data-Bin-File binary
tsa TSA - TubeDB timeseries archive binary

CSV - Comma Separated Values

Generic CSV text transfer format of timeseries data.

This format may be used as intermediate representation of station timeseries data. First the logger specific format (e.g. specific CSV-Format) may be transformed by external tools (e.g. R-script) to this generic CSV and then imported into TubeDB. Files created by TubeDB API method query_csv are in generic csv format.

typical file pattern: *.csv

reader class: tsdb.run.ImportGenericCSV

format specification

filename:

Station name of timeseries is extracted from filename (up to first underscore).

format: STATIONNAME_SOME_TEXT.csv

station name from file name extraction examples:

  • HEG01.csv -> HEG01
  • MyPlot_2010.csv -> MyPlot
  • 123_old.csv -> 123

First line of file content is header. Columns start by datetime followed by sensor name columns.

format: datetime,SENSOR1,SENSOR2,SENSOR2,...

examples of header:

  • datetime,Ta_200,rH_200,p_QNH,WV,WD,P_RT_NRT_01,Trad,SWDR_300,SWUR_300,LWDR_300,LWUR_300,Rn_300
  • datetime,Ta_200,rH_200

data-rows:

format: DATETIME,VALUE1,VALUE2,VALUE3

Datetime is in format yyyy-mm-ddThh:MM (ISO 8601) e.g. 2014-10-12T09:50

complete example:

filename: aet1_2014__2015_11_05.csv -> plot: aet1

datetime,Ta_200,rH_200
2014-01-01T00:10,-9,86.1
2014-01-01T00:20,-9.1,86
2014-01-01T00:30,-9.1,86

ASC

typical file pattern: *.asc

reader class: tsdb.loader.ki.AscParser


TOA5 - Campbell Scientific data loggers

typical file pattern: *.dat

File-format of Campbell Scientific data loggers. (see format specification, Appendix B)


UDBF - Universal-Data-Bin-File

Binary file-format of Gantner loggers. Files of version 1.07 can be read.

typical file pattern: *.dat

reader class: tsdb.loader.be.UniversalDataBinFile


TSA - Time-Series-Archiv

TubeDB binary timeseries archiv format.

This format can be used to write (all or parts of) TubeDB to File for archive purposes and later read it into another TubeDB instance.

This format may be an alternative to CSV. Advantages over CSV are more compact representation (smaller file size, only one file instead of one file per plot), much faster write/read and better integration of metadata (plot names, sensor names).

typical file pattern: *.tsa

reader class: tsdb.TimeSeriesArchivReader

writer class: tsdb.TimeSeriesArchivWriter

format specification

data type definitions:

  • (byte) one byte
  • (int) 32 bit integer (four bytes big-endian)
  • (packed_int) packed integer number (one to five bytes): Sequence of bytes. If highest bit of current byte is set then next byte is part of this sequence. Lower 7 bits of byte are used to code value: bn denotes the 7 value bits of byte n: interger value == b1 | (b2<<7) | (b3<<14)
  • (text) sequence of characters coded by count of characters followed by characters as bytes: (packed_int) (byte) (byte) (byte)
  • marker is coded as (text)
  • (float) IEEE 754 single-precision binary floating-point format 32 bit (four bytes big-endian)

marker definitions:

  • TOC_HEAD = “Time_Series_Archiv_v_1_0_0
  • TOC_START = “TimeSeriesArchiv:start
  • TOC_END = “TimeSeriesArchiv:end
  • TOC_ENTRY = “Entry
  • TOC_TYPE_TIMESTAMPSERIES = “TimestampSeries
  • TOC_TYPE_DATAENTRYARRAY = “DataEntryArray

File content starts (at position 0) with TOC_HEAD then TOC_START , contains one ore more entries and ends with TOC_END.

An entry starts with TOC_ENTRY and then entry type TOC_TYPE_TIMESTAMPSERIES or TOC_TYPE_DATAENTRYARRAY.

entry TOC_TYPE_DATAENTRYARRAY:

This entry stores one time series of one station and one sensor.

marker definitions:

  • TOC_START = “DataEntryArray:start
  • TOC_END = “DataEntryArray:end

content:

(text)[station name]

(text)[sensor name]

TOC_START

(packed_int)[count of entries]

(int)[timestamp]

(float)[value]

(int)[timestamp]

(float)[value]

(int)[timestamp]

(float)[value]

TOC_END

entry TOC_TYPE_TIMESTAMPSERIES:

This entry stores one time series of one station and several sensors.

marker definitions:

  • TOC_START = “TimestampSeries:start”;
  • TOC_END = “TimestampSeries:end”;

content:

TOC_START

(text)[station name]

(packed_int)[count of sensors]

(text)[sensor name 1]

(text)[sensor name 2]

(packed_int)[count of entries]

(int)[timestamp]

(float)[value 1]

(float)[value 2]

(int)[timestamp]

(float)[value 1]

(float)[value 2]

TOC_END

Updated: