Skip to content

Wide-column design

Like Bigtable, Smoltable is a wide-column [1] database. It could be described as a sparse, persistent multi-dimensional sorted map. Every table is sorted by the row key, which is a unique identifier for each row. There are no secondary indexes. This may seem limiting but surprisingly many access patterns can be modelled this way.

The row key is not necessarily the same as the primary key in a relational database: A row in general is not a fixed structure like in a relational database. Each row may contain arbitrarily many columns, grouped into column families. Column families must be defined upfront, but column qualifiers (column names) can be defined dynamically, and do not need to adhere to a fixed schema. Because the table is sparse, unused columns do not cost any disk space. Retrieving specific columns does not require retrieving the entire row.

Wide column table

Each row’s cells are sorted by the column key (family + qualifier), and a timestamp: this results in a multi-dimensional key:

row key + col family + col qualifier + ts

which maps to some value, the cell value. The cell value, unlike in Bigtable, can be a certain type:

  • string (UTF-8 encoded string)
  • boolean (like Byte, but is unmarshalled as boolean)
  • byte (unsigned integer, 1 byte)
  • i32 (signed integer, 4 bytes)
  • i64 (signed integer, 8 bytes)
  • f32 (floating point, 4 bytes)
  • f64 (floating point, 8 bytes)

The timestamp allows storing multiple versions of the same cell.

Columns as data

Consider this scenario: We want to store a fleet of aircrafts, each having a list of flights and some metadata. One possible naive implementation would be:

row keyvalue:
plane#TF-FIR#flight#FI3182024-01-25
plane#TF-FIR#flight#FI3192024-01-25
plane#TF-FIR#meta#miles51000000
plane#TF-FIR#meta#modelBoeing 757-256
plane#TF-FIR#meta#operatorIcelandair
plane#D-AIQN#flight#EW70332019-10-31
plane#D-AIQN#flight#EW70362019-10-31
plane#D-AIQN#meta#miles52142142
plane#D-AIQN#meta#modelAirbus A320-211
plane#D-AIQN#meta#operatorGermanwings

For every attribute, a composite row key is used, and maps to a very generic value: column family, using the empty column as column qualifier. To get flights, a prefix scan over plane#TF-FIR#flight# could be used. To get metadata, plane#TF-FIR#meta#.

Naturally, this is an awkward, suboptimal schema. If you find yourself working exclusively with row keys and an vertically growing table, take a step back for some minutes and try to refactor your schema.

As one plane is a single entity, that should ideally be the row key. We can use column families and columns to restructure our data:

row keyflight:FI318flight:FI319flight:EW7033flight:EW7036meta:milesmeta:modelmeta:operator
plane#TF-FIR2024-01-252024-01-2551000000Boeing 757-256Icelandair
plane#D-AIQN2019-10-312019-10-3152142142Airbus A320-211Germanwings

We can easily store all data in a single row, using two column families, flight and meta. Meta contains arbitrary columns of different kind of metadata, while each column inside flights is a flight number, with the cell value being the flight data. Remember, unused columns do not account into space usage. To retrieve data, we can a specific column and use a column filter.

Both layout use 10 cells, however the column-oriented one uses much simpler row keys, which need less disk space, are more readable and can be retrieved by row key instead of resorting to prefix queries.

Now, we have been tasked to extend flights to store their start and destination airport. We could keep the flight columns and store structured data (e.g. JSON) into each cell, or refactor the table to store each flight as a separate row:

row keyloc:startloc:destmeta:datemeta:milesmeta:modelmeta:operator
plane#TF-FIR51000000Boeing 757-256Icelandair
plane#D-AIQN52142142Airbus A320-211Germanwings
flight#TF-FIR#FI318KEFOSL2024-01-25
flight#TF-FIR#FI319OSLKEF2024-01-25
flight#D-AIQN#EW7033CGNHAM2019-10-31
flight#D-AIQN#EW7036HAMCGN2019-10-31

Now, using a scan with prefix flight#TF-FIR# we can get all flights of TF-FIR.

Row key design

As mentioned before, there are no secondary indices and no query planner. The ordering of the table is defined by the row key, which in return will determine how efficient scans over specific rows and columns are.

When using multiple components in a row key, sorting them by cardinality generally optimizes locality and allows better querying. Compare:

<CONTINENT>#<COUNTRY>#<CITY> and <CITY>#<COUNTRY>#<CONTINENT>.

The second row key will always perform a full table scan if we search only for a specific continent or country, because we cannot use a prefix.

Customizing sorting behaviour

By using a multi-component date or a fixed-length integer, data can be sorted.

Using the above example, by adding a YYYY-MM-DD component to the row key, we can sort flights per plane in ascending order:

row keyloc:startloc:destmeta:milesmeta:modelmeta:operator
plane#TF-FIR51000000Boeing 757-256Icelandair
plane#D-AIQN52142142Airbus A320-211Germanwings
flight#TF-FIR#2024-01-25#FI318KEFOSL
flight#TF-FIR#2024-01-25#FI319OSLKEF
flight#D-AIQN#2019-10-31#EW7033CGNHAM
flight#D-AIQN#2019-10-31#EW7036HAMCGN

Now, using a scan with prefix flight#TF-FIR#2024-01- we can get all flights of TF-FIR in January 2024.

Real-life example: Webtable

The webtable, the heart of the Google search engine, is stored in Bigtable. It stores web pages and references (anchors) between said pages.

  • language contains a single column containing the language code (e.g. DE).
  • anchor are links that point to the given website (backlinks), where each column qualifier is an anchor’s href attribute. The cell value is the anchor text (el.textContent in JavaScript).
  • contents contains a single column containing the raw HTML document. By moving it into a separate locality group the rather large documents will not slow down frequent queries accessing the other columns. By using cell versions, the table can store a history of the web page.

The row key is the reversed domain key. This maximizes locality of pages under the same (sub-)domain.

Webtable

By listing the anchors of a row, we can count how many websites link to this specific page (for example to calculate the PageRank).

For simplicity the examples do not show full row keys. Domains alone are not enough to store the entire internet, so a real row key would also contain the pathname, like:

  • com.github/
  • com.github/about
  • com.github/pricing

Summary

This is a summary based on parts of Bigtable’s documentation that apply to Smoltable. Read more here, but not everything applies to Smoltable.

Data schema

  • Each table has only one index, the row key
  • Rows are sorted lexicographically by row key
  • Column families are stored in any specific order (this differs from Bigtable)
  • Columns are grouped by column family and sorted in lexicographic order within the column family

Column families

  • Put related columns in the same column family
  • Choose short names for your column families
  • Put columns that have different data retention needs in different column families

Columns

  • Treat column qualifiers as data, if possible
  • Give column qualifiers short but meaningful names

Rows

  • Keep all information for an entity in a single row
  • Design your row key based on the queries you will use to retrieve the data (see row key design)
    • Store related entities in adjacent rows
    • Store multiple delimited values in each row key
    • Keep your row keys short
  • Pad integers with leading zeroes: important for timestamps where range-based queries are used
  • Do not use sequential numeric IDs as row key

[1] The origin of the term “wide-column” is not exactly easy to narrow down. It was mentioned in “Scalable SQL and NoSQL Data Stores” by Rick Cattell in 2010. If you know an earlier mention of the term, please let us now!