This is the Golden Age of the Thick Database

Felix Holmgren
Date : August 25, 2020
Categories : Background
Tags : Postgres

Not only is Thick Database the best architecture for most web apps, but recent improvements in tooling mean there has never been a better time to get started with a thick db stack.

Crucial innovations have been made relatively recently at several layers of the web stack: in the database itself, in the backend application layer, and in the web client layer. Let’s look at each of these to find out what makes this the Golden Age of thick databases.

CREATE OR REPLACE FUNCTION getrecommendations (id integer, orderids int[], orderedproducts int[], productids int[], productnames text[])
RETURNS json
AS $$
    import pandas as pd
    o = {'order_id': orderids, 'product_id': orderedproducts}
    orders = pd.DataFrame(data=o)
    orders_for_product = orders[orders.product_id == id].order_id.unique();

    relevant_orders = orders[orders.order_id.isin(orders_for_product)]

    accompanying_products_by_order = relevant_orders[relevant_orders.product_id != id]
    num_instance_by_accompanying_product = accompanying_products_by_order.groupby("product_id")["product_id"].count().reset_index(name="instances")

    num_orders_for_product = orders_for_product.size
    product_instances = pd.DataFrame(num_instance_by_accompanying_product)
    product_instances["frequency"] = product_instances["instances"]/num_orders_for_product

    recommended_products = pd.DataFrame(product_instances.sort_values("frequency", ascending=False).head(3))

    p = {'product_id': productids, 'name': productnames}
    products = pd.DataFrame(data=p)

    recommended_products = pd.merge(recommended_products, products, on="product_id")

    return recommended_products.to_json(orient="table")
$$ LANGUAGE 'plpython3u';

Database

When I say “db”, I mean Postgres.

RLS json generation

delegating work to db is in general not bad for perf
very hard to avoid extra round trips to app layer Replication - read replicas
also, cloud hosts, although trickier issue

Not that new

pl/Python New SQL standard Rich datatypes

Fan of graph, but…

Backend application layer

Replaces your entire MVC framework and all that code you’ve painstakingly put together for the sole purpose of requesting data from the database, deserializing it, maybe throwing some of it away and restructuring other parts, then serializing it again into JSON or something else. Does this by delegating down to the db, and up to the client. (Maybe in the cloud era, we should imagine the backend and data as being above the client?) Delegating more work to the db might sound like a bad idea since the db often ends up being the bottleneck in any moderately complex application. But keep in mind:

All the work you are now not doing
- de-serializing SQL queries
- multiple round-trips - this is a massive cost
- cache flushes
Horizontal read scalability, not harder for db than app layer
For write transactions, the db is in any case the bottleneck, nothing you can do to change that except moving to eventually consistent storage, or cutting-edge systems like FoundationDb (and CockroachDb?).

Worth dwelling on the issue of transactions a bit more: this is the crucial characteristic of OLTP databases, and the foundation upon which all of your business logic and data integrity is built. Wrapping this in weaker abstractions where people can easily break transactional guarantees is a terrible and unnessesary step backwards. For the simplest, but by no means the only, example, consider the simple case of unique emails. Can’t do it w/o relying on db:s capabilities.

Thick db descriptions from a decade or so back recommend one view per app page - strict coupling between db/back end and fe. REST definitely better but also restricted. If FE and BE are different teams, you get a massive bottleneck. Even with full-stack team, you constantly have to make choices about what to include in various payloads. Either lots of ad hoc, hard-to-predict endpoints, or very regular API but hard to avoid multiple round-trips, returning too much data etc.

GraphQL (or equivalent) strikes the right balance between restricted data access and flexibility for clients. It acts as a strict contract between server and client, with strong types, and at the same time gives client code a lot of freedom to evolve queries that fits the UI as it goes through iterations.

SPA

But you can still serve server-side template-based HTML if you prefer.

Enormous amt of duplication in modern web/api systems. Dup of effort, of logic, of trying to outsmart some of the smartest systems on the planet, name rdbmses. What we need are systems built on a heavy foundation of thick db.

You might have heard of thick clients -blabla… My favorite book title from the heyday of java swing: filthy rich clients … I’m here to tell you the time has come for filthy rich databases. Not rich as in private-jet-plane, but rich as in generous servings of fried food, full of good carbs and fat.

Why now?

Postgres GraphQL

A lot of what goes on in a typical rails/node/django app is a poor, slow, confused reinvention of the capabilit