Chapter 04: General Ledger as a database

Status: WOW (early adopter pass)

This chapter treats a general ledger like what it really is for developers:

  • an append-only fact table (postings)

  • a few dimension tables (accounts, segments)

  • a set of derived views (balances) built with grouping + aggregation

  • a set of constraints that keep everything true (double-entry invariants)

If you can think in SQL, you can think in ledgers.

What you will build

You will run a deterministic chapter runner that writes a small “mini-database” worth of artifacts under outputs/ledgerloom/ch04:

  • postings.csv — the fact table (one row per posting)

  • balances_by_account.csv — a materialized view (group by account)

  • balances_by_period.csv — group by period + root + account

  • balances_by_department.csv — group by segment (department) + root

  • running_balance_by_posting.csv — window function: running balance per account

  • invariants.json — constraint checks (double-entry + basic DB rules)

  • gl_schema.json — table schemas + index suggestions

  • sql_mental_model.md — “developer translation” (pandas ↔ SQL)

  • manifest.json — SHA-256 file digests for reproducibility

  • lineage.mmd — a small DAG of how artifacts are derived

  • run_meta.json — seed + environment info (reproducibility)

Why this is “database thinking”

A database mindset is a superpower for accounting systems:

  • Fact table: a posting is the atomic event you query.

  • Constraints: each entry must balance (sum debits == sum credits).

  • Derived views: everything you report is computed from postings.

  • Immutability: treat postings as append-only; rebuild views as needed.

The runner in this chapter builds these ideas explicitly and writes out both the raw table and the views.

SQL mental model

The artifacts are produced using grouping/aggregation patterns that map directly to SQL.

Example: balances by account

SELECT
  account,
  SUM(debit_cents)  AS debit_cents,
  SUM(credit_cents) AS credit_cents
FROM postings
GROUP BY account
ORDER BY account;

In the runner we do the same operation with pandas groupby, then write the view to balances_by_account.csv.

Example: balances by period + root + account

SELECT
  period,
  root,
  account,
  SUM(signed_delta_cents) AS balance_cents
FROM postings
GROUP BY period, root, account
ORDER BY period, root, account;

Example: running balance (window function)

SELECT
  *,
  SUM(signed_delta_cents) OVER (
    PARTITION BY account
    ORDER BY date, entry_id, line_no
  ) AS running_balance_cents
FROM postings
ORDER BY date, entry_id, line_no;

This view is written to running_balance_by_posting.csv.

How to run

From the project root:

# Run the chapter runner (deterministic seed)
make ll-ch04

Or call the module directly:

python -m ledgerloom.chapters.ch04_general_ledger_database \
  --outdir outputs/ledgerloom \
  --seed 123

You should see:

  • Wrote LedgerLoom Chapter 04 artifacts -> outputs/ledgerloom/ch04

Invariant checks you can trust

This chapter generates invariants.json that validates:

  • Double-entry: every entry balances (debits == credits)

  • Ledger-wide balance: total (debit - credit) is exactly zero

  • Primary key uniqueness: posting_id is unique

  • Allowed roots: account roots are one of Assets/Liabilities/Equity/Revenue/Expenses

The checks are designed to fail loudly if anything drifts.

Contributor notes

  • All CSV outputs are forced to LF (n) line endings to keep golden-file tests stable across Windows/macOS/Linux.

  • Golden tests compare bytes (not “parsed” tables) on purpose — the goal is reproducible artifacts.

Next

Chapter 05 will start turning these “database views” into statements with more business realism (period closing, adjusting entries, and statement layout).