Chapter 04: General Ledger as a database

Status: WOW (early adopter pass)

This chapter treats a general ledger like what it really is for developers:

an append-only fact table (postings)
a few dimension tables (accounts, segments)
a set of derived views (balances) built with grouping + aggregation
a set of constraints that keep everything true (double-entry invariants)

If you can think in SQL, you can think in ledgers.

What you will build

You will run a deterministic chapter runner that writes a small “mini-database” worth of artifacts under outputs/ledgerloom/ch04:

postings.csv — the fact table (one row per posting)
balances_by_account.csv — a materialized view (group by account)
balances_by_period.csv — group by period + root + account
balances_by_department.csv — group by segment (department) + root
running_balance_by_posting.csv — window function: running balance per account
invariants.json — constraint checks (double-entry + basic DB rules)
gl_schema.json — table schemas + index suggestions
sql_mental_model.md — “developer translation” (pandas ↔ SQL)
manifest.json — SHA-256 file digests for reproducibility
lineage.mmd — a small DAG of how artifacts are derived
run_meta.json — seed + environment info (reproducibility)

Why this is “database thinking”

A database mindset is a superpower for accounting systems:

Fact table: a posting is the atomic event you query.
Constraints: each entry must balance (sum debits == sum credits).
Derived views: everything you report is computed from postings.
Immutability: treat postings as append-only; rebuild views as needed.

The runner in this chapter builds these ideas explicitly and writes out both the raw table and the views.

SQL mental model

The artifacts are produced using grouping/aggregation patterns that map directly to SQL.

Example: balances by account

SELECT
  account,
  SUM(debit_cents)  AS debit_cents,
  SUM(credit_cents) AS credit_cents
FROM postings
GROUP BY account
ORDER BY account;

In the runner we do the same operation with pandas groupby, then write the view to balances_by_account.csv.

Example: balances by period + root + account

SELECT
  period,
  root,
  account,
  SUM(signed_delta_cents) AS balance_cents
FROM postings
GROUP BY period, root, account
ORDER BY period, root, account;

Example: running balance (window function)

SELECT
  *,
  SUM(signed_delta_cents) OVER (
    PARTITION BY account
    ORDER BY date, entry_id, line_no
  ) AS running_balance_cents
FROM postings
ORDER BY date, entry_id, line_no;

This view is written to running_balance_by_posting.csv.

How to run

From the project root:

# Run the chapter runner (deterministic seed)
make ll-ch04

Or call the module directly:

python -m ledgerloom.chapters.ch04_general_ledger_database \
  --outdir outputs/ledgerloom \
  --seed 123

You should see:

Wrote LedgerLoom Chapter 04 artifacts -> outputs/ledgerloom/ch04

Invariant checks you can trust

This chapter generates invariants.json that validates:

Double-entry: every entry balances (debits == credits)
Ledger-wide balance: total (debit - credit) is exactly zero
Primary key uniqueness: posting_id is unique
Allowed roots: account roots are one of Assets/Liabilities/Equity/Revenue/Expenses

The checks are designed to fail loudly if anything drifts.

Contributor notes

All CSV outputs are forced to LF (n) line endings to keep golden-file tests stable across Windows/macOS/Linux.
Golden tests compare bytes (not “parsed” tables) on purpose — the goal is reproducible artifacts.

Next

Chapter 05 will start turning these “database views” into statements with more business realism (period closing, adjusting entries, and statement layout).