Chapter 04: General Ledger as a database
========================================

Status: **WOW (early adopter pass)**

This chapter treats a *general ledger* like what it really is for developers:

- an **append-only fact table** (postings)
- a few **dimension tables** (accounts, segments)
- a set of **derived views** (balances) built with grouping + aggregation
- a set of **constraints** that keep everything true (double-entry invariants)

If you can think in SQL, you can think in ledgers.

What you will build
-------------------

You will run a deterministic chapter runner that writes a small “mini-database”
worth of artifacts under ``outputs/ledgerloom/ch04``:

- ``postings.csv`` — the **fact table** (one row per posting)
- ``balances_by_account.csv`` — a **materialized view** (group by account)
- ``balances_by_period.csv`` — group by period + root + account
- ``balances_by_department.csv`` — group by segment (department) + root
- ``running_balance_by_posting.csv`` — window function: running balance per account
- ``invariants.json`` — constraint checks (double-entry + basic DB rules)
- ``gl_schema.json`` — table schemas + index suggestions
- ``sql_mental_model.md`` — “developer translation” (pandas ↔ SQL)
- ``manifest.json`` — SHA-256 file digests for reproducibility
- ``lineage.mmd`` — a small DAG of how artifacts are derived
- ``run_meta.json`` — seed + environment info (reproducibility)

Why this is “database thinking”
-------------------------------

A database mindset is a superpower for accounting systems:

- **Fact table**: a posting is the atomic event you query.
- **Constraints**: each entry must balance (sum debits == sum credits).
- **Derived views**: everything you *report* is computed from postings.
- **Immutability**: treat postings as append-only; rebuild views as needed.

The runner in this chapter builds these ideas explicitly and writes out both the
raw table and the views.

SQL mental model
----------------

The artifacts are produced using grouping/aggregation patterns that map
directly to SQL.

Example: balances by account

.. code-block:: sql

   SELECT
     account,
     SUM(debit_cents)  AS debit_cents,
     SUM(credit_cents) AS credit_cents
   FROM postings
   GROUP BY account
   ORDER BY account;

In the runner we do the same operation with pandas groupby, then write the view
to ``balances_by_account.csv``.

Example: balances by period + root + account

.. code-block:: sql

   SELECT
     period,
     root,
     account,
     SUM(signed_delta_cents) AS balance_cents
   FROM postings
   GROUP BY period, root, account
   ORDER BY period, root, account;

Example: running balance (window function)

.. code-block:: sql

   SELECT
     *,
     SUM(signed_delta_cents) OVER (
       PARTITION BY account
       ORDER BY date, entry_id, line_no
     ) AS running_balance_cents
   FROM postings
   ORDER BY date, entry_id, line_no;

This view is written to ``running_balance_by_posting.csv``.

How to run
----------

From the project root:

.. code-block:: bash

   # Run the chapter runner (deterministic seed)
   make ll-ch04

Or call the module directly:

.. code-block:: bash

   python -m ledgerloom.chapters.ch04_general_ledger_database \
     --outdir outputs/ledgerloom \
     --seed 123

You should see:

- ``Wrote LedgerLoom Chapter 04 artifacts -> outputs/ledgerloom/ch04``

Invariant checks you can trust
------------------------------

This chapter generates ``invariants.json`` that validates:

- **Double-entry**: every entry balances (debits == credits)
- **Ledger-wide balance**: total (debit - credit) is exactly zero
- **Primary key uniqueness**: posting_id is unique
- **Allowed roots**: account roots are one of Assets/Liabilities/Equity/Revenue/Expenses

The checks are designed to fail loudly if anything drifts.

Contributor notes
-----------------

- All CSV outputs are forced to **LF (\n)** line endings to keep golden-file tests
  stable across Windows/macOS/Linux.
- Golden tests compare bytes (not “parsed” tables) on purpose — the goal is
  reproducible artifacts.

Next
----

Chapter 05 will start turning these “database views” into statements with more
business realism (period closing, adjusting entries, and statement layout).