Chapter 02 — Debits & Credits as Encodings (Wide, Long, Signed) =============================================================== Chapter 01 introduced the idea that accounting can be represented as a **canonical journal** (entries + postings) and then reported in consistent ways (trial balance, income statement, balance sheet). This chapter takes the next step: **The same accounting facts can be stored in different table shapes — and still compile into the exact same canonical journal.** In other words: *accounting is defined by invariants, not by column names.* What you will build ------------------- You will generate a tiny, meaningful demo dataset and produce: Three encodings of the same transactions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - **Wide**: one row per transaction with explicit debit/credit columns - **Long**: many rows per transaction with a ``side`` column (debit/credit) - **Signed**: many rows per transaction with a single ``signed_amount`` column A canonical journal ^^^^^^^^^^^^^^^^^^^ From each encoding we compile **the same** canonical journal: - ``journal_from_wide.jsonl`` - ``journal_from_long.jsonl`` - ``journal_from_signed.jsonl`` Reports from the journal ^^^^^^^^^^^^^^^^^^^^^^^^ - ``trial_balance.csv`` - ``income_statement.csv`` - ``balance_sheet.csv`` Proof + "wow" artifacts ^^^^^^^^^^^^^^^^^^^^^^^ - ``checks.md`` — PASS/FAIL invariant checks - ``diagnostics.md`` — narrative explanation + hashes - ``tables.md`` — the encodings and reports as readable Markdown tables - ``lineage.mmd`` — Mermaid lineage diagram (encodings → journal → reports) - ``manifest.json`` / ``run_meta.json`` — reproducibility metadata + hashes The core idea: facts vs encoding -------------------------------- A transaction like: - Debit ``Assets:Cash`` $5000 - Credit ``Equity:OwnerCapital`` $5000 is a *fact*. How you store that fact in a table is an *encoding choice*. LedgerLoom treats the encoding as input, compiles it into canonical journal entries, and then enforces accounting correctness through invariants. Encoding 1: wide (classic debit/credit columns) ----------------------------------------------- The wide encoding is common in exports: - one row per transaction - two posting columns: debit side and credit side Columns in ``encoding_wide.csv`` include: - ``tx_id`` — transaction id (groups postings) - ``dt`` — ISO date - ``narration`` — description - ``debit_account``, ``debit_amount`` - ``credit_account``, ``credit_amount`` This is extremely readable for humans, but it is not always the best shape for analytics (because debits and credits are split across columns). Encoding 2: long (one posting per row) -------------------------------------- The long encoding is common in databases and analytics pipelines: - one row per posting - a ``side`` column indicates debit vs credit Columns in ``encoding_long.csv`` include: - ``tx_id``, ``dt``, ``narration`` - ``side`` — ``debit`` or ``credit`` - ``account`` - ``amount`` This is more "relational" and is easy to group, filter, and join. Encoding 3: signed (single numeric measure) ------------------------------------------- The signed encoding is a long table where the numeric measure carries direction: - debits are positive - credits are negative Columns in ``encoding_signed.csv`` include: - ``tx_id``, ``dt``, ``narration`` - ``account`` - ``signed_amount`` Why this is powerful: - you can aggregate with *one* numeric column - you can build models on postings without pivot/unpivot steps - correctness is enforced by invariants (the sum of ``signed_amount`` must be zero per transaction) Compiling encodings into the canonical journal ---------------------------------------------- Each encoding is compiled into a list of ``Entry`` objects, each containing a date, narration, and a list of ``Posting`` objects. The canonical journal is written as deterministic JSONL so you can diff it, hash it, and treat it like a proper artifact. Key point: **All three compiled journals are byte-identical.** That is the chapter's "proof of equivalence". Invariants (the accounting "safety rails") ------------------------------------------ LedgerLoom enforces the invariants that make double-entry bookkeeping work: - Each transaction balances (total debits == total credits) - In signed form: each transaction sums to zero (sum(signed_amount) == 0) - Trial balance is consistent with the journal - Financial statements are consistent with the trial balance These invariants are captured for humans in: - ``checks.md`` (PASS/FAIL) - ``diagnostics.md`` (hashes + explanation) and for machines in: - ``run_meta.json`` / ``manifest.json`` How to run ---------- From the repo root: .. code-block:: bash # Run Chapter 02 demo (writes into outputs/ledgerloom/ch02) python -m ledgerloom.chapters.ch02_debits_credits_encoding --outdir outputs/ledgerloom --seed 123 Or using the Makefile target (if available): .. code-block:: bash make ll-ch02 Where to look after running: .. code-block:: text outputs/ledgerloom/ch02/ encoding_wide.csv encoding_long.csv encoding_signed.csv journal_from_wide.jsonl journal_from_long.jsonl journal_from_signed.jsonl trial_balance.csv income_statement.csv balance_sheet.csv checks.md diagnostics.md tables.md lineage.mmd manifest.json run_meta.json summary.md Recommended reading order ------------------------- If you want the fastest "wow": 1. Open ``summary.md`` (high-level tour) 2. Open ``tables.md`` (see the data) 3. Open ``checks.md`` (PASS/FAIL invariants) 4. Open ``diagnostics.md`` (hashes + reasoning) 5. Open ``manifest.json`` (artifact hashes + sizes) Exercises --------- 1. **Add a new transaction** - Add a new wide row in the chapter script demo dataset. - Regenerate outputs and verify all checks still PASS. 2. **Create a multi-posting transaction** - Extend the demo so that one transaction has *three* postings. - Hint: wide encoding becomes awkward; long and signed remain natural. 3. **Stress-test your intuition** - Change only the ordering of rows in ``encoding_long.csv`` and rerun. - The canonical journal should remain identical (stable grouping rules). Developer notes --------------- - This chapter deliberately keeps the demo dataset small enough to read in one sitting. - Outputs are deterministic for a fixed seed to keep tests stable and diffs meaningful. - The canonical journal and reports are the "source of truth"; encodings are just views. Next ---- Chapter 03 will introduce a Chart of Accounts schema so that account strings can be validated (and later, used for roll-ups and richer reporting).