Chapter 02 — Debits & Credits as Encodings (Wide, Long, Signed)

Chapter 01 introduced the idea that accounting can be represented as a canonical journal (entries + postings) and then reported in consistent ways (trial balance, income statement, balance sheet).

This chapter takes the next step:

The same accounting facts can be stored in different table shapes — and still compile into the exact same canonical journal.

In other words: accounting is defined by invariants, not by column names.

What you will build

You will generate a tiny, meaningful demo dataset and produce:

Three encodings of the same transactions

  • Wide: one row per transaction with explicit debit/credit columns

  • Long: many rows per transaction with a side column (debit/credit)

  • Signed: many rows per transaction with a single signed_amount column

A canonical journal

From each encoding we compile the same canonical journal:

  • journal_from_wide.jsonl

  • journal_from_long.jsonl

  • journal_from_signed.jsonl

Reports from the journal

  • trial_balance.csv

  • income_statement.csv

  • balance_sheet.csv

Proof + “wow” artifacts

  • checks.md — PASS/FAIL invariant checks

  • diagnostics.md — narrative explanation + hashes

  • tables.md — the encodings and reports as readable Markdown tables

  • lineage.mmd — Mermaid lineage diagram (encodings → journal → reports)

  • manifest.json / run_meta.json — reproducibility metadata + hashes

The core idea: facts vs encoding

A transaction like:

  • Debit Assets:Cash $5000

  • Credit Equity:OwnerCapital $5000

is a fact.

How you store that fact in a table is an encoding choice.

LedgerLoom treats the encoding as input, compiles it into canonical journal entries, and then enforces accounting correctness through invariants.

Encoding 1: wide (classic debit/credit columns)

The wide encoding is common in exports:

  • one row per transaction

  • two posting columns: debit side and credit side

Columns in encoding_wide.csv include:

  • tx_id — transaction id (groups postings)

  • dt — ISO date

  • narration — description

  • debit_account, debit_amount

  • credit_account, credit_amount

This is extremely readable for humans, but it is not always the best shape for analytics (because debits and credits are split across columns).

Encoding 2: long (one posting per row)

The long encoding is common in databases and analytics pipelines:

  • one row per posting

  • a side column indicates debit vs credit

Columns in encoding_long.csv include:

  • tx_id, dt, narration

  • sidedebit or credit

  • account

  • amount

This is more “relational” and is easy to group, filter, and join.

Encoding 3: signed (single numeric measure)

The signed encoding is a long table where the numeric measure carries direction:

  • debits are positive

  • credits are negative

Columns in encoding_signed.csv include:

  • tx_id, dt, narration

  • account

  • signed_amount

Why this is powerful:

  • you can aggregate with one numeric column

  • you can build models on postings without pivot/unpivot steps

  • correctness is enforced by invariants (the sum of signed_amount must be zero per transaction)

Compiling encodings into the canonical journal

Each encoding is compiled into a list of Entry objects, each containing a date, narration, and a list of Posting objects.

The canonical journal is written as deterministic JSONL so you can diff it, hash it, and treat it like a proper artifact.

Key point:

All three compiled journals are byte-identical.

That is the chapter’s “proof of equivalence”.

Invariants (the accounting “safety rails”)

LedgerLoom enforces the invariants that make double-entry bookkeeping work:

  • Each transaction balances (total debits == total credits)

  • In signed form: each transaction sums to zero (sum(signed_amount) == 0)

  • Trial balance is consistent with the journal

  • Financial statements are consistent with the trial balance

These invariants are captured for humans in:

  • checks.md (PASS/FAIL)

  • diagnostics.md (hashes + explanation)

and for machines in:

  • run_meta.json / manifest.json

How to run

From the repo root:

# Run Chapter 02 demo (writes into outputs/ledgerloom/ch02)
python -m ledgerloom.chapters.ch02_debits_credits_encoding --outdir outputs/ledgerloom --seed 123

Or using the Makefile target (if available):

make ll-ch02

Where to look after running:

outputs/ledgerloom/ch02/
  encoding_wide.csv
  encoding_long.csv
  encoding_signed.csv
  journal_from_wide.jsonl
  journal_from_long.jsonl
  journal_from_signed.jsonl
  trial_balance.csv
  income_statement.csv
  balance_sheet.csv
  checks.md
  diagnostics.md
  tables.md
  lineage.mmd
  manifest.json
  run_meta.json
  summary.md

Exercises

  1. Add a new transaction - Add a new wide row in the chapter script demo dataset. - Regenerate outputs and verify all checks still PASS.

  2. Create a multi-posting transaction - Extend the demo so that one transaction has three postings. - Hint: wide encoding becomes awkward; long and signed remain natural.

  3. Stress-test your intuition - Change only the ordering of rows in encoding_long.csv and rerun. - The canonical journal should remain identical (stable grouping rules).

Developer notes

  • This chapter deliberately keeps the demo dataset small enough to read in one sitting.

  • Outputs are deterministic for a fixed seed to keep tests stable and diffs meaningful.

  • The canonical journal and reports are the “source of truth”; encodings are just views.

Next

Chapter 03 will introduce a Chart of Accounts schema so that account strings can be validated (and later, used for roll-ups and richer reporting).