Chapter 02 — Debits & Credits as Encodings (Wide, Long, Signed)
Chapter 01 introduced the idea that accounting can be represented as a canonical journal (entries + postings) and then reported in consistent ways (trial balance, income statement, balance sheet).
This chapter takes the next step:
The same accounting facts can be stored in different table shapes — and still compile into the exact same canonical journal.
In other words: accounting is defined by invariants, not by column names.
What you will build
You will generate a tiny, meaningful demo dataset and produce:
Three encodings of the same transactions
Wide: one row per transaction with explicit debit/credit columns
Long: many rows per transaction with a
sidecolumn (debit/credit)Signed: many rows per transaction with a single
signed_amountcolumn
A canonical journal
From each encoding we compile the same canonical journal:
journal_from_wide.jsonljournal_from_long.jsonljournal_from_signed.jsonl
Reports from the journal
trial_balance.csvincome_statement.csvbalance_sheet.csv
Proof + “wow” artifacts
checks.md— PASS/FAIL invariant checksdiagnostics.md— narrative explanation + hashestables.md— the encodings and reports as readable Markdown tableslineage.mmd— Mermaid lineage diagram (encodings → journal → reports)manifest.json/run_meta.json— reproducibility metadata + hashes
The core idea: facts vs encoding
A transaction like:
Debit
Assets:Cash$5000Credit
Equity:OwnerCapital$5000
is a fact.
How you store that fact in a table is an encoding choice.
LedgerLoom treats the encoding as input, compiles it into canonical journal entries, and then enforces accounting correctness through invariants.
Encoding 1: wide (classic debit/credit columns)
The wide encoding is common in exports:
one row per transaction
two posting columns: debit side and credit side
Columns in encoding_wide.csv include:
tx_id— transaction id (groups postings)dt— ISO datenarration— descriptiondebit_account,debit_amountcredit_account,credit_amount
This is extremely readable for humans, but it is not always the best shape for analytics (because debits and credits are split across columns).
Encoding 2: long (one posting per row)
The long encoding is common in databases and analytics pipelines:
one row per posting
a
sidecolumn indicates debit vs credit
Columns in encoding_long.csv include:
tx_id,dt,narrationside—debitorcreditaccountamount
This is more “relational” and is easy to group, filter, and join.
Encoding 3: signed (single numeric measure)
The signed encoding is a long table where the numeric measure carries direction:
debits are positive
credits are negative
Columns in encoding_signed.csv include:
tx_id,dt,narrationaccountsigned_amount
Why this is powerful:
you can aggregate with one numeric column
you can build models on postings without pivot/unpivot steps
correctness is enforced by invariants (the sum of
signed_amountmust be zero per transaction)
Compiling encodings into the canonical journal
Each encoding is compiled into a list of Entry objects, each containing a date,
narration, and a list of Posting objects.
The canonical journal is written as deterministic JSONL so you can diff it, hash it, and treat it like a proper artifact.
Key point:
All three compiled journals are byte-identical.
That is the chapter’s “proof of equivalence”.
Invariants (the accounting “safety rails”)
LedgerLoom enforces the invariants that make double-entry bookkeeping work:
Each transaction balances (total debits == total credits)
In signed form: each transaction sums to zero (sum(signed_amount) == 0)
Trial balance is consistent with the journal
Financial statements are consistent with the trial balance
These invariants are captured for humans in:
checks.md(PASS/FAIL)diagnostics.md(hashes + explanation)
and for machines in:
run_meta.json/manifest.json
How to run
From the repo root:
# Run Chapter 02 demo (writes into outputs/ledgerloom/ch02)
python -m ledgerloom.chapters.ch02_debits_credits_encoding --outdir outputs/ledgerloom --seed 123
Or using the Makefile target (if available):
make ll-ch02
Where to look after running:
outputs/ledgerloom/ch02/
encoding_wide.csv
encoding_long.csv
encoding_signed.csv
journal_from_wide.jsonl
journal_from_long.jsonl
journal_from_signed.jsonl
trial_balance.csv
income_statement.csv
balance_sheet.csv
checks.md
diagnostics.md
tables.md
lineage.mmd
manifest.json
run_meta.json
summary.md
Recommended reading order
If you want the fastest “wow”:
Open
summary.md(high-level tour)Open
tables.md(see the data)Open
checks.md(PASS/FAIL invariants)Open
diagnostics.md(hashes + reasoning)Open
manifest.json(artifact hashes + sizes)
Exercises
Add a new transaction - Add a new wide row in the chapter script demo dataset. - Regenerate outputs and verify all checks still PASS.
Create a multi-posting transaction - Extend the demo so that one transaction has three postings. - Hint: wide encoding becomes awkward; long and signed remain natural.
Stress-test your intuition - Change only the ordering of rows in
encoding_long.csvand rerun. - The canonical journal should remain identical (stable grouping rules).
Developer notes
This chapter deliberately keeps the demo dataset small enough to read in one sitting.
Outputs are deterministic for a fixed seed to keep tests stable and diffs meaningful.
The canonical journal and reports are the “source of truth”; encodings are just views.
Next
Chapter 03 will introduce a Chart of Accounts schema so that account strings can be validated (and later, used for roll-ups and richer reporting).