The Halloween Problem
14 Aug 2023This post first appeared in my new newsletter: Null Bitmap. If you’re interested in getting smaller, more rambly posts directly in your inbox please consider signing up.
What if you had an isolation anomaly with yourself?
The year is 1976, Don’t Go Breaking My Heart is blasting on the speakers of the IBM San Jose Research Center, and you’re Patricia Selinger.
You’re working on your research database System R, and you’ve got a table that looks something like this:
| employee | salary |
+------------------+--------+
| Don Chamberlin | 10000 |
| Pat Selinger | 15000 |
| Morton Astrahan | 21000 |
You want to do run an update that will give every employee earning less than $20,000 a 10% raise. You concoct a query that will do this in the language you’ve designed, SQL (which is just a rough cut at a relational language, and you’re sure will be superseded before too long). It looks like this:
UPDATE employee SET salary = salary * 1.1 WHERE salary < 20000
Of course, expecting the result to look something like this:
| employee | salary |
+------------------+--------+
| Don Chamberlin | 11000 |
| Pat Selinger | 16500 |
| Morton Astrahan | 21000 |
To your surprise, it instead, it looks like this:
| employee | salary |
+------------------+--------+
| Morton Astrahan | 21000 |
| Don Chamberlin | 21436 |
| Pat Selinger | 21961 |
To understand why this happened, we have to look into how the UPDATE
statement, and query engines at large, work.
Query Engines
Relational queries are typically compiled into tree-shaped plans, which then get executed, with data flowing from the leaves up towards the root. If we were planning a query like
SELECT employee, salary FROM employees WHERE salary < 20000
we might have an IndexScan
operator that reads from a particular index and yields all of its rows in sequence, that feeds into a Select
that filters out particular rows:
Select [ salary < 2000 ]
^
|
IndexScan [ employees_primary_index ]
Conceptually, how you might think of this is that we compute the value of each subtree entirely and then feed it upwards to its parent. This might take a lot of memory to do explicitly, so oftentimes the implementations are pipelined, where each row gets sent one at a time to the parent to get processed before the following row is processed.
Each row from the IndexScan
individually travels up to the Select
, where the filter condition gets evaluated, and the Select
either emits the row or doesn’t. These nodes at each level of the tree are called “operators.”
Executing this query would give us the following result set:
| employee | salary |
+------------------+--------+
| Don Chamberlin | 10000 |
| Pat Selinger | 15000 |
Mutations, like INSERT
, UPDATE
, and DELETE
, have their own operators, but instead of fulfilling the job of transforming the data and then passing it along, they act on it. So, when I write
INSERT INTO employees VALUES ('Don Chamberlin', 11000), ('Morton Astrahan', 16500), ('Pat Selinger', 21000);
That gets planned as
Insert [ employees ]
^
|
Values [
('Don Chamberlin', 11000),
('Morton Astrahan', 16500),
('Pat Selinger', 21000),
]
Where the job of the INSERT
is that whenever it receives a row, it directly performs the mutation on the table.
The structure of this tree suggests something about what else might be possible here. It’s possible that this limited form of INSERT
, with an explicit list of VALUES
, is the only one that you’ve been exposed to. If you’re the kind of pervert reading this kind of deep dive on SQL, I doubt that very much.
INSERT
is indeed more general than this. It’s actually shaped like so:
INSERT INTO <table> <relational expression>
Of which VALUES
is just one of many possibilities. VALUES
itself is just a way of constructing a relational expression, and if I punch VALUES
into my System R command line interface [1], I can execute it directly:
> values (1,2,3), (4,5,6), (7,8,9);
1|2|3
4|5|6
7|8|9
And so we can in fact substitute any expression into that INSERT
statement.
> create table abc (a int, b int, c int);
> insert into abc values (1, 2, 3), (4, 5, 6), (7, 8, 9);
> create table def (d int, e int, f int);
> insert into def select * from abc;
> select * from def;
1|2|3
4|5|6
7|8|9
The sunny San Jose sky is the limit.
UPDATE
is similar, but rather than taking a set to insert, it takes a set to issue a deletion and an insertion for.
UPDATE employee SET salary = salary * 1.1 WHERE salary < 20000
Takes the set of rows in the table with salary < 20000
and for each one,
-
Issues a deletion of the row as it exists now, and
-
issues an insertion of that same row with
salary * 1.1
.
UPDATE
and DELETE
are a bit trickier than INSERT
, because they necessarily require reading from a table in order to do said operation.
We might compile this like so:
Update [ (employee, salary * 1.1) ]
^
|
Select [ salary < 20000 ]
^
|
IndexScan [ employees_primary_index ]
However, it seems Pat Selinger expected to be doing a number of queries that did lookups based on the salary
of an employee, and added an index to that effect.
If she had such an index that supported range queries on salary
, the query planner for System R could then avoid doing a full scan of employees_primary_index
, and scan this more suitable index instead. It could then avoid running the Select
on every row and just read a subset of the index directly.
Update [ (employee, salary * 1.1) ]
^
|
IndexScan [ employees_salary_index, [0, 20000) ]
You might be able to see where we’re going with this. In hindsight “operators that, when you evaluate them, change their own inputs,” sound pretty dangerous!
Indeed, the problem was that the each time the UPDATE
statement would update a row, it would insert the changed row into the employees_salary_index
index, which would then be eligible for reading by the IndexScan
. The result was that rows would be continually processed by the Update
until they fell out of the range of the IndexScan
.
Amateur Mistake?
When I first described this problem to a friend, her reaction was something like, “isn’t not modifying a collection while you’re iterating over it like, CS 101? How did anyone ever make that mistake?”
I want to defend the IBM team because I think this is actually more subtle than it first appears.
First of all, this behaviour is implementation-dependent. You could imagine a version of the UPDATE
operator that didn’t pipeline, and computed all (or even batches) of its input before it began processing any of it. I think it’s not obvious that working off of this mental model is wrong, and it’s not when there are no mutation operations.
What’s more though, I think the actually desired semantics here are not obvious! You might say “well, what do you mean, surely a mutation shouldn’t see its own changes!” But I would expect this to work:
BEGIN
INSERT INTO t VALUES (1);
SELECT * FROM t; -- should return the 1 just inserted
COMMIT
This is something like a read-your-own-writes guarantee for transactions.
One of the fundamental tenets of relational databases was transactions, and to me transactions are fundamentally about composability: if I bundle together a collection of writes, that should have the same isolation guarantees as a single write by itself.
So it seems we have two conflicting desires:
-
Encountering the Halloween problem seems obviously broken (especially because as we said before, said behaviour is implementation-dependent), but
-
sometimes reads should see their own writes.
And of course you’ve probably guessed what the solution is, and it’s to define the semantics to be that an individual statement’s writes are hidden from the transaction until it completes.
There’s a couple common ways I know of to handle this. The simplest is spooling. This is where we introduce a special operator into our tree that buffers inputs until there’s no more, at which point it feeds the inputs to its parent:
Update [ (employee, salary * 1.1) ]
^
|
Spool
^
|
IndexScan [ employees_salary_index, [0, 20000) ]
Another is to control the way reads are visible. One way about thinking of visibility is that every transaction does its writes at a particular timestamp.
One operation that you can do on timestamp-like things is to subdivide it into smaller timestamp-like things by taking its lexicographic product with another timestamp-like thing. In this way, you can create a sort of barrier between successive statements.
BEGIN -- (transaction occurs at timestamp 36)
write a = 4 -- this writes at timestamp (36, 1)
write a = a + 1 -- this writes at timestamp (36, 2) and reads at timestamp (36, 1)
read a -- this reads at timestamp (36, 2) and sees a = 5
COMMIT
The third way is to be sneaky: if the original query hadn’t been reading from a problematic index (one that reorders the rows upon mutation), then we wouldn’t have this problem. So we might observe that if we can find an acceptable index to read from, we don’t have to do any mitigation at all, and everything will just work out.
[1]: That’s a joke, but if you work at IBM and are able to smuggle me out a copy of the System R source for historical purposes I will pay you handsomely.
[2]: Of course, it’s not called that because it’s “spooky,” it’s because it was discovered on Halloween. There is a lovely interview with Don Chamberlin on the development of System R that talks about the discovery of the Halloween Problem.