# Our Blog

Ongoing observations by End Point people

## Inside PostgreSQL - Clause selectivity

By Josh Tolley
June 5, 2009

One of the more valuable features of any conference is the so-called â€śhall trackâ€ť, or in other words, the opportunity to talk to all sorts of people about all sorts of things. PGCon was no exception, and I found the hall track particularly interesting because of suggestions I was able to gather regarding multi-column statistics, not all of which boiled down to â€śYouâ€™re dreamingâ€”â€‹give it a restâ€ť. One of the problems Iâ€™d been trying to solve was where, precisely, to put the code that actually applies the statistics to a useful problem. There are several candidate locations, and certainly quite a few places where we could make use of such statistics. The lowest-hanging fruit, however, seems to be finding groups of query clauses that arenâ€™t as independent as we would normally assume. Between PGCon sessions one day, Tom Lane pointed me to a place where we already do something very similar: clausesel.c

â€śClause selectivityâ€ť means much the same thing as any other selectivity: itâ€™s the proportion of rows from a relation that an operation will return. A â€śclauseâ€ť, in this case, is a filter on a relation, such as the â€śX = 1â€ť and the â€śY < 10â€ť in â€śWHERE X = 1 AND Y < 10â€ť. PostgreSQL uses functions in clausesel.c to find clauses whose combined selectivity differs from the product of their individual selectivities. For instance, in â€śWHERE X < 4 AND X < 5â€ť, the â€śX < 5â€ť is redundant; the clausesâ€™ combined selectivity is simply that of â€śX < 4â€ť. With â€śWHERE Y > 4 AND Y < 10â€ť, clausesel.c can determine that we really want the selectivity of the clause â€ś4 < Y < 10â€ť. Itâ€™s also smart enough to recognize â€śpseudo-constantsâ€ť: values from non-volatile functions, or from the outer relation of a nested loop. Although these values arenâ€™t truly constants, they remain constant at the level of the query where the clause will be applied, and can be treated as constants.

With any luck, one day clausesel.c will also know enough to notice cases where, for instance, although â€śfoo.x = 3â€ť and â€śfoo.y > 10â€ť are individually true for much of table â€śfooâ€ť, there are very few rows where both conditions are true.