[GRLUG] Database query logic

Fri Feb 3 20:51:49 EST 2012

On Fri, 2012-02-03 at 17:44 -0500, Eric Beversluis wrote:
> I've been doing some reading on The Raiser's Edge. I found it
> interesting  that, when explaining queries, they emphasized that the
> queries do "filtering." This raised the question for me whether
> relational DB queries actually "filter" or whether the RE talk of
> "filtering" is heuristic.

The answer in terms of modern relational databases is "both" and
"neither".  A modern engine like PostgreSQL/Oracle/DB2/Informix has a
sophisticated optimized that determines the most efficient path for
evaluating a query;  this can certainly include filtering [which is
essentially what a "sequential scan" is, but it also uses indexes and
statistics to skip around [discard] whole chunks of data].  This gets
even more complicated when multiple sets [tables] are joined.  One could
be pedantic and say 'all that is filtering' but it would push the
concept of what-is-a-filter pretty far.

A "filter" is essentially a point where an input stream is reduced.

> In other words, which of these two descriptions fits how a database
> query works?
> 1. The "selection" method: If I say "select FName from tblPeople," the
> programs opens a file and "copies" all the FName values to a new file
> (or into memory). 

It is better to think in terms of "streams" rather than "files".  The
term "file" brings in some baggage such as being random access [you can
skip around inside it].  Moving the pointer in the source violates the
concept of a "filter", IMNSO.  A filter receives a stream and reduces
it; if it influences the stream itself it is something else [not a
filter].

> 2.  The "filter" method: If I say "select FName from tblPeople," the
> program opens a file, deletes everything that is not "FName," and saves
> the result to a new file (or into memory). 

Urg, that would be weird, it would certainly not be an efficient
approach in a transactional system.

> It seems to me that these two approaches are in fact different and would
> be programmed very differently.

Yes.

> I've never thought of DBs as using the "filter" approach, but then I
> haven't thought very hard about it either.

You can have the database tell you exactly what it is doing - this is
referred to in SQL as "EXPLAIN".  So "SELECT name FROM people WHERE
email ILIKE '%@whitemice.org'" performs the query and gives you the
result, while "EXPLAIN SELECT name FROM people WHERE email ILIKE '%
@whitemice.org'" returns *how* the database would come up with the
answer and the anticipated cost.  Depending on the size of the table,
partitioning, fragmentation, cardinality, indexes, and sort order the
database may approach even this simple query in different ways.

See page 132 of
<https://sourceforge.net/projects/coils/files/WMOGAG-Coils.pdf/download>
for a non-trivial EXPLAIN example.

> Are these in fact two different logical approaches? If so, which do
> relational DBs use--or do some use one approach and some the other?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://shinobu.grlug.org/pipermail/grlug/attachments/20120203/9a2fff7f/attachment.pgp>