AssignmentTwo

Image(http://control.cs.berkeley.edu/stride.gif)

Assignment Two

== Administrative Details ==

Do this assignment in groups of 2, same as last time. The assignment is due Thu. 3/5.

Motivation

For your second assignment, you'll be cracking into the guts of the Postgres query executor to implement a special index scanning operator that can perform "Index Stride."

Index Stride was first introduced by Professor Hellerstein's research group at Berkeley, as part of a technique called Online Aggregation, implemented in the CONTROL project. If you are curious about the roots of this project, you can read the original research paper, or peek at a description of what index stride during online aggregation and how it might look like to a user of Online Aggregation (see here for a screenshot of a more professional user interface).

For our purposes in this assignment, full-function Online Aggregation is a bit too ambitious, so you are not responsible to read through all the links above. Instead, we will be implementing a simpler idea: a new SQL clause we call SHUFFLE BY. The idea is to return a query's output rows in an "shuffled" order that rotates between values of some column in the output. Wee are going to implement the SHUFFLE BY clause with a new Index Stride operator. For example, the query:

SELECT name, grade
  FROM Enrolled
SHUFFLE BY grade;

should return a tuple with an A grade, a tuple with a B grade, a tuple with a C grade, a tuple with a D grade, a tuple with an F grade, and begin with an A again until all tuples have been returned. The order of the names can be arbitrary. Adding the clause LIMIT 10 to the end of the query will cause it to return the first 10 of those tuples and then quit.

This is handy for providing a UI with an example tuple for each different value in the SHUFFLE BY key.

Technique

Index Stride is a new query operator ("iterator" in the terminology from class) that can achieve this behavior. The basic idea is to use a B+-tree index on the SHUFFLE BY keys (grade in the example above), to walk over the table "visiting" each key value in round robin order. For example, we visit grades "A" then "B" then "C".

We begin by opening a normal index scan whose parameters (its scan descriptor in Postgres terminology) are to find tuples "> -infinity". This should find the first tuple in the index order, with some key value K. In our example, we would find the first record in the index whose grade is "A".

Once we position the index scan on that tuple, we modify the scan descriptor to have a "search argument" (SARG) of "=K", save the scan descriptor for later in our executor's private state, and return the tuple we found.
In the example, we return the first "A" record, but save the scan descriptor, so when it's time to return another "A" record, we can find the second record.

On a subsequent request for a tuple, we open a new scan descriptor with a SARG of ">K", and locate the first tuple that matches -- it has a key with some value K2. We modify the newly opened scan descriptor to have a SARG of "=K2" and return the tuple we found.
For example, we find on the subsequent request, a record with its grade value "B", which we return. And like before, we save the scan descriptor for "B" records.

We repeat this process until we reach the end (highest key value) of the index (think about how we figure out we're there!). From this point on, as more tuples are requested, we visit the saved scan descriptors in our state in a round-robin fashion, until all of the table rows pointed to by the index have been visited. For example, now we have scan descriptors for each grade, "A" through "F". We can return grades "shuffed" by A, then B, then C, until F, and then A and B..., and so on.

Simple, huh? It is pretty simple, though you have to work out all the corner cases. Also, the Postgres executor structures are relatively complex, and it will take some time to familiarize yourself with them enough to do the assignment. We have tried to limit the pain involved by providing some handy skeleton code in the solution.

Some important themes of this assignment include:

Memory allocation using contexts: To write well-behaved Postgres code, we need to avoid using malloc()/free(), and instead allocate/free memory from postgres contexts that can be free'd en masse. This technique is sometimes called region-based memory management and is used in many server-style software systems. It hugely simplifies memory management and reduces the likelihood of leaks, double-frees, and other boogeymen of C programming.
Careful use of the buffer pool: in general, there are two implementation approaches we could take to the "save the current scan" as described above. We could simply store a pointer to the scan location in the buffer pool, or we could copy the information about the scan state into the heap memory of the IndexStride operator. The former would seem like the obvious right approach, except it means that we'd need to keep the buffer pages pinned for the duration of the scan. This might not be a big deal if the cardinality of the index keys is low...
Deep copying of structs: The latter approach above implies copying a C structure with an attribute of type pointer. We need to do this manually in C, and carefully.

To do

We've already taken care of the details of adding support to SQL for the SHUFFLE BY clause, and making the selected column available to the executor. You will be required to implement the IndexStride node in the Postgres query executor. Operators in Postgres are implemented as nodes, which encompass operators and more.

Recall Professor Hellerstein's lecture about query execution and the iterator model: each operator implements the iterator model (with init(), next(), close() and so on). In Postgres, each "executor node" corresponds to a particular query operator and implements an iterator "class" in a C design pattern. This provides a convenient encapsulation of the operator logic, but requires that we express the algorithm we want to implement in terms of the sequence of calls for init(), [next(), [...]) close().

IndexStride, like a table scan or index scan operator, sits at the bottom of the operator tree. In other words, it acts like a table scan or index scan, piping tuples up to other operators. These traditional scans (table / index) need to store only enough state to know how to fetch the next tuple. Our IndexStride operator is different, and will need to accumulate a potentially large amount of state: enough to know how to fetch the next tuple for each key in the SHUFFLE BY clause. In other words, IndexStride is a new type of scan operator that keeps more state in order to implement its shuffle behavior.

We have provided you with a skeleton file that contains the basic superstructure we've described, along with many helper functions to make it easier for you to navigate the internal structures.

Yikes, where do I begin??

We recommend that you begin by figuring out the basic algorithm you need, given the context of the iterator model.

Answer the following question: how does the prose description of the Index Stride operator and algorithm, when accessed by iterator methods, translate to pseudocode? It will be helpful to write down a pseudocode version that implements init(), next() and close(). Peeking at the skeleton code will provide some hints...

We've already created the struct definition you'll be using to store the operator state: IndexStrideState in ~~src/nodes/execnodes.h~~ src/include/nodes/execnodes.h. You should look over that whole file, and IndexStrideState in particular.
Looking at the implementations of other operators is a good idea. src/backend/executor/nodeIndexScan.c is the basic Postgres code for a traditional Index Scan (with a single scan descriptor). It will provide a lot of context for the structures we're manipulating, but is a very complicated operator! To get the basic idea of how operators are implemented, reading a simple operator like nodeLimit.c (which implements the LIMIT clause in SQL) will help.

Step-by-step

Prepare the environment

Check out the postgres v8.2.4 sources.

In order to avoid going over quota, you will want to start by deleting the large directories we used in the last project (~/pgdata, /home/tmp//postgres-8.0.3). This can be done with 'rm -rf <file/dir>'. If that still prompts you for every file to remove, try typing 'unalias rm' first. Clean up up any other temporary files you may have created: the quotas are very strict. When you're done:

> cd ~
> mkdir hw2
> cd hw2

> svn co file:///home/ff/cs186/sp09/postgres-8.2.4 /home/tmp/$USER/postgresql-8.2.4
> svn co file:///home/ff/cs186/sp09/hw2

> ln -s /home/tmp/$USER/postgresql-8.2.4

Patch the sources

We provided you with a patch called indexstride.patch. This contains the changes that we made to postgres (outside the executor) to support the SHUFFLE BY clause, and should be applied to the vanilla version you just checked out.

> gpatch -p0 -i hw2/indexstride.patch

Install the toy solution

cp hw2/nodeIndexstride.toy.c postgresql-8.2.4/src/backend/executor/nodeIndexstride.c cp hw2/nodeIndexstride.h postgresql-8.2.4/src/include/executor/ cd postgresql-8.2.4 ./configure --enable-debug --enable-cassert --prefix=$HOME/pgsql make make install

You can see that in the hw2 repository, we've provided you with 2 source files: nodeIndexstride.skel.c and nodeIndexstride.toy.c.

The first is a skeleton file containing method definitions and suggestions of where you should add code: this file corresponds in spirit to freelist.skel.c in HW1.
The second file is a working Postgres node / operator that implements a similar, but much simpler algorithm than Index Stride: it walks over the index keys one time only, and exits, providing a round-robin sampler that gives a single tuple for each key of the index. This is similar in spirit to freelist.lru.c in that (aside from actually compiling) it illustrates by example how to use and manipulate the structures we need to implement the real index stride.
You can choose whether you prefer to start by filling in the clean but bare skeleton, or modifying the illustrative but cluttered toy implementation.

== Try out the toy executor ==

Connect to Postgres, and put in the sample data in your hw2 directory. You must use an absolute path.

test=# DROP TABLE stride_test;
test=# CREATE TABLE stride_test ( country varchar(50), year int, pop float);
test=# CREATE INDEX i_stride_country ON stride_test(country);
test=# COPY stride_test FROM '<absolute path to svn-checkout hw2 directory>/a2_data.csv' DELIMITERS ',';

We have implemented a very simple (and incorrect!) version of SHUFFLE BY. You can try it out:

test=# SELECT * FROM stride_test SHUFFLE BY country;

You will probably notice that the toy operator has a buffer leak. See if you can fix that first.

Debugging HW2

HW2 is quite a bit more complex than HW1, and chances are you should learn to use the gdb, the GNU debugger. The following is a light introduction for folks who are new to gdb and/or Emacs. Details abound on the internets.

To set up the debugger, we essentially 1) run Postgres server as normal. 2) Find its pid and attach a gdb session to the process. 3) Set breakpoints. 4) Issue a SQL query that will fire the breakpoint.

First, get the PID of your Postgres backend. Run this query inside psql.

test-# SELECT * FROM pg_backend_pid();

(The following is just one way of setting up the debugger... your results and mileage may vary).

It's nice to ssh to a server with X forwarding. With X, you can run X-based/GUI Emacs. I also like to have vertically split panels, with gdb on one side and your source file on the other.

$ emacs &

In emacs...

Set up two windows, as vertical split panes
Launch gdb. Alternatively, in the Emacs GUI select Tools->Debugger(GDB) does the same. This should show a gdb prompt: (gdb)
In gdb, attach to the Postgres backend
Change to the directory with the relevant source file.
Set a break point at the function ExecIndexStride
Continue execution of Postgres backend

Ctrl-x 3 M-x gdb (gdb) attach (gdb) cd /home/tmp/$USER/postgresql-8.2.4/src/backend/executor (gdb) break ExecIndexStride (gdb) continue

In your terminal, back in psql land, enter the same query as above.

test=# SELECT * FROM stride_test SHUFFLE BY country;

This time, in gdb, your breakpoint will fire, and you're all set to debug.

For more information, check out our debugging tutorials. Or the GDB Manual

Requirements

You are expected to implement a working index stride operator than scans the table in a key-by-key fashion as described above. It must be stable over scans of large tables with high cardinality index keys, so you'll need to be careful about not leaking memory or pins. As we discussed in class, the operator needs to store state relative to the cardinality of the keys, so even a perfect implementation will have problems with certain data sets.

To achieve these goals, it's likely that you'll need to add code to nodeIndexStride in the places we've indicated in nodeIndexStride.skel.c:

/*************** SNIP ********************************/
	// your code goes here
/*************** SNIP ********************************/

Of course, it is not strictly necessary to do so, and you may find a better way. These are merely the places where we removed code from our working implementation. You'll need to write code to:

allocate and extend data structures to store index scan state, using region-based memory functions
manually copy c datastructures, to clone the scans
unpin pages when you're done with them
implement the basic multi-pass index stride algorithm

If you're a top-down thinker and want to look at how indexstride is invoked through the iterator API, and dig downwards to the datastructures, begin reading nodeIndexStride..c at the section header "executor boilerplate," and read downwards through the rest of the file. If you think bottom-up, and want to understand the basic data structure we'll be using to represent our state and work your way up from there, consider studying the definition of IndexStrideState in execnodes.h, and then see how the structure is manipulated in nodeIndexstride..c.

What to turn in

nodeIndexStride.c (your implementation of a fully-functional indexstride implementation)
MY.PARTNERS (list of the logins of your partners)

Good luck!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly