Skip to content

assignment.1.2

Joe Hellerstein edited this page Dec 8, 2015 · 1 revision

Part 2: Implementing the CLOCK and MRU Buffer Replacement Strategies. (70%)

Setting up PostgreSQL and Other Files/Scripts

Everything needed for the second part of the assignment is available in an svn repos at ~cs186/sp09/postgress-8.0.3 and ~cs186/sp09/hw1 on the instructional machines.

Note on conventions: In the labs we will use gray boxes with the '$' prompt to signify text that you should type in verbatim to the shell on one of the instructional machines using your cs186 account. Below we will provide an explanation of the commands you are entering.

To set up the environment start by:

$ /share/b/bin/mkhometmpdir
$ svn co file:///home/ff/cs186/sp09/postgres-8.0.3 /home/tmp/$USER/postgres-8.0.3
$ mkdir ~/hw1
$ cd ~/hw1
$ svn co file:///home/ff/cs186/sp09/hw1 hw1-code
$ ln -s /home/tmp/$USER/postgres-8.0.3 ./
  1. This is a script set up by the instructional computing support to create a special temporary directory. We will keep the source code and other large things here to prevent problems with disk quotas. You should clear this directory out (since it won't contain anything you can't recreate easily) after you are done with the project
  2. This checks out the slightly modified (for space reasons) version of the postgres source tree from Subversion.
  3. Create a directory for homework one in your home directory (~ is short hand for the current users home dir)
  4. Change into the directory we just created
  5. Check out the skeleton for the actual assignment (skeleton file of freelist.c). We are keeping this in subversion as well to ease deployment of possible updated that might need to be given out after the project is assigned. Typing 'svn update' from inside this directory will check for the latest version of the skeleton file at anytime.
  6. Create a symbolic link (like a pointer) to the temp directory copy of the postgres source. This is purely for your convenience.

You can build and install PostgrSQL by using these commands:

$ cd ~/hw1/postgres-8.0.3
$ ./configure --prefix=$HOME/pgsql --enable-debug --enable-cassert
$ make
$ make install
  1. Move to the checked out source tree.
  2. ./configure is a script found in many unix/linux programs that automatically customizes the build process for your specific computer. It is auto generated by autoconf (This might take a while).
  3. Build the source using make (This might take a while).
  4. Install postgres into the location you specified during configuration (--prefix=$HOME/pgsql).

Now that you have built and installed postgres, you can test your installation by running the following commands:

$ ~/pgsql/bin/initdb -D ~/pgdata
$ ~/pgsql/bin/pg_ctl -D ~/pgdata -l pre.log start
$ ~/pgsql/bin/psql template1
  1. Initializes a default empty database for postgres to use at ~/pgdata.
  2. Start the background server using the the blank environment you just initialized and the logfile ./pre.log (You can look in here for debugging information about the running server).
  3. psql is a shell that you can type SQL commands or special postgres commands into (starting with a ). Since we have not yet created a database yet, we will use the default one "template1".

Note: If you encounter an error like "FATAL: user "" does not exist", you may be trying to connect to another student's Postgres instance on the same port number. See the troubleshooting section below.

This will open a postgres prompt from which you can type SQL commands to the database. Below are some example commands to try out. Feel free to play around to get a good feel for SQL. Can you implement the examples we talked about in discussion section? To quit simply press ctl-d or \q. Note that the prompt below reflects that we are running not in the shell, but the Postgres interpreter.

template1=# CREATE DATABASE test;
template1=# \connect test;
test=# CREATE TABLE test_table (field VARCHAR(25));
test=# INSERT INTO test_table (field) VALUES('Hello World');
test=# SELECT * FROM test_table;
test=# DROP TABLE test_table;
  1. This is DDL to create a new database
  2. \connect lets you switch switch the connection to a new database (in this case from template1 to test)
  3. Create a table 'test_table" with a single field named 'field'.
  4. Insert the string 'Hello World' into the database. Note the use of single quotes instead of double quotes.
  5. Retrieve all the data from test_table
  6. Delete our experimental table.

For more information about PostgreSQL usage, you can consult the version 8.0 manual.

Modifying the Buffer Pool Replacement Strategy

In version 8.0.3 a 2Q strategy is implemented. If you would like to check it out, look at ~/hw1/postgres-8.0.3/src/backend/storage/buffer/freelist.c. For this part of the assignment you will be implementing two different strategies: MRU and Clock. You will then compare the performance of your implementation to the original. To help you out, we have provided the implementation for LRU.

== Files to peruse ==

You can add and manage any new data structures that you need. The existing code is not extremely clear, but it is understandable. It may take you a few hours (or more) to digest it. Since understanding the existing code is a significant part of the assignment, the TAs and Professors will not assist you in your understanding of the code base (beyond what we discuss here).

The actual buffer manager code is neatly separated from the rest of the code base. It is located in the postgres-8.0.3/src/backend/storage/buffer directory, and primarily made up of the files bufmgr.c and freelist.c. While bufmgr.c contains the implementation of the buffer manager, we are only interested in freelist.c, which defines the buffer manager strategy (e.g., LRU, MRU, CLOCK, etc.).

We have provided you with a commented example freelist.lru.c which details the functions you will need to implement. We recommend that you copy this to freelist.mru.c and freelist.clock.c in order to create your implementations. There is also stubs.c (which contains helper code needed to test) and buftest.c (a simple testcase). You may want to add additional tests to buftest.c as we will be doing this when we grade you. The provided Makefile will compile your implementations and generate three binaries (testlru, testmru and testclock). Check the output to make sure you have implemented the policy correctly.

Rebuilding Postgres With Your Strategies

Use the following commands to rebuild and install a modified version of postgres that uses the clock or MRU replacement policy that you created, assuming you named your files freelist.mru.c and freelist.clock.c

$ cp freelist.<clock or mru>.c ~/hw1/postgres-8.0.3/src/backend/storage/buffer/freelist.c
$ cd ~/hw1/postgres-8.0.3/
$ gmake
$ gmake install

Note: The make file contains an example of how to automate this.

Performance Testing

After you have finished implementing and testing your strategies with the simple harness, it is time to test their performance in the real server. We first start the server using a log file named for the test we are doing (Ex: mru16.log), with additional arguments to see more statistics in the log.

$ ~/pgsql/bin/initdb -D ~/pgdata
$ ~/pgsql/bin/pg_ctl start -D ~/pgdata -l <log name>.log -o "-B <numbuffers> -N 8 -o '-te -fm -fh'"
$ ~/pgsql/bin/psql test

You can see the output of this log in realtime in another window by typing:

$ tail -f <log file>

For the performance testing you will be creating two tables (raw_s_tuples and raw_r_tuples) and populating them with the provided data.

test=# CREATE TABLE raw_r_tuples (tname varchar(2), pkey int, num2 int, num3 int, num1 int, node varchar(16), inserttime float8, lifetime float8, testname varchar(64));
test=# CREATE TABLE raw_s_tuples (tname varchar(2), pkey int, num2 int, num3 int, num1 int, node varchar(16), inserttime float8, lifetime float8, testname varchar(64));

test=# COPY raw_r_tuples FROM '/home/ff/cs186/fa08/data/R' DELIMITERS ',';
test=# COPY raw_s_tuples FROM '/home/ff/cs186/fa08/data/S' DELIMITERS ',';

=== Testing Matrix ===

For the final part of the lab, we are going to have you run two different queries under different circumstances and report your results. The statistics will be output in the log file you specify for the server and will be in the following format:

LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
	!	0.065834 elapsed 0.068005 user 0.000000 system sec
	!	[36.306269 user 0.016001 sys total]
	!	0/0 [0/0] filesystem blocks in/out
	!	0/0 [0/1915] page faults/reclaims, 0 [0] swaps
	!	0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
	!	0/0 [134/88] voluntary/involuntary context switches
	! buffer usage stats:
	!	Shared blocks:         71 read,          0 written, buffer hit rate = 99.26%
	!	Local  blocks:          0 read,          0 written, buffer hit rate = 0.00%
	!	Direct blocks:          0 read,          0 written

Take special note of the buffer hit rate (99.26% in this example). Run each of the following queries at buffer sizes of 16, 32, and 64. Repeat the results for LRU, MRU and CLOCK. Record your results and the answers to the following questions in a text file performance.txt and turn it in with your submission. You'll also turn in a text file analysis.txt with a brief analsis of the results (see below).

Queries:

  1. SELECT * FROM raw_r_tuples;
  2. SELECT * FROM raw_r_tuples r, raw_s_tuples s WHERE r.pkey = s.pkey;

Run each query three (3) times for each of the three buffer sizes, for each of LRU, MRU, and Clock. You will be recording your minimum and maximum hit rates for the combinations of buffer sizes and policies, for both queries. Record the hit rate as a decimal to two places, without the (%) sign, e.g. 99.26. You'll be filling in the following table for both queries.

Clarification: Run the first query several times for each replacement policy and record your hit rate results. Then do the same for the second query. You don't need to clear your buffer between the two queries. The important aspect of this is that you detail in your analysis what you did and why you think you got the results that you got.

||||16,min||16,max||32,min||32,max||64,min||64,max|| ||LRU|||||||||||||| ||MRU|||||||||||||| ||CLOCK||||||||||||||

== Submission ==

What To Turn In

  1. performance.txt -- TEXT ONLY! Two (one for each query) performance comparison tables: three rows, six columns separated by space, which represent the two tables we asked you to fill. Put the second table below the first.
  2. analysis.txt -- In 1-2 paragraphs, discuss whether you think the hit rates for MRU and Clock are as expected from the discussion in class.
  3. freelist.mru.c -- Your implementation of the MRU policy
  4. freelist.clock.c -- Your implementation of the Clock policy
  5. MY.PARTNERS - Listing the other person that you worked with

How to Submit

  1. Save your five files performance.txt, analysis.txt, freelist.mru.c, freelist.clock.c,and MY.PARTNERS in a directory called "hw1p2" within your cs186 home directory.
  2. cd into hw1p2.
  3. Run: submit hw1p2