homeWork1

Overview

In the previous assignment we learned how to set up a database cluster (using initdb), how to start the postgres master process (using pg_ctl), how to create a specific database (using createdb), and how to interact with the database through the Object-Relational Mapping of Ruby on Rails.

Now we start with the actual C code hacking. The aim is to modify functionality in the backend server of PostgreSQL. For this first assignment, we focus on just one core module - the buffer manager. The actual coding for this assignment is minimal, given the highly organized and modular PostgreSQL code (the code for the buffer manager policy is cleanly separated from the rest of the code base). However, you have to figure out what code to modify, which is non-trivial! There are three major parts to this project:

* Understanding different buffer management strategies
* Examining, understanding and modifying existing code
* Testing your code in a real system

Please skim through this document to the end before getting to work!

Partners

There are two parts to this assignment, the first of which is to be done individually and the second in teams of two people. Please see the deadlines below for when and how to register your team.

Deadlines

You have to submit this assignment in two main parts:

Register your team of two at the Group Registration page as soon as possible.
The first part will test your understanding of different buffer replacement strategies. This part must be done individually and is due at 11:59 pm, Friday, Sept. 14.
The second phase requires that you and your partner code a buffer replacement policy and create tests code. The due date for the part is 11:59 pm, Friday, Sept. 21.

The second part is expected to take much more time than the first, so manage your time accordingly. The rest of this document describes your tasks in each phase.

Tasks and Grade Percentage

Understanding different buffer replacement strategies. (20%)
Implement the LRU buffer replacement strategy. (80%)
Add test code using the provided test harness. (0%)

The rest of this document describes these steps in detail. Note that in the code and in this document a "buffer" or "slot" is what the text and course notes refer as a "buffer frame". Similarly, the terms "refcount" and "pincount" are used interchangeably.

Part 1: Understanding Different Buffer Replacement Strategies. (20%)

The textbook describes LRU, MRU and CLOCK strategies in section 9.4.1. However, you may need to read the entire section (9.4) to get a full image of a buffer manager in the DBMS. A fourth (more recent) strategy is called 2Q, and is used in Postgres v8.0.3. LRU, MRU, and CLOCK strategies work well enough for most systems so we are going to replace 2Q with a LRU strategy. But first you'll need to exercise your understanding of the LRU, MRU, and CLOCK buffer replacement strategies.

This part of the assignment requires no code. We will give you the number of page slots that the buffer manager must manage and a sequence of data blocks accessed by the different DBMS layers above the buffer manager layer. You will have to track the behavior of the buffer manager (which pages it has to replace, when, etc.) using your, and yours alone, knowledge of the three aforementioned buffer replacement strategies.

Solution format for this step

You must record your answers in three plain text files called strategy.lru, strategy.mru, strategy.clock. These files will contain the buffer/slot state and buffer manager replacement decisions in accordance to the LRU, MRU, and CLOCK policies respectively. The format of each file is as follows:

* Three columns of text, separated by spaces
* One line of text each time a page miss occurs
* Each row lists the time of a page miss, a space, the buffer pool slot (P1, P2, P3, P4), a space and, if an eviction occurs, the page evicted from memory

Example workload and solution

For example, assume there are four page slots your buffer manager must manage: P1, P2, P3, P4. All four slots are initially empty. When the buffer pool has unused slots (e.g., in the beginning when all four slots are empty), it will put the newly read data in the first unused slot. The blocks to be read from disk are labeled A through F. For simplicity here, we assume that after each access the page is pinned, and then immediately unpinned. (You cannot make this same assumption when writing the actual code. A page may be pinned for any length of time in a DBMS.)

Given this information and the following workload:

||Time||T1||T2||T3||T4||T5||T6||T7||T8||T9||T10||T11|| ||Block Accessed||A||A||B||C||D||E||F||A||B||C||D||

the files should be as follows:

||strategy.lru ||strategy.mru ||strategy.clock ||T1 P1 ||T1 P1 ||T1 P1 ||T3 P2 ||T3 P2 ||T3 P2 ||T4 P3 ||T4 P3 ||T4 P3 ||T5 P4 ||T5 P4 ||T5 P4 ||T6 P1 A ||T6 P4 D ||T6 P1 A ||T7 P2 B ||T7 P4 E ||T7 P2 B ||T8 P3 C ||T11 P3 C ||T8 P3 C ||T9 P4 D || ||T9 P4 D ||T10 P1 E || ||T10 P1 E ||T11 P2 F || ||T11 P2 F

Several points deserve elaboration:

Certain times (e.g., T2) are missing for each strategy. This is due to the different buffer miss patterns exhibited by the given replacement policy.
The third column will be blank when the buffer miss does not result in an eviction.
When there are multiple FREE pages available, the page with the lowest id is chosen (e.g. P2 before P3 and P4 when all are free at T3). You should conform to this tie-breaking criteria.

#!html (Updated tie-breaking clarification 8:10PM, 10 Sep).
In this case, for the given number of pages and sequence of block accesses, MRU emerges the winner (least number of misses).

Now we come to the problem that you will have to solve. Suppose the buffer manager has five page slots to manage: P1, P2, P3, P4, P5. All slots are initially empty. Suppose seven disk blocks (A, B, ..., G) are accessed by the layers above the buffer manager layer in the DBMS.

To get your access pattern, login to your cs186 account and type mypattern at the prompt. Remember that you must use your cs186 class account to get your access pattern. Based on this information, you should perform a similar analysis as the one above.

Generate the three files strategy.lru, strategy.mru, and strategy.clock for this problem. Stick to the specified format (e.g., no extra line at the end of a file), as we will use file comparison scripts to evaluate your answers.

How to Submit

You will need to use the unix submit program to hand in your assignment:

Save your three files strategy.lru, strategy.mru, strategy.clock in a directory called "hw1p1" within your cs186 home directory.
cd into hw1p1.
Run: submit hw1p1

Part 2: Implementing the LRU Buffer Replacement Strategy. (80%)

Setting up PostgreSQL and Other Files/Scripts

Everything needed for the second part of the assignment is available at: ~cs186/Hw1/ on x86 solaris instructional machines. Because the source tree of Postgres is quite large, you'll need to delete the database you have setup for homework 0 before you begin. So start by:

 $ \\rm -rf ~/pgdata

Be careful! This command will not prompt you for delete confirmation. Once you have removed homework 0's database, run:

 $ hw1quickinstall.sh              # quick still takes around 10 minutes
 ...                               # lots and lots of output
 PostgreSQL installation complete. # this is the last message that appears if all went well

Running hw1quickinstall.sh again WILL destroy any of your existing work in ~/Hw1 so please be careful.

If installation is successful, this will perform the entire installation of Postgres to your account:

Copies all needed hw1 files ~~from ~cs186~~ from our subversion code repository. Note that you are placing the main Postgres portion in the /home/tmp directory (which has no disk quota) while placing your personal files in your home directory (which has a very limited disk quota).
Configures and makes Postgres from source and installs binaries into ~/pgsql.

#!html
<span style="color: blue">

The hw1quickinstall.sh script has been modified to use subversion version control to ease possible future rollouts of updates/corrections and also simplify the compilation process. The main changes are:

freelist.skel.c and freelist.postgres-original.c are provided for your reference in /MyCode, are part of version control, and should not be modified directly.
You should create your own freelist.c (e.g. by copying freelist.skel.c) within /MyCode directory.
To compile test harness: gmake from /MyCode (no change from before)
To compile postgres: gmake mypgsql from /MyCode
To compile original postgres (assuming you only modified freelist.c): gmake origpgsql from /MyCode

To get these new changes, please rerun hw1quickinstall.sh. Warning: It will wipe out everything in ~/Hw1 so backup any changes you have made.

Note that you can [source:hw/hw1/student/Hw1 browse the code] -- and any modifications we have made -- via trac's interface to subversion.

(Updated 12:40AM, 11 Sep).

#!html
</span>

You should see the following directory structure within ~/Hw1.

postgresql-8.0.3/: Source code of PostgreSQL version 8.0.3.
MyStuff/: This is the directory in which you will spend most of your time. It has a couple of subdirectories:
- MyCode/:
  - freelist.c: While this file does not exist yet, you will need to implement it, most likely starting from copying freelist.skel.c.
  - freelist.postgres-original.c: The buffer policy 2Q used by PostgreSQL. It can serve as example code for each of the routines required by the freelist interface. A copy can always be found in ~cs186/hw1/Hw1/postgresql-8.0.3/src/backend/storage/buffer/freelist.c
  - freelist.skel.c: The skeleton code that you will need to fill in. Again, PostgreSQL ships with 2Q. HINT: Focus on the comments in the skeleton code. They tell you the purpose of a particular routine along with a high level description of what the code should look like.
- PerformanceEvaluationScripts/: This has the scripts that you can use to carry out PostgreSQL experiments after you have written completed freelist.skel.c and compiled it successfully. The arguments to the provided shell scripts are self explanatory (all you need to do is supply the location of where you want your data directory to be placed e.g., DoInitDB.sh ~/pgdata, PrepareTables.sh ~/pgdata, Query.sh ~/pgdata). The Query.sh shell script will execute a query, producing output to stdout.

To test that your new install works:

 $ cd ~/Hw1/MyStuff/PerformanceEvaluationScripts
 $ ./DoInitDB.sh ~/pgdata
 $ ./PrepareTables.sh ~/pgdata
 $ ./Query.sh ~/pgdata

After you have prepared your version of freelist.c (using freelist.skel.c as a starting point), and have tested it using the test harness code, you will need to install it in the PostgreSQL source tree. The path to the buffer manager policy is postgresql-8.0.3/src/backend/storage/buffer/freelist.c, which gmake mypgsql will replace for you, in addition to compiling and installing postgres.

  $ cd ~/Hw1/MyStuff/MyCode
  $ gmake mypgsql
  ...
  PostgreSQL installation complete.
  gmake[1]: Leaving directory `/home/tmp/cs186-dk/postgresql-8.0.3'

The original PostgreSQL 8.0.3 freelist.c is available in /MyCode as freelist.postgres-original.c. You can also compile and install it as a point of reference (this was done for you automatically as part of hw1quickinstall.sh):

  $ cd ~/Hw1/MyStuff/MyCode
  $ gmake origpgsql
  ...
  PostgreSQL installation complete.
  gmake[1]: Leaving directory `/home/tmp/cs186-dk/postgresql-8.0.3'

NOTE: If you're working on the instructional machines and need to kill you postmaster instance (due to it not shutting down properly) you can use the kill_postmaster command. In most cases you should be able to cleanly shutdown using pg_ctl -D <data directory> stop. Also, we STRONGLY recommend that you do not modify any other files in the postgresql-8.0.3 directory. Moreover, we suggest you work from your /MyCode directory and replace freelist.c in src/backend/storage/buffer/ only when you need to test your freelist.skel.c changes.

About Subversion

Subversion is a flexible version control system which is primarily used here to push out corrections/modifications to the homework as they may occur. In this way, the command that will be most useful for you is:

svn update: Receive all the updates to the current directory tree, if any. If a correction is made, an blog/newsgroup announcement will most likely instruct you to use this command to synchronize your files with the master.

Implementing LRU

The description of the LRU algorithm is available in the course textbook (Section 9.4). While there are many possible correct LRU implementations, two reasonable implementations are:

Stack: When a page is referenced, it is moved to the top of the stack. When it is time to replace a page, the page at the bottom of the stack is selected for replacement. You should implement Stack LRU.
Counters: Each page entry keeps a last-use attribute. When a page is referenced, its last-use attribute is assigned to the current logical clock or counter, and the logical clock or counter is then incremented. When it is time to replace a page, the pages are searched and the page with the smallest last-use value is selected for replacement. You should NOT implement Counters LRU.

For this assignment, you should implement the first option Stack LRU. The Stack approach is typically regarded as the better choice for software-based LRU implementations. For more information on these implementation choices, you may consult The Dinosaur Book.

Files of interest

You can add and manage any new data structures that you need. The existing code is not extremely clear, but it is understandable. It may take you a few hours (or more) to digest it. Since understanding the existing code is a significant part of the assignment, the TAs and Professors will not assist you in your understanding of the code base (beyond what we discuss here).

The actual buffer manager code is neatly separated from the rest of the code base. It is located in the postgres-8.0.3/src/backend/storage/buffer directory, and primarily made up of the files bufmgr.c and freelist.c. While bufmgr.c contains the implementation of the buffer manager, we are only interested in freelist.c, which defines the buffer manager strategy (e.g., LRU, MRU, CLOCK, etc.).

src/backend/storage/buffer/freelist.c: Has functions that implement the replacement strategy. This is the only file you need to replace using your filled in freelist.skel.c. While we do not cover the default Postgresql 2Q implementation in class, it is not critical, but may be helpful to understand this code as you begin your LRU implementation.
src/include/storage/buf_internals.h: Contains the definition of each buffer frame (called BufferDesc). Most of the fields in BufferDesc are managed by other parts of the code. The fields that you'll be interested in are the BufferDesc.refcount. The reference field is set to true by the buffer manager when the refcount (a.k.a. pin count) transitions from 1 to 0. You should convince yourself that this yields the proper semantics needed to successfully execute the LRU algorithm.
src/backend/storage/buffer/buf_init.c: Some initialization of the buffer frames occur in this file. However, you should do all your initialization in freelist.c (see StrategyInitialize routine in freelist.c).
src/backend/storage/buffer/README: Useful description of the Strategy interface implemented by freelist.c

How to debug

You should ONLY proceed to debugging postgresql after you ensure that the logic of your freelist.skel.c is correct with the Test Harness discussed after this section!

We suggest that you debug using the logging/printing feature: elog(DEBUG1,<format>,<arg>). Please see bufmgr.c for example uses of this routine.

Also, you may use gdb with the src/backend/postgres standalone executable. Make sure you run your version of postgres and not the one in the class account directory! To start the backend server in stand-alone mode, run src/backend/postgres -D <DATADIR> test. It will use the data directory (and data) you prepared using initdb -D <DATADIR> and database test you created using createdb test. Alternatively, you can use the scripts in /MyCode/PerformanceEvaluationScripts to quickly setup your database. The postgres interface is directly to the backend, so psql features will not work. If your binary does not work correctly, it may corrupt the data, so be warned.

Through the backend, you can directly type SQL statements, although the output is somewhat painful to read. To exit, type CTRL-D. Finally, you can restrict the number of buffers PostgreSQL will use by adding the -B <nbuffers> option (<nbuffers> should not be smaller than 16.) so that even small queries generate buffer misses.

In addition, there are self-guided debugging tutorials if you are interested in learning how to use gdb, cscope and other tools.

More information on using the backend in stand-alone executable can be found on the PostgreSQL web site: http://www.postgresql.org/.

Test harness code. (0%)

A small test harness stub program can be found in Hw1/MyStuff/MyCode/buftest.c. This stub file tests the correctness of your buffer policy (freelist.c) by bypassing the PostgreSQL code and directly testing your freelist.c implementation.

Copy your version of freelist.skel.c to freelist.c and run gmake to compile the test harness stub. Full details about using the test stub are available in a README file within the /MyCode directory.

For the final task of this assignment, you will create buffer access patterns in the buftest.c file (part 1 of this assignment should assist you in this effort). The goal is to add access patterns that ensure your code provides the correct functionality (e.g., runs LRU algorithm, returns unpinned and unreferenced pages on request, etc.). You will receive no credit for this part, but we ask that you submit your buftest.c anyway. It should be clear from looking at the code in buftest.c, how to go about writing extra test cases. We strongly recommend that you make sure the test harness runs properly on your code before installing it into PostgreSQL. However, a working test harness does not imply a working version of your buffer manager within the confines of PostgreSQL.

Please be sure to submit both buftest.c and freelist.c in the final submission. Again we emphasize, there is no credit for this part (so you don't have to do anything with buftest.c if you don't want) but adding (sound) tests to buftest.c will help verify the correctness of your buffer manager.

Final submission

How to submit: You will need to use the unix submit program to hand in your assignment:

Save your freelist.c and buftest.c files in a directory called "hw1p2" in your cs186 home directory.
cd into hw1p2.
Run: submit hw1p2

Warning: Compilation errors in freelist.c or buftest.c will be harmful to your credit on the respective parts. If your code files depend on changes that you made to other files (not freelist.c or buftest.c), then you should redesign your solution without such modifications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly