Skip to content

Hw4Description

Joe Hellerstein edited this page Dec 8, 2015 · 1 revision

0. Late-Breaking Updates!!

  • In case you didn't hear in class, this project is to be done in teams.

  • A Tip: Here are some PostgreSQL tricks to convert between floats and boxes.

    1. Box to Floats: Recall the cali table from HW2 that had a location field of type box. The following query returns the four components of the box as 4 floats:

    select (location[0]::point)[0] as lower_x, (location[0]::point)[1] as lower_y, (location[1]::point)[0] as upper_x, (location[1]::point)[1] as upper_y from cali limit 1; }}} 2. ''Floats to Box:'' Consider a table tmp defined as follows: {{{ create table tmp(lx float, ly float, hx float, hy float);

    select box(point(lx,ly), point(hx, hy)) from tmp;

  • Another potentially useful tidbit is that Postgres supports indexes on expressions, not just columns.

1. Overview

In this assignment you will do some database-backed Ruby on Rails programming. The goal is to expose you to more of the software engineering issues involved in developing a working web application, including the use of Web Services, caching, and Test-Driven Development. You might also make use of your previous PostHacking, if you like, in the context of a simple Geographic Information Systems interface.

We recommend that you read through this project description all the way, figure out what other resources you may need to read, and get going ASAP! The project is not hard; in fact, we it should be a fun and useful experience. But it will require you to take the initiative and pick up a bunch of Rails skills on your own, and that's best done before the last minute.

The core task of the assignment is to improve the performance of a web application that is running too slowly. The current application, Inspirational Libraries, builds on Hw3, and shows a single library and multiple authors on a Google map. The authors are placed at their place of birth, and ordered by distance from the library (sound familiar?). Your (imaginary) customer is excited to see how prestigious a given library is by how many famous authors were born near it. A screen shot of Gustav Library in Rome, along with the 10 closest authors, is shown below.

Image(gustav.jpg, nolink, align=center)

The problem is that the application's latency is too high -- it takes over 3 seconds per page-load with a tiny database of 17 authors, and nearly 100 seconds for a small database with 500 authors. The customer wants to scale that up to many thousands of authors, and has solicited your help with improving the webapp's performance.

2. Quick Install + Manual Step

You can use a quick install script to set up the assignment, by typing:

% hw4quickinstall.sh

at the prompt. If the installation succeeds, there may be some error statements (refer to Hw3). This is normal.

Now, you need to do the following step yourself to programmatically access Google Map:

  1. Make a note of your class account's WEBPORT, by running the command "echo $WEBPORT".
  2. Sign up for a Google Maps API key at: http://www.google.com/apis/maps/signup.html a. If you do not have a Google account already, you will have to [https://www.google.com/accounts/Login?continue=http://www.google.com/&hl=en sign up for one] (they're free).
    b. Type in: http://rhombus.cs.berkeley.edu:$WEBPORT into the box that says "My web site URL:" where $WEBPORT is your assigned port number.
  3. Google will assign you a key, something like "ABQIAAAAomFGGGfxhNe0GlMO8JxKKRQRVYwNglGoES86b9QDHVwSt2ocYBQeNyAxuhfhJx2OjU_39g_jEKmiDQ". Save this key somewhere handy, in case you ever need it again!
  4. Edit ~/Hw4/library/config/gmaps_api_key.yml, replacing whatever key is there with the key Google assigned you (you will need to do it three times).

2.1 Library Database

The first time you install, you do not have to start Postgres, create the database, or load the database. The install script does this for you. However, here are some instructions in case you want to redo this yourself, and/or modify things.

These commands should restart postgres successfully in case things get wacky:

% kill_postmaster                # shouldn't be necessary right after hw4quickinstall.sh
% pg_ctl -D ~/Hw4/pgdata start

The schema for your database is (of course!) stored in the Rails migration files in Hw4/library/db/migrate. If you need to recreate your database and build the tables, recall the command to run migrations that we learned in Homework 0:

% rake db:migrate

2.2 Loading Data

Sample data for your database has been stored in Rails fixtures, in the Hw4/library/test/fixtures directory. Fixtures are text files (in our case, comma-separated-value files ending with .csv) that can be used to load the database for testing. If you lost your data for some reason, you can reload it from the fixtures via the command:

% rake db:fixtures:load

In addition to the fixtures, we have provided you more phony authors to load in for testing, courtesy of http://www.fakenamegenerator.com via Swivel. There are files with 500 and 20,000 authors respectively, stored in Hw4/library/db. To load the 500-author file, for example, you can do something like the following:

% psql hw4
hw4=# copy authors from '/home/cc/cs186/fa07/class/cs186-XX/Hw4/library/db/500authors.csv' with csv header;
hw4=# \\q
%

replacing XX with your class account suffix.

3. Library Rails Application

The skeleton Rails application is in the ~/Hw4/library directory. Depending on which machine you have your DBMS running on (probably the machine you are working on), you might need to change the host: rhombus.cs.berkeley.edu line of the ~/Hw4/library/config/database.yml to reflect the correct server name. After you do that, you should be ready to fire up the application, by typing:

% cd ~/Hw4/library
% server

As a running example, we will use http://rhombus.cs.berkeley.edu:16157 as the URL of our application; the number at the end is your WEBPORT environment variable (which you can find from a shell prompt via echo $WEBPORT.)

Once the webserver boots up, try visiting:

http://rhombus.cs.berkeley.edu:16157/librarymaps

This is the only view you have to consider in this assignment. Notice how the page took a long time to load (more than 3 seconds even for the small fixtures database). Use the pull-down menu at the bottom of the page to try out a different library as well. (If you received Google Map warnings of an invalid key, you probably missed the key signup step above.)

4. Your Task

You've been given the broad objective to improve the responsiveness of the website. Start by reading the brief code of librarymaps_controller.rb. Lucky for you, there are plenty of options to make this simple web application go faster. (Many details are in the comments.) Two options that will probably get you a long way are:

  • The server is doing too many Remote Procedure Calls (RPCs) to Google Maps on every page request. A cache using the local database might really speed things up. In practice, such a cache might be required to keep your site from going down -- Google only allows 50,000 geocode requests per day, and with a database of 20,000+ place names, that means you had better cache!
  • The server is recomputing relative distances of all the authors' birth places to a library on every page request. These relative distances could be cached, or for a really large number of authors, a GiST index supporting nearest neighbor search would help.

Designing a local database cache of Google Map RPC results and distance computations means you will likely need to modify and/or add RoR models. This means you should also modify the database schema in any way you see fit: as the designer, you have a lot of flexibility here. Just remember to make any schema changes by adding Rails migrations. Additionally, you will likely need to modify librarymaps_controller.rb to take advantage of the modifications to the models. You don't need to modify any other controllers. While, you are encouraged to spruce up the .rhtml views in any way you see fit, changing anything under app/views is not required. If you do change any views, ensure that your submitted files work with the original app/views/librarymaps/index.rhtml view.

Your customer will be happy as long as the libraries and the 500-author version of the database load on the map within a reasonable amount of time (ballpark less than 1 second) and with the correct set of authors, distances and libraries. If you can support the 20,000-author version efficiently, that would be much more realistic, but is not required to get full credit (in part because it might be very slow for all of you to populate your caches from Google.) And beware: if your cache doesn't work, you will make 20,000 RPC calls to Google for each page load. After 2 page loads, you will not be able to call Google for the rest of the day!

4.1 Generating New Models

Recall from Homework 0 the proper way to generate new Rails models and associated Postgres tables is to use the script/generate command. You should not create tables in Postgres directly; this will make it difficult for you to manage your code (and impossible for us to grade it!)

Here is the usual drill for creating a new model called newmodel:

% cd Hw4/library
% script/generate model newmodel
      exists  app/models/
      exists  test/unit/
      exists  test/fixtures/
      create  app/models/newmodel.rb
      create  test/unit/newmodel_test.rb
      create  test/fixtures/newmodels.yml
      exists  db/migrate
      create  db/migrate/006_create_newmodels.rb
%

Note the files it generates:

  • app/models/newmodel.rb is where your ruby code for the model goes
  • test/unit/newmodel_test.rb is where your unit test for the model goes (see discussion on Test Driven Development in the next section). It has been pre-filled with a simple test already.
  • test/fixtures/newmodels.yml is a place to put example data for testing your model (more on this in the next section as well).
  • db/migrate/006_create_newmodels.rb is where you put the schema information for the newmodels table in the database. You can look at the other files in that directory for examples of migrations.

4.2 Near Neighbor GiST Support

Your CS186 class accounts are now configured to use a version of PostgreSQL that has GiST Near-Neighbor Search support, as in Homework 2. Since we never coded the Postgres optimizer to understand the correct cost model for a near-neighbor search, your environment variables are set up to tell the optimizer to always favor index scans over sequential scans (whenever an index is available). To make sure you've got thing set up properly, check the the output of the following commands matches what we see in our account:

% which postgres
/home/ff/cs186/hw4/pgsql/bin/postgres
% echo $PGOPTIONS
-fs -fb
%

4.3 Test-Driven Development (TDD)

An important software engineering lesson that's stressed by the Rails framework is Test-Driven Development or TDD, which is the idea that you write tests for your code as a standard part of writing the code itself. Rails encourages this behavior through the script/generate approach to generating code -- these scripts not only set up skeleton code for you, they also set up skeleton test for you as well. As a good Rails developer, you should get used to the idea of writing lots of tests ... perhaps as many lines of tests as lines of real code. Yes, this is time well spent, since your operational Rails code is usually pretty short anyhow if it's done well.

As a rule of thumb:

  • Models are covered by unit tests in the test/unit directory.
  • Controllers are covered by functional tests in the test/functional directory.
  • More complex scenarios are covered by integration tests in the test/integration directory.

For this homework, you only need to make sure that test/functional/librarymaps_controller_test.rb covers all the code in the librarymaps controller, and that you have tests in test/unit that cover any new models that you write.

To write your tests, you need to have a test database full of sample data. If you look in config/database.yml you will see that it assumes a test database called hw4_test; this was set up for you by hw4quickinstall.sh. The test database schema will be automatically generated from your migrations. The test database's data is loaded from the files in test/fixtures. By default, Rails creates fixture files in .yml format. Many people find that format awkward, so for each model you create, we encourage you to do the following:

% rm Hw4/library/test/fixtures/newmodels.yml
% touch Hw4/library/test/fixtures/newmodels.csv

This will leave you with an empty comma-separated-value fixture for your model. You can look at the examples in the fixtures directory to figure out how to format your own fixtures.

4.4 Running Tests and Code Coverage

Your code must pass all your tests. But that alone is not enough. To help ensure your tests are thorough, a code coverage tool will tell you which lines of code your tests actually exercised, and which ones remain untested. Ruby has a pretty nice coverage tool called rcov that generates web-based reports for you. In this assignment, you will have to use rcov to ensure that your code is fully covered by tests.

To run all your tests, you do the following:

% cd Hw4/library/
% rake

This will load the test database, run all the tests, report any errors, and delete the test data.

If you want to run a specific test, you can prepare the test database manually, and run the tests directly:

% rake db:test:prepare
% ruby test/controller/librarymaps_controller_test.rb

To run rcov, we have set up an alias for rcov that runs your tests, and then shows you the files we care about for this assignment without a lot of distracting other files from the rails framework:

% rcov_hw4

Once you run that command, start your rails server, and point your browser at http://rhombus.cs.berkeley.edu:16157/coverage/ (of course replacing 16157 with your WEBPORT value). You will see a report like the one below; if you have perfect coverage, you will be all green (100% coverage for everything)!

Image(rcov.jpg, nolink, align=center)

You can click through the filenames to see your source code colored with covered and uncovered lines.

We have set up tests that cover the default code we give you; you just need to extend them to cover your changes.

Note: you need to have your server running to view the coverage report.

Note 2 for Mac users: In Safari (on Leopard anyway) the source code is colored incorrectly -- no red shows up. Use Firefox if you don't see the source code colored properly.

5. Deliverables

You should have an implementation with full coverage of your code, passing all your tests, and giving correct answers, with performance on the 500-author data that is not much more than 1 second per page once your cache has "warmed up".

*The turnin instructions here are tentative! *

The code you turn in for this assignment must include fixtures that contain any database data you need in order to run fast. To dump a Postgres table into a csv fixture, you can use the PostgreSQL copy command:

% psql hw4
hw4=# COPY bars TO
      '/home/cc/cs186/fa07/class/cs186-ee/Hw4/library/test/fixtures/bars.csv' 
      WITH CSV HEADER;
COPY 58
hw4=#\\q
%

To submit your files, make sure all your relevant tables have been dumped properly into fixtures, and then do this:

% hw4preparesubmit.sh
% cd ~/Hw4/hw4submitdir
% submit hw4 # should only be submitting one file, library.tar.gz

== 6. Links of interest ==