Skip to content

Commit

Permalink
Merge branch 'master' into stable
Browse files Browse the repository at this point in the history
  • Loading branch information
desilinguist committed Jun 27, 2016
2 parents b4a5389 + 3aed46e commit ae20af7
Show file tree
Hide file tree
Showing 24 changed files with 1,021 additions and 108 deletions.
25 changes: 0 additions & 25 deletions .travis.yml

This file was deleted.

37 changes: 22 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

## Introduction

RSMTool is a python package for facilitating research on building and evaluating scoring models (SMs) for automated scoring engines. It allows the integration of educational measurement practices with the automated scoring and model building process.
RSMTool is a python package for facilitating research on building and evaluating scoring models (SMs) for automated scoring engines. It allows the integration of educational measurement practices with the automated scoring and model building process. See [rsmtool.pdf](doc/rsmtool.pdf) for background information.

Specifically, RSMTool takes a feature file with numeric, non-sparse features and a human score as input and lets you try several different regression models to try and predict the human score from the features. The primary output of RSMTool is a comprehensive, customizable HTML statistical report that contains feature descriptives, subgroup analyses, model statistics, as well as several different evaluation measures illustrating model efficacy. The various numbers and figures in the report are highlighted based on whether they exceed or fall short of the recommendations laid out by Williamson et al. (2012). However, these can be easily customized if the user wishes to use different set of recommendations.

Finally, since the report is based on IPython notebooks, it can be easily customized. In addition, RSMTool explicitly provides support for adding custom notebooks to the report. [Here's](http://bit.ly/rsmtool) an example RSMTool report for a simple scoring system built to automatically score the responses from the [2012 Kaggle Automated Student Assessment Prize competition](https://www.kaggle.com/c/asap-aes).
Finally, since the report is based on IPython notebooks, it can be easily customized. In addition, RSMTool explicitly provides support for adding custom notebooks to the report.


RSMTool provides the following main scripts:
Expand All @@ -25,10 +25,6 @@ David M. Williamson, Xiaoming Xi, and F. Jay Breyer. 2012. A Framework for Evalu

## Installation

If you want to use RSMTool on your own machine, either as a user or a developer, follow the appropriate instructions below. Note that RSMTool only works with Python 3.4 and higher.

### For users

Currently, the best way to install RSMTool is by using the `conda` package manager. If you have the `conda` package manager already installed, you can skip straight to Step 2.

1. To install the `conda` package manager, follow the instructions on [this page](http://conda.pydata.org/docs/install/quick.html).
Expand All @@ -39,25 +35,36 @@ Currently, the best way to install RSMTool is by using the `conda` package manag

4. From now on, you will need to activate this conda environment whenever you want to use RSMTool. This will ensure that the packages required by `rsmtool` will only be used when you want to run `rsmtool` experiments and will not affect other projects.

### For developers
Note that RSMTool only works with Python 3.4 and higher.

## Example

You can try out RSMTool as follows:

1. Go to the `example` folder. This folder contains the training and test set features for a simple scoring system built to automatically score the responses from the [2012 Kaggle Automated Student Assessment Prize competition](https://www.kaggle.com/c/asap-aes).
2. Make sure to activate the conda environment where you installed rsmtool (e.g., `source activate rsmtool`)
3. Run RSMTool: `rsmtool config.json`
4. Since no output directory was specfied, `rsmtool` will create the three output folders in the current directory: `figure`, `output`, and `report`. You can examine the HTML report `report/ASAP2_report.html`. It should look like [this](https://s3.amazonaws.com/sample-rsmtool-report/ASAP2_report.html).

## Contributing

The instructions below are only if you are developing new features or functionality for RSMTool.
Contributions to RSMTool are very welcome. You can use the instructions below to get started on developing new features or functionality for RSMTool.

1. Pull the latest version of rsmtool from github and switch to the develop branch.
1. Pull the latest version of rsmtool from github and switch to the `master` branch.

2. If you already have the `conda` package manager installed, skip to the next step. If you do not, follow the instructions on [this page](http://conda.pydata.org/docs/install/quick.html) to install `conda`.

3. Create a new conda environment (say, `rsmtool`) and install the packages specified in the `conda_requirements.txt` file by running `conda create -n rsmtool -c desilinguist --file conda_requirements.txt`. Use `conda_requirements_windows.txt` if you are on Windows. The two conda requirements file will be consolidated with the next version.
3. Create a new conda environment (say, `rsmtool`) and install the packages specified in the `conda_requirements.txt` file by running `conda create -n rsmtool -c desilinguist --file conda_requirements.txt`. Use `conda_requirements_windows.txt` if you are on Windows. There are two versions because RSMTool currently does not use MKL on non-Windows platforms.

4. Activate the environment using `source activate rsmtool` (use `activate rsmtool` if you are on Windows).

5. Run `pip install -e .` to install rsmtool into the environment in editable mode which is what we need for development.

6. Run `nosetests -v tests` to run the tests.
6. Run `nosetests -v tests` to run the tests.

## Available documentation

## Usage documentation for main scripts
### Usage documentation for main scripts

* [rsmtool](doc/rsmtool.md)

Expand All @@ -67,7 +74,7 @@ The instructions below are only if you are developing new features or functional

* [rsmcompare](doc/rsmcompare.md)

## Description of configuration files
### Description of configuration files

* [RSMTool configuration file](doc/config_file.md) - main configuration file for `rsmtool`

Expand All @@ -79,7 +86,7 @@ The instructions below are only if you are developing new features or functional

* [Feature file](doc/feature_file.md) - feature file

## Lists of available options
### Lists of available options

* [Available models](doc/available_models.md) - list of models available to `rsmtool`

Expand All @@ -89,7 +96,7 @@ The instructions below are only if you are developing new features or functional

* [Output CSV files](doc/output_csv.md) - .csv files generated by `rsmtool` and `rsmeval`

## Documentation for developers
### Documentation for developers

* [New notebooks](doc/new_notebooks.md) - the variables and data frames available for use in custom report sections.

Expand Down
2 changes: 1 addition & 1 deletion conda-recipe/unix/rsmtool/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
package:
name: rsmtool
version: 5.0.1
version: 5.0.2

source:
path: ../../../../rsmtool
Expand Down
2 changes: 1 addition & 1 deletion conda-recipe/windows/rsmtool/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
package:
name: rsmtool
version: 5.0.1
version: 5.0.2

source:
path: ../../../../rsmtool
Expand Down
4 changes: 4 additions & 0 deletions doc/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Release Notes

## v5.0.2 (June 27, 2016)

Added files needed for [Journal of Open Source Software](http://joss.theoj.org/) submission.

## v5.0.1 (June 7, 2016)

### Bugfixes
Expand Down
9 changes: 3 additions & 6 deletions doc/new_notebooks.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Guide to writing new IPython notebooks

`rsmtool` allows developers and users to contribute new analysis sections to `rsmtool` and `rsmeval` as *custom sections* (see report_sections.md for further information).
RSMTool allows developers and users to contribute new analysis sections to `rsmtool` and `rsmeval` as *custom sections* (see report_sections.md for further information).

When writing such notebooks, some or all of the python variables below will be available in the notebook and so can be used in the notebooks.

Expand Down Expand Up @@ -44,20 +44,17 @@ When writing such notebooks, some or all of the python variables below will be a

- `df_train_metadata` and `df_test_metadata`: Data frames containing the `*_train_metadata.csv` and `*_test_metadata.csv` files respectively as explained in `doc/output_csv.md`. [`rsmtool`: both data frames, `rsmeval`: test data only]


- `df_train_length`: A data frame containing `spkitemid` and response lengths (`length`) for the training data. These are *only* available (a) if `length_column` was specified in the config file, (b) if no values in that column are missing and, (c) if the values in that column are not distributed with a standard deviation <= 0. [`rsmtool` only]

- `df_test_human_scores`: A data frame containing `spkitemid`, test label (`sc1`) and the second human score (`sc2`) for the test data. This frame is *only* available if `second_human_score_column` was specified in the config file. Note that the data frame will contain `NaN`s for the responses for which no numeric second human score was available or for which the second score was 0 and exclude_zero_scores was set to `True`.

- `df_pred_preproc`: A data frame containing the `*_pred_processed.csv` file as explained in `doc/output_csv.md`.

- `df_feature_subset_specs`: a data frame containing the content of `feature_subset_file` if it was specified in config file. `None` if not specified in the config file.
[`rsmtool` only]
- `df_feature_subset_specs`: a data frame containing the content of `feature_subset_file` if it was specified in config file. `None` if not specified in the config file. [`rsmtool` only]

In addition, the following variables are also available but you should *not* re-read the files under these directories which are already available as data frames.

- `output_dir`: The output directory for the experiment that contains all the generated CSV files.
-

- `figure_dir`: The figure directory for the experiment that contains all the generated SVG and PNG figures.

## Notes:
Expand Down
Binary file added doc/rsmtool.pdf
Binary file not shown.
16 changes: 16 additions & 0 deletions example/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"test_label_column": "score",
"train_file": "train.csv",
"description": "Using all features with a LinearRegression model.",
"use_scaled_predictions": true,
"trim_min": 1,
"id_column": "ID",
"model": "LinearRegression",
"train_label_column": "score",
"second_human_score_column": "score2",
"length_column": "LENGTH",
"features": "features.json",
"experiment_id": "ASAP2",
"trim_max": 6,
"test_file": "test.csv"
}
44 changes: 44 additions & 0 deletions example/features.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"features": [
{
"transform": "raw",
"feature": "FEATURE1",
"sign": 1
},
{
"transform": "raw",
"feature": "FEATURE2",
"sign": 1
},
{
"transform": "raw",
"feature": "FEATURE3",
"sign": 1
},
{
"transform": "raw",
"feature": "FEATURE4",
"sign": 1
},
{
"transform": "raw",
"feature": "FEATURE5",
"sign": 1
},
{
"transform": "raw",
"feature": "FEATURE6",
"sign": 1
},
{
"transform": "raw",
"feature": "FEATURE7",
"sign": 1
},
{
"transform": "raw",
"feature": "FEATURE8",
"sign": 1
}
]
}
Loading

0 comments on commit ae20af7

Please sign in to comment.