Skip to content

VisionEval 3.0.0 Initial Release

Compare
Choose a tag to compare
@jrawbits jrawbits released this 11 Apr 21:04
· 75 commits to development since this release

This release includes VisionEval 3.0, the "Next Generation" framework, referred to here as VE-3.

Installers for R 4.1.3, R 4.2.2 and R 4.2.3 are available below for the end user version in the "assets" section of this release. Alternatively, to get the latest and greatest, you may clone the Github "development" branch and build VE-3 using the standard VisionEval build tools.

Please post any issues you encounter on the VisionEval/VisionEval-dev Github repository. Numerous previous issues have been resolved in this release.

Emerging documentation for this release is located at docs.visioneval.org. You can build the documentation "book" using ve.build('book') from the Github development environment (start VisionEval-dev.Rproj from the VisionEval git root).

Interactive quick start documentation is available by running the "walkthrough()" function and stepping through the various scripts provided there. The walkthrough provides a rudimentary view of VE-3 operations.

Key features of VisionEval 3.0 (VE-3)

These extensive release notes are intended more as a "teaser" than as documentation. See the walkthrough, and docs.visioneval.org for more concrete examples.

Setting up a runtime environment

The VE-3 runtime environment has three parts:

  1. The "load directory" containing the startup scripts
    • By default, the installation location of VisionEval
    • VisionEval should be started from this folder (but it need not run models there; see "runtime directory" below)
  2. The directory containing the VisionEval R package library (ve-lib)
    • Usually located within the load directory (end user installation)
    • May instead be located adjacent to the load directory (as in the built artifacts of the Github)
  3. The "runtime directory" which contains models to be run and their results
    • You can change the runtime directory at startup by setting the VE_RUNTIME operating system environment variable
    • This scheme allows you to install new VisionEval runtimes and not overwrite (or have to copy) your previous models. You may still want to copy those models within the VE-3 environment before running them again (or just use the run("save") function the first time you run them).
    • In the development environment, what is called the "runtime" directory is used as the "load directory" and if you use "ve.run()" to launch a VisionEval run, the development environment will create a separate "runtime.test" directory where it does its work. That is to avoid accidentally overwriting models you may be developing within the Github tree.

Sample / Prototype Models

The VisionEval sample models are stored within the VisionEval packages and copies of those standard models, with a small sample dataset, can be installed using the VE-3 "installModel" function. You can access help for that function by loading the VEModel package and running ?help("installModel"). A default name is provided for the model when it is installed, and you can use that name later to re-open the model (e.g. in a new VisionEval session). The default name is constructed from the model (VERSPM, VE-State, others) and the model "variant", which describes a specific set of sample files that illustrate model structures and operations that you can adapt for your own models.

Installed models are set up in subdirectories of the "models" subdirectory of the VisionEval runtime directory. The name of the model subdirectory is the model name used by VE-3. So if your "models" directory contains a subdirectory called "myRSPM", you can use the mod <- openModel('myRSPM') function to open the model. See the R help for openModel.

Both installModel and openModel functions return model objects that contain the functions and data needed to configure, run and report results from the model. See below and the walkthrough for more information.

Setting Up and Running Models

A typical VE-3 model development process would be to install a prototype model that does what you want, and then adjust the model structure by creating your own model geography (zones), setting the model base year and run years (and adjusting the deflators.csv file), creating all the module inputs for base and future years in your baseline scenario, and once that is all running, creating scenarios (subsets of the main model that make changes to the inputs for the scenario). See docs.visioneval.org for more information.

Running existing VisionEval models... If you already have a VisionEval model, you can still run it the old way, using source("run_model.R") from the directory containing the inputs, defs, and run_model.R script. However that approach does not put full information into the model results so some of the advanced VE-3 functions for examining the model will not be available.

A better way to get your old models into VE-3 is to create a subdirectory of "models" in the runtime directory with the name of your model (say myOldModel), then use mod <- openModel('myOldModel') to open the model, and mod$run() to run it. It should (fingers crossed, file an issue if it doesn't) "just work". You don't have to change a thing, and you can still "extract" or "query" the results as usual (see below).

Re-running models VE-3 notices if you've already run your model and if you run it again, it will do nothing and report the model status (print(mod) will show you the status at any time). If the run failed for some reason, it will try to run again. To force it to throw away the previous run, do mod$run('reset'). If you want to save the previous run, do mod$run('save') and the previous results will be moved to a new timestamped directory prior to starting the whole model over again.

Parallel Processing VE-3 has a basic implementation of parallel processing. You enable it by calling the R function mod$plan(workers=3) before you do mod$run(). VE-3 will group the model stages (see below) into sets that have the same starting point (see below, StartFrom) and distributed them across the number of "workers" (CPU's) available on your computer. Note that a large VisionEval model will consume up to 7 or 8 gigabytes of memory per running stage, so the limiting factor on parallel processing is more likely to be how much RAM you have and not how many processors. The parallel processing is implemented through R's "future" package (and support from "parallely") and any type of connection you can set up that way (including clusters of machines) shoudl work "in principle" and if you'd like to figure out how to set that up, get in touch with us. The stock parallell processing implementation uses the "callr" package to set up multiple R sessions.

VE-3 Model Configuration

To enable the new VE-3 features, particularly model stage and scenario management as well as efficient extraction of results, VE-3 uses a new configuration scheme. Parameters that used to be kept separately in "defs/run_parameters.json" and in the arguments to the initializeModel function in the run_model.R script are now maintained in one or more YAML files called "visioneval.cnf" (which, if you're seriously into nostalgia, could still be a JSON file called "run_parameters.json"; it should still work). Plus there are a lot more parameters to control other features of VE-3.

You can have a global "visioneval.cnf" (where you might set your preferred DatastoreType or a common random Seed), each model must have a "visioneval.cnf" in its root directory (i.e. inside "models\myModel") that describes the base model, and if the model has stages or scenarios defined in subdirectories of the the model (see below), those can also have a "visioneval.cnf" describing how they differ from the base scenario. However, you can also configure scenarios directly in the main model "visioneval.cnf", but that gets inconvenient if you have lots of them, or if they change frequently; see below on setting up scenarios.

Model Stages

The most radical change in VE-3 (which is fully backward compatible with old models) is that models are reconceived as a series of "model stages" - units that can be run to generate output in a Datastore. A model stage can be part of a model (e.g. population synthesis, or just the base year run of the model, or whatever). A stage can have its own run_model.R script (which can have some informative name if you prefer), its own inputs, and its own outputs. All the stages share the basic model structural information: everything that is in the "defs" directory, notably "geo.csv", "units.csv" and "deflators.csv".

Stages can be connected to each other, using the "StartFrom" parameter in the stage's "visioneval.cnf" file. When the stage is run, any information that is not present in that stage will be sought in the "StartFrom" stage (and if that stage also has a StartFrom, the run will keep looking up the "ladder" of StartFrom stages until it has found everything it needs). A stage without a StartFrom needs to have all the inputs and scripts it needs in its own directories. If you have multiple stages that share the same StartFrom (e.g. future year scenarios that StartFrom the default future scenario), those can easily be run in parallel (see above); the default is to run them sequentially, which is better the first time through, since it is a bit harder to find the murder weapon if one or more of the stages comes to an untimely demise.

It is possible to start a model from another model (in effect, turning the other model into a "stage") but that is intended mostly for use in debugging a large model that has crashed many minutes (or hours) into a run, and the key difference is that it copies the previous model's Datastore, rather than just accessing it in place. To use that, you can set up another model to load the partially-formed carcass with a script that starts just before the old model crashed and then play around with the inputs and not have to wait hours to find out if it worked or not. Use the LoadModel configuration parameter in the subsequent model's "visioneval.cnf" to copy over the datastore (and the LoadStage if you need to load a stage other than the very last one in the previous model).

VE-3 puts its results in a subdirectory of the model called "results". Each stage, if any are defined, will go in a subdirectory of "results" named after the stage.

One downside of the stages is that each stage has its own separate Datastore which VE-3 links internally to the StartFrom stage Datastores (without copying them). So to get at the results from your full model run, you'll either need to use the VE-3 extraction and query mechanims (see below). If you have R scripts already that run on one big Datastore, you can merge the stage datastores into a single datastore through a process called "flattening" the Datastore. You can just copy the results from your stage and add the Flatten parameter, like this:

rs <- mod$results("stageToFlatten")   # stageToFlatten is the name of the final stage you would like all the data from
rs$copy("OtherDirectory",Flatten=TRUE)  # Generates a Datastore will all data available to the stage from its StartFrom stages

Note that you can open that "OtherDirectory" using the openResults function (see its R help) that you can then extract or query using the machinery described below.

Building scenarios

Scenarios are just model stages that have the special property of being "Reportable" (that is, they will automatically be included in extracts or in queries). You can manually mark a stage as "Reportable" in its visioneval.cnf, but VE-3 marks any stage Reportable that does not have another stage starting from it (i.e. a terminal stage).

So to make a scenario, you define a model stage. If the changes do not involve different inputs (e.g. separating base and future years into different stages), you can just define the model stage in the model's visioneval.cnf.

For scenario-type stages, you might want to define different inputs, or perhaps even a different run_model.R script (though be careful with that - if the same data doesn't emerge from each Reportable stage, the query process may leave you with "NA's" in some of your metrics). That's most easily done by creating a sub-directory and putting a visioneval.cnf with the stage particulars into that directory. It's probably better to put run-model.R scripts under different names in the single "scripts" directory for the overall model than to bury them in the scenario stage. The stage (scenario) sub-directory should just contain input files that are different from what is available to its StartFrom stages.

So to set up scenarios efficiently, you'll create your default future year (with complete inputs), then just create a few altered input files in the scenario subdirectory of the model (the subdirectory is named after the "stage" according to its - or the model's - visioneval.cnf). When the stage runs, it looks for input files locally, and anything it doesn't find it searches for in its StartFrom stage (and on up the ladder, as described earlier). Building scenarios as stages in that way makes it very easy to keep track of what's different in each scenario. Plus, when you run the model, you just run the model - each scenario (stage) gets run in its turn automatically. And when you extract or query the results, you get the results for all the (Reportable) scenarios. - though you can also get at the results for stages that are not reportable; you just have to ask for them explicitly by name or index.

Instead of defining individual subdirectories for your stages, you can push them all down into a single subdirectory of your model called "scenarios". Inside that scenarios, you can construct a set of manual scenarios (and they will all be reportable by default, even if some of the scenarios you define start from others - that's the key behavioral difference). Or you can construct variant inputs and have VE-3 combine them into all possible permutations and combinations. That reproduces in essence the behavior of the old VEScenario package that no longer exists. If you do combination scenarios, you can visualize them easily with the (now long-in-the-tooth) R HTML visualizer. More modern approaches to dumping VisionEval outputs into Access, Excel, SQL generally or using various Tableau or Power BI templates are under development and will go into another minor update release shortly.

Extracting model results

Extracting results is pretty simple. You run the model, you get a "VEResults" object by calling rs <- mod$results(), and then you do rs$extract(). Be default, that creates R data.frames, but you can tell VE-3 where to save the results and it will (for now) create .CSV files with whatever you chose to extract. Note that the results are automatically "flattened" (see above on collapsing Datastores from multiple stages), so you get a full set of data including any StartFrom stages.

A forthcoming minor release will let you put those results directly into SQL, though you can do that now just by copying the results data.frames that are generated by the "extract" function into SQL tables using the R DBI interface.

Querying model results

A spiffy mechanism for generating summary queries was built a few years back by Brian Gregor (the original author of VIsionEval). That mechanism was reworked into VE-3 to query model results and generate tables of outputs for multiple scenarios within a model (all the "Reportable" ones). The idea is to generate summary metrics from simple one-line computations (e.g. household DVMT per capita) and build a table of all the metrics for each scenario. It's more easily shown in examples than explained briefly in text, so check out the "walkthrough" and also the "queries" subdirectory in many of the sample models (see "installModel" above).

The metrics can be split out by grouping variables (e.g. Income or some adjacent characteristic, such as Households in urbanized areas). In VE-3, Bzones can be tagged with new properties (e.g. identifying EJ zones) and those can also be used to subdivide the query metrics - just add your tags as columns in the model's geo.csv file before you run the model.

Queries can generate two types of output format: "wide format" which produces one column for each year of each scenario and one row for each metric, and "long format" which produces one row for each scenario for each year for each metric (so there is only one column of metric values in the resulting output). See the walkthrough for examples of how to generate those outputs.

Interior changes

As noted above, model stages can have different run_model.R scripts. That will support using (for example) different versions of the PowertrainsAndFuels module in different scenarios.

The framework now supports modules with "dynamic" specifications (generated by a function call at runtime, rather than being built as static data into an R package). See the VESnapshot package and help for its functions "Dynamic" and "Snapshot" for more details.

Earlier pre-release versions

The following notes were written at the time of earlier pre-releases and contain additional information about VE-3.

In beta-release-0.8: Updated 01-getting-started.md in VisionEval-docs (and configuration to build it into the installer) - currently a pull request. Fixed a variety of problems with queries and indexing model contents. Fixed runtime and development startup (including functional access to the walkthrough's in their own independent runtime folder). See the getting started document for some instructions.

Updated tests and walkthrough, and in the process fixed a bunch of bugs (beta-release-0.7).

In beta-release-0.7: Updated with a new test architecture (see the changelog, test-architecfture.md, for description) (beta-release-0.6). Also fixed some bugs in the VEModel results extraction code where earlier stages in a staged model were not being included in the results.

Updated to improve walkthrough and test access and clean up runtime build process (beta-release-0.4). This release (which skips beta-release-0.3) includes the patched MultiModal module, restructuring of the tutorials and vignettes, some fixes to the build process, the locate employment bug patch, and (most exciting) the fully functional dynamic visualizer that uses the VEModel scenario and query features to let you configure exactly what scenario categories and metrics you want to display.

Updated to beta-release-0.5 which makes a variety of minor fixes (including updates the scenario functionality to work better, added new scheme for managing pull requests and the changelog.

The internal VEModel test.R script, which I've used to develop the framework, is also included in (and will run from) the installed runtime. Just source("tools/tests/VEModel/test.R") once you have launched the runtime. I will be updating walkthrough.R to demonstrate the test features in a more tutorial fashion.

The revised walkthrough will be part of beta-release-0.5, which will also include updating the scenarios and visualizer so you can visualize an
arbitrary set of manually constructed scenarios, rather than just the category-permutation scenarios that track the old VEScenario functionality. That will make the visualizer a fully-supported element of VisionEval.

The installers included as assets below have been updated to tag beta-release-0.4.

Key changes include the following:

  • New framework VEModel package to manage the User Interface
  • Extensive changes to the visioneval framework package itself to support the new UI
  • An extensive runtime "walkthrough" describing and running most of the new User Interface features, plus access to the VEModel tests.R script, which is the definitive statement of "what works".

Though it is still feasible to run a "classic" VisionEval model by doing source("run_model.R"), the full power of the new framework requires a few simple modifications to the model structures. We should probably have a vignette on converting an old model. The basic strategy is simple: move the model into a subdirectory of the "models" directory in the runtime (callint it, say, "myModel"), and create a visioneval.cnf file that reproduces key elements from defs/run_parameters.json and the InitializeModel function in run_model.R. You can remove intializeModel from the run_model.R script (though you can also leave it - it will be ignored if you use the VEModel run function). Then just use mod <- openModel("myModel") and then mod$run(). You'll also eventually want to restructure the various scenarios you created by varying inputs to that model, and there will soon be a tutorial on how to do that.

Aside from the above, here are the key user-visible changes in this VisionEval version (there are probably many more, but they'll be intuitive or optional for basic model setup and runs).

  • Pre-packaged sample models are delivered in packages either as standalones (see the new VEState package) or as add-ons to a package that also delivers modules (see the updated VETravelDemandMM package). Use installModel() to list available models, installModel("VERSPM",var="") to list "variants" (the same sample model set up with different staging and configuration strategies), or pick a variant and do installModel("VERSPM",var="base") to copy one of the models into the runtime "models" folder.
  • The new model packaging scheme provides a new recommended way to generate a model: just create an R package like what you see in VEState, VERPAT, or VETravelDemandMM that has a suitably configured "inst/models" sub-directory. Once you install such a package (using standard R install.packages), the model automatically becomes visible and available to installModel.
  • As you will see VERPAT and VETravelDemandMM, an inst/model package can also include modules specific to that model (and the model thus functions like a vignette that shows how the package is used and what it does).
  • Models can be organized into "stages", or they can be "loaded" from another model. Stages can start from another stage, so you can run a base model then run future scenarios without re-running the base. Try "installModel("VERSPM",var="pop")" to look at a fully-staged sample.
  • Stages are "cascading" in the sense that key features like the defs/geo.csv file, any of the inputs, or the model script are passed from earlier stages to later stages (unless the later stage explicitly redefines it).
  • Stages sound complicated, but they greatly simplify scenario management and documentation: your new future scenario may only change three input files, so you just set the new scenario StartFrom to the existing future scenario and then drop just the three changed input files into the new scenario's sub-directory. Then you can run the model as usual.
  • And when you run a model, if any of its stages have already run to completion, they won't get re-run unless you "reset" the run. New scenario stages you may have added will get run.
  • Before or after running a model, you can use its "list" function to inspect all the inputs and outputs of each stage of the model, identifying which exact file was used on input, and which module wrote the various output datasets. That function is poorly documented still, but see the walkthrough and also "tests/test.r" in the VEModel package source code for more walkthrough-like testing.
  • You can use multiprocessing without any special changes. At a minimum you can do model$plan() just before doing model$run() which will use all available cores, or you can do model$plan(workers=3) to pick a specific number (smaller is sometimes better). When multiprocessing is enabled, all the model stages that StartFrom the same place will run in parallel. See the walkthrough and VEmodel/tests.R. Be aware however that the limiting factor on how useful multiprocessing will be is often the amount of RAM you have, not the number of cores - VE models that have many zones (or that simulate a lot of zones, like VE-State) use a LOT of memory. If you exceed available RAM, the memory will get swapped onto your hard disk, leading to a condition called "thrashing" that is ultimately just as bad (or worse) than it sounds.
  • In principle (that's what "beta" means in Greek) the multiprocessing should be trivial to extend across multiple servers or into the cloud using standard R future "plans".
  • Once you have run a model, you can get at its results easily using the "extract" and "export" functions to generate data frames and save the results into tabular formats. Right now, only .csv files are supported, but the forthcoming ViEIO package will allow the output formats to be readily extended to any tabular environment (SQLite, Excel, Access, MySQL, Apache feather).
  • An extensible query facility can also be applied to model results to generate performance metrics either for the entire region, broken out by Geography (Marea, Azone, Bzone), or set up by breakpoints (e.g. Household income brackets). You have to write the queries using framework tools or just by editing an R script file, but there are working examples included for most of the models that you can easily adapt.
  • The Query specification supports some extra fields that help control the visualizer display so it looks nicer. See the "scenarios-cat" or "scenarios-ms" variants of VERSPM and look in the "queries" sub-folder for a fully-fleshed-out example.
  • Query results are cached (but can be regenerated if you like), and can be extracted into a data.frame (query$extract())or exported into a tabular file (like the raw model results), again eventually using ViEIO.
  • When a model is run in stages, each stage has its own Datastore. No data is duplicated (new data from a stage is always written into its own Datastore). If a dataset is not present in the current stage, each StartFrom stage is searched until the dataset is located.
  • A Datastore can be manually copied to a new location (so can a model, with or without its results). A copied Datastore will always be associated with a single stage, but you can request that the copy operation "flatten" the Datastore, so that all the datasets are in one place.
  • When a Datastore is copied or accessed, the Datastore type (currently just "RD" or "H5" for hdf5, but be on the lookout for ViEIO version 2) can be changed. You can specific different Datastore types for each stage of a model if you like, or load a Datastore from another model with a different type and (in beta/principle) it "just works".
  • You can change the names of pretty much any of the standard files or directory locations. In the new version, model results go into a "Results" sub-directory, but you can set that back to "." to get the classic behavior. Such changes are managed in various places by files called "visioneval.cnf" that can be either in YAML (simple new style) or in JSON (like the old run_parameters.json file). See the visioneval.cnf.sample file in the root of the runtime folder for details (though there are currently a few more configurable parameters than appear there, mostly related to configuring scenarios).
  • A first cut at a tableau prototype is included in the "tools" directory (which can be extended with new scripts if you like). The "visuzalizer.js" file that is created if you save out the visualizer setup should eventually form the basis for tableau visualization - we'll come back to that.
  • The old VEScenario functionality, now (as of beta-release-0.4) including the R visualizer, is fully integrated into the framework. You can create model stages in a sub-directory of the model, either by explicitly putting them there (and remember you only have to write the parts that differ from the base scenario), or by using the category combination approach that VEScenario made available. installModel("VERSPM",var="scenario-ms") to see examples of setting up manual scenarios as model stages (visualizer may not yet be working here), and `installModel("VERSPM",var="scenario-cat") (visualizer definitely works here) to see an example of setting up combination categories (as well as an example of how to comment out some categories so you can try out such a model in a finite amount of time without having to run 637 scenarios - though you could if you wanted to).
  • The old R visualizer is now integrated into the framework. Currently, you get there by running a visualizer-compatible query (which is really any query, except it will look nicer if you set some of the "Export" fields in the query specification; see the scenario sample models) and then doing query$visual(). By default it will launch a browser window, load the visualizer, and populate the it with scenario categories and results. If you set SaveTo=NULL, it will dump a working set of files (like the old VEScenario visualizer) into a sub-directory of results/outputs. The walkthrough will get updated to demonstrate all of that. The upshot is that you will have full control over the input categories as well as what metrics appear in the output histograms just by configuring the model and its query (the visualizer HTML, Javascript and CSS need never be touched)