-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trial sharing simulations across tests #1324
base: master
Are you sure you want to change the base?
Conversation
Performance Summary
For the implementation on this branch, durations <0.01s are not listed, but are broken down into So for example, a "post" time of Test names have been truncated so that the table formats properly, see the list below for full names (in row order).
Conclusions
As such, there appears to be a net time save to be obtained from sharing simulations. Full test names
|
3501a70
to
7230953
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @willGraham01 for your work on this, there are some interesting ideas here!
I've added some initial thoughts and comments. Overall I think the idea of providing standardized infrastructure for allowing sharing simulations across tests safely when the test functions involves manipulating the simulation object is nice. If we could avoid requiring refactoring test functions into test classes I think that would be good, as we generally have not used test classes, so this would be both quite a lot code refactoring to apply to existing tests and also make it less familiar for existing users when adding new tests.
Also as our biggest test computational burden are the GitHub Actions runs which run a matrix with one test module per runner, a session scoped shared simulation fixture will in reality have the same computational burden as a module scoped one. This to some degree limits how much computational gain we could get from this sort of approach, but also means we don't necessarily need to find a 'lowest common denominator' shared simulation across multiple modules, but can instead just look at sharing simulations within a single module, which presumably should be easier to manage in terms of specifying the Module
subclasses to register, population size etc. appropriately. In this case having standard fixtures to allow safely updating properties of a shared_sim
fixture within a test function without persisting those changes would still be useful.
@pytest.fixture(scope="session") | ||
def jan_1st_2010() -> Date: | ||
return Date(2010, 1, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly naming start_date
or similar would make a bit more descriptive (and also make more sense if we later changed default start date).
old_param_values = dict(self.this_module.parameters) | ||
yield | ||
self.this_module.parameters = dict(old_param_values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is dict
being used here to force a copy? If so using the copy
method would perhaps be more explicit. Also is there any need for calling dict
with old_param_values
as presumably we can just restore the original (copy)?
sim.register( | ||
demography.Demography(resourcefilepath=resource_filepath), | ||
diarrhoea.Diarrhoea(resourcefilepath=resource_filepath, do_checks=True), | ||
diarrhoea.DiarrhoeaPropertiesOfOtherModules(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using this module with stunting.Stunting
may be problematic as both define some of the same properties ( diarrhoea.DiarrhoeaPropertiesOfOtherModules
is intended as a stand-in for test purposes for a number of modules, including Alri
, Epi
, Hiv
, Stunting
and Wasting
).
@pytest.fixture(scope="function") | ||
def changes_module_properties(self): | ||
""" | ||
The Diarrhoea.models property takes a copy of the | ||
module's parameters on initialisation, so is not | ||
affected by our temporary parameter change from the | ||
base class. | ||
|
||
Overwriting the base fixture to remedy this. | ||
""" | ||
old_param_values = dict(self.this_module.parameters) | ||
yield | ||
self.this_module.parameters = dict(old_param_values) | ||
self.this_module.models.p = self.this_module.parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.fixture(scope="function") | |
def changes_module_properties(self): | |
""" | |
The Diarrhoea.models property takes a copy of the | |
module's parameters on initialisation, so is not | |
affected by our temporary parameter change from the | |
base class. | |
Overwriting the base fixture to remedy this. | |
""" | |
old_param_values = dict(self.this_module.parameters) | |
yield | |
self.this_module.parameters = dict(old_param_values) | |
self.this_module.models.p = self.this_module.parameters |
Don't think this is necessary, though maybe I'm wrong if this was specifically added because you found things weren't working as expected. tlo.diarrhoea.Models
has an attribute p
which points to the same dict
instance as the parameters
attribute of the tlo.diarrhoea.Diarrhoea
instance from which it is initialised - as far as I can see there is no copy involved.
return sim | ||
|
||
|
||
class _BaseSharedSim: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of using fixtures to safely mediate updates to the shared simulation fixture is cool, I hadn't come across the use of yield
in fixtures to allow setup / teardown before.
What I'm less sure of is if we need the _BaseSharedSim
class and corresponding subclasses? We could do something similar with just test functions, for example
@pytest.fixture(scope="session")
def start_date() -> Date:
return Date(2010, 1, 1)
@pytest.fixture(scope="session")
def resource_filepath() -> Path:
return (Path(os.path.dirname(__file__)) / "../resources").resolve()
@pytest.fixture(scope="session")
def shared_sim(seed, start_date, resource_filepath):
...
@pytest.fixture
def clear_shared_sim_hsi_event_queue(shared_sim):
cached_queue = shared_sim_healthsystem.HSI_EVENT_QUEUE.copy()
shared_sim.modules["HealthSystem"].reset_queue()
yield
shared_sim.modules["HealthSystem"].HSI_EVENT_QUEUE = cached_queue
@pytest.fixture
def changes_shared_sim_event_queue(shared_sim):
cached_event_queue = shared_sim.event_queue.copy()
yield
shared_sim.event_queue = cached_event_queue
@pytest.fixture
def changes_shared_sim_module_parameters(shared_sim, changed_module_name):
cached_parameter_values = shared_sim.modules[changed_module_name].parameters.copy()
yield
shared_sim.modules[changed_module_name] = cached_parameter_values
@pytest.fixture
def changes_shared_sim_person_properties(shared_sim, changed_person_id):
cached_values = shared_sim_df.loc[changed_person_id].copy()
yield
shared_sim_df.loc[changed_person_id] = cached_values
This assumes a changed_module_name
/ changed_person_id
fixture would be defined in the scope of the test function using the changes_shared_sim_module_parameters
/ changes_shared_sim_person_properties
fixture, which would work when we want to use the same module / person ID values across a test module (or test class). If we wanted to also be able to set these at a per test function level we could use the ability to use marks to pass data to fixtures and have defaults for the fixtures like
@pytest.fixture
def changed_module_name(request):
return request.node.get_closest_marker("changed_module_name").args[0]
@pytest.fixture
def changed_person_id(request):
return request.node.get_closest_marker("changed_person_id").args[0]
with an example test function usage then something like
@pytest.mark.changed_module_name("Diarrhoea")
@pytest.mark.changed_person_id(0)
def test_do_when_presentation_with_diarrhoea_severe_dehydration(
shared_sim,
changed_person_id,
changes_shared_sim_module_parameters,
changes_shared_sim_person_properties,
clear_shared_sim_hsi_event_queue,
):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if we do keep the base class approach, I would say we don't want to use an underscrore prefixed name as it is intended for use outside this module?
Related to #1144 | Leaving as a draft for now since this is mainly a proof of concept.
This is a proof of concept for how we might decide to share simulation objects across tests, with a view to speeding up the execution of the test suite.
This PR creates the
_BaseSharedSim
class, whose purpose is to be used as a base class for the module-level tests. We also create the session-scopedshared_small_sim
fixture, which is a simulation object that is created and run once right at the start of the test session._BaseSharedSim
then defines anautouse
fixture_attach_to_shared_sim
that allows any tests within the class (and any tests in derived classes) to access the shared simulation object.As a proof of concept, this idea has been implemented in two modules (
Diarrhoea
andStunting
) for the "0-day-simulations" that appear semi-frequently across the test suite: simulations that have the same start date as end date, and a particularly low initial population (100). Two new test classes,TestStunting_SharedSim
andTestDiarrhoea_SharedSim
are defined, which inherit from the base class.Complications due to simulation sharing
There are a number of complications regarding sharing simulation objects.
The most obvious is that the shared simulation object has to register all the modules (and their dependencies) that then wish to hook into this shared simulation to run tests. This isn't particularly problematic beyond remembering to adjust the dependencies if a new module is created and wishes to hook into the shared object.
Multiple Persistent Simulation Objects
If we want several different persistent simulation objects, we will need a
_BaseSharedSim
class for each one (this is due to how pytest handles setup and teardown fixtures). The current implementation has been designed with this possible generalisation in mind; everything except the_attach_to_shared_sim
method just needs to be moved into a further abstract class - say__AbstractSharedSim
which is then derived from:Why is there a need for the
_BaseSharedSim
class at all given thesmall_shared_sim
fixture is session-scoped?Technically there isn't, but:
pytest
internals are a mystery and this was the only way I could get this to work.Changing Simulation Properties / State
A lot of tests need a simulation object for the sole reason that they want to check if a particular event is scheduled under certain conditions.
Setting up these conditions can involve a number of changes to the simulation object, including:
As such, it is necessary for us to have "setup/teardown" fixtures that allow us to restore any potential edits to the shared object state. This is cheaper than recreating the simulation object from scratch, but still introduces some overhead. It does have the effect of forcing each test to highlight which aspects of the simulation object it expects to change though, which is quite useful from a debugging perspective.
These safe "setup/teardown" fixtures are defined in the
_BaseSharedSim
class so that they can be inherited by module-level test classes.