-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Use an epsilon independent of dt when comparing times #1057
base: master
Are you sure you want to change the base?
Conversation
Will look in more detail later but quick comment. This solution makes me feel a bit uncomfortable because it's hard coding a relevant time scale. What about as an alternative, taking 0.01% of the minimum dt across all clocks in the Network? |
I thought so before as well, but I now wonder whether this isn't rather a feature than a bug... The problem I'm seeing with your alternative is that it potentially makes the behaviour of one object depend on the presence of another object. Say I have a few things running on a very slow clock and a few others on a fast clock (i.e. the situation that triggered #1054). With the "minimum dt"-based solution, you'd again trigger the bug if you remove/deactivate the fast objects, e.g. for debugging. I find this more difficult to explain than "you have to manually change But then, all this is about extreme corner cases, and I'm happy to think this over a bit more before merging. |
That's a fair point, but I'm still uncomfortable with the global time scale. The reason for having an epsilon relative to dt is that it relates more or less to floating point equality tests The choice of epsilon also puts an upper bound on how large you can make the duration. With a dt=.1ms we want to convert times into 32 bit signed ints if I remember correctly, which means our max duration is about 60 hours, whereas if we have a hardcoded epsilon based essentially on a dt of 0.01ms then that drops to 6 hours. Admittedly, not that many people run simulations for that much simulated time - but some do I guess! Can you describe the origin of the bug in #1054 in more detail? Maybe there's another solution here. |
I don't think this is the case. The epsilon is only relevant when comparing two times, we don't use it to convert anything into integers (also: why 0.01ms? In the current PR, epsilon is 10ns = 0.00001ms). I think in practice there are only differences to a solution based on
The details are actually quite tricky because our system handling the clocks is quite convoluted. I'd say the the main issue is the following : If the dt is 100000s (as in #1054), and we request a run of 1s, then the end timestep will be calculated to be 0, because it is "good enough" with respect to the requested time. This is the main issue, later in the run process similar issues will then make the simulation not run at all or run things too often. Just to make sure there is no misunderstanding, your proposed solution would definitely fix the bug, I'm just worried about introducing hard-to debug issues where the behaviour of one object depends on the presence of another object. |
Thank you for addressing this so quickly! I cloned the branch and can confirm that this also fixes the behaviour in my full network. In the meanwhile, however, I discovered a new and related(?) problem that does not seem to be fixed by the suggested changes. Even when the |
@felix11h uh, so this is indeed not completely fixed, thanks for making us aware! I'm away the coming week, but I'll have a closer look at this when I'm back. |
Also cast integers to 64 bit instead of 32 bit
I had another look at this, the latest issue that @felix11h mentioned was actually unrelated. When you set |
Can you hack the clock jumping forward to time step 2^31-5 or something like that? |
Yes, I think that should work, I'll give it a try. |
It is a little bit ugly (I had to use All tests pass on my machine. @felix11h, can you give it another try as well? Of course this still leaves the question open whether the fix I implemented (fixed epsilon instead of relative epsilon) is a good one. I also just realized that the user can change the epsilon rather easily for runtime (by setting |
@thesamovar What do you think? Do you still prefer your solution of using the smallest |
I think I still slightly prefer my solution because it doesn't force a natural timescale, so maybe we should get comments from others. @romainbrette what do you think? Issue summarised in brief below so that you don't need to read the whole thread (Marcel, feel free to correct me if I've misrepresented it). We have a bug where objects with a very large value of dt don't work correctly because the code for checking whether two things happen in the same timestep is relative to dt (I think 0.01% of dt?) in order to work with floating point values, and having two objects with very different dts introduces issues (because what seems like small to one object can be large to the other one). Large dt does make sense for some objects, e.g. network_operation that only runs infrequently. Marcel proposes having a global value of dt that is used for these "same timestep" calculations, which can be changed. My alternative was using the minimum value of dt across all objects in the Network, because this is time scale invariant. Marcel's problem with this is that it introduces a potentially hard to debug situation where the behaviour of one object depends on the inclusion or not of a potentially unrelated object. I can see the point here, but I also feel like having a dependence on a (to the user) unknown global dt could also be confusing. Any thoughts? |
Would be happy to hear @felix11h's opinion as well! |
Hello everyone, I read the whole thread, and guess that I understood the issue we're facing. I too feel that hardcoding the epsilon and forcing the user to set manual epsilon value for the special case is not the best choice, but it solves the issue for almost all reasonable values. Last week, when I was in class my professor said that for comparing two numbers having fixed tolerance epsilon induces flow leak (eg: abs(0.0000000000012-0.0.0000000000013)< 0.000000001 will give unwanted output). To overcome this problem we can use: abs(a-b)<epsilonabs(a) && abs(a-b)<epsilonabs(b) instead of abs(a-b)<epsilon. I hope, this can avoid manual setting. Regarding issue #1054 I guess, when we compare two time differences, viewing from their relative scale would be effective and reduce rounding issues. So would using less than (0.01% of dt)*abs(dt) of dt away from the requested run time instead of 0.01% of dt solve the problem? The bottom line what I feel is that comparing, finding difference and rounding should be done in the scale of the considered variables rather than having fixed global scale. |
@mstimberg Thank you for your updates on this issue! I tested the branch with your changes and my full simulation now runs without issue (of course it will take a while to complete). I don't believe I can adequately comment on the best implementation. From the user standpoint however, I am not worried about the hardcoded 10*ns, especially since (if I understand you correctly) this value could be set by the user if needed. |
I removed the |
I'm not super happy with this fix, but I did not see any other obvious solution.
As a bit of context: in Brian, we are making our life hard by being very flexible on everything related to clocks. Users can use multiple clocks with varying dt's (which do not have to be multiples of each other), dt's can change between runs, and runs do not need to express times in multiples of dt (e.g. with the default dt of 0.1ms, you can run a network twice for 0.05ms and it will work fine, running everything for a single time step). Since this entails comparing floating point time values (instead of just integer time steps), we have to deal with floating point inaccuracies and cannot simply do exact comparisons to figure out whether two clocks with differing dt's should be executed within the same time step, etc.
Our previous solution was to consider two times as being equal iff they differ by less than 0.01% of dt. The problem is, that 0.01% of dt can be a meaningful amount of time if dt is very long. In #1054, a
run_regularly
operation had a very long dt, and a run had a runtime of less than 0.01% of this dt before doing a long run. From the point of view of therun_regular
operation, the start and end time of the short run were identical, leading to weird behaviour (which differed between standalone and runtime...).The fix I went for is to simply use an absolute difference when comparing times, i.e. 10ns (which is 0.01% of the default dt). This has the advantage of being easy to explain and should avoid issues when having big differences in dt. It is still far away from reasonably chosen dt's, and if someone wants to simulate stuff in the nanosecond range for some reason, they could always change
Clock.epsilon_dt
manually.Fixes #1054 (@felix11h, I tested with the code you provided, but please give it a go with your actual code as well)