WIP: Ratcliff Diffusion Model pdf and simulators #29

kiante-fernandez · 2023-06-22T03:20:04Z

I have added a PDF file containing the Ratcliff Diffusion Model and two methods for simulating the diffusion process. However, the PDF code still needs testing as there are some remaining bugs. I wanted to share the progress so that we can identify the tests that need to be conducted and address any issues.

codecov-commenter · 2023-06-22T03:22:45Z

Codecov Report

Patch coverage has no change and project coverage change: -30.14 ⚠️

Comparison is base (4414161) 82.93% compared to head (7fd93cd) 52.80%.

Additional details and impacted files

@@             Coverage Diff             @@
##           master      #29       +/-   ##
===========================================
- Coverage   82.93%   52.80%   -30.14%     
===========================================
  Files          11       12        +1     
  Lines         545      856      +311     
===========================================
  Hits          452      452               
- Misses         93      404      +311

Impacted Files	Coverage Δ
src/DDM.jl	`82.68% <ø> (ø)`
src/RatcliffDDM.jl	`0.00% <0.00%> (ø)`
src/SequentialSamplingModels.jl	`100.00% <ø> (ø)`

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

src/RatcliffDDM.jl

itsdfish · 2023-06-22T09:50:46Z

src/RatcliffDDM.jl

+    end
+
+    if η == 0


I like the idea of suggesting typical parameter ranges in the documentation, but I don't think we should enforce bounds unless they are strict (e.g., a standard deviation must be non-negative). I think the best approach is to allow users to enforce their own parameter bounds. This will also be consistent with the approach taken for the other models in the package. .

Something that might be good to add is argument checks for parameters that cannot take certain values (e.g., negative values). Here is the best approach, which could be used throughout the package:

struct M{T<:Real} x::T y::T function M(x::T, y::T; check=true) where {T} check ? check_args(M, x, y) : nothing return new{T}(x, y) end end function M(x, y; check=true) return M(promote(x, y)...; check) end function M(; x, y, check=true) return M(x, y; check) end function check_args(m::Type{<:M}, x, y) if (x < 0) || (y < 0) error("x and y must be positive") end end m = M(1, 2) m = M(; y=1, x=2) m = M(; y=1, x=-2, check=false) m = M(3.0, 1) m = M(-2, 1) m = M(-2, 1.0)

The inner constructor is kind of ugly, but it gives us a lot of flexibility.

Okay. I will make a separate PR that contains an implementation for this across models.

src/RatcliffDDM.jl

itsdfish · 2023-06-22T09:59:29Z

@kiante-fernandez, thanks for the progress update. I realize this is a WIP, but I left a few comments that may still be applicable.

My approach to tests is to start by plotting the pdf over the histogram of simulated data. That is part of the reason I include those plots in the documentation. I also like to compare the pdf to a kernel density of the pdf. It's not perfect, but it gives you some idea of whether the generative model and the pdf are consistent with each other. I also test against established libraries when available. Another thing I do is develop a test whenever I find a bug to ensure there are no regressions in the future.

DominiqueMakowski · 2023-06-22T11:03:58Z

Could we also add (in the docstrings?) a brief mention about the difference between DDM and RatcliffDDM and maybe a cross-link ("See also DDM()")

itsdfish · 2023-06-22T12:16:40Z

Could we also add (in the docstrings?) a brief mention about the difference between DDM and RatcliffDDM and maybe a cross-link ("See also DDM()")

If we can get this model working, another option would be to integrate everything into one model given that DDM is currently a special case in which the cross-trial variability is zero. The integration procedures can be skipped in the special case. We could avoid breaking changes by using the type DDM for the general model.

…alSamplingModels.jl into ssmDev

kiante-fernandez · 2023-07-03T13:46:47Z

@itsdfish I have added some updated PDF along with the CDF code, but I am getting a precompile error that might be related to some bugs with Bijectors.

itsdfish · 2023-07-03T14:25:58Z

Thanks @kiante-fernandez for your hard work. It is difficult for me to look at a fork locally. Based on the unit tests, it seems like there is a variable called x that is not defined somewhere. It's not clear to me where its at based on the error trace. I recommend using Revise.jl and commenting out files or parts of files until you can locate the issue. If you continue to have issues, I can make a new branch and you can merge into that. I don't want to merge code with errors into main.

I'll leave a few comments for now and look through the code in more detail probably tomorrow.

src/RatcliffDDM.jl

.vscode/settings.json

…alSamplingModels.jl into ssmDev

itsdfish · 2023-12-18T21:58:34Z

@kiante-fernandez, I hope you are doing well. I wanted to follow up with you about the status of the PR and a related question about rand. I was hoping to use rand, but I am having trouble reproducing results with RT dists. I thought maybe I am doing something incorrectly, or I am misunderstanding the parameterization of the model. Here is what I have:

Julia

using SequentialSamplingModels
using SequentialSamplingModels: RatcliffDDM

dist = RatcliffDDM(ν = 1.0,α = 0.80,τ = 0.30,z = 0.25,η = 0.16,sz = 0.05, st = .10,σ = 1.0)
choice,rt = rand(dist, 10_000)

mean1 = mean(rt[choice .== 1])
prob1 = mean(choice .== 1)

mean2 = mean(rt[choice .== 2])
prob2 = mean(choice .== 2)

[mean1, mean2, prob1, prob2]

4-element Vector{Float64}:
 0.49065225546420305
 0.3867084522660908
 0.4116
 0.5884

R

library("rtdists")

# z = a * .25
sim_data = rdiffusion(100000, a=.8, v=1, t0=.30, z = .20, sz = .05, sv = .16,
                      st0 = .10, s = 1)

data_lower = sim_data$rt[sim_data$response == "lower"]
mean_lower = mean(data_lower)
prob_lower = mean(sim_data$response == "lower")

data_upper = sim_data$rt[sim_data$response == "upper"]
mean_upper = mean(data_upper)
prob_upper = mean(sim_data$response == "upper")
c(mean_upper,mean_lower, prob_upper, prob_lower)

0.5408594 0.4371890 0.4108300 0.5891700

The response probabilities are similar, but the rts are different. Any ideas?

kiante-fernandez · 2023-12-21T20:45:57Z

Thanks for following up. With the holiday here, I should have some time to take a look. Also, from the example you gave, you have set z to two different values (Julia: z = 0.25 and R: z = 0.20).

itsdfish · 2023-12-21T22:47:05Z

Thanks for taking a look.

You are right that the values for z are different. The documentation for rtdist was a bit unclear. However, the examples lead me to believe it is absolute, so the appropriate value is z = .80 * .25 = .20. I did try the absolute value, but the results diverged even more. I created a random walk version because it is easier to understand. Interestingly, the results between that and your code are similar. I will share that code in case it is helpful for determining a ground truth.

Anyways, have a good break!

function rand(dist::DDM; Δt=.001)
    (;ν,α,z,σ,τ,η,st,sz) = dist
    t = 0.0
    z′ = α * z 
    Δx = σ * √(Δt)
    x = sz == 0 ? z′ : rand(Uniform(z′ - sz / 2, z′ + sz / 2))
    ν′ = rand(Normal(ν, η))
    p = .50 * (1 + (ν′ * √Δt) / σ)
    while (x < α) && (x > 0)
        x += rand() ≤ p ? Δx : -Δx
        t += Δt
    end
    choice = (x < α) + 1
    t += st == 0 ?  τ : rand(Uniform(τ - st / 2, τ + st / 2))
    return choice,t
end

function rand(dist::DDM, n::Int; Δt=.001)
    choices = fill(0, n)
    rts = fill(0.0, n)
    for i ∈ 1:n 
        choices[i],rts[i] = rand(dist; Δt)
    end
    return (;choices,rts)
end

Update

After further investigation, I found that the culprit is st0 and st. When I set the variability parameters to zero, the results agreed. However, the only diverged when st0 and st > 0:

sim_data = rdiffusion(100000, a=.8, v=1, t0=.30, z = .20, sz = .0, sv = .00,
                      st0 = .2, s = 1)
dist = RatcliffDDM(;ν = 1.0,α = 0.80,τ = 0.3,z = 0.25,η = 0.00,sz = 0.0, st = .20,σ = 1.0)

The mean reaction time increased for rtdists, but stayed the same for SSM.jl. If I understand correctly, st0 should not change the mean reaction time. Is that your understanding too?

kiante-fernandez · 2023-12-26T01:05:19Z

@itsdfish That is my understanding as well. I tried to hunt down the sampling function rtdist uses. But I also confirmed this using a few different simulation functions across both R and Python.


# Load necessary libraries
library(ggplot2)
library(rtdists)
library(Rhddmjags)

# Function to simulate and get mean RT for a given st0 value
simulate_mean_rt <- function(st0, sample_method = 1) {
  if (sample_method == 1) {
    sim_data <- rdiffusion(100000, a = 0.8, v = 1, t0 = 0.30, z = 0.20, 
                           sz = 0.0, sv = 0.0, st0 = st0, s = 1)
    mean_rt <- mean(sim_data$rt)
  } else if (sample_method == 2) {
    sim_data <- Rhddmjags::simulratcliff(100000, Alpha = 0.8, Nu = 1, Tau = 0.30, Beta = 0.20, 
                                         rangeBeta = 0.0, Eta = 0.0, rangeTau = st0, Varsigma = 1)
    mean_rt <- mean(abs(sim_data))
  } else {
    sim_data <- Rhddmjags::simul_ratcliff_slow(100000, Alpha = 0.8, Nu = 1, Tau = 0.30, Beta = 0.20, 
                                               rangeBeta = 0.0, Eta = 0.0, rangeTau = st0, Varsigma = 1, nsteps = 1000)
    mean_rt <- mean(abs(sim_data))
  }

  return(mean_rt)
}

st0_values <- seq(0, 0.7, by = 0.05)
# Define a range of st0 values and store results
results1 <- data.frame(st0 = st0_values, mean_rt = sapply(st0_values, simulate_mean_rt))
results2 <- data.frame(st0 = st0_values, mean_rt = sapply(st0_values, simulate_mean_rt, sample_method = 2))
results3 <- data.frame(st0 = st0_values, mean_rt = sapply(st0_values, simulate_mean_rt, sample_method = 3))

results1$method <- "fast_dm"
results2$method <- "rejection_rhddmjags"
results3$method <- "euler_rhddmjags"

results <- rbind(results1, results2, results3)

# Plot using ggplot2
ggplot(results, aes(x = st0, y = mean_rt, group = method, color = method)) +
  geom_line() +
  geom_point() +
  labs(title = "Mean Response Time by st0", 
       x = "st0 Value", 
       y = "Mean RT") +
  theme_minimal()

Here fast_dm is the rdiffusion function

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import hssm
from ssms.basic_simulators.simulator import simulator

v_true, a_true, z_true, t_true, sz_true, sv_true, st_true, s  = [1, 0.8, 0.2, 0.3, 0, 0, 0, 1]

# Define the range of st_true values
st_true_values = np.linspace(0, 1, 10)

# Initialize lists to store the mean response times and st_true values
mean_rts = []
st_values = []

# Loop over the range of st_true values
for st_true in st_true_values:
    # Simulate data
    sim_out = simulator([v_true, a_true, z_true, t_true, sz_true, sv_true, st_true, s], model="full_ddm", n_samples=10000)
    
    # Turn data into a pandas dataframe
    dataset_plots = pd.DataFrame(
        np.column_stack([sim_out["rts"][:, 0], sim_out["choices"][:, 0]]),
        columns=["rt", "response"],
    )
    
    # Round the response times to 4 decimal places
    dataset_plots['rt'] = dataset_plots['rt'].round(4)
    
    # Calculate the mean response time and store it
    mean_rt = dataset_plots['rt'].mean()
    mean_rts.append(mean_rt)
    
    # Store the st_true value
    st_values.append(st_true)

# Plot the mean response times by st_true values
plt.plot(st_values, mean_rts)
plt.xlabel('st_true')
plt.ylabel('Mean response time')

# Make the y-axis larger
plt.ylim(0, max(mean_rts) * 1.1)

plt.show()

So at first glance, we might have found something wrong with rdiffusion?

Also, just so I am clear on the PR, I am still writing DDM.jl to replace RatcliffDDM.jl. So, when we have non-zero values of the across-trial parameters, it just calls those functions (the RatcliffDDM) instead?

itsdfish · 2023-12-26T16:25:30Z

@kiante-fernandez, thanks for looking through the issue above. It does appear to be a bug. I will submit an issue later today or tomorrow. I think the code above can be repurposed for unit tests at some point. It would be good to include a few tests to make sure changes in parameters have the expected effect (or lack of effect). It goes to show that it is easy to make a simple error. I would be happy to add those tests if it would help.

Yeah. I think it would be good to remove the DDM code and use the RatcliffDDM code because it subsumes the DDM code. There are probably different ways to approach it. Perhaps the simplest way is to use conditional checks to skip calculations needed for st, eta and sz if the value is equal to zero. For example, if you perform an expensive operation (e.g., a convolution) in the RatcliffDDM code, we can potentially skip it if eta = 0. Of course, you will probably have a better idea of what approach is the best in terms readability, efficiency, and maintainability. But I think this will be simpler overall because DDM is a special case where st, eta,sz = 0 in RatcliffDDM. Once all of that is done, I recommend switching the name from RatcliffDDM to DDM, which will prevent a breaking change (e.g., we are just making DDM more general). Let me know if you have any questions or concerns.

Update

I looked through the documentation for rtdists again and noticed I may have misunderstood the parameterization. Here is what it says

"inter-trial-variability of non-decisional components. Range of a uniform distribution with mean t0 + st0/2 describing the distribution of actual t0 values across trials. Accounts for response times below t0."

In contrast to sz, st0/2 is added to mean non-decision time. That seems a bit unusual to me, but is consistent with the observed behavior. The only thing that does not make sense to me is "Accounts for response times below t0". Maybe that is not important. Anyways, sorry for the oversight. I thought I read something else.

kiante-fernandez · 2023-12-27T02:29:02Z

@itsdfish I have finally had a chance to take a look through the new codebase; great work on organizing things! Following the conventions, I added the AbstractDDM and moved everything we needed to DDM.jl. I am still failing some of the tests I set up, but I thought I would ping for feedback while I have time to have eyes on this for a few days.

.vscode/settings.json

src/DDM.jl

itsdfish · 2023-12-27T20:48:23Z

@itsdfish I have finally had a chance to take a look through the new codebase; great work on organizing things! Following the conventions, I added the AbstractDDM and moved everything we needed to DDM.jl. I am still failing some of the tests I set up, but I thought I would ping for feedback while I have time to have eyes on this for a few days.

@kiante-fernandez, thanks! I think we are converging on a good system and API.

I made a few comments throughout. Everything looks good for the most part. I think the primary item is getting the unit tests to pass and adding more if needed.

kiante-fernandez · 2024-01-01T18:54:41Z

@itsdfish I have finally had a chance to take a look through the new codebase; great work on organizing things! Following the conventions, I added the AbstractDDM and moved everything we needed to DDM.jl. I am still failing some of the tests I set up, but I thought I would ping for feedback while I have time to have eyes on this for a few days.

@kiante-fernandez, thanks! I think we are converging on a good system and API.

I made a few comments throughout. Everything looks good for the most part. I think the primary item is getting the unit tests to pass and adding more if needed.

@itsdfish Thank you for the comments. I have been working to resolve some issues, but I have not yet identified what is wrong with the implementation. I thought it might be due to the numerical integration implementation (inspired by [https://github.com/hddm-devs/hddm/blob/master/src/integrate.pxi]), though I have not pinned it down yet. That is just my sense from some of the test results so far.

github-actions · 2024-01-01T18:57:20Z

Benchmark Results

	master	`8515612`...	t[master]/t[`8515612`...]
logpdf/("SequentialSamplingModels.DDM", 10)	1.67 ± 0.11 μs	14.5 ± 3.5 μs	0.115
logpdf/("SequentialSamplingModels.DDM", 100)	17.2 ± 1.2 μs	0.136 ± 0.0071 ms	0.126
logpdf/("SequentialSamplingModels.LBA", 10)	2.45 ± 0.18 μs	2.45 ± 0.18 μs	1
logpdf/("SequentialSamplingModels.LBA", 100)	23.5 ± 0.63 μs	23.5 ± 0.66 μs	0.999
logpdf/("SequentialSamplingModels.LNR", 10)	1.01 ± 0.17 μs	1 ± 0.15 μs	1.01
logpdf/("SequentialSamplingModels.LNR", 100)	8.54 ± 0.29 μs	8.56 ± 0.28 μs	0.998
logpdf/("SequentialSamplingModels.RDM", 10)	2.61 ± 0.25 μs	2.54 ± 0.19 μs	1.03
logpdf/("SequentialSamplingModels.RDM", 100)	24.9 ± 0.71 μs	24.9 ± 0.73 μs	1
logpdf/("SequentialSamplingModels.Wald", 10)	0.227 ± 0.16 μs	0.227 ± 0.17 μs	1
logpdf/("SequentialSamplingModels.Wald", 100)	2 ± 0.17 μs	2 ± 0.18 μs	1
logpdf/("SequentialSamplingModels.WaldMixture", 10)	1.09 ± 0.16 μs	1.09 ± 0.16 μs	0.999
logpdf/("SequentialSamplingModels.WaldMixture", 100)	10.6 ± 0.16 μs	10.6 ± 0.17 μs	1
rand/("SequentialSamplingModels.DDM", 10)	3.92 ± 0.54 μs	9.39 ± 1.3 μs	0.418
rand/("SequentialSamplingModels.DDM", 100)	0.0383 ± 0.0019 ms	0.088 ± 0.0058 ms	0.435
rand/("SequentialSamplingModels.LBA", 10)	3.21 ± 1.3 μs	3.23 ± 0.39 μs	0.993
rand/("SequentialSamplingModels.LBA", 100)	16.5 ± 0.63 μs	16.5 ± 1 μs	1
rand/("SequentialSamplingModels.LCA", 10)	0.769 ± 0.24 ms	0.77 ± 0.24 ms	0.999
rand/("SequentialSamplingModels.LCA", 100)	8.24 ± 0.24 ms	8.3 ± 0.26 ms	0.992
rand/("SequentialSamplingModels.LNR", 10)	1.06 ± 0.18 μs	1.09 ± 0.27 μs	0.972
rand/("SequentialSamplingModels.LNR", 100)	7.34 ± 3.8 μs	7.59 ± 3.8 μs	0.967
rand/("SequentialSamplingModels.RDM", 10)	1.29 ± 0.18 μs	1.33 ± 0.2 μs	0.971
rand/("SequentialSamplingModels.RDM", 100)	10.7 ± 3.8 μs	10.8 ± 3.9 μs	0.998
rand/("SequentialSamplingModels.Wald", 10)	0.46 ± 0.16 μs	0.464 ± 0.16 μs	0.991
rand/("SequentialSamplingModels.Wald", 100)	2.89 ± 0.21 μs	2.91 ± 0.2 μs	0.991
rand/("SequentialSamplingModels.WaldMixture", 10)	1.2 ± 0.15 μs	1.19 ± 0.15 μs	1
rand/("SequentialSamplingModels.WaldMixture", 100)	11.4 ± 0.17 μs	11.4 ± 0.19 μs	0.997
simulate/SequentialSamplingModels.DDM	3.7 ± 1.5 μs	3.32 ± 1.4 μs	1.12
simulate/SequentialSamplingModels.LBA	3.8 ± 0.5 μs	3.88 ± 1.9 μs	0.979
simulate/SequentialSamplingModels.LCA	0.102 ± 0.021 ms	0.103 ± 0.023 ms	0.989
simulate/SequentialSamplingModels.RDM	0.0693 ± 0.03 ms	0.0674 ± 0.025 ms	1.03
simulate/SequentialSamplingModels.Wald	9.4 ± 4.7 μs	9.23 ± 4.8 μs	1.02
simulate/SequentialSamplingModels.WaldMixture	4.05 ± 1.6 μs	4.02 ± 1.6 μs	1.01
time_to_load	3.45 ± 0.0097 s	3.45 ± 0.013 s	0.999

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

DominiqueMakowski · 2024-03-08T18:00:17Z

Should we aim at merging this before @kiante-fernandez's talk ^^ ?

itsdfish · 2024-07-12T21:41:33Z

I'll close this now that it has been moved to #90

kiante-fernandez added 2 commits June 21, 2023 16:26

pdf for Ratcliff DDM first pass

cd364ee

added rand simulators for Ratcliff Diffusion Model

342f6bb

itsdfish reviewed Jun 22, 2023

View reviewed changes

src/RatcliffDDM.jl Outdated Show resolved Hide resolved

itsdfish reviewed Jun 22, 2023

View reviewed changes

src/RatcliffDDM.jl Outdated Show resolved Hide resolved

itsdfish changed the title ~~Ratcliff Diffusion Model pdf and simulators~~ WIP: Ratcliff Diffusion Model pdf and simulators Jun 29, 2023

kiante-fernandez and others added 8 commits July 1, 2023 18:56

Merge branch 'master' into ssmDev

2a33d5f

Merge branch 'itsdfish:master' into ssmDev

217bc6e

Merge branch 'ssmDev' of https://github.com/kiante-fernandez/Sequenti…

b8f998f

…alSamplingModels.jl into ssmDev

Merge branch 'itsdfish:master' into ssmDev

9339d77

changes to Ratcliff to for SSM format

b5cca7d

cdf fast dm implementation

244a24d

Merge branch 'itsdfish:master' into ssmDev

0bb7df6

_pdf change

8014ab0

itsdfish reviewed Jul 3, 2023

View reviewed changes

src/RatcliffDDM.jl Outdated Show resolved Hide resolved

itsdfish reviewed Jul 3, 2023

View reviewed changes

src/RatcliffDDM.jl Outdated Show resolved Hide resolved

kiante-fernandez added 4 commits July 3, 2023 18:51

cdf code must be redone. Wrong in a few ways

c8f1c9d

added non license breaking implementation of cdf

dcccdf6

bugs

1560760

minor bugs

1e7669c

itsdfish reviewed Jul 4, 2023

View reviewed changes

.vscode/settings.json Outdated Show resolved Hide resolved

kiante-fernandez added 3 commits July 4, 2023 19:43

rm vscode settings

dc8cc16

settings change

6aa64a2

rm vscode settings

1e6f316

kiante-fernandez added 4 commits October 21, 2023 19:01

merged Ratcliff docs

f7f29ad

make simulate use across trial parameters

a025829

working on test changes

538dbad

Merge branch 'ssmDev' of https://github.com/kiante-fernandez/Sequenti…

b09c9ef

…alSamplingModels.jl into ssmDev

kiante-fernandez added 2 commits December 25, 2023 17:18

Merge remote-tracking branch 'upstream/master' into ssmDev

57d48b7

Add random number generator to DDM simulation

c530353

Add AbstractDDM and remove/port RatcliffDDM

bd36ea8

kiante-fernandez requested a review from itsdfish December 27, 2023 02:26