Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docstrings to some of the distributions #15

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions src/distributions/add_noise.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
"""
noisy_value::Float64 ~ AddNoise(mean::Float64, std::Float64)

Adds normally-distributed random noise (with standard deviation `std`) to the value `mean`.
"""
struct AddNoise <: PCleanDistribution end

has_discrete_proposal(::AddNoise) = false
Expand Down
21 changes: 21 additions & 0 deletions src/distributions/add_typos.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
import StringDistances: DamerauLevenshtein, evaluate

"""
word_with_typos::String ~ AddTypos(word::String, max_typos=nothing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am slightly concerned that this style of documentation implies that we support this syntax, when in fact I don't believe PClean's parser can handle type annotations on the LHS of a ~. We could just fix this in PClean (and probably should, eventually)—or we could find another way to communicate the return type?


Add a random number of random typos to `word`.

The distribution on the of typos added to a word depends on the word
length. On average there is approximately 1 typo for every 45 characters in the
input word when `max_typos` is large or not provided.

The typos can be one of several types:

- insertion: insert a random lower-case letter at a random location

- deletion: delete a random character

- substitution: replace a random character with a random lower-case letter

- transpose: swap a random pair of two consecutive letters

NOTE: The log-density is approximate
"""
struct AddTypos <: PCleanDistribution end

has_discrete_proposal(::AddTypos) = false
Expand Down
5 changes: 5 additions & 0 deletions src/distributions/maybe_swap.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
"""
MaybeSwap(val, options, prob)

With probability `prob`, return a random element from `options`, otherwise return `val`.
"""
struct MaybeSwap <: PCleanDistribution end

supports_explicitly_missing_observations(::MaybeSwap) = true
Expand Down
8 changes: 8 additions & 0 deletions src/distributions/string_prior.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
using CSV

"""
str::String ~ StringPrior(min_length, max_length, proposal_atoms::Vector{String})

Sample a string of random length froma simple bigram model fit to English text.

The string length is uniformly distributed between `min_length` and `max_length` (inclusive).
The alphabet is the set {'a', 'b', .., 'z', ' ', '.'}.
"""
struct StringPrior <: PCleanDistribution end

letter_probs_file = joinpath(dirname(pathof(PClean)), "distributions", "lmparams", "letter_probabilities.csv")
Expand Down
7 changes: 7 additions & 0 deletions src/distributions/time_prior.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
using CSV

"""
timestamp::String ~ TimePrior(proposal_atoms::Vector{String})

Return a random time stamp of form `@sprintf("%d:%02d %s", hours, minutes, ampm)`.

The `hours`, `minutes` and `ampm` are drawn uniformly from {1, .., 12}, {0, .., 59}, and {"a.m.", "p.m."} respectively.
"""
struct TimePrior <: PCleanDistribution end

has_discrete_proposal(::TimePrior) = true
Expand Down