Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docstrings to some of the distributions #15

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions src/distributions/add_typos.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
import StringDistances: DamerauLevenshtein, evaluate

"""
word_with_typos::String ~ AddTypos(word::String, max_typos=nothing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am slightly concerned that this style of documentation implies that we support this syntax, when in fact I don't believe PClean's parser can handle type annotations on the LHS of a ~. We could just fix this in PClean (and probably should, eventually)—or we could find another way to communicate the return type?


Add a random number of random typos to the given string.

The distribution on the of typos added to a word depends on the word
length. On average there is approximately 1 typo for every 45 characters in the
input word when max_typos is large or not provided.

The typos can be one of several types:

- insertion: insert a random lower-case letter at a random location

- deletion: delete a random character

- substitution: replace a random character with a random lower-case letter

- transpose: swap a random pair of two consecutive letters

NOTE: The log-density is approximate
"""
struct AddTypos <: PCleanDistribution end

has_discrete_proposal(::AddTypos) = false
Expand Down
5 changes: 5 additions & 0 deletions src/distributions/maybe_swap.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
"""
MaybeSwap(val, options, prob)

With probability prob, return a random element from options, otherwise return val.
"""
struct MaybeSwap <: PCleanDistribution end

supports_explicitly_missing_observations(::MaybeSwap) = true
Expand Down
8 changes: 8 additions & 0 deletions src/distributions/string_prior.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
using CSV

"""
str::String ~ StringPrior(min_length, max_length, proposal_atoms::Vector{String})

Sample a string of random length froma simple bigram model fit to English text.

The string length is uniformly distributed between min_length and max_length (inclusive).
The alphabet is {'a', 'b', .., 'z', ' ', '.'}.
"""
struct StringPrior <: PCleanDistribution end

letter_probs_file = joinpath(dirname(pathof(PClean)), "distributions", "lmparams", "letter_probabilities.csv")
Expand Down
7 changes: 7 additions & 0 deletions src/distributions/time_prior.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
using CSV

"""
TimePrior(

Return a random time stamp of form @sprintf("%d:%02d %s", hours, minutes, ampm).

The hours, minutes and ampm are drawn uniformly from {1, .., 12}, {0, .., 59}, and {"a.m.", "p.m."} respectively.
"""
struct TimePrior <: PCleanDistribution end

has_discrete_proposal(::TimePrior) = true
Expand Down