Incorporating binary labels into kernel distance #133

helenhe96 · 2018-02-15T21:09:55Z

No description provided.

ArtPoon · 2018-02-16T20:48:43Z

Make a new branch
Move tree-processing level functions from tree-kernel.R to a new file
tree.kernel should take regular expressions as label arguments instead of expecting character vectors
the regexes should be applied within tree.kernel() to classify tip labels from each tree into a finite number of categories, that can be represented by an integer-valued vector. These two integer vectors will be passed to C-level kernel computation.

ArtPoon · 2018-02-20T18:17:26Z

@gtng92 pointed out that the kernel distance can be called on trees x and y as k(x,y) or k(y,x), and that if we define two regular expressions then these trees could potentially be processed differently. After discussion we decided to use just one regex for kernel distances.

… issue #133, need to fix bugs

Deleted deprecated config parsing code from smcConfig.R Eliminated caching of "self" kernel scores to trees in treekernel.R

ArtPoon · 2018-02-26T18:16:26Z

Please write unit tests to check whether labeled kernel function is behaving properly before closing

ArtPoon · 2018-02-27T15:49:50Z

On branch issue133, we presently have this in treekernel.R (dropping commented lines):

tree.kernel <- function(tree1, tree2,
                        lambda,        # decay factor
                        sigma,         # RBF variance parameter
                        rho=1.0,         # SST control parameter; 0 = subtree kernel, 1 = subset tree kernel
                        normalize=0,   # normalize kernel score by sqrt(k(t1,t1) * k(t2,t2))
                        regexPattern="",     # arguments for labeled tree kernel
                        regexReplacement="",
                        gamma=0        # label factor
                        ) {
  # make labels
  use.label <- if (any(is.na(label1)) || any(is.na(label2)) || is.null(label1) || is.null(label2)) {
    FALSE
  } else {
    new_label1 <- gsub(regexPattern, regexReplacement, tree1$tip.label)
    new_label2 <- gsub(regexPattern, regexReplacement, tree2$tip.label)
    TRUE
  }
    
  nwk1 <- .to.newick(tree1)
  nwk2 <- .to.newick(tree2)
		
  res <- .Call("R_Kaphi_kernel",
                 nwk1, nwk2, lambda, sigma, as.double(rho), use.label, gamma, normalize,
                 PACKAGE="Kaphi")
  return (res)
}

We want to make these changes:

user provides regular expressions that determine how substrings that define states are extracted from tip labels --- tip labels have to be unique, but also share some substring in common that tells us whether two tips share the same state, e.g., were sampled from the same compartment
instead of gamma, user should pass a matrix of weights that includes row and column names. These names should correspond to the substrings that are extracted from tip labels by the regular expression.
This function should use both arguments to convert tip labels in either tree into integer-valued vectors, where the integers are indices into the weight matrix. The two integer vectors and the weight matrix (without row/column names) are passed to the C function as vectors (for the matrix, the number of rows and columns is given by the maximum integer values in the respective integer vectors).

ArtPoon · 2018-02-27T15:55:50Z

regexReplacement should be \\1 by default (capture a single group). There may be a situation where we want to concatenate two or more groups, so I guess we can let the user define a more complex label like "\1\2".

gtng92 · 2018-03-02T17:01:18Z

On branch issue133, implementation changed so that the weight matrix is no longer necessary.

user provides regex to extract the substrings from the tip labels
user provides character vector of all possible states
each tip label is assigned a binary encoded integer value reflective of the state(s) the tip label (5fe35e1)
integer vectors are passed down into C level (24aed96), where the integer value is then decoded and the different states are matched or mismatched

ArtPoon · 2018-03-12T17:24:04Z

We want to refactor the kernel to encode labels in each node's production. Whereas before productions can only take one of four values (0 for terminal node, 1 for node with two non-terminal descendants, etc.), we now want to have each internal node have a tuple (pair) of integers for productions, and reserve the integer value -1 when the descendant is an internal node.

ArtPoon · 2018-03-26T17:17:38Z

New labeled kernel is being prototyped in Python, see PoonLab/coevolution phyloK3.py

ArtPoon · 2019-06-12T15:51:25Z

Need to port Python implementation into R

helenhe96 added a commit that referenced this issue Feb 20, 2018

Changed the code to using just one regular expression as discussed in…

52e80f6

… issue #133, need to fix bugs

ArtPoon added a commit that referenced this issue Feb 23, 2018

Issue #133, moved kernel.dist() to treekernel.R

78cf900

Deleted deprecated config parsing code from smcConfig.R Eliminated caching of "self" kernel scores to trees in treekernel.R

ArtPoon mentioned this issue Feb 26, 2018

Validate ABC-SMC parameter estimation on BiSSE model #101

Open

ArtPoon added this to the version 0.3 milestone Mar 19, 2018

ArtPoon added the help wanted label Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporating binary labels into kernel distance #133

Incorporating binary labels into kernel distance #133

helenhe96 commented Feb 15, 2018

ArtPoon commented Feb 16, 2018

ArtPoon commented Feb 20, 2018

ArtPoon commented Feb 26, 2018

ArtPoon commented Feb 27, 2018 •

edited

Loading

ArtPoon commented Feb 27, 2018

gtng92 commented Mar 2, 2018 •

edited

Loading

ArtPoon commented Mar 12, 2018

ArtPoon commented Mar 26, 2018

ArtPoon commented Jun 12, 2019

Incorporating binary labels into kernel distance #133

Incorporating binary labels into kernel distance #133

Comments

helenhe96 commented Feb 15, 2018

ArtPoon commented Feb 16, 2018

ArtPoon commented Feb 20, 2018

ArtPoon commented Feb 26, 2018

ArtPoon commented Feb 27, 2018 • edited Loading

ArtPoon commented Feb 27, 2018

gtng92 commented Mar 2, 2018 • edited Loading

ArtPoon commented Mar 12, 2018

ArtPoon commented Mar 26, 2018

ArtPoon commented Jun 12, 2019

ArtPoon commented Feb 27, 2018 •

edited

Loading

gtng92 commented Mar 2, 2018 •

edited

Loading