Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in PCC.disagreement: Input is not a numeric matrix. #51

Open
GGoetzelmann opened this issue Jun 14, 2018 · 5 comments
Open

Error in PCC.disagreement: Input is not a numeric matrix. #51

GGoetzelmann opened this issue Jun 14, 2018 · 5 comments
Labels

Comments

@GGoetzelmann
Copy link

Thank you for this R package, it looks like an interesting project.
I have started playing around with it (without much insight into the stemma creation method yet and with no experience in R at all)

I have tried to use data with multiple readings the parameter alternateReadings=TRUE

In the interactive mode the first few steps work fine but then the error

Error in PCC.disagreement(tableVariantes, omissionsAsReadings = omissionsAsReadings) :
Input is not a numeric matrix.

is thrown.

I have tried with a real data set first but then used the example matrix from the documentation as test data.

I had to duplicate the matrix a few times, otherwise I got the error

Error in cluster::pam(ordConflTot[, 1], numberOfClasses) :
Number of clusters 'k' must be in {1,2, .., n-1}; hence n >= 2

So my minimal data example for the error would be:

 A D F T P
1 "1" "2" "2" "2" "1,2"
2 "1" "2" "1,2" "2" "1"
3 "1" "1" "1" "1" "2"
4 "1,3" "1,2" "1" "2" "3"
5 "1" "2" "2" "2" "1,2"
6 "1" "2" "1,2" "2" "1"
7 "1" "1" "1" "1" "2"
8 "1,3" "1,2" "1" "2" "3"
9 "1" "2" "2" "2" "1,2"
10 "1" "2" "1,2" "2" "1"
11 "1" "1" "1" "1" "2"
12 "1,3" "1,2" "1" "2" "3"
13 "1" "2" "2" "2" "1,2"
14 "1" "2" "1,2" "2" "1"
15 "1" "1" "1" "1" "2"
16 "1,3" "1,2" "1" "2" "3"

I have loaded it from a txt file with mydata = read.table("filename.txt") and mydata = as.matrix(mydata) and then used PCC(mydata,alternateReadings=TRUE).

@Jean-Baptiste-Camps
Copy link
Owner

Hi, thanks for your interest in the package ! I did not succeed in replicating this bug for now. Did you install from CRAN ? Perhaps you are still on a version < 3 ?

For k-medoïds, I will correct that. It is because you can't have more clusters than individuals.

@Jean-Baptiste-Camps
Copy link
Owner

PS: as a side note, the handling of alternateReadings is not fully implemented in the stemma building functions, because I do not have a good algorithm for that yet (and also because cases with alternateReadings on the same witness are excessively rare for the romance texts I work mostly with).

@GGoetzelmann
Copy link
Author

@Jean-Baptiste-Camps
thank you for your reply. Yes, I installed from CRAN, the installed version was 0.3.1
I tried today to install the github version but I am not sure I succeeded. Both show the same version (via sessionInfo()), I think.

Since you cannot reproduce the problem, I tried to use the steps in the example from the readme instead of the interactive PCC function. On my minimal example above, I used a threshold of 0.1 and skipped the step with "myNewData = PCC.equipollent"

So basically:

> myConflicts = PCC.conflicts(mydata,alternateReadings=TRUE)
> myConflicts = PCC.overconflicting(myConflicts, ask = FALSE, threshold = 0.1)
> myNewData = PCC.elimination(myConflicts)
> myConflicts = PCC.conflicts(myNewData,alternateReadings=TRUE)

myNewData then is

   A   D   F     T   P
2  "1" "2" "1,2" "2" "1"
3  "1" "1" "1"   "1" "2"
6  "1" "2" "1,2" "2" "1"
7  "1" "1" "1"   "1" "2"
10 "1" "2" "1,2" "2" "1"
11 "1" "1" "1"   "1" "2"
14 "1" "2" "1,2" "2" "1"
15 "1" "1" "1"   "1" "2"

and PCC.Stemma(myNewData) fails with Error in PCC.disagreement(tableVariantes, omissionsAsReadings = omissionsAsReadings) : Input is not a numeric matrix. Which is true, but there doesn't seem to be a parameter to say otherwise?

I find alternateReadings interesting, because I deal with ancient data sets where it is very likely that a word or part is only partially readable. So imho this looks like a way to say something like "this is either (one of) the word(s) in other witnesses at this position or something different". At least it would be something worth comparing to always using '?' for damaged words. And very fragmentary and therefor uncertain/fuzzy data is a problem for a lot of (phylogenetic) approaches anyway.

@Jean-Baptiste-Camps
Copy link
Owner

Ok, many thanks on your very clear message.
I understand now. Indeed, there is no algorithm for the moment that allows alternative readings for stemma building (PCC.Stemma is very strict and allows only for a single reading per witness, I would have to implement other algorithms to allow that).
BUT, actually, there is a way to do exactly what you want to do, if I understand well, which is to use NA for 'not available'/'no answer' (it is a basic R type for missing value, cf. https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/NA). NA's are handled by PCC.Stemma. As you say, they can be problematic, as the algorithm will take into account only the information it has, but it is manageable up to a certain point, as long as there are enough points where the witnesses can be compared.

@GGoetzelmann
Copy link
Author

I see, thanks for the clarification.

I am aware of the NA feature and I find it very useful. In fact I wanted to compare a matrix with NA readings with an alternative encoding which tries to assign multiple readings to fragmented words. That is where I encountered the issue. My data set right now is already quite sparse, so every reading would help, I guess. But atm it is just on-the-side playing around out of curiosity. I'll watch the project with interest for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants