Description
[See also https://github.com//issues/10 ]
Until recently CTree
s were generated either locally or through getpapers
or quickscrape
. The automatically generated files contain at least one reserved file such as fulltext.pdf
and this was used by CMine
software to determine which directories in a CProject
are actually CTree
s. This was always recognised to be a heuristic, and recently with bulk download of metadata from Crossref
we see many potential CTree
without reserved files or even without any files. Here's a simple example:
├── PMC4678086
│ ├── eupmc_result.json
│ ├── fulltext.pdf
│ └── fulltext.xml
├── http_dx.doi.org_10.1001_jama.2016.7992
│ └── results.json
└── http_dx.doi.org_10.1007_s13201-016-0429-9
The first directory is retrieved by quickscrape
from EPMC
and the heuristics indicate it to be a potential CTree
. The other two are retrieved from getpapers
on Crossref
followed by quickscrape
which creates only metadata but currently are not flagged as CTree
s. The empty directory is created (I think) by quickscrape
which then fails to retrieve anything.
The original motivation for the heuristics is that we may introduce new reserved directories into a CProject
and users might also introduce non-ctree
directories. There was also the idea that we have a reserved file (e.g. metadata.json
or log.xml
) in any CTree
directory`. At present I favour this, and we should discuss what is in it.
Currently I have added a switch
cProject.setTreatAllChildDirectoriesAsCTrees(true);
which allows users to toggle this behaviour. I will also add results.json
to the reserved files which flag "Ctree-ness".