CTrees without (reserved) child files

[See also https://github.com/ContentMine/cmine/issues/10 ]

Until recently `CTree`s were generated either locally or through `getpapers` or `quickscrape`. The automatically generated files contain at least one reserved file such as `fulltext.pdf` and this was used by `CMine` software to determine which directories in a `CProject` are actually `CTree`s. This was always recognised to be a heuristic, and recently with bulk download of metadata from `Crossref` we see many potential `CTree` without reserved files or even without any files. Here's a simple example:

```
├── PMC4678086
│   ├── eupmc_result.json
│   ├── fulltext.pdf
│   └── fulltext.xml
├── http_dx.doi.org_10.1001_jama.2016.7992
│   └── results.json
└── http_dx.doi.org_10.1007_s13201-016-0429-9
```

The first directory is retrieved by `quickscrape` from  `EPMC` and the heuristics indicate it to be a potential `CTree`. The other two are retrieved from `getpapers` on `Crossref` followed by `quickscrape` which creates only metadata but currently are not flagged as `CTree`s. The empty directory is created (I think) by `quickscrape` which then fails to retrieve anything.

The original motivation for the heuristics is that we _may_ introduce new reserved directories into a `CProject` and users might also introduce non-`ctree` directories.  There was also the idea that we have a reserved file (e.g. `metadata.json` or `log.xml`) in any `CTree` directory`. At present I favour this, and we should discuss what is in it.

Currently I have added a switch 

```
        cProject.setTreatAllChildDirectoriesAsCTrees(true);
```

which allows users to toggle this behaviour. I will also add `results.json` to the reserved files which flag "Ctree-ness".


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CTrees without (reserved) child files #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CTrees without (reserved) child files #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions