Skip to content

Commit 32ca5a6

Browse files
committed
Added aggregation pipelines and added "regional" breajdown.
Set the "regional" breakdown category to all "background" breakdowns that had "region" as their category. Added a set of aggregation pipelines to check for duplicates and other stuff. Need to study the actual surveys, since this data is mostly summary. Will launch ArangoDB import.
1 parent 0b34a47 commit 32ca5a6

File tree

5 files changed

+204
-62
lines changed

5 files changed

+204
-62
lines changed

.idea/workspace.xml

+29-61
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

batch/DHS.php

+5-1
Original file line numberDiff line numberDiff line change
@@ -1262,7 +1262,11 @@ public function loadSurveyData( $theSurvey )
12621262
//
12631263
// Set descriptor breakdown.
12641264
//
1265-
$document[ kTAG_BREAKDOWN ] = $domain;
1265+
$document[ kTAG_BREAKDOWN ]
1266+
= ( strlen( trim( $line[ 'RegionId' ] ) )
1267+
&& (trim( $line[ 'CharacteristicCategory' ] ) == 'Region') )
1268+
? 'regional'
1269+
: $domain;
12661270

12671271
//
12681272
// Set other data.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
db._data.aggregate(
2+
3+
// Pipeline
4+
[
5+
// Stage 1
6+
{
7+
$group: {
8+
_id: "$@b",
9+
"count": { $sum: 1 }
10+
}
11+
},
12+
13+
// Stage 2
14+
{
15+
$match: {
16+
"count": { $gt: 1 }
17+
}
18+
},
19+
20+
// Stage 3
21+
{
22+
$out: "checks_DuplicateDataIdentifiers"
23+
}
24+
],
25+
26+
// Options
27+
{
28+
cursor: {
29+
batchSize: 50
30+
},
31+
32+
allowDiskUse: true
33+
}
34+
35+
// Created with 3T MongoChef, the GUI for MongoDB - http://3t.io/mongochef
36+
37+
);

0 commit comments

Comments
 (0)