Skip to content

Latest commit

 

History

History
326 lines (281 loc) · 13.8 KB

examples.md

File metadata and controls

326 lines (281 loc) · 13.8 KB

Example usage of bcstats

Let’s apply bcstats in a few examples.

Toy example

First, consider the following minimal working example. bcstats comes with two example two data sets. Load the library to get started.

library(bcstatsR)

And then load the two datasets that come bundled with the library.

data(survey)
data(bc)

Let’s take a look at the survey data.

print(survey)
id date enum enumteam gender gameresult itemssold
1 15apr2011 hana 3 female 10 2
2 15apr2011 mark 1 female 9 7
3 16may2011 lisa 2 female 12 1
4 13jun2011 ife 2 female 10 5
5 10jun2011 dean 1 female 12 3
6 17may2011 annie 1 NA 10 7
7 07jul2011 dean 1 female 14 1
8 16jun2011 annie 1 female 7 3
9 14jul2011 brooke 2 female 10 3
10 25jul2011 brooke 2 female 14 1
11 16jun2011 lisa 2 female 14 1
12 13jun2011 hana 3 female 9 3
13 21jul2011 mateo 3 female 14 1
14 15aug2011 rohit 3 female 11 1
15 30may2011 annie 1 female 10 4
16 04may2011 mark 1 female 10 4
17 19apr2011 brooke 2 female 12 5
18 01apr2011 mateo 3 female 9 1
19 11may2011 ife 2 female 10 5
20 28mar2011 mateo 3 female 10 4
21 09may2011 ife 2 female 12 7
22 06apr2011 brooke 2 female 10 3
23 07jun2011 annie 1 female 9 2
24 01apr2011 annie 1 NA 9 5
25 18may2011 hana 3 female 10 6
26 12may2011 hana 3 female 9 4
27 31may2011 mark 1 female 9 3
28 29apr2011 mateo 3 female 10 1
29 11may2011 annie 1 female 6 8
30 10may2011 ife 2 female 8 4
31 20apr2011 dean 1 female 9 6
32 30mar2011 mateo 3 female 10 5
33 06may2011 ife 2 female 9 5

Now, take a look at the back check data (i.e., the follow up where highly trained surveyors interview the same households).

print(bc)
id date bcer gender gameresult itemssold
1 18apr2011 wendy female NA NA
2 18apr2011 wendy NA 9 10
3 17may2011 rebecca female NA NA
4 14jun2011 wendy female NA NA
5 13jun2011 wendy NA 12 5
6 18may2011 wendy male NA NA
7 08jul2011 wendy NA 14 1
8 17jun2011 wendy female NA NA
9 15jul2011 wendy NA 10 3
10 26jul2011 rebecca female NA NA
11 17jun2011 wendy NA 14 1
12 14jun2011 wendy NA 9 6
13 22jul2011 rebecca female NA NA
14 16aug2011 rebecca NA 14 1

In this example, gender, gameresult and itemssold are the variables collected in both the survey and the back check. Note that id identifies the respondent in both the survey and the back check. In the survey, enum and enumteam tells us the surveyor and the team of the surveyor. We’ll want to know whether or not these surveyors and teams collected the data correctly in the survey. Similarly, in the back check, we’ll want to summarize the data by back checker to see if we notice unusual patterns.

Now, let’s run the back check!

result <- bcstats(surveydata  = survey,
                  bcdata      = bc,
                  id          = "id",
                  t1vars      = "gender",
                  t2vars      = "gameresult",
                  t3vars      = "itemssold",
                  enumerator  = "enum",
                  enumteam    = "enumteam",
                  backchecker = "bcer")

And auto-magically, you’ve created a bunch of results stored in result. Let’s take a look at back check, which has been stored in result$backcheck.

print(result$backcheck)
id enum enumteam bcer type variable value.survey value.backcheck
1 hana 3 wendy type 2 gameresult 10 NA
1 hana 3 wendy type 3 itemssold 2 NA
2 mark 1 wendy type 1 gender female NA
2 mark 1 wendy type 3 itemssold 7 10
3 lisa 2 rebecca type 2 gameresult 12 NA
3 lisa 2 rebecca type 3 itemssold 1 NA
4 ife 2 wendy type 2 gameresult 10 NA
4 ife 2 wendy type 3 itemssold 5 NA
5 dean 1 wendy type 1 gender female NA
5 dean 1 wendy type 3 itemssold 3 5
6 annie 1 wendy type 2 gameresult 10 NA
6 annie 1 wendy type 1 gender NA male
6 annie 1 wendy type 3 itemssold 7 NA
7 dean 1 wendy type 1 gender female NA
8 annie 1 wendy type 2 gameresult 7 NA
8 annie 1 wendy type 3 itemssold 3 NA
9 brooke 2 wendy type 1 gender female NA
10 brooke 2 rebecca type 2 gameresult 14 NA
10 brooke 2 rebecca type 3 itemssold 1 NA
11 lisa 2 wendy type 1 gender female NA
12 hana 3 wendy type 1 gender female NA
12 hana 3 wendy type 3 itemssold 3 6
13 mateo 3 rebecca type 2 gameresult 14 NA
13 mateo 3 rebecca type 3 itemssold 1 NA
14 rohit 3 rebecca type 2 gameresult 11 14
14 rohit 3 rebecca type 1 gender female NA

Each row contains the difference between the survey and the back check by each household and variable. Cases where nothing changed have not been included in this data.frame. Now let’s take a look at the error rates for Type 1 variables by each surveyor (enumerator).

print(result[["enum1"]]$summary)
enum error.rate differences total
annie 0.5 1 2
brooke 0.5 1 2
dean 1.0 2 2
hana 0.5 1 2
ife 0.0 0 1
lisa 0.5 1 2
mark 1.0 1 1
mateo 0.0 0 1
rohit 1.0 1 1

We can also take at the error rate for each Type 1 variable by enumerator.

print(result[["enum1"]]$each)
enum variable error.rate differences total
annie gender 0.5 1 2
brooke gender 0.5 1 2
dean gender 1.0 2 2
hana gender 0.5 1 2
ife gender 0.0 0 1
lisa gender 0.5 1 2
mark gender 1.0 1 1
mateo gender 0.0 0 1
rohit gender 1.0 1 1

And we can do the same thing for Type 2 variables.

print(result[["enum2"]]$summary)
print(result[["enum2"]]$each)
enum error.rate differences total
annie 1.0 2 2
brooke 0.5 1 2
dean 0.0 0 2
hana 0.5 1 2
ife 1.0 1 1
lisa 0.5 1 2
mark 0.0 0 1
mateo 1.0 1 1
rohit 1.0 1 1
enum variable error.rate differences total
annie gameresult 1.0 2 2
brooke gameresult 0.5 1 2
dean gameresult 0.0 0 2
hana gameresult 0.5 1 2
ife gameresult 1.0 1 1
lisa gameresult 0.5 1 2
mark gameresult 0.0 0 1
mateo gameresult 1.0 1 1
rohit gameresult 1.0 1 1

Now let’s redo the back check where this time we do a t-test for the differences between the survey data and the back check.

result <- bcstats(surveydata  = survey,
                  bcdata      = bc,
                  id          = "id",
                  t1vars      = "gender",
                  t2vars      = "gameresult",
                  t3vars      = "itemssold",
                  enumerator  = "enum",
                  enumteam    = "enumteam",
                  backchecker = "bcer",
                  ttest       = "itemssold")

You can find the results for the t-test as an element of the results list.

print(result[["ttest"]]$itemssold)
## 
##  Paired t-test
## 
## data:  as.numeric(pairwise.var$value.survey) and as.numeric(pairwise.var$value.backcheck)
## t = -2.0656, df = 6, p-value = 0.0844
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.4966927  0.2109784
## sample estimates:
## mean of the differences 
##               -1.142857

We could have choosen to not code some changes as errors as follows,

result <- bcstats(surveydata  = survey,
                  bcdata      = bc,
                  id          = "id",
                  t1vars      = "gender",
                  t2vars      = "gameresult",
                  t3vars      = "itemssold",
                  enumerator  = "enum",
                  enumteam    = "enumteam",
                  backchecker = "bcer",
                  nodiff      = list(itemssold = c(0)))

or specify an acceptable range,

result <- bcstats(surveydata  = survey,
                  bcdata      = bc,
                  id          = "id",
                  t1vars      = "gender",
                  t2vars      = "gameresult",
                  t3vars      = "itemssold",
                  enumerator  = "enum",
                  enumteam    = "enumteam",
                  backchecker = "bcer",
                  okrange     = list(itemssold = c(0, 5)))

or exclude them all together.

result <- bcstats(surveydata  = survey,
                  bcdata      = bc,
                  id          = "id",
                  t1vars      = "gender",
                  t2vars      = "gameresult",
                  t3vars      = "itemssold",
                  enumerator  = "enum",
                  enumteam    = "enumteam",
                  backchecker = "bcer",
                  exclude     = list(itemssold = c(0)))

Multiple variables

Of course, you’ll want to check multiple variables within any given type. You can just pass those as variable names as a list. For example, if you want to run the back check with both gender and gameresult as Type 1 variables, you could do the following:

result.mv <- bcstats(surveydata  = survey,
                     bcdata      = bc,
                     id          = "id",
                     t1vars      = c("gender", "gameresult"),
                     t3vars      = "itemssold",
                     enumerator  = "enum",
                     enumteam    = "enumteam",
                     backchecker = "bcer",
                     exclude     = list(itemssold = c(0)))

Need help?

Check out all the features of bcstats in the help page and post an issue on GitHub if you encounter any problems.

help(bcstats)