Skip to content

Commit

Permalink
Merge pull request #653 from worldbank/arm_lfs_update
Browse files Browse the repository at this point in the history
Arm lfs update
  • Loading branch information
gronert-m authored Oct 10, 2024
2 parents edffbbd + ff6ecba commit 72a4a79
Show file tree
Hide file tree
Showing 15 changed files with 16,442 additions and 11 deletions.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,18 @@ This change leads to an apparent decrease in non-paid employees after 2017.

In addition, kindly be advised that it is not feasible to convert between different ICLS versions in ARM LFS. Although the questionnaire screenshots above show that the employment status is composed by several sub-questions in terms of "for-sale" and "for-own-consumption" work, the raw datasets only present the final decision about whether a given observation is employed or not based on their answers to the sub-questions which we couldn't see. Therefore, we were not able to convert between ICLS-13 and ICLS-19 by "switching on and off" the "for-own-consumption jobs".

### Changes to the weight variable

At the same time as the change of employment definition was implemented, a change in the weight calculation was also introduced. From 2018 onwards a new weight (referred in the raw data as "_calibrated_") is used. This value results in a higher extended population. during 2018 and 2019 both the old weights (used in 2014 to 2017) and the calibrated weights (used since 2018) are available in the raw data.

The GLD harmonization includes by default the calibrated weights as the standard weights (variables `weight` and `weight_q`). For the years 2018 and 2019 the old weights are kept in special variables (i.e., not part of the common GLD dictionary) called `weight_old_series` and `weight_q_old_series`.

The graph below plots the labor force participation using the old series (blue line) and the new weights (red line). Additionally, the green line denotes the data as reported by the Armenian NSO. The change in employment definition should decrease LFP as own consumption workers are no longer employed. This is only visible with the old weights, as the new weights increase the population that indicates a positive jump in the labor force participation (LFP)

![LFP plot by weights](utilities/weight_timelines.png)

Users are advised to take these changes into account when trying to create a series over all LFS years. The code to produce the above graph [can be found here](utilities/Comparison_weights.do).

### Coding of industry and occupation codes

In terms of classifications of industry and occupation, ARM LFS used ISIC-4 and ISCO-08 respectively for all years. No national classifications applied.
Expand All @@ -83,10 +95,13 @@ However, regarding the level of classification, both industry and occupation onl

### Coding PSU and strata

Based on the description of sampling methodology in the annual report of 2018 (refer to the screenshot below), the sampling is based on the 2011 census. The PSUs are census enumeration areas to which we currently do not have access. In this case, we didn't code `psu` in GLD. We may update this variable in the future if information concerned becomes availabel.
Based on the description of sampling methodology in the annual report of 2018 (refer to the screenshot below), the sampling is based on the 2011 census. The PSUs are census enumeration areas to which we currently do not have access. In this case, we didn't code `psu` in GLD. We may update this variable in the future if information concerned becomes available.

![PSU](utilities/PSU.png)

### Age restriction of the questionnaire labor market module

While most surveys limit the market module to all people aged above lower age threshold (commonly 15 years of age), the Armenian ILCS is limited also by an upper age threshold. Only people aged 15 to 75 respond to the labor market module and thus have labor variables. Given the low labor force participation of people aged 76 and older, it can be assumed that the labor force participation of the population 15+ is lower than the one that can be calculated.

### Coding of education

Expand Down Expand Up @@ -119,13 +134,6 @@ With this information, the coding choice is shown below. The key distinction is
| | | | 10. Certified specialist | 7 University incomplete or complete |
| | | | 11. Post-graduate (Ph.D, doctorate, internship, residency) | 7 University incomplete or complete |
### Employment rate comparison

As part of our harmonization quality checks, we conducted a cross examination of employment rate with official annual reports. The figure below shows the comparison which indicates that our `lstatus` was correctly harmonized and the results align with official results.

<br></br>

![employment_rate](utilities/employment_comparison.png)

### Age variables in 2021 and 2022

Expand All @@ -137,10 +145,22 @@ Different from previous years, the last two years, 2021 and 2022 do not have the

In this case, we harmonized our `age` variable using the mean of each sub-group of the original `Age_16groups` so as to proceed with our quality checks which require numeric values (i.e., all people between 0 and 4 are coded as 2 year olds, all 65 to 69 are coded as 67). Hence, for 2021 and 2022, `age` in GLD harmonization of ARM LFS does not represent the actual age of a given observation. Yet instead, it indicates the age group a given observation belongs to.

### Migration from other countries
### Absent household members

Armenia's LFS includes in the general information about the household all members the household deems to be part of it. However, many of the people the household emotionally may list as its members are absent (e.g., a child who has left for work in another country) and thus are not to be included.

The logic in the questionnaires from 2014 to 2017 is to have the enumerator skip any members who have been absent for 3 or more months (except for military service), as shown below.

![Skipping absent HH members until 2017](utilities/in-exclusion_til_2017.png)

From 2018, a special section (section **C**) is introduced to collect more information on absent household members and determine their exclusion. Their general status is determined in questions C3 and C4 (green box in the screenshot below), their eligibility is determined in question C12 (red box) following the parameters outlined below (yellow box).

![Skipping absent HH members from 2018](utilities/in-exclusion_from_2018.png)

The above screenshot is from the 2019 questionnaire and questions as well as instructions change slightly over the years. In 2022, all people aged 2022 are coded as eligible, even though 253 individuals claim that they have not usually resided in the household in the previous year (C3 == 2) and have been absent for at least 3 months (C10 == 4 or 5).

Since 2018, ARM LFS includes a migration section. From 2019 to 2022, one of the "last settlement" options is *Artsakh* (also known as Nagorno-Karabakh), an enclave of mostly ethnic Armenian contested between Armenia and Azerbaijan. The international community’s position as of December 2023 is that the area is official part of Azerbaijan. Consequently, and as there is no ISO three-letter country code for Artsakh, respondents with this answer are coded as "AZE", the ISO code of Azerbaijan. This is purely to keep answers within the alpha-3 ISO codes and in no way an endorsement of any political position.
Users are advised to review the logic and take into account how it may impact any calculation of the working age calculation (as absentees are present in the data even though they lack labor module answers). Finally, users are also advised to review the translations as there may be errors in the English translation. The only error in this section we have found is shown below, where the English translation has a different skip pattern to the Armenian version.

<br></br>

![migration_country](utilities/migration.png)
![Difference in skip pattern between English and Armenian version in 2018](utilities/Error_pop_move_2018_eng.png)
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
clear

*---- Append 2014 to 2022 data
foreach year of numlist 2014/2022 {

append using "Y:/GLD/ARM/ARM_`year'_LFS\ARM_`year'_LFS_V01_M_V03_A_GLD/Data/Harmonized/ARM_`year'_LFS_V01_M_V03_A_GLD_ALL.dta"


}

*---- Create variables for comparison
gen lfp_old_series = .
replace lfp_old_series = weight if inlist(lstatus,1,2) & inrange(year,2014,2017)
replace lfp_old_series = weight_old_series if inlist(lstatus,1,2) & inrange(year,2018,2019)

gen wap_old_series = .
replace wap_old_series = weight if inlist(lstatus,1,2,3) & inrange(year,2014,2017)
replace wap_old_series = weight_old_series if inlist(lstatus,1,2,3) & inrange(year,2018,2019)

gen lfp_new_series = .
replace lfp_new_series = weight if inlist(lstatus,1,2) & inrange(year,2018,2022)

gen wap_new_series = .
replace wap_new_series = weight if inlist(lstatus,1,2,3) & inrange(year,2018,2022)

gen lfp_cmb_series = .
replace lfp_cmb_series = weight if inlist(lstatus,1,2)

gen wap_cmb_series = .
replace wap_cmb_series = weight if inlist(lstatus,1,2,3)


*---- Collapse data to get by year
collapse (sum) lfp_* wap_*, by(year)

// Loop through all variables in the dataset
foreach var of varlist * {
// Replace 0 with missing value (.)
replace `var' = . if `var' == 0
}
*list

*---- Add in official data

gen lfp_off = .
replace lfp_off = 1375700 if year == 2014
replace lfp_off = 1316400 if year == 2015
replace lfp_off = 1226300 if year == 2016
replace lfp_off = 1230700 if year == 2017
replace lfp_off = 1293800 if year == 2018
replace lfp_off = 1318100 if year == 2019
replace lfp_off = 1286700 if year == 2020
replace lfp_off = 1287300 if year == 2021
replace lfp_off = 1311300 if year == 2022


*---- Plot

// Determine the range of years
summarize year
local min_year = r(min)
local max_year = r(max)

twoway (line lfp_old_series year) (line lfp_new_series year) (line lfp_off year), ///
title("Plot of ARM LFP over years by weight variable") ///
xlabel(`min_year'(1)`max_year', grid) ylabel(, grid) ///
legend(order(1 "LFP with weight 204-17, old weigth 2018-19" 2 "LFP with weight 2018-2022" 3 "LFP ARMSTAT Data") ///
position(6) row(3))

gr export "Y:/GLD/ARM/ARM_2022_LFS/ARM_2022_LFS_V01_M_V03_A_GLD/Work/weight_timelines.png", replace
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 72a4a79

Please sign in to comment.