Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stocksCRPS and factorsSPGMI sectors misspelled and not official GICS naming conventions used #86

Open
spinnj opened this issue Apr 2, 2022 · 2 comments

Comments

@spinnj
Copy link
Contributor

spinnj commented Apr 2, 2022

factorsSPGMI and stocksCRSP do not use official GICS naming conventions in sector names. In addition, several securities have obvious misspellings in Sector assignment, resulting in incorrect groupings. For example:

unique(factorsSPGMI$Sector) # contains 16 sectors with dupes

[1] "InfoTech" "Industrials" "HealthCare" [4] "ConsumStap" "Energy" "Materials" [7] "ConsumDisc" "TelcoServices" "Utilities" [10] "RealEstate" "Health Care" "Financials" [13] "Consumer Discretionay" "Information Technology" "Consumer Staple" [16] "Communication Services"

For those 16 "Sectors" above, the following mapping would be correct:

Sector (in factorsSPGMI) Official GICS Sector Name
InfoTech Information Technology
Industrials Industrials
HealthCare Health Care
ConsumStap Consumer Staples
Energy Energy
Materials Materials
ConsumDisc Consumer Discretionary
TelcoServices Communication Services
Utilities Utilities
RealEstate Real Estate
Health Care Health Care
Financials Financials
Consumer Discretionay Consumer Discretionary
Information Technology Information Technology
Consumer Staple Consumer Staples
Communication Services Communication Services
@spinnj
Copy link
Contributor Author

spinnj commented Apr 2, 2022

One additional note to this is that @braverock has identified the good practice of following official naming conventions for data to facilitate merges with other data sets. I would suggest that in the case of GICS sectors, it it actually better to use the sector number (e.g. "10" instead of "Energy") and then use a function to map numbers to names, since S&P/MSCI have in the past changed the name of a sector (Telecommunications became Communication Services, but the level 1 number remained 50 before and after). Technically this happened in 2018, and the factorsSPGMI data set ends in 2015 so for a true Point-in-Time data set, we should be using Telecommunications instead of Communication Services (the current name for sector 50 since 2018).

JustinMShea pushed a commit that referenced this issue Jul 16, 2022
…ded compression alogorithms, and updated the data sets
@JustinMShea
Copy link
Collaborator

I agree @spinnj , the original number would be much better and we can create a function for labels of needed! Along the lines of creating a function to transform the sector number human readable, we could find the dates when name changes happen and store them in a small data.frame that could be incorporated into the function...or simply document them in the /man pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants