Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

factorsSPGMI and stocksCRSP incorrect sector assignments #85

Open
spinnj opened this issue Mar 26, 2022 · 1 comment
Open

factorsSPGMI and stocksCRSP incorrect sector assignments #85

spinnj opened this issue Mar 26, 2022 · 1 comment

Comments

@spinnj
Copy link
Contributor

spinnj commented Mar 26, 2022

A subset of securities have incorrect Sectors and GICS data. This appears to be the result of human error in creating data
sets...my best guess is that S&P used the ticker (or TickerLast) to match CRSP data to their data sets and a subset of securities were mismatched due to ticker recycling, etc.

This can be seen in the file stocksTickers310GICSgovindSPGMI.xlsx, in the Sandbox, which has incorrect mappings for several securities (observe row 280, ticker STJ, which has matched "St Jude Medical Inc" to "St James's Place plc for example).

Subsequent to the original data set creation, however, there appears to have been some human intervention to clean up a few securities. I identified 11 securities in the original stocksTickers310GICSgovindSPGMI.xlsx with issues, but a few of them are now correct in stocksCRSP and factorsSPGMI so they must have been fixed later?

Note that both the Sector name and the GICS number appear to be incorrect in some cases.

Fixing this will result in cleaner data in the FactorAnalytics package, a better ability to merge data sets with vendor sources later, and will allow resolution of additional issues with factorsSPGMI and stocksCRSP. The following table lists the securities where there remain uncorrected data.

TickerLast Assigned Sector Correct Sector Assigned GICS Correct GICS
AVP Industrials Consumer Staples 20101010 30302010
CSH Information Technology Financials 45103010 40202010
CTS Information Technology Information Technology 45102010 45203020
PIR Financials Consumer Discretionary 40301040 25504060
RTN Consumer Discretionary Industrials 25301040 20101010
STJ Financials Healthcare 40203010 35101010
TSS Industrials Information Technology 20102010 45102020

Once this is fixed, it will be linked to issue #73 in that there will be 1 real estate, ~1 financials, and ~3 utilities stocks in the final sample , requiring replacement of ~5 securities from the additional 10 securities and so the solution to #73 will need to also account for the financials name(s).

Edited for completeness.

JustinMShea pushed a commit that referenced this issue Jul 16, 2022
…ded compression alogorithms, and updated the data sets
@JustinMShea
Copy link
Collaborator

Great catch @spinnj , yes we should stamp out these errors!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants