patent_extraction

Project: A set of python codes to read and extract patent data from google bulk data of patents and match them with COMPUSTAT Firm Data

Details: This package is written to read XML files of the bulk patent data to extract patent numbers, dates of application and issuance and then with cross math between datasets joins patents to their forward references to allow for more data analysis on referencing and value of the patent. Ultimately I use different name matching strategies to match patent assignees to the company list in Compustats/CRSP firms. In contrast to similar data available on the internet, I have used the most recent bulk data to let the user analyze the very recent published data on patents including the period of financial crises and recovery (2007-2014) which is useful for researchers who are interested on the interaction of business/economic fluctuations and innovation.

Currently the codes are under revision. This raspatory will be updated gradually. For more info or to contribute contact me at sbyasin(at)gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

patent_extraction

Files

README.md

Latest commit

History

README.md

File metadata and controls

patent_extraction