Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include code examples for use cases #159

Open
7 tasks
thcrock opened this issue Apr 18, 2018 · 0 comments
Open
7 tasks

Include code examples for use cases #159

thcrock opened this issue Apr 18, 2018 · 0 comments

Comments

@thcrock
Copy link
Contributor

thcrock commented Apr 18, 2018

We would like to have basic use cases covered in examples or packaged in an easy to use format. This may involve pythonflow ( #157 ), but if not, it may involve making sure simple code examples exist that fulfill that use case with minimal dependencies (e.g. having some specific dataset present on the user's system or in S3)

We do have some examples in https://github.com/workforce-data-initiative/skills-ml/tree/master/examples but this format may or may not be enough.

A bunch of possible use cases (by no means exhaustive, feel free to add):

  • Train SOC classifier
  • Use trained SOC classifier on a job postings dataset
  • Train SOC classifier and use that classifier on another job postings dataset
  • Extract skills from a job postings dataset using noun phrase rules
  • Extract skills from a job postings dataset using ONET dictionary and exact matching
  • Extract skills from a job postings dataset using ONET dictionary and fuzzy matching
  • Extract skills from a job postings dataset using DICE dictionary and exact matching
  • Extract skills from a job postings dataset using DICE dictionary and fuzzy matching
  • Extract skills from a job postings dataset using ESCO dictionary and exact matching
  • Extract skills from a job postings dataset using ESCO dictionary and fuzzy matching
  • Upload skill candidates from some skill extractor in common schema skill candidate format, for human labeling
  • Clean job titles on a job postings dataset
  • Find the CBSA for job titles in a job postings dataset
  • Generate Research Hub-ish aggregate dataset (cleaned job title counts and skill counts by CBSA and quarter) from job postings dataset

First draft of use cases from the above list deemed important enough to put on a checklist and that we should include as a part of this issue

  • Train SOC classifier and apply to new dataset
  • Skill extraction using noun phrase rules
  • Skill extraction using ONET dictionary and fuzzy matching
  • Take skill candidates for labeling
  • Clean job titles
  • Geocode and find CBSA for job postings
  • Generate Research Hub-ish aggregate dataset
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant