-
Notifications
You must be signed in to change notification settings - Fork 53
How does morphology on yoastseo work?
Morphological analysis of keyphrase and synonyms is implemented as a research. This means that
- The relevant script is added in the
researches
folder next to the other more conventional researches - The results of the morphological analysis can be required through
researcher.getResearch( "morphology" )
The language-specific information about how morphological forms of words should be built, is supplied separately from the researcher, in a data file in a private repository Yoast/YoastSEO.js-premium-configuration
. This allows to control who has access to this file (Premium, but not Free), as well as (in prospective) makes data distribution more efficient, as a user needs to only access the data for his/her language.
Currently, morphological analysis is available for Premium users and for the English language only. More guidelines on how morphology for new languages should be added will follow shortly.
The morphological research receives a paper with keyword and eventually synonyms in it. It relies on the language of the paper (parsed from the locale
) for the future analysis. The default language is English.
The research
- Splits the keyphrase or a synonym phrase by words -
A boy reads a book
>A
,boy
,reads
,a
,book
. - Filters out function words (words with little or no conceptual meaning, e.g. propositions, enumerations), if a list of function words available. Otherwise keeps all words in >
boy
,reads
,book
. - For English, for Premium: builds all possible forms the remaining words, including hypothetical, as it was a noun, an adjective, an adverb and a verb > [
boy
,boys
,boying
,boyed
], [read
,reads
,reading
], [book
,books
,booking
,booked
,bookly
]. The research makes use of regexes and lists of exceptions. - For English Free and for all other languages the arrays of forms would only contain one wordform.
- Collects keyphrase and synonyms forms into one structure:
{
keyphraseForms: [
// forms of every word from the keyphrase
[ form1, form2, ... ], // 1st content word from the keyphrase
[ form1, form2, ... ], // 2nd content word from the keyphrase
...
],
synonymsForms: [
[ // forms of every word from the 1st synonym
[ form1, form2, ... ], // 1st content word from the 1st synonym
[ form1, form2, ... ], // 2nd content word from the 1st synonym
...
],
[ // forms of every word from the 2nd synonym
[ form1, form2, ... ], // 1st content word from the 2nd synonym
[ form1, form2, ... ], // 2nd content word from the 2nd synonym
...
],
...
],
}
- The plugin requires morphological data from the private repository
Yoast/YoastSEO.js-premium-configuration
and supplies these data to the webworker as aresearchData
. - The webworker creates a
Researcher
with the provided morphological data and supplies this Researcher as an argument to the SEO asssessors (regular and cornerstone) that it calls.
Right now, content assessors do not receive this Researcher as input and create a new one (without morphological data available) on the fly every time it is needed. As soon as word-lists for readability analysis (e.g., transition words) are transferred to data
, on-demand functionality), this will have to be adjusted.
- SEO assessor calls SEO assessments and SEO assessments call their specific researches as normal.
- Some SEO-specific researches require morphological analysis of keyphrase and synonyms, and some do not. Almost all researches that search for keyword or synonyms (in text, headings, tags, metadescription, etc.) require morphological analysis. You can see here if your research in question requires morphological analysis.
In order for an SEO research to use keyphrase or synonym word-forms, it should call the morphological research within itself. Something like:
export default function( paper, researcher ) {
const topicForms = researcher.getResearch( "morphology" );
...
}
The function that builds morphological forms is memoized, so do not worry about inefficiency.
Depending on the exact functionality of the SEO research, it can make use of one of the helper functions, which were created to search for keyphrase forms or synonym forms in any supplied text string.
- Pick an SEO assessment to work on. All specifications are available in the overview issue of this project.
- In the
research
of the assessment:
-
Pass
researcher
as argument to the main research function that is being exported -
Request results of the morphological research.
-
Adjust the content of the research function to match the specification. Notice that the helper functions can return
- number of words (word forms) matched,
- percent of the words (word forms) matched,
- whether the match was found with the keyword or a synonym.
-
Remember that we import from
lodash-es
instead oflodash
and that we doexport default function
instead ofmodule.exports
.
- In the
spec
of yourresearch
:
- Require the researcher and the morphology data.
For now the morphology data can be supplied as an internal json
file, but it will soon be changed.
-
For every spec (or every time you create a new
paper
), create aResearcher
, supply morphology data and use thisresearcher
as an argument for the SEO research that you are testing. -
Adjust the expected values of the tests.
- In the
SEO assessment
file: Adjust the criteria, boundaries and feedback strings to match the specifications. - In the
spec
file of theSEO assessment
: Adjust the expected scores and feedback strings. - In the full-text specs: Adjust the expected scores and feedback strings for your assessment.
- In the full-text specs runner: Add
researcher
as a second parameter to the call of your SEO research.