Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -152,9 +152,124 @@ protected static function plugin_json_jd_schema( $plugin ) {

$schema[] = $software_application;

// FAQPage schema.
$content = Plugin_Directory::instance()->split_post_content_into_pages( get_the_content( $plugin ) );
$faq_content = isset( $content['faq'] ) ? $content['faq'] : '';
$faq_schema = self::build_plugin_faq_schema( $faq_content, get_permalink( $plugin ) );

if ( $faq_schema ) {
$schema[] = $faq_schema;
}

return $schema;
}

/**
* Builds an FAQPage schema object from plugin content structured with <dl><dt><dd>.
*
* @param string $faq_content Raw plugin content containing FAQ markup.
* @param string $plugin_url The URL of the plugin page.
* @return array|null FAQPage schema or null if none found.
*/
protected static function build_plugin_faq_schema( $faq_content, $plugin_url ) {
if ( empty( $faq_content ) || false === strpos( $faq_content, '<dl' ) ) {
return null;
}

$document_internal_errors = libxml_use_internal_errors( true );

$document = new \DOMDocument();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of using DomDocument like this on every page load.

We should either: Store the FAQs as post_meta, or use WP_HTML_Tag_Processor if absolutely required to extract from the HTML.

We can run a migration process to backfill the postmeta, so no need for doing both.
We could also migrate the existing FAQ markup to a block that pulls from that metadata..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dd32 I agree with your suggestion, the postmeta approach seems the cleanest long-term solution.

For new uploads, we can parse the readme during the upload process (e.g. in class-upload-handler.php) and store _plugin_faqs postmeta immediately.

For existing plugins, have we handled this kind of migration process in the past?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've done migrations for data regularly enough. It's a PITA but it's easier than building legacy code upon legacy code upon legacy code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dd32 How were these migrations typically handled in the past? Is there any sample code or a PR I can look at as a reference to follow the same approach for FAQ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikunj8866 Generally either manually run or a bin script, I don't really have any good references for you.

One option in this case would be Just run the plugin import process for each plugin with FAQ items

What I'd suggest is ignoring back-compat and just work with:

  • Store FAQ items as meta values
  • Add a block, or shortcode, in place of the FAQ HTML markup in the post_content when it's built
  • Leave older imported plugins without FAQPage schema items to start with, with the assumption that either a) a re-import will update it or b) a migration will be run.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dd32 Thanks for the guidance.

As I understand it:

  • I'll add code to store FAQ content in post meta when plugins are uploaded.
  • I'll create a block for FAQs. Just to confirm, should this block output both the FAQ HTML Markup and the FAQ schema, or do you prefer to keep schema generation separate?
  • For older plugins, we could hook into import_from_svn so that FAQs are stored into post meta whenever a plugin is updated.

Does that align with what you had in mind?

$document->loadHTML( $faq_content );

libxml_use_internal_errors( $document_internal_errors );

$faq_entities = [];
$dts = $document->getElementsByTagName( 'dt' );

if ( 0 === $dts->length ) {
return null;
}

foreach ( $dts as $dt ) {
$question = sanitize_text_field( $dt->textContent );

// Find the next <dd> sibling.
$dd = $dt->nextSibling;
while ( $dd && 'dd' !== $dd->nodeName ) {
$dd = $dd->nextSibling;
}
if ( ! $dd ) {
continue;
}

// Collect and sanitize answer HTML.
$answer_html = '';
foreach ( $dd->childNodes as $child ) {
$answer_html .= $dd->ownerDocument->saveHTML( $child );
}

$faq_entities[] = [
"@type" => "Question",
"name" => $question,
"acceptedAnswer" => [
"@type" => "Answer",
"text" => self::sanitize_faq_answer_html( $answer_html ),
],
];
}

if ( ! $faq_entities ) {
return null;
}

return [
"@context" => "https://schema.org",
"@type" => "FAQPage",
"@id" => $plugin_url,
"url" => $plugin_url,
"mainEntity" => $faq_entities,
];
}

/**
* Sanitizes FAQ answer HTML for use in FAQPage schema.
*
* Allows only tags supported by Google rich results:
* <h1>-<h6>, <p>, <div>, <ul>, <ol>, <li>, <a href>, <br>, <b>, <strong>, <i>, <em>.
*
* @link https://developers.google.com/search/docs/appearance/structured-data/faqpage#answer
*
* @param string $html Raw FAQ answer HTML.
* @return string Sanitized HTML.
*/
protected static function sanitize_faq_answer_html( $html ) {
$allowed_tags = array(
'h1' => array(),
'h2' => array(),
'h3' => array(),
'h4' => array(),
'h5' => array(),
'h6' => array(),
'br' => array(),
'ol' => array(),
'ul' => array(),
'li' => array(),
'p' => array(),
'div' => array(),
'b' => array(),
'strong' => array(),
'i' => array(),
'em' => array(),
'a' => array(
'href' => true,
),
);

$html = wp_kses( $html, $allowed_tags );

return $html;
}

/**
* Prints meta tags in the head of a page.
*
Expand Down