Skip to content

Commit

Permalink
MWScrape: Don't filter translation unit ID comments.
Browse files Browse the repository at this point in the history
  • Loading branch information
nmlgc committed Dec 22, 2022
1 parent ad1a7f4 commit e9db3e2
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions MWScrape.php
Original file line number Diff line number Diff line change
Expand Up @@ -179,8 +179,11 @@ protected static function getMWTokenArray( &$str ) {
public static function toArray( &$page ) {
$temps = array();

// Apply basic regex
$page = preg_replace( '/<!--.*?-->/s', '', $page );
// Apply basic regex. We leave translation unit ID removal to
// TPCUtil::sanitize(); if we did it here, we'd leave empty lines in
// place of these ID comments, and couldn't distinguish intended line
// breaks after <translate> from unintended ones anymore.
$page = preg_replace( '/<!--(?!T:).*?-->/s', '', $page );
$page = preg_replace( '/\[\[[Cc]ategory:.*?\]\]/', '', $page );
$page = preg_replace( self::MW_PAGE_LINK_REGEX, "$3", $page );

Expand Down

0 comments on commit e9db3e2

Please sign in to comment.