Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port-StringTools #56

Merged
merged 12 commits into from
Aug 19, 2019
Merged

Port-StringTools #56

merged 12 commits into from
Aug 19, 2019

Conversation

MakoEnergy
Copy link
Member

A port of the StringTools from the monolithic repo.

@MakoEnergy MakoEnergy added enhancement mezzanine port The code for this exists in the old Mezzanine and should be copied and modified from there. labels Jul 22, 2019
@MakoEnergy MakoEnergy requested a review from Sqeaky July 22, 2019 01:26
@MakoEnergy MakoEnergy self-assigned this Jul 22, 2019
/// @brief Checks if a character is a space.
/// @param ToCheck The character to be checked.
/// @return Returns true if the character is a space, false otherwise.
Boole MEZZ_LIB IsSpace(const Char8 ToCheck);
Copy link
Member

@Sqeaky Sqeaky Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have signed and unsigned versions of this? I get that Cha8 is our type and we will set it appropriately, but someone using and external type could benefit.

EDIT - Same question applies to any function accepting a char.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be an enormous amount of extra boilerplate and I'm not sure what the use case for it would be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. I suggested it for the cornercase where the game dev cared about signedness of their chars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made all those functions templates with a type trait for sanity.

Boole MEZZ_LIB IsTab(const Char8 ToCheck);
/// @brief Checks if a character is a newline.
/// @param ToCheck The character to be checked.
/// @return Returns true if the character is a newline, false otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do with a \r? It is part of a newline on some platforms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the source, the function checks for both \r and \n.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was implying that should be in the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

/// @brief Gets a copy of a String with the first letter of each word upper cased and other letters lower cased.
/// @param Source The String that will be the basis for the modified String.
/// @return Returns a copy of the String that has been modified to be in Camel Case.
String MEZZ_LIB CamelCaseCopy(String Source);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be PascalCase or DragonCase. See #57

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

/// @param Source The String that will be the basis for the modified String.
/// @param Left Whether or not to trim the left side of the String.
/// @param Right Whether or not to trim the right side of the String.
String MEZZ_LIB TrimCopy(String Source, const Boole Left = true, const Boole Right = true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am used to seeing this as 3 different functions instead of parameters on one. LeftTrim, RightTrim and Trim.

Also a version of this could operate on iterator ranges like std::erase and return an iterator to the new end of the range or the end of the erased range,.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified it to have 3 versions. The Non-copy version operates like std::erase as you suggested.

void MEZZ_LIB RemoveDuplicateWhitespaces(String& Source);
/// @brief Gets a copy of a string with all duplicate whitespaces removed.
/// @param Source The String that will be the basis for the modified String.
String MEZZ_LIB RemoveDuplicateWhitespacesCopy(String Source);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again there could be an iterator range version like std::erase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.

/// @param Delims The characters to look for and use as split points in the source String.
/// @param MaxSplits The maximum number of splits to perform on this String. Value of zero means unlimited splits.
/// @return Returns a vector containing all the substrings generated from the source String.
StringVector MEZZ_LIB Split(const String& Source, const String& Delims = " \t\n", const Whole MaxSplits = 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a matching: String Join(StringVector ToJoin, const String& Delim=" ");

Js, Ruby, Python, and PHP all have some function to match their split/explode functions that join groups into a single string following a simple pattern of ToJoin[0] + Delim +ToJoin[1] + Delim ... ToJoin[size-1];

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do now!

///////////////////////////////////////////////////////////////////////////////
// Convert-From-String Functions

/// @brief Converts a String to any.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this needs the phrase "Lexicographical Conversion"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you highlighted multiple parts here....do you mean the section header or the method brief doc?

StringStream Converter;
Converter << ToConvert;
return Converter.str();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should This delegate to std::to_string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. std::to_string is intended for numeric conversions. Whereas this will work with any type that has a streaming operator.

const String Delims = " \t\r";
if( Left ) {
size_t Forward = 0;
while( Forward < Source.size() )
Copy link
Member

@Sqeaky Sqeaky Aug 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like a reasonable implementation. Why not use std::find_first_of or std::find_if_not to find the range for std::erase.

Also, this is just conjecture, but I cannot imagine that the branching helps with optimization, if this were just a call RightTrim and LeftTrim then the game dev could decide which of the 3 to call and just never branch.

EDIT - Strings have a find_first_not_of on them. This could make ltrim super easy to implement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation has been updated as per a previous review comment.

CurrIndex = Source.find_first_of(Delims,CurrIndex,2) )
{
size_t EndIndex = CurrIndex;
Char8 CurrWhite = Source[EndIndex];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work on a string with several alternating tabs and spaces in between words?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nor was it meant to when it was written. It could be updated to handle such cases...although I'm curious what the use case for that would be and which of the alternating whitespace characters should be prioritized when replacing the sequence with one character.

Real ConvertHexToColourChannel(const StringView Hex)
{
if( Hex.size() != 2 ) {
String ExceptionString("Hex code requires 2 characters to express a Colour channel.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSS has color hex code triplets. It just omits the lower precision number. If a single digit is passed we could simply duplicate that digit before resuming.

return Ret;
}

Ret.reserve( MaxSplits ? MaxSplits + 1 : 10 );
Copy link
Member

@Sqeaky Sqeaky Aug 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for limiting the amount of splits? I have not seen this before.

EDIT - I get this is just reserving and the capacity, but it I clicked add comment before I finished thinking about where to put this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is no maximum. I want to say I have used the maximum somewhere in the Mezzanine, but I can't recall where. I think this is a relic from Ogre, but unlike other Ogre relics this is doing no harm.

Boole ConvertToBool(const StringView ToConvert, const Boole Default)
{
auto Checker = [ToConvert](const StringView Str) {
if( Str.size() == ToConvert.size() ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better as a call to std::equal which likely does a range length check on its own.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given it's parameters, I don't see how it could:
https://en.cppreference.com/w/cpp/algorithm/equal

}

template<>
UInt8 ConvertFromString<UInt8>(const StringView ToConvert)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this specialized?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicit conversions turn it into an int, which produces unexpected results when interacting with streams. The specializations work around that.

TEST_EQUAL("IsLowerHexLetter(const_Char8)-Fail",false,StringTools::IsLowerHexLetter('g'));
TEST_EQUAL("IsUpperHexLetter(const_Char8)-Pass",true,StringTools::IsUpperHexLetter('F'));
TEST_EQUAL("IsUpperHexLetter(const_Char8)-Fail",false,StringTools::IsUpperHexLetter('1'));
TEST_EQUAL("IsHexLetter(const_Char8)-Pass",true,StringTools::IsHexLetter('B'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a lowercase 'b' test expecting a true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


{//String Manipulation
String ToUpperString("This is typed in a reasonable, soft-spoken tone.");
String ToUpperResult("THIS IS TYPED IN A REASONABLE, SOFT-SPOKEN TONE.");
Copy link
Member

@Sqeaky Sqeaky Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/Nit Results for test should be const, the strings to be copied could be too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I got them all but I made a large number of the source strings const.

TEST_EQUAL_EPSILON("ConvertHexToColourChannel(const_StringView)-Fourth",
Real(6.0 / 255.0),StringTools::ConvertHexToColourChannel(HexStr4));

Real ChannelVal1 = 0.8675f;//221.2125
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/Nit - The fractions above are more readable in my opinion.

StringVector ISplitResult = StringTools::Split(SplitSourceString,"i",5);// Should normally split 7 times.
TEST_EQUAL("Split(const_String&,const_String&,const_Whole)-DefaultCount",
size_t(11),DefaultSplitResult.size());
TEST_EQUAL("Split(const_String&,const_String&,const_Whole)-DefaultElement1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mixes two different tests. It is a minor nit, but 2 sets of tests, one testing the split char and another testing split limit would better follow best practices. That said, this is simple enough its not a real issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two sets of tests for Split as is. One uses the default value for splits and the other doesn't. Although the one that doesn't (the ISplit tests) also happens to use a different delimiter. I guess it's still not ideal, but we do have one default value test and one non-default value test.

String ThirdSource("Test\\Data\\Results.txt");
String FourthSource("*Star*");

String FirstPattern("Test/Data/*.txt");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern names that indicate what they match would improve readabilty. This could "PatternMatchesFirst" or "PatternMatchesUnixTextFile" for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

TEST_EQUAL("ConvertToBool(const_StringView,const_Boole)-Default",
true,StringTools::ConvertToBool("Yes"));
TEST_EQUAL("ConvertToBool(const_StringView,const_Boole)-Gibberish",
false,StringTools::ConvertToBool("Gibberish"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a ("gibberish", true) test should be added to specifically test the defaulting behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Replaced all Size Modifying String Manipulations that accepted a String with versions that accept an iterator range.
Renamed CamelCase to Pascal Case.
Added minor documentation.
Changed String parameters in Split to StringViews.
@MakoEnergy
Copy link
Member Author

Latest round of tests only failed because of CountedPtr TestWarns.

@MakoEnergy
Copy link
Member Author

More TestWarn "failures".

@MakoEnergy MakoEnergy merged commit b0dcb1e into master Aug 19, 2019
@MakoEnergy MakoEnergy deleted the Port-StringTools branch August 19, 2019 05:51
@MakoEnergy MakoEnergy mentioned this pull request Aug 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement mezzanine port The code for this exists in the old Mezzanine and should be copied and modified from there.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants