Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: DRAFT ! equate empty string to null for more intuitive results #11363

Draft
wants to merge 1 commit into
base: 5
Choose a base branch
from

Conversation

sunnysideup
Copy link
Contributor

@sunnysideup sunnysideup commented Sep 8, 2024

Description

In the real world, there is often little difference between null and an empty string (''). For a developer it is not easy to know if a field is saved as an empty string or null or how that works, especially if you are new to Silverstripe (or if you are focussing on the actual problem and do not really care about these sort of details).

In my case, I wrote:

MyDataObject::get()->filter(['MyField:not' => ['', null, 0]);

In my simple thinking, I was like, anything with a real entry (i.e. not NULL, EMPTY STRING, or ZERO). What I noticed though is that LESS rows were included than expected.

Now, also consider this: For example, if you write:

foreach(MyDataObject::get() as $myObject) {
    $myObject->MyField = '';
    $myObject->write();
}
echo 
    MyDataObject::get()->count() .
    '=' .
    MyDataObject::get()->filter(['MyField' => ''])->count()
;

You will see that the answer is foo=0, where foo is obviously the number of records. That is not intuitive.

I have investigated this a little and by taking two random DataObjects with a Varchar field and filtering for "falsy" vs non-falsy. With interesting results. I will list the results below.

I tried to fix this by just poking around and what I got up to so far is a the attached pull requests. This is just a trial pull request to see if it works! Obviously this is a super risky "breaking" change so improvements are:

  • bury deeper into query model
  • consider other field types
  • consider more complex queries

An alternative would be too simply throw an error for any query where the value is an empty string to warn developers that the results may not be as expected.

Another alternative would be to keep the code as-is, but to make the NULL filter requirement a lot more obvious in the documentation.

Of course, one of the more "hidden" scenarios is, for example, when filtering using a variable that can be any string (including an empty one). In this scenario, as developer, you would simply forget to turn an empty string (one of many strings) into a NULL value.

Issues

These may be related:
#10930
#8135

From Nightjarnz:
https://github.com/silverstripe/silverstripe-framework/blob/5/src/ORM/Filters/ExactMatchFilter.php#L118
https://github.com/silverstripe/silverstripe-framework/blob/5/src/ORM/Filters/ExactMatchFilter.php#L221-L223

Pull request checklist

  • [N ] The target branch is correct
  • [Y] All commits are relevant to the purpose of the PR (e.g. no debug statements, unrelated refactoring, or arbitrary linting)
    • Small amounts of additional linting are usually okay, but if it makes it hard to concentrate on the relevant changes, ask for the unrelated changes to be reverted, and submitted as a separate PR.
  • [Y ] The commit messages follow our commit message guidelines
  • [ ?] The PR follows our contribution guidelines
  • [ ?] Code changes follow our coding conventions
  • [ N] This change is covered with tests (or tests aren't necessary for this change)
  • [ N] Any relevant User Help/Developer documentation is updated; for impactful changes, information is added to the changelog for the intended release
  • [ ?] CI is green

Manual testing steps

Click to see the very long details
<?php

use SilverStripe\Control\Director;
use SilverStripe\Core\Config\Config;
use SilverStripe\Core\Environment;
use SilverStripe\Dev\BuildTask;
use SilverStripe\ORM\DataObject;
use SilverStripe\ORM\DB;


class Test extends BuildTask
{
    protected $title = 'Test';

    protected $description = 'Test';

    protected $tablesToTest = [
// ADD YOU OWN TABLES HERE!
    ];

    private static $segment = 'test';
    protected $api = null;

    public function run($request)
    {
        Environment::increaseTimeLimitTo(3600);
        Environment::increaseMemoryLimitTo('512M');
        $this->header("====================================");
        $this->header("====================================");
        $this->header("HAS VALUE");
        $this->header("====================================");
        $this->header("====================================");

        $this->runExcludeOrExclude('buildExcludeScenarioArray', 'not', 'HasValue');

        $this->header("====================================");
        $this->header("====================================");
        $this->header("DOES NOT HAVE VALUE");
        $this->header("====================================");
        $this->header("====================================");

        $this->runExcludeOrExclude('buildIncludeScenarioArray', '', 'DoesNotHaveValue');

    }

    /**
     * optional method to reset all to empty string
     */
    protected function resetToEmptyString(string $className, string $fieldName)
    {
        Config::modify()->set(DataObject::class, 'validation_enabled', false);
        foreach ($className::get() as $object) {
            $object->$fieldName = '';
            $object->write();
        }
        echo $className::get()->count() .'='. $className::get()->filter([$fieldName => ''])->count();
    }

    protected function runExcludeOrExclude(string $method, string $phrase, string $rightAnswer)
    {
        $testCount = 0;
        foreach ($this->tablesToTest as $tableName => $details) {
            $className = $details['ClassName'];
            $fieldName = $details['Field'];
            $answers = [
                'HasValue' => 0,
                'DoesNotHaveValue' => 0,
            ];
            foreach ($className::get() as $record) {
                if (trim((string) $record->$fieldName)) {
                    $answers['HasValue']++;

                } else {
                    $answers['DoesNotHaveValue']++;
                }
            }
            $answer = $answers[$rightAnswer];
            $tableSafe = $this->replaceTableAndFieldName($tableName, $tableName, $fieldName);
            $fieldSafe = $this->replaceTableAndFieldName($fieldName, $tableName, $fieldName);
            $this->header(
                'testing '.$tableSafe . '.'.$fieldSafe.'
                ('.$className::get()->count().' records)
                for not falsy values.
                Expected answer: '.$answer.'
                (derived by looping through dataobjects)'
            );
            $scenarios = $this->$method($tableName, $fieldName, $className);
            foreach ($scenarios as $i => $results) {
                $testCount++;
                $count = $results['v'];
                $isCorrect = ($count === $answer);
                $isCorrectPhrase = $isCorrect ? 'CORRECT' : 'WRONG ANSWER';
                foreach ($results as $key => $sql) {
                    if ($key !== 'v' && $key !== 'sql') {
                        $sql = $this->recursiveReplaceValues($sql, $tableName, $fieldName);
                        $sql = $this->normalizeWhitespace(print_r($sql, 1));
                        $this->outcome('===== TEST '.$testCount.'====='.PHP_EOL.$isCorrectPhrase. '('. $count.  '), PRODUCED BY: ');
                        $this->outcome($key.' = '.$sql);
                    }
                }
                if (!$isCorrect) {
                    $whereOnlySQL = $cleanSQL = $this->whereOnlyPhrase($results['sql']);
                    $this->outcome('SQL: '.$this->replaceTableAndFieldName($whereOnlySQL, $tableName, $fieldName));
                }
            }
        }
    }



    protected function buildExcludeScenarioArray(string $tableName, string $fieldName, string $className): array
    {
        $scenario = [];

        // SQL based scenarios
        $sqlQueries = [
            'SELECT COUNT(*) FROM "' . $tableName . '" WHERE "' . $fieldName . '" IS NOT NULL',
            'SELECT COUNT(*) FROM "' . $tableName . '" WHERE "' . $fieldName . '" <> \'\'',
            'SELECT COUNT(*) FROM "' . $tableName . '" WHERE "' . $fieldName . '" IS NOT NULL AND "' . $fieldName . '" <> \'\'',
            // 'SELECT COUNT(*) FROM "' . $tableName . '" WHERE "' . $fieldName . '" <> 0',
            // 'SELECT COUNT(*) FROM "' . $tableName . '" WHERE "' . $fieldName . '" NOT IN (\'\', NULL, 0)',
            // 'SELECT COUNT(*) FROM "' . $tableName . '" WHERE "' . $fieldName . '" IS NOT NULL AND "' . $fieldName . '" <> \'\' AND "' . $fieldName . '" <> 0'
        ];

        foreach ($sqlQueries as $key => $sql) {
            $scenario[] = [
                'DB::query' => $sql,
                'v' => (int) DB::query($sql)->value(),
                'sql' => $sql,
            ];
        }

        // ORM based scenarios
        $filters = [
            [$fieldName . ':not' => ['']],
            [$fieldName . ':not' => [null]],
            [$fieldName . ':not' => ['', null]],
            [$fieldName . ':not' => ''],
            [$fieldName . ':not' => null],
            // [$fieldName . ':not' => [0]],
            // [$fieldName . ':not' => [0, null]],
            // [$fieldName . ':not' => 0],
            // [$fieldName . ':not' => ['', 0]],
            // [$fieldName . ':not' => ['', null, 0]],
        ];

        foreach ($filters as $key => $filter) {
            $scenario[] = [
                'filter' => $filter,
                'v' => (int) $className::get()->filter($filter)->count(),
                'sql' => $className::get()->filter($filter)->sql(),
            ];
        }

        // Exclude based scenarios
        $excludes = [
            [$fieldName  => ['']],
            [$fieldName  => [null]],
            [$fieldName  => ['', null]],
            [$fieldName  => ''],
            [$fieldName  => null],
            // [$fieldName  => [0]],
            // [$fieldName  => [0, null]],
            // [$fieldName  => ['', 0]],
            // [$fieldName  => ['', null, 0]],
            // [$fieldName  => 0],
        ];

        foreach ($excludes as $key => $exclude) {
            $scenario[] = [
                'exclude' => $exclude,
                'v' => (int) $className::get()->exclude($exclude)->count(),
                'sql' => $className::get()->exclude($exclude)->sql(),
            ];
        }

        return $scenario;
    }

    protected function buildIncludeScenarioArray(string $tableName, string $fieldName, string $className): array
    {
        $scenario = [];

        // SQL based scenarios
        $sqlQueries = [
            'SELECT COUNT(*) FROM "' . $tableName . '" WHERE "' . $fieldName . '" IS NULL',
            'SELECT COUNT(*) FROM "' . $tableName . '" WHERE "' . $fieldName . '" <> \'\'',
            'SELECT COUNT(*) FROM "' . $tableName . '" WHERE "' . $fieldName . '" IS NULL OR "' . $fieldName . '" = \'\'',
            // 'SELECT COUNT(*) FROM "' . $tableName . '" WHERE "' . $fieldName . '" <> 0',
            // 'SELECT COUNT(*) FROM "' . $tableName . '" WHERE "' . $fieldName . '" NOT IN (\'\', NULL, 0)',
            // 'SELECT COUNT(*) FROM "' . $tableName . '" WHERE "' . $fieldName . '" IS NOT NULL AND "' . $fieldName . '" <> \'\' AND "' . $fieldName . '" <> 0'
        ];

        foreach ($sqlQueries as $key => $sql) {
            $scenario[] = [
                'DB::query' => $sql,
                'v' => (int) DB::query($sql)->value(),
                'sql' => $sql,
            ];
        }

        // ORM based scenarios
        $filters = [
            [$fieldName  => ['']],
            [$fieldName  => [null]],
            [$fieldName  => ['', null]],
            [$fieldName  => ''],
            [$fieldName  => null],
            // [$fieldName  => [0]],
            // [$fieldName  => [0, null]],
            // [$fieldName  => ['', 0]],
            // [$fieldName  => ['', null, 0]],
            // [$fieldName  => 0],
        ];

        foreach ($filters as $key => $filter) {
            $scenario[] = [
                'filter' => $filter,
                'v' => (int) $className::get()->filter($filter)->count(),
                'sql' => $className::get()->filter($filter)->sql(),
            ];
        }

        // Exclude based scenarios
        $excludes = [
            [$fieldName . ':not' => ['']],
            [$fieldName . ':not' => [null]],
            [$fieldName . ':not' => ['', null]],
            [$fieldName . ':not' => ''],
            [$fieldName . ':not' => null],
            // [$fieldName . ':not' => [0]],
            // [$fieldName . ':not' => [0, null]],
            // [$fieldName . ':not' => 0],
            // [$fieldName . ':not' => ['', 0]],
            // [$fieldName . ':not' => ['', null, 0]],
        ];

        foreach ($excludes as $key => $exclude) {
            $scenario[] = [
                'exclude' => $exclude,
                'v' => (int) $className::get()->exclude($exclude)->count(),
                'sql' => $className::get()->exclude($exclude)->sql(),
            ];
        }

        return $scenario;
    }

    protected function recursiveReplaceValues($input, $tableName, $fieldName): array|string
    {
        if (is_null($input)) {
            return '[NULL]';
        }
        if ($input === '') {
            return '[EMPTY STRING]';
        }
        if (!is_array($input)) {
            return $this->replaceTableAndFieldName($input, $tableName, $fieldName);
        }
        foreach ($input as $key => $value) {
            if (is_array($value)) {
                // Recursively apply the function to nested arrays
                $input[$key] = $this->recursiveReplaceValues($value, $tableName, $fieldName);
            } elseif (is_null($value)) {
                // Replace null with [NULL]
                $input[$key] = '[NULL]';
            } elseif ($value === '') {
                // Replace empty string with [EMPTY STRING]
                $input[$key] = '[EMPTY STRING]';
            } elseif ($value === 0) {
                // Replace 0 with [ZERO]
                $input[$key] = '[ZERO]';
            } else {
                $input = $this->replaceTableAndFieldName($input, $tableName, $fieldName);
            }
            $newKey = $this->replaceTableAndFieldName($key, $tableName, $fieldName);
            $newVal = $input[$key];
            unset($input[$key]);
            $input[$newKey] = $newVal;
        }

        return $input;
    }

    protected function replaceTableAndFieldName($input, $tableName, $fieldName): string
    {
        $tableKeys = array_keys($this->tablesToTest);
        $tablePos = (array_search($tableName, $tableKeys, true)) + 1;
        $replaceTable = 'TABLE'.$tablePos;
        $input = str_replace($tableName, $replaceTable, $input);
        $input = str_replace($fieldName, 'MYFIELD', $input);
        return $input;
    }

    protected function normalizeWhitespace(string $input): string
    {
        // Replace multiple spaces, tabs, newlines with a single space
        return preg_replace('/\s+/', ' ', $input);
    }
    protected function header($message)
    {
        if (Director::is_cli()) {
            echo PHP_EOL;
            echo PHP_EOL;
            echo PHP_EOL;
            echo PHP_EOL;
            echo '========================='.PHP_EOL;
            echo $message . PHP_EOL;
            echo '========================='.PHP_EOL;
            ;
        } else {
            echo '<h2>' . $message . '</h2>';
        }
    }

    protected function outcome($mixed)
    {
        if (Director::is_cli()) {
            echo PHP_EOL;
            print_r($mixed);
            echo PHP_EOL;
        } else {
            echo '<pre>';
            print_r($mixed);
            echo '</pre>';
        }
    }

    protected function whereOnlyPhrase(string $sql): string
    {
        // Remove everything before WHERE
        $pos = stripos($sql, ' WHERE ');
        if ($pos !== false) {
            // Return the part of the string starting from WHERE
            $sql = substr($sql, $pos);
        }

        // Remove everything from ORDER BY onward
        $sql = preg_replace('/\sORDER\sBY.*/i', '', $sql);

        return $sql;
    }


}

results from test before changing code (search for WRONG to see 18 unexpected results):

Running Task Test
====================================
====================================
HAS VALUE
====================================
====================================
testing MYTABLE.MYFIELD (20 records) for not falsy values. Expected answer: 12 (derived by looping through dataobjects)
===== TEST 1=====
CORRECT(12), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYTABLE" WHERE "MYFIELD" IS NOT NULL
===== TEST 2=====
CORRECT(12), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYTABLE" WHERE "MYFIELD" <> ''
===== TEST 3=====
CORRECT(12), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYTABLE" WHERE "MYFIELD" IS NOT NULL AND "MYFIELD" <> ''
===== TEST 4=====
WRONG ANSWER(20), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] ) ) 
SQL:  WHERE ("MYTABLE"."MYFIELD" NOT IN (?) OR "MYTABLE"."MYFIELD" IS NULL)
===== TEST 5=====
CORRECT(12), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => Array ( [0] => [NULL] ) ) 
===== TEST 6=====
CORRECT(12), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 7=====
WRONG ANSWER(20), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => [EMPTY STRING] ) 
SQL:  WHERE ("MYTABLE"."MYFIELD" != ? OR "MYTABLE"."MYFIELD" IS NULL)
===== TEST 8=====
CORRECT(12), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => [NULL] ) 
===== TEST 9=====
WRONG ANSWER(20), PRODUCED BY: 
exclude = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] ) ) 
SQL:  WHERE (("MYTABLE"."MYFIELD" NOT IN (?) OR "MYTABLE"."MYFIELD" IS NULL))
===== TEST 10=====
CORRECT(12), PRODUCED BY: 
exclude = Array ( [MYFIELD] => Array ( [0] => [NULL] ) ) 
===== TEST 11=====
CORRECT(12), PRODUCED BY: 
exclude = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 12=====
WRONG ANSWER(20), PRODUCED BY: 
exclude = Array ( [MYFIELD] => [EMPTY STRING] ) 
SQL:  WHERE (("MYTABLE"."MYFIELD" != ? OR "MYTABLE"."MYFIELD" IS NULL))
===== TEST 13=====
CORRECT(12), PRODUCED BY: 
exclude = Array ( [MYFIELD] => [NULL] ) 
testing MYOTHERTABLE.MYFIELD (58 records) for not falsy values. Expected answer: 43 (derived by looping through dataobjects)
===== TEST 14=====
CORRECT(43), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYOTHERTABLE" WHERE "MYFIELD" IS NOT NULL
===== TEST 15=====
CORRECT(43), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYOTHERTABLE" WHERE "MYFIELD" <> ''
===== TEST 16=====
CORRECT(43), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYOTHERTABLE" WHERE "MYFIELD" IS NOT NULL AND "MYFIELD" <> ''
===== TEST 17=====
WRONG ANSWER(58), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] ) ) 
SQL:  WHERE ("MYOTHERTABLE"."MYFIELD" NOT IN (?) OR "MYOTHERTABLE"."MYFIELD" IS NULL)
===== TEST 18=====
CORRECT(43), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => Array ( [0] => [NULL] ) ) 
===== TEST 19=====
CORRECT(43), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 20=====
WRONG ANSWER(58), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => [EMPTY STRING] ) 
SQL:  WHERE ("MYOTHERTABLE"."MYFIELD" != ? OR "MYOTHERTABLE"."MYFIELD" IS NULL)
===== TEST 21=====
CORRECT(43), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => [NULL] ) 
===== TEST 22=====
WRONG ANSWER(58), PRODUCED BY: 
exclude = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] ) ) 
SQL:  WHERE (("MYOTHERTABLE"."MYFIELD" NOT IN (?) OR "MYOTHERTABLE"."MYFIELD" IS NULL))
===== TEST 23=====
CORRECT(43), PRODUCED BY: 
exclude = Array ( [MYFIELD] => Array ( [0] => [NULL] ) ) 
===== TEST 24=====
CORRECT(43), PRODUCED BY: 
exclude = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 25=====
WRONG ANSWER(58), PRODUCED BY: 
exclude = Array ( [MYFIELD] => [EMPTY STRING] ) 
SQL:  WHERE (("MYOTHERTABLE"."MYFIELD" != ? OR "MYOTHERTABLE"."MYFIELD" IS NULL))
===== TEST 26=====
CORRECT(43), PRODUCED BY: 
exclude = Array ( [MYFIELD] => [NULL] ) 
====================================
====================================
DOES NOT HAVE VALUE
====================================
====================================
testing MYTABLE.MYFIELD (20 records) for not falsy values. Expected answer: 8 (derived by looping through dataobjects)
===== TEST 1=====
CORRECT(8), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYTABLE" WHERE "MYFIELD" IS NULL
===== TEST 2=====
WRONG ANSWER(12), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYTABLE" WHERE "MYFIELD" <> ''
SQL:  WHERE "MYFIELD" <> ''
===== TEST 3=====
CORRECT(8), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYTABLE" WHERE "MYFIELD" IS NULL OR "MYFIELD" = ''
===== TEST 4=====
WRONG ANSWER(0), PRODUCED BY: 
filter = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] ) ) 
SQL:  WHERE ("MYTABLE"."MYFIELD" IN (?))
===== TEST 5=====
CORRECT(8), PRODUCED BY: 
filter = Array ( [MYFIELD] => Array ( [0] => [NULL] ) ) 
===== TEST 6=====
CORRECT(8), PRODUCED BY: 
filter = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 7=====
WRONG ANSWER(0), PRODUCED BY: 
filter = Array ( [MYFIELD] => [EMPTY STRING] ) 
SQL:  WHERE ("MYTABLE"."MYFIELD" = ?)
===== TEST 8=====
CORRECT(8), PRODUCED BY: 
filter = Array ( [MYFIELD] => [NULL] ) 
===== TEST 9=====
WRONG ANSWER(0), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] ) ) 
SQL:  WHERE (("MYTABLE"."MYFIELD" IN (?)))
===== TEST 10=====
CORRECT(8), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => Array ( [0] => [NULL] ) ) 
===== TEST 11=====
CORRECT(8), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 12=====
WRONG ANSWER(0), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => [EMPTY STRING] ) 
SQL:  WHERE (("MYTABLE"."MYFIELD" = ?))
===== TEST 13=====
CORRECT(8), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => [NULL] ) 
testing MYOTHERTABLE.MYFIELD (58 records) for not falsy values. Expected answer: 15 (derived by looping through dataobjects)
===== TEST 14=====
CORRECT(15), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYOTHERTABLE" WHERE "MYFIELD" IS NULL
===== TEST 15=====
WRONG ANSWER(43), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYOTHERTABLE" WHERE "MYFIELD" <> ''
SQL:  WHERE "MYFIELD" <> ''
===== TEST 16=====
CORRECT(15), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYOTHERTABLE" WHERE "MYFIELD" IS NULL OR "MYFIELD" = ''
===== TEST 17=====
WRONG ANSWER(0), PRODUCED BY: 
filter = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] ) ) 
SQL:  WHERE ("MYOTHERTABLE"."MYFIELD" IN (?))
===== TEST 18=====
CORRECT(15), PRODUCED BY: 
filter = Array ( [MYFIELD] => Array ( [0] => [NULL] ) ) 
===== TEST 19=====
CORRECT(15), PRODUCED BY: 
filter = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 20=====
WRONG ANSWER(0), PRODUCED BY: 
filter = Array ( [MYFIELD] => [EMPTY STRING] ) 
SQL:  WHERE ("MYOTHERTABLE"."MYFIELD" = ?)
===== TEST 21=====
CORRECT(15), PRODUCED BY: 
filter = Array ( [MYFIELD] => [NULL] ) 
===== TEST 22=====
WRONG ANSWER(0), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] ) ) 
SQL:  WHERE (("MYOTHERTABLE"."MYFIELD" IN (?)))
===== TEST 23=====
CORRECT(15), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => Array ( [0] => [NULL] ) ) 
===== TEST 24=====
CORRECT(15), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 25=====
WRONG ANSWER(0), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => [EMPTY STRING] ) 
SQL:  WHERE (("MYOTHERTABLE"."MYFIELD" = ?))
===== TEST 26=====
CORRECT(15), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => [NULL] ) 

results from test after changing code (note only two, "expected" errors):

Running Task Test
====================================
====================================
HAS VALUE
====================================
====================================
testing MYTABLE.MYFIELD (20 records) for not falsy values. Expected answer: 12 (derived by looping through dataobjects)
===== TEST 1=====
CORRECT(12), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYTABLE" WHERE "MYFIELD" IS NOT NULL
===== TEST 2=====
CORRECT(12), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYTABLE" WHERE "MYFIELD" <> ''
===== TEST 3=====
CORRECT(12), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYTABLE" WHERE "MYFIELD" IS NOT NULL AND "MYFIELD" <> ''
===== TEST 4=====
CORRECT(12), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] ) ) 
===== TEST 5=====
CORRECT(12), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => Array ( [0] => [NULL] ) ) 
===== TEST 6=====
CORRECT(12), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 7=====
CORRECT(12), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => [EMPTY STRING] ) 
===== TEST 8=====
CORRECT(12), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => [NULL] ) 
===== TEST 9=====
CORRECT(12), PRODUCED BY: 
exclude = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] ) ) 
===== TEST 10=====
CORRECT(12), PRODUCED BY: 
exclude = Array ( [MYFIELD] => Array ( [0] => [NULL] ) ) 
===== TEST 11=====
CORRECT(12), PRODUCED BY: 
exclude = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 12=====
CORRECT(12), PRODUCED BY: 
exclude = Array ( [MYFIELD] => [EMPTY STRING] ) 
===== TEST 13=====
CORRECT(12), PRODUCED BY: 
exclude = Array ( [MYFIELD] => [NULL] ) 
testing MYOTHERTABLE.MYFIELD (58 records) for not falsy values. Expected answer: 43 (derived by looping through dataobjects)
===== TEST 14=====
CORRECT(43), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYOTHERTABLE" WHERE "MYFIELD" IS NOT NULL
===== TEST 15=====
CORRECT(43), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYOTHERTABLE" WHERE "MYFIELD" <> ''
===== TEST 16=====
CORRECT(43), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYOTHERTABLE" WHERE "MYFIELD" IS NOT NULL AND "MYFIELD" <> ''
===== TEST 17=====
CORRECT(43), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] ) ) 
===== TEST 18=====
CORRECT(43), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => Array ( [0] => [NULL] ) ) 
===== TEST 19=====
CORRECT(43), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 20=====
CORRECT(43), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => [EMPTY STRING] ) 
===== TEST 21=====
CORRECT(43), PRODUCED BY: 
filter = Array ( [MYFIELD:not] => [NULL] ) 
===== TEST 22=====
CORRECT(43), PRODUCED BY: 
exclude = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] ) ) 
===== TEST 23=====
CORRECT(43), PRODUCED BY: 
exclude = Array ( [MYFIELD] => Array ( [0] => [NULL] ) ) 
===== TEST 24=====
CORRECT(43), PRODUCED BY: 
exclude = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 25=====
CORRECT(43), PRODUCED BY: 
exclude = Array ( [MYFIELD] => [EMPTY STRING] ) 
===== TEST 26=====
CORRECT(43), PRODUCED BY: 
exclude = Array ( [MYFIELD] => [NULL] ) 
====================================
====================================
DOES NOT HAVE VALUE
====================================
====================================
testing MYTABLE.MYFIELD (20 records) for not falsy values. Expected answer: 8 (derived by looping through dataobjects)
===== TEST 1=====
CORRECT(8), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYTABLE" WHERE "MYFIELD" IS NULL
===== TEST 2=====
WRONG ANSWER(12), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYTABLE" WHERE "MYFIELD" <> ''
SQL:  WHERE "MYFIELD" <> ''
===== TEST 3=====
CORRECT(8), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYTABLE" WHERE "MYFIELD" IS NULL OR "MYFIELD" = ''
===== TEST 4=====
CORRECT(8), PRODUCED BY: 
filter = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] ) ) 
===== TEST 5=====
CORRECT(8), PRODUCED BY: 
filter = Array ( [MYFIELD] => Array ( [0] => [NULL] ) ) 
===== TEST 6=====
CORRECT(8), PRODUCED BY: 
filter = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 7=====
CORRECT(8), PRODUCED BY: 
filter = Array ( [MYFIELD] => [EMPTY STRING] ) 
===== TEST 8=====
CORRECT(8), PRODUCED BY: 
filter = Array ( [MYFIELD] => [NULL] ) 
===== TEST 9=====
CORRECT(8), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] ) ) 
===== TEST 10=====
CORRECT(8), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => Array ( [0] => [NULL] ) ) 
===== TEST 11=====
CORRECT(8), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 12=====
CORRECT(8), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => [EMPTY STRING] ) 
===== TEST 13=====
CORRECT(8), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => [NULL] ) 
testing MYOTHERTABLE.MYFIELD (58 records) for not falsy values. Expected answer: 15 (derived by looping through dataobjects)
===== TEST 14=====
CORRECT(15), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYOTHERTABLE" WHERE "MYFIELD" IS NULL
===== TEST 15=====
WRONG ANSWER(43), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYOTHERTABLE" WHERE "MYFIELD" <> ''
SQL:  WHERE "MYFIELD" <> ''
===== TEST 16=====
CORRECT(15), PRODUCED BY: 
DB::query = SELECT COUNT(*) FROM "MYOTHERTABLE" WHERE "MYFIELD" IS NULL OR "MYFIELD" = ''
===== TEST 17=====
CORRECT(15), PRODUCED BY: 
filter = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] ) ) 
===== TEST 18=====
CORRECT(15), PRODUCED BY: 
filter = Array ( [MYFIELD] => Array ( [0] => [NULL] ) ) 
===== TEST 19=====
CORRECT(15), PRODUCED BY: 
filter = Array ( [MYFIELD] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 20=====
CORRECT(15), PRODUCED BY: 
filter = Array ( [MYFIELD] => [EMPTY STRING] ) 
===== TEST 21=====
CORRECT(15), PRODUCED BY: 
filter = Array ( [MYFIELD] => [NULL] ) 
===== TEST 22=====
CORRECT(15), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] ) ) 
===== TEST 23=====
CORRECT(15), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => Array ( [0] => [NULL] ) ) 
===== TEST 24=====
CORRECT(15), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => Array ( [0] => [EMPTY STRING] [1] => [NULL] ) ) 
===== TEST 25=====
CORRECT(15), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => [EMPTY STRING] ) 
===== TEST 26=====
CORRECT(15), PRODUCED BY: 
exclude = Array ( [MYFIELD:not] => [NULL] ) 

@GuySartorelli
Copy link
Member

Is the "DRAFT" in the pr title part of the context, or do you mean the pr is a draft and shouldn't be reviewed yet?
If the latter, please click "Convert to draft"

@sunnysideup
Copy link
Contributor Author

sunnysideup commented Sep 8, 2024

DRAFT as in: I would like for someone to have a look at it for some feedback. I know it can not be merged like this, but I can't progress, because I need it to be reviewed.

@GuySartorelli GuySartorelli marked this pull request as draft September 8, 2024 07:57
@sunnysideup sunnysideup marked this pull request as ready for review September 8, 2024 21:31
@GuySartorelli GuySartorelli marked this pull request as draft September 10, 2024 22:36
@GuySartorelli
Copy link
Member

GuySartorelli commented Sep 10, 2024

Now, also consider this: For example, if you write:

foreach(MyDataObject::get() as $myObject) {
    $myObject->MyField = '';
    $myObject->write();
}
echo 
    MyDataObject::get()->count() .
    '=' .
    MyDataObject::get()->filter(['MyField' => ''])->count()
;

You will see that the answer is foo=0, where foo is obviously the number of records. That is not intuitive.

That seems like an outright bug. What's actually stored in the database? Is it set to null or is it actually an empty string?

@sunnysideup
Copy link
Contributor Author

sunnysideup commented Sep 10, 2024

Now, also consider this: For example, if you write:

foreach(MyDataObject::get() as $myObject) {
    $myObject->MyField = '';
    $myObject->write();
}
echo 
    MyDataObject::get()->count() .
    '=' .
    MyDataObject::get()->filter(['MyField' => ''])->count()
;

You will see that the answer is foo=0, where foo is obviously the number of records. That is not intuitive.

That seems like an outright bug. What's actually stored in the database? Is it set to null or is it actually an empty string?

It is stored as a NULL value. I will run it again on a different project, just to be sure it is correct.

image

image

@andrewandante
Copy link
Contributor

My general feeling on this is that is dangerous to assume your opening statement:

there is often little difference between null and an empty string ('').

While I appreciate that you are trying to make things more "intuitive", there's a balance to be had here between things that feel right, and things that are objectively correct.

I think an approach to take would be to be able to pass a specific token into the filter that means "empty" - to your specific example, something like:

foreach(MyDataObject::get() as $myObject) {
    $myObject->MyField = '';
    $myObject->write();
}
echo 
    MyDataObject::get()->count() .
    '=' .
    MyDataObject::get()->filter(['MyField' => DB::EMPTY_TOKEN])->count()
;

where DB:EMPTY_TOKEN is some sort of special combination of characters that is transposed into null || '' || 0, or something sensible that matches the "intuitive" matching you are after, while still preserving the "technically accurate" behaviour in place. Do you think that would work for your use-case?

@GuySartorelli
Copy link
Member

GuySartorelli commented Sep 10, 2024

It is stored as a NULL value

Ahh okay that makes sense then. So yeah filtering by empty string and treating it as null would (usually) make sense if that's what the storage is currently doing.
We do have to consider however that people can decide they don't want empty strings to be stored as null and implement a custom DBField for that - in which case changing the filter logic would break their custom DBField.

I think at best this would have to be configurable.
It should either target CMS 6 and be on by default, or if it targets CMS 5 as this PR currently is it should be off by default.

@sunnysideup
Copy link
Contributor Author

My general feeling on this is that is dangerous to assume your opening statement:

there is often little difference between null and an empty string ('').

No, that is not dangerous, that is a fact. Just ask 100 content editors - do you care about the difference between null and an empty string. In real life, they are usually the same thing. I appreciate that NULL basically means: "no set" and an empty string a particular value that is set. That is not lost on me, but Silverstripe is for the real world.

@sunnysideup
Copy link
Contributor Author

I think an approach to take would be to be able to pass a specific token into the filter that means "empty" - to your specific example, something like:

The problem with this is that we do not always control what we filter for, if you filter for a variable (e.g. from a form input), then you need to consider this every time. I guess what I am saying is that the NULL vs EMPTY STRING is an edge case, not the other way around.

@GuySartorelli
Copy link
Member

I like @andrewandante's suggestion of having a const that people can use - that helps to remove the ambiguity. Though it does depend on developers knowing that's there to use which I think will be the hard part.

@GuySartorelli
Copy link
Member

GuySartorelli commented Sep 10, 2024

Just ask 100 content editors - do you care about the difference between null and an empty string

Content editors aren't the ones setting up the data model and using the ORM :p There are times where programatically the value difference is important.

@sunnysideup
Copy link
Contributor Author

I like @andrewandante's suggestion of having a const that people can use - that helps to remove the ambiguity. Though it does depend on developers knowing that's there to use which I think will be the hard part.

I think it should be the other way around. If you filter for NULL or EMPTY STRING it should yield the same result, unless you set some "strict" setting, in which case it would go for NULL or EMPTY STRING only.

@andrewandante
Copy link
Contributor

To sum up what's being discussed in the Silverstripe User's Slack at the moment:

  • it appears the SS is doing some "magic" to set an empty string (in this case) as null to represent an "empty" value. There's some debate as to whether this is desired, but it is currently the case
  • the problem that is occurring at the moment is a disconnect between the filtering behaviour and the storage behaviour - if I can set '' and get null, I should be able to filter for '' and return null results. It is this inconsistency that is the core of the issue
  • so any proposed changes should either:
    • lean into the magic, and have the filter understand it
    • lean away from the magic, and have things stored as they are put in

For more of my own opinions 😅

Content editors aren't the ones setting up the data model and using the ORM

^ I agree with this sentiment entirely. Yes, content authors aren't expect to know the difference between null and '' in the case of the filter, but equally they aren't writing the code. Devs are expected to know the difference.

I think it should be the other way around. If you filter for NULL or EMPTY STRING it should yield the same result, unless you set some "strict" setting, in which case it would go for NULL or EMPTY STRING only.

I think the preference here comes down to which way we lean. If we go for the magic route, we want to lean towards a loose match, with strict as an option. Otherwise, the other way around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants