Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [ZF3] Refactor input filter component #5

Open
weierophinney opened this issue Dec 31, 2019 · 20 comments
Open

[WIP] [ZF3] Refactor input filter component #5

weierophinney opened this issue Dec 31, 2019 · 20 comments

Comments

@weierophinney
Copy link
Member

This issue has been moved from the zendframework repository as part of the bug migration program as outlined here - http://framework.zend.com/blog/2016-04-11-issue-closures.html


Original Issue: https://api.github.com/repos/zendframework/zendframework/issues/4772
User: @bakura10
Created On: 2013-07-02T12:52:46Z
Updated At: 2014-09-28T00:34:50Z
Body
Hi everyone,

This is a branch I'm working on on my free time, to refactor the input filter component for ZF 3. I mostly push this for people who want to see the code and help.

It is not yet usable because it relies on some refactoring I'm doing in both Validator and Filter components. It completely embraces the service manager, so now we should not have the VERY annoying problem of adding custom validators/filters, and not being able to use them using the short name because it uses plugin managers that were not fetched from SM.

Under the hood, the component is completely rewritten and now uses SPL RecursiveFilter. From my first experiments, this allow an around x5-8 performance improvements and a memory consumption divided by 13 for complex input filters (magic happen here: https://github.com/bakura10/zf2/blob/943abe025ed578c4068cf19df079b34726d449a3/library/Zend/InputFilter/BaseInputFilter.php#L274-L300)

What I've seen is that the input filter component got more and more bloated, and it is now slow as hell. Checking the isValid method as it is now makes it quite clear where the problem is.

Therefore, for ZF3, I suggest that the base input filter to stay as simple as possible, and people wanting to have custom behaviors to extend the class and override the isValid for their use-cases (as a more general rule, I'd like this for most components too).

To help this, it also uses internally SPL RecursiveFilter. Basically, it allows to have very powerful validation group rules. For instance, this branch features a ValidationGroupRegex filter, and usage would be as follow:

$inputFilter = new InputFilter();
// … populate the input filter with lot of fields

// You can use the as-usual validation group:
$inputFilter->setValidationGroup(array('username', 'email'));

// But also some more complex rules. Here it only validate fields that contain 'registration-'
// in their names
$validationGroupRegex = new ValidationGroupRegexFilter($inputFilter, '/registration-/i')
$inputFilter->setValidationGroupFilter($validationGroupRegex);

Those filters are really easy to write. They must extend AbstractValidationGroupFilter and implements the "accept" method (like this: https://github.com/bakura10/zf2/blob/943abe025ed578c4068cf19df079b34726d449a3/library/Zend/InputFilter/Filter/ValidationGroupRegexFilter.php)

What do you think ? :)

NOTE : not all features have been back ported yet.


Comment

User: @desem
Created On: 2013-07-02T13:33:40Z
Updated At: 2013-07-02T13:33:40Z
Body
And I like it :)
http://www.ivangospodinow.com/simple-form-validator-for-zend-framework-2-forms


Comment

User: @bakura10
Created On: 2013-07-02T13:38:22Z
Updated At: 2013-07-02T13:38:22Z
Body
Thanks :). I change a little bit the logic so now any recursive iterator filter can be specific, so you can use the RecursiveCallbackFilterIterator:

$filter = new RecursiveCallbackFilterIterator($inputFilter, function($value, $key)) {
    if ($key === 'toto') {
        return false;
    }

    return true;
}

This will only validate fields except the one different than "toto".

Pretty powerful :D.


Comment

User: @desem
Created On: 2013-07-02T13:56:33Z
Updated At: 2013-07-02T13:56:33Z
Body
It's interesting. But I would start with a common forum for zend :)


Comment

User: @texdc
Created On: 2013-07-04T19:13:03Z
Updated At: 2013-07-04T19:13:03Z
Body
Please see my notes.


Comment

User: @bakura10
Created On: 2013-07-04T19:23:45Z
Updated At: 2013-07-04T19:23:45Z
Body
I answered :).


Comment

User: @bakura10
Created On: 2013-08-09T09:51:01Z
Updated At: 2013-08-09T09:51:01Z
Body
I tried to add more exciting stuff to this refactor. Now you can add a callback recursive filter to act as a validation group:

$inputFilter = new InputFilter();
$recursiveFilter = new ValidationGroupCallbackFilter($inputFilter, function($value, $key, $iterator) {
   // This get called for each element, so you can reject easily
   if ($key === 'lol') {
       return false;
   }

   return true;
});

$inputFilter->isValid();

Comment

User: @bakura10
Created On: 2013-08-09T10:17:43Z
Updated At: 2013-08-09T10:17:43Z
Body
Based on ocramius feedback, I've renamed the InputFilter class to InputCollection, which may be clearer for people. What do you think?

The only problem is that people may be confused between the InputCollection and CollectionInputCollection (previously CollectionInputFilter). I'll try to have a better idea for this.


Comment

User: @texdc
Created On: 2013-08-09T15:06:32Z
Updated At: 2013-08-09T15:06:32Z
Body
I like the 'InputCollection' name! I've just started to use 'CollectionInputFilter', so I understand the issue. Why not just 'CollectionInput'?

On Aug 9, 2013, at 5:17 AM, Michaël Gallego [email protected] wrote:

Based on ocramius feedback, I've renamed the InputFilter class to InputCollection, which may be clearer for people. What do you think?

The only problem is that people may be confused between the InputCollection and CollectionInputCollection (previously CollectionInputFilter). I'll try to have a better idea for this.


Reply to this email directly or view it on GitHub.


Comment

User: @bakura10
Created On: 2013-08-10T08:21:41Z
Updated At: 2013-08-10T08:21:41Z
Body
@texdc , I'm not even sure the CollectionInputFilter is really needed. After all, the collection input filter does nothing more than applying the same input filter to a collection. Can't we simply run the same input filter against different set of data ? (if I managed to make the input filter stateless it's going to be even more true… but turning it stateless is really hard…)


Comment

User: @bakura10
Created On: 2013-08-10T08:39:19Z
Updated At: 2013-08-10T08:42:23Z
Body
By the way, here are some performance figures. I simply created a single input filter. Please note that my current implementation does a little more than the current one as each element are created using a factory (both input and input collection), so that now the global validator and filter plugin managers are always correctly injected (this was always hard in current implementation). Please note also that a major time during construction is instead spent on the service locator, therefore I've decided to create the element using new instead of using the plugin manager (after doing this test I realized how service manager is slow).

In order not to make the test wrong, no validator nor filters are attached to elements. This test just take into account the traversal time.

With 50 elements

ZF2 :
Input filter construction time: 0.0015020370483398 msec
Traversal time: 0.0062439441680908 msec
Memory consumption: 3 Mb

ZF3 :
Input filter construction time: 0.00066399574279785 msec
Traversal time: 0.0010840892791748 msec
Memory consumption: 2,75 Mb

With 500 elements

ZF2 :
Input filter construction time: 0.010568141937256 msec
Traversal time: 0.058469772338867 msec
Memory consumption: 6 Mb

ZF3 :
Input filter construction time: 0.0047299861907959 msec
Traversal time: 0.0099430084228516 msec
Memory consumption: 3,75 Mb

With 5000 elements

ZF2 :
Input filter construction time: 0.1059877872467 msec
Traversal time: 0.60562801361084 msec
Memory consumption: 37 Mb

ZF3 :
Input filter construction time: 0.048290014266968 msec
Traversal time: 0.10112690925598 msec
Memory consumption: 11,75 Mb

We can expect the difference to be even more when using nested input filters (which is quite common), but already, the implementation that uses RecursiveIteratorIterator, once most important features were back ported, traversal is around 6 times faster and construction 2 times faster, and consumes a lot less memory with a lot of elements.

Furthermore, all the validation group features, even with complex ones like Callback, come nearly for free with RecursiveIteratorIterator.


Comment

User: @bakura10
Created On: 2013-09-02T15:54:03Z
Updated At: 2013-09-02T15:54:03Z
Body
Hi everyone,

I've taken into account a lot of feedbacks and I've completely rewritten some parts of it.

The big news is that now input filter component is completely stateless. Both input and input collection. Because of this, I had to abandon the RecursiveIteratorIterator implementation that was not flexible enough. Now it uses IteratorIterator instead. As a consequence, perf improvements are a bit less drastic (especially in construction time).

How to use it

Now, input filter component is stateless. This means that there is no longer "isValid" method. Instead, you are expected to use the "validate" method:

// Note: the Factory dependency will be hanlded by a ZF factory
$inputCollection = new InputCollection(new Factory(new InputFilterManager()));

$inputCollection->add(array(
    'name' => 'foo'
));

$data = array('foo' => 'myValue');

$validationResult = $inputCollection->validate($data, $optionalContext);

if ($validationResult->isValid()) {
    $values = $validationResult->getValues();
    // or $rawValues = $validationResult->getRawValues()
} else {
    $error = $validationResult->getErrorMessages();
}

Because validation result is completely decoupled, it can now be serialized or converted to JSON:

json_encode($validationResult); // Output error messages in JSON

Validation result

One of the main goal of this refactor was, as I said it, to simplify it and remove a lot of crap because of everyone adding its edge cases.

Instead, I've written several methods that make it easy to extend default behaviour:

  • People can extend InputCollection and override the "buildValidationResult", if they need to construct a custom validation result that also contains unknown inputs, for instance.
  • You can also extend "buildErrorMessages", which is called for each input collection.

Also, the validation group feature got more powerful thanks to SPL filters. Iteration is not made using a simple foreach, but using an IteratorIterator:

$iterator         = $this->getValidationGroupFilter();
$iteratorIterator = new IteratorIterator($iterator);
$errorMessages    = array();

/** @var InputInterface|InputCollectionInterface $inputOrInputCollection */
foreach ($iteratorIterator as $inputOrInputCollection) {

SPL filter iterators are automatically executed for us and allow to accept or refuse a specific key.

By default, ZF attaches a "Zend\InputFilter\ValidationGroup\NoOpFilterIterator" filter that accepts all fields (it is the same as setting an empty array as validation group). But you can also attach more complex filter. For instance using an array:

$arrayFilter = new ArrayFilterIterator($inputCollection, array('foo'));
$inputCollection->setValidationGroupFilter($arrayFilter);

Now, only the "foo" input of this input collection will be validated.

It comes with other filter:

$callableFilter = new CallbackFilterIterator($inputCollection, function($key, $value) {});
$regexFilter = new RegexFilterIterator($inputCollection, '/user/');

This should open the way to very fine filtering, as an input collection can contain another input collection that contains its own validation group filter. In the future, I'll write the code to be able to specifcy an input filter through config:

$inputCollection->add(
    'type' => 'Zend\InputFilter\InputCollection',
    'name' => 'top',
    'children' => array(
        array('name' => 'bar')
    ),
    'validation_group_filter' => array(
        'type' => 'Zend\InputFilter\ValidationGroup\RegexFilterIterator',
        'options' => array(
            'regex' => '/blabla/'
        )
    )
));

Performance

I've made some performance test, once again without adding any validators/filters so that the test focus only on input filter:

ZF2:

5 elements,
3 input collections with 5 elements inside

Construction time : 0.00055599212646484 secondes
Traversal time : 0.0064549446105957 secondes
Memory: 3 Mb

50 elements,
10 input collections with 50 elements inside

Construction time : 0.0096118450164795 secondes
Traversal time : 0.11807513237 secondes
Memory: 6,5 Mb

100 elements,
25 input collections with 100 elements inside

Construction time : 0.033029079437256 secondes
Traversal time : 0.51178097724915 secondes
Memory: 19,75 Mb

ZF3

5 elements,
3 input collections with 5 elements inside

Construction time : 0.00034809112548828 secondes
Traversal time : 0.0011918544769287 secondes
Memory: 2,5 Mb

50 elements,
10 input collections with 50 elements inside

Construction time : 0.0085010528564453 secondes
Traversal time : 0.01695990562439 secondes
Memory: 2,75 Mb

100 elements,
25 input collections with 100 elements inside

Construction time : 0.031938076019287 secondes
Traversal time : 0.060458183288574 secondes
Memory: 4,25 Mb


Comment

User: @Ocramius
Created On: 2013-09-02T16:17:17Z
Updated At: 2013-09-02T16:17:17Z
Body
As discussed with @bakura10 on IRC, we should have the same interface for Input and InputFilter. An InputFilter is just a normal Input with an array or Traversable value. Otherwise, I see no difference with the Input API.

Any pluralization of method names in the API shall be removed.


Comment

User: @bakura10
Created On: 2013-09-02T16:22:36Z
Updated At: 2013-09-02T16:24:21Z
Body
Done.

So now InputCollection extends Input, and have the same "validate" method. I didn't change it yet but Input should return a ValidationResult too.

The question now is that InputFilter will receive an array of ValidationResult, so we must find an efficient way to inject the result into the main ValidationResult.

As @Ocramius suggested, I've alos removed the fallback value capability. In fact this is a Filter job. So maybe we should create a FallbackValueFilter, but this sohuld not be handled in Input Filter component.


Comment

User: @bakura10
Created On: 2013-09-02T17:47:51Z
Updated At: 2013-09-02T17:48:47Z
Body
I have integrated several new things:

  • Input collection can now have their own validators and filters. When an input collection is validated, it FIRST execute its own validators, and then execute the validators of all its children. On the other hand, when data is filtered, all children are filtered, THEN the filters attached to the collection is executed. This allow to have, for instance, a Date input collection with three inputs, and validate or filter based on this context. This should open some nice simplifications to the form component, especially with the Collection element.
  • The validation process is a bit complicatd, I'm not sure about hte best way to handle it.

Currently, whenever an input collection is validated, a new ValidationResult is built. If the input collection contains no errors, then the values are filtered. This means that in the following hiearrchy:

Input filter 1
    input 1
    input 2
    input filter 2
        input 3
        input 4

if input 3, 4 and input filter 2 are valid, then the values are filtered EVEN THOUGH input 1 failed. The filter step is "on an input collection" basis.

This may not be the most efficient way because we need to do the recursion twice... If you have any better idea, please shout!

I also need people to evaluate this: https://github.com/bakura10/zf2/blob/d0cfc2e6643732b1e5e020f6154a6617e7e4d20d/library/Zend/InputFilter/InputCollection.php#L157


Comment

User: @bakura10
Created On: 2013-09-02T19:14:05Z
Updated At: 2013-09-02T19:14:05Z
Body
I've been using IteratorAggregate, which is around 30% faster than implement IteratorIterator. We are now even faster than when using RecursiveIteratorIterator :D. Some funny benchmark with HEAVY input filters (450 inputs, and 50 sub-input collections with 450 elements inside):

ZF2:

Construction time : 0.47534704208374 secondes
Traversal time : 7.0274820327759 secondes
Memory consumption: 154 Mb

ZF3:

Construction time : 0.48701810836792 secondes
Traversal time : 0.83196616172791 secondes
Memory consumption: 18 Mb


Comment

User: @Netiul
Created On: 2013-09-03T06:58:37Z
Updated At: 2013-09-03T06:58:37Z
Body
Impressive, I must say. Keep up the great work!


Comment

User: @bakura10
Created On: 2013-09-03T09:58:39Z
Updated At: 2013-09-03T09:58:39Z
Body
Thanks @Netiul .

I've just noticed something strange, I'd like your feedback. Since this time, I've always thought that the input filters does : validation using the raw data, then filtering. However, it seems that I was wrong: https://github.com/zendframework/zf2/blob/master/library/Zend/InputFilter/Input.php#L309

The filtering is done prior to the validation. Is this something expected? Is this normal behaviour and what we want? I'm always lost about this.


Comment

User: @bakura10
Created On: 2013-09-03T12:26:28Z
Updated At: 2013-09-03T12:26:28Z
Body
Another massive performance improvement for today (around 20%), because the whole input collection is now traversed only once instead of twice.

This is possible thanks to the previous remark about filtering data, some talk with @Ocramius, and the new stateless that allows interesting optimizations.

First, the ValidationResult has been renamed to InputFilterResult. This is because I'll try to make Validator component completely stateless too. It will return ValidationResult, so it was important to make the two different by the naming.

About the performance improvement it comes from the reason of my previous comment:

I realized that current implementation of Input, the "isValid" method filter data before validating it. In fact, it makes sense (if you have an input with a "Alnum" validator and a trimming Filter, you want " abc " to be valid). The thing is that in ZF2, values were filtered twice: once when the "isValid" method of the input filter is called, and another time you do "getValues". This is useless work.

Now, when the "validate" method for an input is called, as the value is filtered anyway, it's stored directly in the ValidationResult, and aggregated by the ValidationResult of the Input Collection. So calling "getData" from the input collection does not incur another recursion.

The second thing is that as I said before, InputCollection now can have their validators and filters (which shuld remove the need of the previous CollectionInputFilter class). I'm not sure about the workflow:

Let's assume the following input collection, called "coll", which contains two inputs "a", "b".

When calling validate, the following happens:

  1. The InputCollection validators (if any) are run. The array that the validator will receive will by ["a" => ..., "b" => ...].
    1.1 If invalid, register error messages.
    1.2 If input collection is configured to break on failure, we immediately return an input filter result.
  2. Otherwise, each inputs (a and b) are filtered and validated.
  3. If everything is correct, the input filter result is built, and the input collection filters are run for the value (in this case, the filtered values of a and b).

Does the order make sense to everyone?

Another question I have is: if an input has failed (not valid), should we insert the filtered value into the input filter result ?

Remark: because InputFilter is a stateless component now, I think we can safely switch the plugin manager to share instance by default. This should be a pretty memory consumption improvment for people using a lot of input filters.


Comment

User: @dphn
Created On: 2013-09-03T12:35:53Z
Updated At: 2013-09-03T12:35:53Z
Body
Tell me, please, why do not you rename InputFilter to InputSpecification? After all, it contains both filters and validators?


Comment

User: @bakura10
Created On: 2013-09-03T12:37:35Z
Updated At: 2013-09-03T12:37:35Z
Body
??? You're mixing everything again... InputSpecifciation does not exist. No classes have been renamed to InputSpecification.

InputFilter has been however renamed to InputCollection, because it represents more what it is (a collection of inputs).


Comment

User: @dphn
Created On: 2013-09-03T12:38:54Z
Updated At: 2013-09-03T12:38:54Z
Body
Thank you, I get it. So really better.


Comment

User: @weierophinney
Created On: 2013-09-03T16:10:05Z
Updated At: 2013-09-03T16:10:05Z
Body

The filtering is done prior to the validation. Is this something expected? Is this normal behaviour and what we want? I'm always lost about this.

@bakura10 :

Yes, this is normal behavior, and has been since ZF1. The idea is that you do normalization tasks, and then validate the normalized data (this is particularly useful for things like phone numbers and CC numbers, to allow stripping non-digit input such as spaces, dots, parens, etc). I've seen some requests in the past for a post-filter process as well, but the idea has never gained traction.


Comment

User: @bakura10
Created On: 2013-09-03T16:11:22Z
Updated At: 2013-09-03T16:11:22Z
Body
Ok thanks to confirm this then :).


Comment

User: @dphn
Created On: 2013-09-05T16:25:15Z
Updated At: 2013-09-05T16:25:15Z
Body
I wonder, in the third version will remain a mass func_get_arg () in the constructor of each validator?


Comment

User: @Ocramius
Created On: 2013-09-05T16:35:37Z
Updated At: 2013-09-05T16:35:37Z
Body
@dphn not sure if that's relevant to this PR - the main focus here is moving to a stateless interface first. Otherwise we'd need an Options object for every validator, which is probably a further step for a different PR


Comment

User: @dphn
Created On: 2013-09-05T16:42:16Z
Updated At: 2013-09-05T16:42:16Z
Body
I'm sorry, I just took the most active channel of communication :) Not anymore.


Comment

User: @dphn
Created On: 2013-09-05T22:00:49Z
Updated At: 2013-09-05T22:00:49Z
Body

The filtering is done prior to the validation. Is this something expected? Is this normal behaviour and what we want? I'm always lost about this.

It would be nice if it were true. I'm trying to add a filter in the multiupload form, but it is always executed after validators.


Comment

User: @bakura10
Created On: 2013-09-05T22:02:35Z
Updated At: 2013-09-05T22:02:35Z
Body
Yes, this is expected for file upload though and won't change. For file it makes sense to first validate and then filter (which include things like file upload).


Comment

User: @dphn
Created On: 2013-09-05T22:11:39Z
Updated At: 2013-09-05T22:11:39Z
Body
I think that uploading files is not a task to be performed by the filter. If the filters and validators can be added to Form as listeners, this problem would not exist. But what can I do in the current situation?


Comment

User: @bakura10
Created On: 2013-09-05T22:13:49Z
Updated At: 2013-09-05T22:13:49Z
Body
In fact, I was a bit vague. Filters does not "upload", they rename. Eventually, they could rename and move to something else, which would upload the file to another server. But if you think at it, "moving" or "renaming" can be the job of a filter.



Originally posted by @GeeH at zendframework/zend-inputfilter#111

@weierophinney
Copy link
Member Author

@bakura10 , @weierophinney

I think validators should always be run before the filters.

Four years ago, I had an issue with the DateTimeFormatter filter. I haven't tested whether this issue still occurs (this should always be, if I rely on the its code), the DateTimeFormatter filter raised an error because the value wasn't a date, and it could not convert it. This issue occurred because it tried to filter the value before it was validated.

If I take your example on phone numbers, the user may have intentionally put a space and expects to get it in this field. If you filter it before validating it, the InputFilter will say "Everything is ok" without informing the user of the problem, and will process with a value that the user didn't expect to send. If you want to avoid this space in phone number's field, return an error during validation informing the user about the allowed formats.

It doesn't make sense to filter the value before its validation. If I take the example of a specific input type, such as dates, we should check if the value is really a date before converting it to the expected date format. If we filter it before, it makes no sense to validate it: if the value is converted to a date, it will be a date anyway.

For these reasons I think the value should be validated before being filtered. The FileInputFilter already works in this way, it validates that the value is a file before working with it.


Originally posted by @ghost at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

Dates and files are special cases. Most other form data often can have superfluous characters and/or require some form of casting before it can be tested for validity. As examples:

  • radio buttons that need to evaluate to booleans
  • phone numbers (which can be formatted numerous ways, even within the same country)
  • credit card numbers (where white space or hyphens are irrelevant, yet often included in input)
  • textual input, where whitespace may be accidently introduced before or after the value
  • numbers, which are submitted as strings

Even with dates, the input can vary based on the input type (is it a time field? A date field? A month?).

It was for these reasons we introduced filters as a pre-validation workflow in zf1's form component, and why the practice persisted into this component in zf2. These normalization filters are primarily intended to strip unwanted characters prior to validation; if they cannot handle the input, they are expected to return the original verbatim.

For things like dates, I would likely validate that the value is correctly formatted, and can be car to a DateTime instance without error; if cast it after validation. For this reason, I'd recommend a factor of the component provide post filtering capabilities, and that we rename the current filters to be called normalizers instead.

However, removing pre-approval filtering/normalization is a non-starter, for the reasons explained above.


Originally posted by @weierophinney at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

radio buttons that need to evaluate to booleans

There is no radio button validator. If you are talking about the Radio element of Zend\Form, it uses the InArray validator which already casts data internally, as it's a requirement for this validator. Instead of running filters before validation for this use case, I would rather say "this validator depends on a filter", and therefore, inject the filter as a dependency for this validator.

phone numbers (which can be formatted numerous ways, even within the same country)

Same as radio button, however it uses the Regex validator that is too generic to inject it a dependency. In this case, it would be better to create a new Tel validator with its dependencies.

credit card numbers (where white space or hyphens are irrelevant, yet often included in input)

I never used the CreditCard validator, but if it requires filtering, inject a filter.

textual input, where whitespace may be accidently introduced before or after the value

As I said before, whitespaces may have been accidentally introduced ... or not. Although they are often unwanted, we can't assume that this is always the case. The question is more "Are whitespaces before or after input allowed?" than "Were the whitespaces wanted?".

numbers, which are submitted as strings

The validator may have an option to force strict comparison, and if it is not used, it must filter the value before processing using its dependencies. In PHP, number validators filter the value internally before proceeding with the validation (is_int, is_float, is_numeric, ...).

Even with dates, the input can vary based on the input type (is it a time field? A date field? A month?).

I don't understand what you mean. If we need a date, we ask the user to input a date. Once we know the date is valid, we convert it to the expected format.
In HTML there is no filter before validation (maxlength attribute, pattern attribute, number input, date input,...).

These normalization filters are primarily intended to strip unwanted characters prior to validation; if they cannot handle the input, they are expected to return the original verbatim.

Returning the original value when we can't filter it can lead to security issue if no validator is run after the filter.

Of course, doing pre- and post-filtering will correct these issues, but in my opinion the pre-filtering process is unnecessary.


Originally posted by @ghost at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

@Seryus

Although they are often unwanted, we can't assume that this is always the case.

This sentence is irrelevant, because zend-inputfilter does not force you to use a filter to remove the whitespace. It's your choice!

…in my opinion the pre-filtering process is unnecessary.

A simple example: a text field for the age of a person. Many values are possible:
(_ for whitespace)

  1. 40
  2. _40
  3. 40_
  4. _40_

With pre-filtering all values will pass the validator.
Without pre-filtering only the number 1. will pass the validator.

Now the question: which filtering is user-friendly? 😉


@weierophinney

The current problem is the "background magic" like the FileInput class, which runs the filter after validation. It's correct, but too much magic and it is a different behaviour opposite to all other input classes.

For this reason, I'd recommend a factor of the component provide post filtering capabilities, and that we rename the current filters to be called normalizers instead.

This will remove the "background magic" and introduce many more possibilities for the InputFilter. 👍


Originally posted by @froschdesign at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

@froschdesign
Indeed, filtering before validation is more user-friendly in this case, but this is not the normal behavior. If you test these values in a input[type="number"], only the number 1 will pass form submission.


Originally posted by @ghost at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

@Seryus

If you test these values in a input[type="number"], only the number 1 will pass form submission.

Never trust any input! You do not know if the value comes from this form field.


Originally posted by @froschdesign at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

@froschdesign
Of course, this was just a comparison with existing features.


Originally posted by @ghost at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

Filtering should always come before validation, otherwise, validation may pass pre-filtering, but fail post-filtering.


Originally posted by @GeeH at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

@GeeH depends on use case. I had situations when I had to change the value after validation, but also had situations when I had to normalize input to validate. So to have both would be perfect.


Originally posted by @svycka at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

@GeeH
To me, filters are there to secure user inputs and/or to convert them into values that we will work with. Put a filter that will make the validation failed has no meaning.

When I code, I write:

function($arg)
{
    if(!is_int($arg)) {
        throw new Exception('Integer expected.');
    }

    $arg = intval($arg);
    // ...
}

and not:

function($arg)
{
    $arg = intval($arg);

    if(!is_int($arg)) {
        throw new Exception('Integer expected.');
    }
    //...
}

Originally posted by @ghost at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

I think you need to consider that the framework is written for the 90% usecase and not for your specific uses. I agree, your examples are totally valid, but what you are talking about is type casting and not filtering here. Considering that all form data gets passed as a string, your example of using is_int as a validator would be useless anyway.

I'm not saying that there are no situations when you would need to validate before filtering, I'm saying that the 90% use case is to filter before validating, and that's what we'll have to support.

With that said and after thinking more about it I think the ability to attach pre-filters for cleaning input, and post-filters for casting etc is a good idea and I'd like to see it implemented if there's no BC break.


Originally posted by @GeeH at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

@GeeH

…I'd like to see it implemented if there's no BC break.

The milestone for this issue is 3.0.0 and so BC breaks are allowed. 😃
It would be great if an input filter configuration of version 2 also works in version 3.


Originally posted by @froschdesign at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

@GeeH
Type casting is a form of filtering.
I can't see any valid example of the need to perform the whole filtering process prior to validation, that's what I'm trying to say. Could you give me one valid use case ?


Originally posted by @ghost at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

@Seryus
See my example above with the age of a person. This is a typical and valid use case. Or what do you mean?


Originally posted by @froschdesign at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

In your example, the validator won't validate the user input, it will validate the filter output. If the validation passes, the user won't be notified that its input was wrong and has been modified. For this reason, filtering before validation is a bad pratice.

Of course, if the field expects to get the age of a person, whitespaces were probably not wanted. In this case, it would be better to configure the validator to allow whitespaces, and then filter the value.


Originally posted by @ghost at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

For this reason, filtering before validation is a bad pratice.

Without pre-filtering you will generate frustration on the user side. One whitespace at the beginning of an email address, a whitespace in the middle of a bank account number, empty lines at the end of a user message and so on.
Pre-filtering should help the user, that's the goal. On the other side pre-filtering should not convert the input to pass any validator!
Therefore I like and prefer @weierophinney suggestion to name it "normalization". (…and add post-filtering)

Of course, if the field expects to get the age of a person, whitespaces were probably not wanted. In this case, it would be better to configure the validator to allow whitespaces, and then filter the value.

You want really allow whitespace in an email validator?
How should this work with PHP's filter_var function?

var_dump(filter_var('[email protected]', FILTER_VALIDATE_EMAIL)); // '[email protected]'
var_dump(filter_var(' [email protected]', FILTER_VALIDATE_EMAIL)); // false
var_dump(filter_var('[email protected] ', FILTER_VALIDATE_EMAIL)); // false
var_dump(filter_var(' [email protected] ', FILTER_VALIDATE_EMAIL)); // false

Originally posted by @froschdesign at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

I would prefer to use the HTML5 form input or JavaScript to avoid this frustration on the user side.

About allowing whitespaces in email validator, I meant if the developer wants, it can do so by extending the Zend Validator and using dependency injection. We shouldn't allow whitespaces in the validator in general.

However, I understand your point.

So if we rename it and add post-filtering, should we create a new component? Or should we just use a new NormalizationChain in Input/InputCollection?
I would like to contribute, to make things happens. Or maybe @bakura10 will do it ?


Originally posted by @ghost at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

I would prefer to use the HTML5 form input or JavaScript to avoid this frustration on the user side.

You can not trust the client side, because you don't know them.

So if we rename it and add post-filtering, should we create a new component?

In my opinion a new component should not the first goal. I will quote myself:

It would be great if an input filter configuration of version 2 also works in version 3.

Add your ideas here or provide a some code snippets. For example:

  • the internal workflow of the input filter
  • the new configuration structure

Please look also at the RFC for the new validator component. Some code can you find in this repository: https://github.com/weierophinney/zend-datavalidator

Thanks for your contribution! 👍


Originally posted by @froschdesign at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

@Seryus
Btw. you can also use the forum: https://discourse.zendframework.com/


Originally posted by @froschdesign at zendframework/zend-inputfilter#111 (comment)

@weierophinney
Copy link
Member Author

I'm looking at it, I didn't know this forum existed, thank you!


Originally posted by @ghost at zendframework/zend-inputfilter#111 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants