Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

submissionXmlToFieldStream: handle stream end properly #1256

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
15 changes: 10 additions & 5 deletions lib/data/submission.js
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,14 @@ const submissionXmlToFieldStream = (fields, xml, includeStructuralAttrs = false,
const stack = new SchemaStack(fields, true);
let textBuffer = ''; // agglomerates text nodes that come as multiple events.
const parser = new hparser.Parser({
onend: () => {
if (!stack.hasExited()) {
outStream.destroy(new Error('Stack has not exited yet'));
} else {
parser.reset();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel quite sure about what parser.reset() is doing here. I don't see reset() mentioned in the docs for htmlparser2 (though I know we're using an old version). If the parser has ended, and we're not reusing the parser, why do we need to reset it? Part of the reason I'm asking is that I wonder whether reset() had more of an effect before. Could reset() have had more of an effect when it was called in onclosetag, like causing the parser to stop parsing early?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel quite sure about what parser.reset() is doing here.

Copied from current line 70. I'm also not sure why it's required, but my guess is similar to yours. Would you rather remove it?

outStream.push(null);
}
},
onopentag: (name, attrs) => {
const field = stack.push(name);
if (field != null) {
Expand All @@ -58,18 +66,15 @@ const submissionXmlToFieldStream = (fields, xml, includeStructuralAttrs = false,
textBuffer += text;
},
onclosetag: () => {
if (stack.hasExited()) throw new Error('Stack has already exited');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel totally sure about throwing here. Is it impossible for there to be XML after the last tag for the fields in the SchemaStack? Could this function (as well as onopentag and ontext) just early-return if the stack has exited?

If it is the case that we should throw, I feel like the throw should happen in onopentag. Rather than waiting until the close tag, if we can tell at the open tag that there will be a problem, I think it'd be better to throw sooner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel totally sure about throwing here. Is it impossible for there to be XML after the last tag for the fields in the SchemaStack?

I'm not sure, but I expect having a close tag after the stack is empty represents something very unexpected happening.

If it is the case that we should throw, I feel like the throw should happen in onopentag.

I don't think there's any guarantee there will be an opentag after a dangling closetag, but maybe both cases should throw?


const field = stack.pop();

if (textBuffer !== '' || includeEmptyNodes) {
if ((field != null) && !field.isStructural()) // don't output useless whitespace
outStream.push({ field, text: textBuffer });
textBuffer = '';
}

if (stack.hasExited()) {
parser.reset();
outStream.push(null);
}
}
}, { xmlMode: true, decodeEntities: true });

Expand Down
27 changes: 27 additions & 0 deletions test/unit/data/submission.js
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,33 @@ describe('submission field streamer', () => {
});
});

it('should throw given random text', (done) => {
fieldsFor(testData.forms.simple).then((fields) => {
const stream = submissionXmlToFieldStream(fields, 'this is not an XML');
stream.on('data', () => () => {});
stream.on('error', () => done());
stream.on('end', () => done(new Error('should have emitted error event')));
});
});

it('should throw given null xml', (done) => {
fieldsFor(testData.forms.simple).then((fields) => {
const stream = submissionXmlToFieldStream(fields, null);
stream.on('data', () => () => {});
stream.on('error', () => done());
stream.on('end', () => done(new Error('should have emitted error event')));
});
});

it('should throw given undefined xml', (done) => {
fieldsFor(testData.forms.simple).then((fields) => {
const stream = submissionXmlToFieldStream(fields, undefined);
stream.on('data', () => () => {});
stream.on('error', () => done());
stream.on('end', () => done(new Error('should have emitted error event')));
});
});

it('should not crash given malformed over-closing xml', (done) => {
fieldsFor(testData.forms.simple).then((fields) => {
const stream = submissionXmlToFieldStream(fields, '<data></goodbye></goodbye></goodbye>');
Expand Down
Loading