Skip to content

Conversation

@furesoft
Copy link
Contributor

@furesoft furesoft commented Nov 10, 2024

Added barebones of matching

ToDo:

  • Outputs
  • Matching
    • Constants
      • NumberMatching
      • String Matching 'world'
      • String Operations
        • StartsWith: 'hello'*
        • EndsWith: *'hello'
        • Contains: 'hello'
        • Match instructions already seen (e.g. "(sub {lhs: (add {x} $y)} $y)") - the first occurance of an $ will be saved and later will be compared with it and make it available to output subexpressions that matches a pattern
    • Ignore Argument "_"
    • Not Operator "!"
    • Match Sub Pattern
    • Match Binary Instructions
    • Match Other Instructions
    • Match Type Specifier (add {x}#const (mul {y}#instr)) and (add #const {x})
    • Number Operations
      • > <
    • Update Docs
    • Replacement
      • Replace with buffered values
      • Replace with constant
      • Replace with new instruction
      • replace with evaluation like (add $x#const $y#const) -> eval($x + $y)

{
if (name != null)
{
Outputs[name] = (IntPtr)Unsafe.AsPointer(ref Unsafe.AsRef(in value));
Copy link

@neon-sunset neon-sunset Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no. This should never be done! Unsafe.AsPointer is the most dangerous method on Unsafe class because this is guaranteed to introduce random crashes because it is illegal to reinterpret a byref that points to managed memory without pinning that said memory. It's unfortunate it does not come with a compiler warning/error.

(otherwise thank you for contributing to DistIL as long as Unsafe.AsPointer is not used like this :D )

Copy link
Contributor Author

@furesoft furesoft Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an idea how to implement the output of values else?

btw: dubiousconst282 suggested this

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning here was that this method is only supposed to be used with local variables, which are guaranteed to not be moved around by the GC, and where I expect this should be mostly fine.

It doesn't change the fact that it's a scary hack, but the long-term plan is to use a source interceptor to replace the interpolation handlers with specialized matchers to avoid the overhead from parsing/matching.

Copy link

@neon-sunset neon-sunset Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. The issue with byrefs is they can e.g. point to variables lifted into a display class by a closure, which means they would no longer point to stack. Byrefs are also necessary to keep objects alive - pointers will not and because .NET has precise GC tracking it means that the array of pointers to stack locations is insufficient to keep objects stored at those locations alive. This is often not obvious and not immediately observable making it difficult to overstate just how dangerous .AsPointer is.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info, I was not aware of these problems. We'll probably have to look for another solution to get the output values then.

pointers to stack locations is insufficient to keep objects stored at those locations alive

Just to clarify, does this mean that object x; *(&x) = new(); (e.g. a roundtrip through ldloca/conv.u/stind) is effectively a GC hole and it's only safe to store managed objects to refs/tracked pointers?

Copy link

@neon-sunset neon-sunset Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, does this mean that object x; *(&x) = new(); (e.g. a roundtrip through ldloca/conv.u/stind) is effectively a GC hole and it's only safe to store managed objects to refs/tracked pointers?

Yes, even though the simplified snippet like this won't easily trigger it. The above example is more complex and is definitely a GC hole. Even the snippet itself - if this local is the last place where it is observed, then the GC will be free to collect the object since there is no object or managed reference to keep it alive.

By reinterpreting byrefs into pointers it completely breaks the tracking and allows GC to collect Value objects the T here is constrained on. This could theoretically be less problematic for T satisfying unmanaged constraint but it does not address a scenario where a byref points to a field on the heap.

Copy link
Contributor Author

@furesoft furesoft Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like this?

if (instr.Match("(add {x} (mul _ {y}))", out var outputs) {
    var x = outputs.Get<ConstInt>("x");
    var y = outputs.Get<Instruction>("y");
}

or

if (instr.Match("(add {x} (mul _ {y}))", out var outputs) {
    var x = outputs["x"];
    var y = outputs[1];
}

@furesoft furesoft force-pushed the feature/matching-dsl branch 4 times, most recently from 4a32e84 to 5247832 Compare November 10, 2024 17:33
@furesoft furesoft changed the title Feature/matching dsl Feature/matching dsl related to #20 Nov 10, 2024
@furesoft furesoft force-pushed the feature/matching-dsl branch from 5247832 to 66c5717 Compare November 10, 2024 17:56
@furesoft furesoft force-pushed the feature/matching-dsl branch from 66c5717 to f624f90 Compare November 10, 2024 18:54
@furesoft furesoft force-pushed the feature/matching-dsl branch from f18a03f to 4a21df8 Compare November 16, 2024 12:32
@furesoft
Copy link
Contributor Author

furesoft commented Nov 26, 2024

There is a problem with the replacement and I am not sure if its relevant or how we could handle it.

If I have this expression: (sub {lhs:(add $x $y)} $y) -> $x that matches and replace the sub-instruction with the value of $y. But if the value is a constant it cannot be replaced, because constants are no valid instructions therefor there forbidden in the root level of a basic block. The ReplaceWith method cannot handle non Instruction values

if (insertIfInst && newValue is Instruction newInst && newInst.Block == null) {

If I run the pattern it deletes the binary instruction. I had the idea to replace it with an + 0 but this wouldn't be optimal. Should we disallow this or do you have other ideas?

@dubiousconst282
Copy link
Owner

dubiousconst282 commented Nov 26, 2024

You can use ReplaceUses() instead of ReplaceWith() to replace uses of the instruction and keep it in the block, constants still derive from Value and they will be handed correctly. To avoid leaving unused instructions behind, I suggest adding a small check like if (inst.NumUses == 0) inst.Remove().

@furesoft furesoft marked this pull request as ready for review November 30, 2024 13:21
@furesoft
Copy link
Contributor Author

Closes #20


for (int index = 0; index < pattern.Arguments.Count; index++) {
Value? operand = instruction.Operands[index];
matched &= MatchValue(operand, pattern.Arguments[index], outputs);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: matching can stop as soon as a false match is found (not super important, but maybe useful until we get to do the source generator thing)

private static bool MatchUnary(UnaryInst un, InstructionPattern pattern, OutputPattern outputs)
{
var operation = pattern.OpCode;
var op = (UnaryOp)(operation - (Opcode._Bin_First + 1));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add the unary opcodes in the enum, otherwise this should probably be commented as todo.

(also this enum is a mess, purely my fault, it would probably be nice to have helpers like Get/Is BinaryOp() and etc.)

Comment on lines 21 to 22
_module = _modResolver.Create("Test");
_testType = _module.CreateType("Test", "Stub");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be causing tests to fail when running them all at once (class is instantiated in parallel once per run(?), then gets a duplicate key exception).

I'm not sure how to best approach this but i think it would be easier to just pre-declare declare stub types/methods in the tests/misc/TestAsm project instead

Comment on lines 20 to 22
_module = _modResolver.Create("Test");
_testType = _modResolver.Resolve("TestAsm").FindType(null, "Stub")!;
_stub = _testType.FindMethod("StubMethod");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still crashing for me, I got it working with the following but some of the tests are still failing

Suggested change
_module = _modResolver.Create("Test");
_testType = _modResolver.Resolve("TestAsm").FindType(null, "Stub")!;
_stub = _testType.FindMethod("StubMethod");
_module = _modResolver.Resolve("TestAsm");
_testType = _module.FindType(null, "MatcherStub")!;
_stub = _testType.FindMethod("StubMethod");

[xUnit.net 00:00:01.06] DistIL.Tests.IR.MatchingTests.TestNumberOperator [FAIL]
[xUnit.net 00:00:01.06] DistIL.Tests.IR.MatchingTests.TestCallMatchOutput [FAIL]
[xUnit.net 00:00:01.06] DistIL.Tests.IR.MatchingTests.TestUnary [FAIL]
[xUnit.net 00:00:01.07] DistIL.Tests.IR.MatchingTests.Test_Strings [FAIL]

Comment on lines 1 to 3
namespace TestAsm;

public class Stub {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace TestAsm;
public class Stub {
public class MatcherStub
{

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants