Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instr_dict should contain instruction format #149

Open
MatthewNielsen27 opened this issue Dec 12, 2022 · 9 comments
Open

instr_dict should contain instruction format #149

MatthewNielsen27 opened this issue Dec 12, 2022 · 9 comments

Comments

@MatthewNielsen27
Copy link

This repo has been very useful for gathering information about the ISA in a programmatic way. I think one of the shortcomings is that it doesn't contain any information about the instruction format (ex: I, J, U ...). This could be used to programmatically verify that implementations are handling the instructions in a correct manner.

Currently I do this manually by cross-referencing the spec itself. Is there a better way to do this?

@aswaterman
Copy link
Member

The basic formats are an illustrative abstraction. There are so many minor formats (see the C and Zcb extensions) that doing this fully and correctly would require naming several currently unnamed formats and labeling all of the instructions manually. It’s not an insurmountable task, and it does have some value, but it isn’t quick or automatable.

@MatthewNielsen27
Copy link
Author

@aswaterman Thanks for your insight. It makes sense that my proposed solution is not robust because there are lots of unnamed formats. With this in mind, do you think the better approach is to support #136? I believe that these attempt to solve the same issue.

I'm willing to contribute in whatever way is needed. I think this is important for the RISCV spec. Let me know if this is something you'd be interested in.

@aswaterman
Copy link
Member

Something along those lines would allow programmatically indicating which instruction bits map to which immediate bits without labeling every instruction format, which might be more appropriate. I think we'd need a volunteer to propose a concrete format and get feedback on it before proceeding. @neelgala and I should be in the loop on such an effort.

@MatthewNielsen27
Copy link
Author

Sure thing. If you're open to it, I can take some time to come up with a complete schema and we can begin iteration from there.

@aswaterman
Copy link
Member

Go for it. Thanks for volunteering!

@MatthewNielsen27
Copy link
Author

So one option is to define a json schema. The benefit of this is that it can be trivially parsed and validated. Users would only have to worry about using the data. The downside to this approach is its verbosity.

Data example:

{
  "JAL": {
    "rd": [
      { "dst": [0, 5],    "src": [7, 11]}
    ],
    
    "imm20": [
      { "dst": [ 0],      "tie": 0 },
      { "dst": [ 1, 10],  "src": [21, 30] },
      { "dst": [11],  	  "src": [20] },
      { "dst": [12, 19],  "src": [12, 19] },
      { "dst": [20, 31],  "src": [31] }
    ]
  }, ...
}

Schema:

{
  "$schema": "https://json-schema.org/draft-07/schema",
  "$id": "risc-v://variable-encodings.schema.json",
  "type": "object",
  "additionalProperties": {
    "$ref": "#/$defs/VariableEncoding"
  },
  "$defs": {
    "VariableEncoding": {
      "type": "object",
      "additionalProperties": {
        "$ref": "#/$defs/Encoding"
      }
    },
    "Encoding": {
      "type": "array",
      "items": {
        "oneOf": [
          {
            "$ref": "#/$defs/ConstMapping"
          },
          {
            "$ref": "#/$defs/RangeMapping"
          }
        ]
      }
    },
    "RangeMapping": {
      "type": "object",
      "properties": {
        "src": {
          "$ref": "#/$defs/InclusiveRange"
        },
        "dst": {
          "$ref": "#/$defs/InclusiveRange"
        }
      },
      "required": [
        "src",
        "dst"
      ]
    },
    "ConstMapping": {
      "type": "object",
      "properties": {
        "tie": {
          "type": "integer",
          "inclusiveMin": 0,
          "inclusiveMax": 1
        },
        "dst": {
          "$ref": "#/$defs/InclusiveRange"
        }
      },
      "required": [
        "tie",
        "dst"
      ]
    },
    "InclusiveRange": {
      "type": "array",
      "items": {
        "type": "integer"
      },
      "minItems": 1,
      "maxItems": 2
    }
  }
}

@ghost
Copy link

ghost commented May 9, 2023

Hello, I'd like this to happend too.

How about add another file (eg: formats), eg:

# type
$type:u rd  imm        31..12=imm[19..0] 11..7=rd[4..0] 6..0=opcode[6..0]
$type:j rd  imm        31=imm[20] 30..21=imm[10..1] 20=imm[11] 19..12=imm[19..12] 11..7=rd[4..0] 6..0=opcode[6..0]
$type:i rd  rs1 imm    31..20=imm[11..0] 19..15=rs1[4..0] 14..12=funct3[2..0] 11..7=rd[4..0] 6..0=opcode[6..0]
$type:b rs1 rs2 imm    31=imm[12] 30..25=imm[10..5] 24..20=rs2[4..0] 19..15=rs1[4..0] 14..12=funct3[12..10] 11..8=imm[4..1] 7=imm[11] 6..0=opcode[6..0]
$type:r rd  rs1 rs2    31..25=funct7[6..0] 24..20=rs2[4..0] 19..15=rs1[4..0] 14..12=funct3[2..0] 11..7=rd[4..0] 6..0=opcode[6..0]
$type:s rs1 rs2 imm    31..25=imm[11..5] 24..20=rs2[4..0] 19..15=rs1[4..0] 14..12=funct3[2..0] 11..7=imm[4..0] 6..0=opcode[6..0]

# rv_i
lui     $type:u  opcode=0b0110111
auipc   $type:u  opcode=0b0010111
jal     $type:j  opcode=0b1101111
jalr    $type:i  opcode=0b1100111  funct3=0b000
beq     $type:b  opcode=0b1100011  funct3=0b000
bne     $type:b  opcode=0b1100011  funct3=0b001
blt     $type:b  opcode=0b1100011  funct3=0b100
bge     $type:b  opcode=0b1100011  funct3=0b101
bltu    $type:b  opcode=0b1100011  funct3=0b110
bgeu    $type:b  opcode=0b1100011  funct3=0b111
lb      $type:i  opcode=0b0000011  funct3=0b000
lh      $type:i  opcode=0b0000011  funct3=0b001
lw      $type:i  opcode=0b0000011  funct3=0b010
lbu     $type:i  opcode=0b0000011  funct3=0b100
lhu     $type:i  opcode=0b0000011  funct3=0b101
sb      $type:s  opcode=0b0100011  funct3=0b000
sh      $type:s  opcode=0b0100011  funct3=0b001
sw      $type:s  opcode=0b0100011  funct3=0b010
addi    $type:i  opcode=0b0010011  funct3=0b000
slti    $type:i  opcode=0b0010011  funct3=0b010
sltiu   $type:i  opcode=0b0010011  funct3=0b011
xori    $type:i  opcode=0b0010011  funct3=0b100
ori     $type:i  opcode=0b0010011  funct3=0b110
andi    $type:i  opcode=0b0010011  funct3=0b111
add     $type:r  opcode=0b0110011  funct3=0b000  funct7=0b0000000
sub     $type:r  opcode=0b0110011  funct3=0b000  funct7=0b0100000
sll     $type:r  opcode=0b0110011  funct3=0b001  funct7=0b0000000
slt     $type:r  opcode=0b0110011  funct3=0b010  funct7=0b0000000
sltu    $type:r  opcode=0b0110011  funct3=0b011  funct7=0b0000000
xor     $type:r  opcode=0b0110011  funct3=0b100  funct7=0b0000000
srl     $type:r  opcode=0b0110011  funct3=0b101  funct7=0b0000000
sra     $type:r  opcode=0b0110011  funct3=0b101  funct7=0b0100000
or      $type:r  opcode=0b0110011  funct3=0b110  funct7=0b0000000
and     $type:r  opcode=0b0110011  funct3=0b111  funct7=0b0000000
fence   $type:i  opcode=0b0001111  funct3=0b000
ecall   $type:i  opcode=0b1110011  funct3=0b000
ebreak  $type:i  opcode=0b1110011  funct3=0b000

# rv_m
mul     $type:r  opcode=0b0110011  funct3=0b000  funct7=0b0000001
mulh    $type:r  opcode=0b0110011  funct3=0b001  funct7=0b0000001
mulhsu  $type:r  opcode=0b0110011  funct3=0b010  funct7=0b0000001
mulhu   $type:r  opcode=0b0110011  funct3=0b011  funct7=0b0000001
div     $type:r  opcode=0b0110011  funct3=0b100  funct7=0b0000001
divu    $type:r  opcode=0b0110011  funct3=0b101  funct7=0b0000001
rem     $type:r  opcode=0b0110011  funct3=0b110  funct7=0b0000001
remu    $type:r  opcode=0b0110011  funct3=0b111  funct7=0b0000001

# rv_a
$type:a  rd rs1 rs2 aq rl  31..23=funct5[4..0] 22=aq[0] 21=[rl] 24..20=rs2[4..0] 19..15=rs1[4..0] 14..12=funct3[2..0] 11..7=rd[4..0] 6..0=opcode[6..0]
lr.w       $type:a  opcode=0b0101111  funct3=0b010  funct5=0b00010  rs2=0
sc.w       $type:a  opcode=0b0101111  funct3=0b010  funct5=0b00011
amoswap.w  $type:a  opcode=0b0101111  funct3=0b010  funct5=0b00001
amoadd.w   $type:a  opcode=0b0101111  funct3=0b010  funct5=0b00000
amoxor.w   $type:a  opcode=0b0101111  funct3=0b010  funct5=0b00100
amoand.w   $type:a  opcode=0b0101111  funct3=0b010  funct5=0b01100
amoor.w    $type:a  opcode=0b0101111  funct3=0b010  funct5=0b01000
amomin.w   $type:a  opcode=0b0101111  funct3=0b010  funct5=0b10000
amomax.w   $type:a  opcode=0b0101111  funct3=0b010  funct5=0b10100
amominu.w  $type:a  opcode=0b0101111  funct3=0b010  funct5=0b11000
amomaxu.w  $type:a  opcode=0b0101111  funct3=0b010  funct5=0b11100

Similiar to rv_* files:

  • '#' for comment
  • '$type:X' defined a macro, will be substituted into instruction line.
  • instruction line has INST ARG1 ARG2 .. ARGX <bit encoding assignments> <constants/arguments assignments>
    for lr.w, arguments are rd rs1 aq rl, rs2 assigned as a constant.

In parser.py we can lookup a format by name for the instruction, and add arguments with bit encoding assignments data to it if exists.

@MatthewNielsen27
Copy link
Author

I'm still fine to take this on if we can settle on a format. I know my json schema was probably a little too verbose!

@ghost
Copy link

ghost commented May 10, 2023

I'm still fine to take this on if we can settle on a format.

Yes, let's see how it will work.

@neelgala @aswaterman a friendly ping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants