Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support verifying test implementation on both ARM and x86 #31

Open
howjmay opened this issue Nov 3, 2023 · 15 comments
Open

Support verifying test implementation on both ARM and x86 #31

howjmay opened this issue Nov 3, 2023 · 15 comments
Labels
help wanted Extra attention is needed

Comments

@howjmay
Copy link
Owner

howjmay commented Nov 3, 2023

No description provided.

@howjmay howjmay added the help wanted Extra attention is needed label Nov 3, 2023
@OMaghiarIMG
Copy link
Contributor

OMaghiarIMG commented Feb 20, 2024

Hello @howjmay, nice work with this project!
I've built the tests(on x86 host) and got the following results:
Using GCC 14.0.1 (g7af0f1e107a):

NEON2RVV_TEST Complete!
Passed:  1481
Failed:  1
Ignored: 209
Coverage rate: 87.58%

Using Clang 19.0 (4cf458c696047d6d2991c121da7a5c165ff747ce):

NEON2RVV_TEST Complete!
Passed:  1276
Failed:  206
Ignored: 209
Coverage rate: 75.46%

Running on QEMU v8.1.1.
Also seen some additional failures when building with different optimization levels.
I've identified some of the issues, can provide fixes in a couple of days.

@howjmay
Copy link
Owner Author

howjmay commented Feb 22, 2024

Thank you! Looking forward to your PR.
And if it is possible to know how did you run the test?

@OMaghiarIMG
Copy link
Contributor

OMaghiarIMG commented Feb 28, 2024

Thank you! Looking forward to your PR. And if it is possible to know how did you run the test?

Hi @howjmay, opened PR #309
I've been running the tests like so:

CROSS_COMPILE=/path/to/toolchain/riscv64-unknown-linux-gnu- make CC=/path/to/toolchain/clang CXX=/path/to/toolchain/clang++ SIMULATOR_TYPE=qemu ENABLE_TEST_ALL=1 test

@howjmay
Copy link
Owner Author

howjmay commented Feb 28, 2024

Thank you for sharing!

@OMaghiarIMG
Copy link
Contributor

Thank you for sharing!

No problem.
So I've got a couple of questions, first regarding the number of Neon intrinsics.
According to this website there are 2185 intrinsics for v7, 2754 intrinsics for A32, and 4344 for A64. I presume A32 contains all of v7, and A64 contains all of A32?
Looking at this GCC header file for A32? there are around 2700 intrinsics, then the header file for aarch64 has around 3800 plus a couple for f16/bf16 separately, but still falling short of 4344.

Maybe you know where the complete list of 4344 are defined? And what is this project going to cover?
Do you eventually plan to include Zvfh/Zvfbfwma? Vector crypto would also help when available with vclz/vcpop/carryless multiplication.

@howjmay
Copy link
Owner Author

howjmay commented Feb 29, 2024

According to this website there are 2185 intrinsics for v7, 2754 intrinsics for A32, and 4344 for A64. I presume A32 contains all of v7, and A64 contains all of A32?

Not sure whether I miunderstood you, but I think there are some intrinsics are only A64.

Looking at this GCC header file for A32? there are around 2700 intrinsics, then the header file for aarch64 has around 3800 plus a couple for f16/bf16 separately, but still falling short of 4344.

This is my fault. In the beginning of this project I was directly copying my local arm_neon.h file on M1 machine. I notice it has a lack of some intrinsics, but I didn't have tine to add them, and I not sure whether I delete those intrinsics accidentally in the beginning.

Do you eventually plan to include Zvfh/Zvfbfwma
However, I am not sure the f16/bf16 parts are necessary. How do you think about it? It it is necessary I think I am good for working on it
And the poly part I don't think will put it in the first priority too.

@OMaghiarIMG
Copy link
Contributor

Not sure whether I miunderstood you, but I think there are some intrinsics are only A64.

No I meant the other way around, I hope there isn't anything which is not included in A64.

However, I am not sure the f16/bf16 parts are necessary. How do you think about it? It it is necessary I think I am good for working on it
And the poly part I don't think will put it in the first priority too.

I wouldn't say they are a priority, I don't think you can do poly without vector crypto anyway.
My concern was at the moment the neon2rvv header contains ~1700 intrinsics, even if we add 278 for f16, 81 for bf16, 115 for poly, we're still a long way to 4344.
Would be good to understand what exactly isn't covered.

@howjmay
Copy link
Owner Author

howjmay commented Mar 1, 2024

ok I will check what exactly I missed this weekend. Thank you!

@howjmay
Copy link
Owner Author

howjmay commented Mar 2, 2024

I roughly checked it. It is my fault that I didn't copy all the functions to neon2rvv.h. I need to implement a proper parse to do it

@howjmay
Copy link
Owner Author

howjmay commented Mar 3, 2024

The crawler is in this PR
#310

@OMaghiarIMG
Copy link
Contributor

Nice, I was thinking of doing the same. But apparently the website is displaying the wrong things when the requested value is too high:
https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&first=4000
Should have shown results 4001-4020, instead shows 4201-4220.

Made a scraper to click through the next page in the table instead which seems to have worked
CSV file with all intrinsics:
neon_intrinsics.csv
And breakdown of what is currently covered:

Neon2RVV coverage:
Total 1643 / 4344
Bit manipulation 	 39 / 74
	 Bit manipulation / Bitwise clear 	 16 / 16
	 Bit manipulation / Bitwise select 	 18 / 28
	 Bit manipulation / Count leading sign bits 	 0 / 12
	 Bit manipulation / Count leading zeros 	 1 / 12
	 Bit manipulation / Population count 	 4 / 6

Compare 	 90 / 300
	 Compare / Absolute greater than 	 2 / 9
	 Compare / Absolute greater than or equal to 	 2 / 9
	 Compare / Absolute less than 	 2 / 9
	 Compare / Absolute less than or equal to 	 2 / 9
	 Compare / Bitwise equal 	 14 / 28
	 Compare / Bitwise equal to zero 	 0 / 31
	 Compare / Bitwise not equal to zero 	 12 / 22
	 Compare / Equal to 	 0 / 3
	 Compare / Greater than 	 14 / 42
	 Compare / Greater than or equal to 	 14 / 42
	 Compare / Greater than or equal to zero 	 0 / 3
	 Compare / Greater than zero 	 0 / 3
	 Compare / Less than 	 14 / 42
	 Compare / Less than or equal to 	 14 / 42
	 Compare / Less than or equal to zero 	 0 / 3
	 Compare / Less than zero 	 0 / 3

Complex arithmetic 	 0 / 62
	 Complex arithmetic / Complex addition 	 0 / 10
	 Complex arithmetic / Complex multiply-accumulate 	 0 / 20
	 Complex arithmetic / Complex multiply-accumulate by scalar 	 0 / 32

Cryptography 	 0 / 35
	 Cryptography / AES 	 0 / 4
	 Cryptography / CRC32 	 0 / 8
	 Cryptography / SHA1 	 0 / 6
	 Cryptography / SHA256 	 0 / 4
	 Cryptography / SHA512 	 0 / 4
	 Cryptography / SM3 	 0 / 7
	 Cryptography / SM4 	 0 / 2

Data type conversion 	 153 / 635
	 Data type conversion / Conversions 	 9 / 195
	 Data type conversion / Reinterpret casts 	 144 / 440

Load 	 165 / 451
	 Load / Load 	 0 / 1
	 Load / Stride 	 165 / 450

Logical 	 90 / 124
	 Logical / AND 	 16 / 16
	 Logical / Bit clear and exclusive OR 	 0 / 8
	 Logical / Bitwise NOT 	 12 / 14
	 Logical / Exclusive OR 	 16 / 24
	 Logical / Exclusive OR and rotate 	 0 / 1
	 Logical / Negate 	 8 / 16
	 Logical / OR 	 16 / 16
	 Logical / OR-NOT 	 16 / 16
	 Logical / Rotate and exclusive OR 	 0 / 1
	 Logical / Saturating Negate 	 6 / 12

Move 	 21 / 53
	 Move / Narrow 	 6 / 12
	 Move / Saturating narrow 	 9 / 27
	 Move / Vector move 	 0 / 2
	 Move / Widen 	 6 / 12

Scalar arithmetic 	 84 / 184
	 Scalar arithmetic / Fused multiply-accumulate by scalar 	 0 / 8
	 Scalar arithmetic / Vector multiply by scalar 	 20 / 40
	 Scalar arithmetic / Vector multiply by scalar and widen 	 8 / 24
	 Scalar arithmetic / Vector multiply-accumulate by scalar 	 24 / 50
	 Scalar arithmetic / Vector multiply-accumulate by scalar and widen 	 18 / 26
	 Scalar arithmetic / Vector multiply-subtract by scalar 	 14 / 36

Shift 	 232 / 348
	 Shift / Left / Vector rounding shift left 	 16 / 18
	 Shift / Left / Vector saturating rounding shift left 	 12 / 24
	 Shift / Left / Vector saturating shift left 	 40 / 60
	 Shift / Left / Vector shift left 	 32 / 36
	 Shift / Left / Vector shift left and insert 	 16 / 24
	 Shift / Left / Vector shift left and widen 	 6 / 12
	 Shift / Right / Vector rounding shift right 	 16 / 18
	 Shift / Right / Vector rounding shift right and accumulate 	 16 / 18
	 Shift / Right / Vector rounding shift right and narrow 	 6 / 12
	 Shift / Right / Vector saturating rounding shift right and narrow 	 9 / 27
	 Shift / Right / Vector saturating shift right and narrow 	 9 / 27
	 Shift / Right / Vector shift right 	 16 / 18
	 Shift / Right / Vector shift right and accumulate 	 16 / 18
	 Shift / Right / Vector shift right and insert 	 16 / 24
	 Shift / Right / Vector shift right and narrow 	 6 / 12

Store 	 120 / 331
	 Store / Store 	 0 / 1
	 Store / Stride 	 120 / 330

Table lookup 	 16 / 72
	 Table lookup / Extended table lookup 	 6 / 33
	 Table lookup / Table lookup 	 10 / 39

Vector arithmetic 	 421 / 1081
	 Vector arithmetic / Absolute / Absolute difference 	 14 / 21
	 Vector arithmetic / Absolute / Absolute difference and accumulate 	 12 / 12
	 Vector arithmetic / Absolute / Absolute value 	 8 / 16
	 Vector arithmetic / Absolute / Saturating absolute value 	 6 / 12
	 Vector arithmetic / Absolute / Widening absolute difference 	 6 / 12
	 Vector arithmetic / Absolute / Widening absolute difference and accumulate 	 6 / 12
	 Vector arithmetic / Across vector arithmetic / Addition across vector 	 0 / 17
	 Vector arithmetic / Across vector arithmetic / Addition across vector widening 	 0 / 12
	 Vector arithmetic / Across vector arithmetic / Maximum across vector 	 0 / 15
	 Vector arithmetic / Across vector arithmetic / Maximum across vector (IEEE754) 	 0 / 3
	 Vector arithmetic / Across vector arithmetic / Minimum across vector 	 0 / 15
	 Vector arithmetic / Across vector arithmetic / Minimum across vector (IEEE754) 	 0 / 3
	 Vector arithmetic / Add / Addition 	 18 / 25
	 Vector arithmetic / Add / Narrowing addition 	 36 / 48
	 Vector arithmetic / Add / Saturating addition 	 16 / 48
	 Vector arithmetic / Add / Widening addition 	 12 / 24
	 Vector arithmetic / Division 	 0 / 7
	 Vector arithmetic / Dot product 	 0 / 28
	 Vector arithmetic / Matrix multiply 	 0 / 4
	 Vector arithmetic / Maximum 	 14 / 26
	 Vector arithmetic / Minimum 	 18 / 34
	 Vector arithmetic / Multiply / Fused multiply-accumulate 	 4 / 78
	 Vector arithmetic / Multiply / Multiplication 	 14 / 28
	 Vector arithmetic / Multiply / Multiply extended 	 0 / 29
	 Vector arithmetic / Multiply / Multiply-accumulate 	 28 / 34
	 Vector arithmetic / Multiply / Multiply-accumulate and widen 	 12 / 24
	 Vector arithmetic / Multiply / Saturating multiply 	 10 / 18
	 Vector arithmetic / Multiply / Saturating multiply by scalar and widen 	 20 / 48
	 Vector arithmetic / Multiply / Saturating multiply-accumulate 	 16 / 48
	 Vector arithmetic / Multiply / Saturating multiply-accumulate by element 	 8 / 24
	 Vector arithmetic / Multiply / Saturating multiply-accumulate by scalar and widen 	 4 / 8
	 Vector arithmetic / Multiply / Widening multiplication 	 6 / 11
	 Vector arithmetic / Pairwise arithmetic / Pairwise addition 	 7 / 23
	 Vector arithmetic / Pairwise arithmetic / Pairwise addition and widen 	 24 / 24
	 Vector arithmetic / Pairwise arithmetic / Pairwise maximum 	 7 / 23
	 Vector arithmetic / Pairwise arithmetic / Pairwise maximum (IEEE754) 	 0 / 3
	 Vector arithmetic / Pairwise arithmetic / Pairwise minimum 	 7 / 20
	 Vector arithmetic / Pairwise arithmetic / Pairwise minimum (IEEE754) 	 0 / 6
	 Vector arithmetic / Polynomial / Polynomial addition 	 0 / 7
	 Vector arithmetic / Polynomial / Polynomial multiply 	 0 / 6
	 Vector arithmetic / Reciprocal / Reciprocal estimate 	 4 / 18
	 Vector arithmetic / Reciprocal / Reciprocal exponent 	 0 / 2
	 Vector arithmetic / Reciprocal / Reciprocal square-root estimate 	 4 / 20
	 Vector arithmetic / Reciprocal / Reciprocal step 	 0 / 3
	 Vector arithmetic / Rounding 	 10 / 66
	 Vector arithmetic / Square root 	 0 / 7
	 Vector arithmetic / Subtract / Narrowing subtraction 	 24 / 36
	 Vector arithmetic / Subtract / Saturating subtract 	 16 / 24
	 Vector arithmetic / Subtract / Subtraction 	 18 / 25
	 Vector arithmetic / Subtract / Widening subtraction 	 12 / 24

Vector manipulation 	 212 / 594
	 Vector manipulation / Combine vectors 	 9 / 15
	 Vector manipulation / Copy vector lane 	 0 / 56
	 Vector manipulation / Create vector 	 9 / 15
	 Vector manipulation / Extract one element from vector 	 18 / 52
	 Vector manipulation / Extract vector from a pair of vectors 	 18 / 28
	 Vector manipulation / Reverse bits within elements 	 0 / 6
	 Vector manipulation / Reverse elements 	 26 / 38
	 Vector manipulation / Set all lanes to the same value 	 54 / 118
	 Vector manipulation / Set vector lane 	 18 / 30
	 Vector manipulation / Split vectors 	 18 / 32
	 Vector manipulation / Transpose elements 	 14 / 68
	 Vector manipulation / Unzip elements 	 14 / 68
	 Vector manipulation / Zip elements 	 14 / 68

@howjmay
Copy link
Owner Author

howjmay commented Mar 4, 2024

That is super useful!!! Thank you!! Are you going to add them to neon2rvv?

@OMaghiarIMG
Copy link
Contributor

Here are the scripts, don't know Go so I used Python.
Scrapper:

from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

with open("neon_intrinsics.csv", 'a') as file:
    file.write("ReturnType,Name,Arguments,Group\n")
    options = Options()
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

    driver.get("https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]")
    driver.maximize_window()
    driver.find_element(By.XPATH, "//button[text()='Accept and hide this message ']").click()
    wait = WebDriverWait(driver, 5)
    wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'c-table')))

    sum = 0
    for i in range(0, 218):
        data = driver.page_source
        soup = BeautifulSoup(data, 'html.parser')
        table = soup.find_all(lambda tag: tag.name == "table" and tag.has_attr("class") and ("c-table" in tag.get("class")))[0]
        all_tr = table.find('tbody').find_all('tr')
        sum += len(all_tr)
        print(i, sum)
        for tr in all_tr:
            td = tr.find_all('td')
            file.write(f"{td[2].string},{td[3].string},\"{td[4].string}\",{td[5].string}\n")

        element = driver.find_element(By.TAG_NAME, "ads-pagination").shadow_root.find_element(By.CLASS_NAME, "c-pagination-action--next")
        # element.click()
        driver.execute_script("arguments[0].click();", element)
        wait = WebDriverWait(driver, 10)
        wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'c-table')))
    driver.close()

Coverage:

import pandas as pd
import re

with open("neon2rvv.h", 'r') as file:
    data = file.read()
    result = re.findall(r"^FORCE_INLINE .+? (v.+?)\(.*\)", data, flags=re.MULTILINE)
    intrinsics = set(result)

df = pd.read_csv("neon_intrinsics.csv")

# for data_type in ["float16_t", "float16x4_t", "float16x8_t", "poly8_t", "poly8x8_t", "poly8x16_t", "poly16_t", "poly16x4_t", "poly16x8_t", "poly64_t", "poly64x1_t", "poly64x2_t", "poly128_t"]:
#     df = df[~df["ReturnType"].str.contains(data_type)]
#     df = df[~df["Arguments"].str.contains(data_type)]
# df.reset_index()
# df.to_csv("neon_filtered.csv", index=False)
# df_unimplemented = df[~df["Name"].isin(intrinsics)]
# df_unimplemented.to_csv("neon_unimplemented.csv", index=False)

primary_group_list = []
secondary_group_list = sorted(list(set(df["Group"].to_list())))

for group in secondary_group_list:
    primary_group_list.append(group.split(" / ")[0])
primary_group_list = sorted(list(set(primary_group_list)))

print("Neon2RVV coverage:")
print("Total", len(intrinsics), "/", len(set(df["Name"].to_list())))

for primary_group in primary_group_list:
    df_primary = df[df["Group"].str.contains(primary_group)]
    primary_set = set(df_primary["Name"].to_list())
    intrinsics_count = len(primary_set)
    intersection = len(intrinsics.intersection(primary_set))
    print(primary_group, "\t", intersection, "/", intrinsics_count)

    for secodary_group in [group for group in secondary_group_list if primary_group in group]:
        df_secondary = df_primary[df_primary["Group"] == secodary_group]
        secondary_set = set(df_secondary["Name"].to_list())
        intrinsics_count = len(secondary_set)
        intersection = len(intrinsics.intersection(secondary_set))
        print("\t", secodary_group, "\t", intersection, "/", intrinsics_count)
    print()

@howjmay
Copy link
Owner Author

howjmay commented Mar 15, 2024

I am busy recently. I will add the missing intrinsics in the coming week

@howjmay
Copy link
Owner Author

howjmay commented Mar 27, 2024

all added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants