Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove merge-files option #1407

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

Conversation

ellosel
Copy link
Contributor

@ellosel ellosel commented Nov 28, 2024

In efforts to simplify TensileCreateLibrary and recognizing that primary consumers exclusively use merge-files, we are removing the merge-files option along with num-merge-files. If this option is needed in the future, we can restore this behavior but in such a way that we follow the open/closed principle.

@KKyang
Copy link
Contributor

KKyang commented Nov 28, 2024

We still need mergeFiles=False feature when tuning using config yaml in TensileLite.

@ellosel
Copy link
Contributor Author

ellosel commented Nov 28, 2024

We still need mergeFiles=False feature when tuning using config yaml in TensileLite.

Can you explain how it is used in that context? What do you mean by

tuning using config yaml in TensileLite

There are tests that specify MergeFiles=False but they pass with MergeFiles=True. Is MergeFiles used during the current tuning process? We may still want to remove it from TensileCreateLibrary but expose that option through Tensile.

@bstefanuk
Copy link
Contributor

I am also interested in learning more about how MergeFiles=False is used during tuning.

Either way, this update cleans up a lot of (what appears to be) unused code in TensileCreateLibrary.

@KKyang
Copy link
Contributor

KKyang commented Nov 29, 2024

MergeFiles=False separates the code object, and makes us easier to modify and compile them individually when we modify the assembly files directly.

@ellosel
Copy link
Contributor Author

ellosel commented Nov 29, 2024

MergeFiles=False separates the code object, and makes us easier to modify and compile them individually when we modify the assembly files directly.

A few more questions

  • "separates the code object" does that mean one code object per kernel?
  • "easier to modify and compile them" can you provide some examples of what this looks like with and without merge files? What commands are you running to compile them individually?

@ellosel
Copy link
Contributor Author

ellosel commented Dec 1, 2024

MergeFiles=False separates the code object, and makes us easier to modify and compile them individually when we modify the assembly files directly.

I ran the NN problem from Tensile/Tests/common/f64.yaml and the assembly directory is the same for MergeFiles: False and MergeFiles: True. This is the tree result for part of 1_BenchmarkProblems/Cijk_Ailk_Bljk_DB_UserArgs_00/source with MergeFiles: True:

|   -- source
|       |-- ClientParameters.ini
|       |-- KernelHeader.h
|       |-- Kernels.cpp
|       |-- Kernels.h
|       |-- ReductionTemplate.h
|       |-- TensileTypes.h
|       |-- build_tmp
|       |   -- SOURCE
|       |       |-- assembly
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_TiYsBK1qo5u_rR8bHXARgCxwG4aIxobG-s_2dvbO5-I=.co
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_TiYsBK1qo5u_rR8bHXARgCxwG4aIxobG-s_2dvbO5-I=.o
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_TiYsBK1qo5u_rR8bHXARgCxwG4aIxobG-s_2dvbO5-I=.s
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_x7ORMQlR_9_oGuVRV9GQ5Ynxdwb4dbXm11s-9-V_ARE=.co
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_x7ORMQlR_9_oGuVRV9GQ5Ynxdwb4dbXm11s-9-V_ARE=.o
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_x7ORMQlR_9_oGuVRV9GQ5Ynxdwb4dbXm11s-9-V_ARE=.s
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_S7tQReIL4OeSmZJFDvasrLnCParJHbrixdWXVTjawBRk=.co
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_S7tQReIL4OeSmZJFDvasrLnCParJHbrixdWXVTjawBRk=.o
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_S7tQReIL4OeSmZJFDvasrLnCParJHbrixdWXVTjawBRk=.s
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_SS43WgNuDDfcoUUtYuPMhpvaPoamjTmjo3qLYaESotP0=.co
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_SS43WgNuDDfcoUUtYuPMhpvaPoamjTmjo3qLYaESotP0=.o
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_SS43WgNuDDfcoUUtYuPMhpvaPoamjTmjo3qLYaESotP0=.s
|       |       |   |-- asm-new.sh
|       |       |   -- insert_byte_array.py

This is the same result for MergeFiles: False:

|   -- source
|       |-- ClientParameters.ini
|       |-- KernelHeader.h
|       |-- Kernels
|       |   |-- Cijk_DD_PostGSU16_VW1.cpp
|       |   |-- Cijk_DD_PostGSU16_VW1.h
|       |   |-- Cijk_DD_PostGSU16_VW2.cpp
|       |   |-- Cijk_DD_PostGSU16_VW2.h
|       |   |-- Cijk_D_GA.cpp
|       |   `-- Cijk_D_GA.h
|       |-- ReductionTemplate.h
|       |-- Solutions
|       |-- TensileTypes.h
|       |-- build_tmp
|       |   -- SOURCE
|       |       |-- assembly
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_TiYsBK1qo5u_rR8bHXARgCxwG4aIxobG-s_2dvbO5-I=.co
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_TiYsBK1qo5u_rR8bHXARgCxwG4aIxobG-s_2dvbO5-I=.o
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_TiYsBK1qo5u_rR8bHXARgCxwG4aIxobG-s_2dvbO5-I=.s
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_x7ORMQlR_9_oGuVRV9GQ5Ynxdwb4dbXm11s-9-V_ARE=.co
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_x7ORMQlR_9_oGuVRV9GQ5Ynxdwb4dbXm11s-9-V_ARE=.o
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_x7ORMQlR_9_oGuVRV9GQ5Ynxdwb4dbXm11s-9-V_ARE=.s
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_S7tQReIL4OeSmZJFDvasrLnCParJHbrixdWXVTjawBRk=.co
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_S7tQReIL4OeSmZJFDvasrLnCParJHbrixdWXVTjawBRk=.o
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_S7tQReIL4OeSmZJFDvasrLnCParJHbrixdWXVTjawBRk=.s
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_SS43WgNuDDfcoUUtYuPMhpvaPoamjTmjo3qLYaESotP0=.co
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_SS43WgNuDDfcoUUtYuPMhpvaPoamjTmjo3qLYaESotP0=.o
|       |       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_SS43WgNuDDfcoUUtYuPMhpvaPoamjTmjo3qLYaESotP0=.s
|       |       |   |-- asm-new.sh
|       |       |   -- insert_byte_array.py

Note that the primary difference is that with MergeFiles: True we have Kernels.cpp and Kernels.h because we merge all of the kernel helpers into one translation unit; whereas for MergeFiles: False, each of the kernel helpers has a dedicated translation unit.

MergeFiles also impacts the .hsaco and .co output. When MergeFiles: True we have the following library directory:

|       |-- library
|       |   |-- Kernels.so-000-gfx1010.hsaco
|       |   |-- Kernels.so-000-gfx1011.hsaco
|       |   |-- Kernels.so-000-gfx1012.hsaco
|       |   |-- Kernels.so-000-gfx1030.hsaco
|       |   |-- Kernels.so-000-gfx1100.hsaco
|       |   |-- Kernels.so-000-gfx1101.hsaco
|       |   |-- Kernels.so-000-gfx1102.hsaco
|       |   |-- Kernels.so-000-gfx1200.hsaco
|       |   |-- Kernels.so-000-gfx1201.hsaco
|       |   |-- Kernels.so-000-gfx803.hsaco
|       |   |-- Kernels.so-000-gfx900.hsaco
|       |   |-- Kernels.so-000-gfx906-xnack-.hsaco
|       |   |-- Kernels.so-000-gfx908-xnack-.hsaco
|       |   |-- Kernels.so-000-gfx90a-xnack+.hsaco
|       |   |-- Kernels.so-000-gfx90a-xnack-.hsaco
|       |   |-- Kernels.so-000-gfx940-xnack-.hsaco
|       |   |-- Kernels.so-000-gfx941-xnack-.hsaco
|       |   |-- Kernels.so-000-gfx942-xnack-.hsaco
|       |   |-- TensileLibrary.yaml
|       |   `-- TensileLibrary_gfx90a.co

which we can contrast with MergeFiles: False and we have:

       |-- library
|       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_TiYsBK1qo5u_rR8bHXARgCxwG4aIxobG-s_2dvbO5-I=_gfx90a.co
|       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x16_MI16x16x1_x7ORMQlR_9_oGuVRV9GQ5Ynxdwb4dbXm11s-9-V_ARE=_gfx90a.co
|       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_S7tQReIL4OeSmZJFDvasrLnCParJHbrixdWXVTjawBRk=_gfx90a.co
|       |   |-- Cijk_Ailk_Bljk_DB_UserArgs_MT16x16x8_MI16x16x1_SS43WgNuDDfcoUUtYuPMhpvaPoamjTmjo3qLYaESotP0=_gfx90a.co
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx1010.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx1011.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx1012.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx1030.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx1100.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx1101.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx1102.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx1200.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx1201.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx803.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx900.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx906-xnack-.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx908-xnack-.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx90a-xnack+.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx90a-xnack-.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx940-xnack-.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx941-xnack-.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW1.so-000-gfx942-xnack-.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx1010.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx1011.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx1012.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx1030.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx1100.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx1101.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx1102.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx1200.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx1201.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx803.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx900.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx906-xnack-.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx908-xnack-.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx90a-xnack+.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx90a-xnack-.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx940-xnack-.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx941-xnack-.hsaco
|       |   |-- Cijk_DD_PostGSU16_VW2.so-000-gfx942-xnack-.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx1010.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx1011.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx1012.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx1030.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx1100.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx1101.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx1102.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx1200.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx1201.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx803.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx900.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx906-xnack-.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx908-xnack-.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx90a-xnack+.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx90a-xnack-.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx940-xnack-.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx941-xnack-.hsaco
|       |   |-- Cijk_D_GA.so-000-gfx942-xnack-.hsaco
|       |   `-- TensileLibrary.yaml

Note that we now have many more hsaco files and four co files rather than one.

I don't understand how the difference in these outputs makes it easier to modify an assembly file and compile. Are you using asm-new.sh? Is it easier because you have fewer files that you have to pass to asm-new.sh? It seems to me that you want to be able to incrementally modify and build assembly and maybe merge files isn't the only way to solve that problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants