Skip to content

oneAPI backend update: kernel and layer optimizations #1246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
70323c9
Init: add examples
haoyanwa Feb 20, 2025
4162599
Input and output DMA.
haoyanwa Feb 21, 2025
34f0d82
Added streaming beat control signal.
haoyanwa Feb 21, 2025
951a1f6
Restartable kernel for io_parallel.
haoyanwa Feb 21, 2025
8445de7
Updated oneAPI backend testbench.
haoyanwa Feb 24, 2025
0d21e99
Updated oneAPI template: io_stream kernel template.
haoyanwa Feb 24, 2025
257385a
Remove temp files.
haoyanwa Feb 24, 2025
0b8ef13
Refactoring oneAPI backend myproject_test.
haoyanwa Feb 24, 2025
cf98216
Merge branch 'fastmachinelearning:main' into oneapi_backend/experiment
haoyanwa Feb 24, 2025
70054aa
Cosmetic change.
haoyanwa Feb 24, 2025
c307715
oneAPI backend simulation support.
haoyanwa Feb 25, 2025
454d556
Merge branch 'main' into oneapi_backend/experiment
jmitrevs Mar 6, 2025
7e028e6
pre-commit fixes
jmitrevs Mar 26, 2025
97c187d
Merge branch 'main' into oneapi_backend/experiment
jmitrevs Mar 26, 2025
00f82a3
oneAPI BSP support.
haoyanwa Apr 1, 2025
496846d
User API and documentation.
haoyanwa Apr 1, 2025
84ad787
Merge pull request #1254 from haoyanwa/oneapi_backend/experiment
jmitrevs Apr 2, 2025
120c2e4
pre-commit fixes
jmitrevs Apr 2, 2025
e2cec76
Merge branch 'main' into oneapi_backend/experiment
jmitrevs Apr 16, 2025
d869a5c
update convert_data and convert_data_back to use packets
jmitrevs Apr 17, 2025
7e2e747
consolidate convert_data and DMA_convert_data in nnet_data_movement.h
jmitrevs May 1, 2025
0b3dbeb
update all the activations
jmitrevs May 2, 2025
36881e0
migrate batchnorm to restartatabe
jmitrevs May 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions docs/api/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,20 @@ Finally, one then uses the configuration to create an hls model:
backend='Vitis'
)

To target an oneAPI Board Support Package (BSP) enabled FPGA for offload acceleration, you can specify the ``part`` argument to be the path to your BSP and the BSP variant. Then, set ``use_oneapi_bsp=True``.

.. code-block:: python

hls_model = hls4ml.converters.convert_from_keras_model(
model,
hls_config=config,
output_dir="my_project_dir",
io_type="io_parallel",
backend="oneAPI",
part="/path/to/my/bsp:bsp_variant",
use_oneapi_bsp=True
)

See :py:class:`~hls4ml.converters.convert_from_keras_model` for more information on the various options. Similar functions exist for ONNX and PyTorch.

----
Expand Down Expand Up @@ -132,6 +146,9 @@ It looks like this:
ClockPeriod: 5
IOType: io_parallel # options: io_parallel/io_stream

# oneAPI Offload Acceleration flag.
UseOneAPIBSP: True

HLSConfig:
Model:
Precision: fixed<16,6>
Expand Down Expand Up @@ -168,6 +185,10 @@ For Vivado backend the options are:
* **PipelineInterval**\ : Optionally override the desired initiation interval of the design. Only valid in combination with "pipeline" style. If unspecified, it is left to the compiler to decide, ideally matching the largest reuse factor of the network.
* **Precision**\ : this defines the precision of your inputs, outputs, weights and biases. It is denoted by ``fixed<X,Y>``\ , where ``Y`` is the number of bits representing the signed number above the binary point (i.e. the integer part), and ``X`` is the total number of bits. Additionally, integers in the type (\ ``int<N>``\ , where ``N`` is a bit-size from 1 to 1024) can also be used. The format follows ``ap_fixed`` and ``ap_int`` conventions. You have a chance to further configure this more finely with per-layer configuration described below. In the per-layer configuration (but not globally) one can also use ``'auto'`` precision.

For oneaAPI the options are similar but also include:

* **UseOneAPIBSP**\ : path to the oneAPI Board Support Package (and the BSP variant) to enable offload acceleration with an Altera FPGA. This is only needed if you are using oneAPI in the accelerator style.

2.2 Per-Layer Configuration
---------------------------

Expand Down
10 changes: 6 additions & 4 deletions hls4ml/backends/oneapi/oneapi_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,11 +130,13 @@ def get_default_flow(self):
def get_writer_flow(self):
return self._writer_flow

def create_initial_config(self, part='Arria10', clock_period=5, io_type='io_parallel', write_tar=False, **_):
def create_initial_config(
self, part='Agilex7', clock_period=5, io_type='io_parallel', write_tar=False, use_oneapi_bsp=False, **_
):
"""Create initial configuration of the oneAPI backend.

Args:
part (str, optional): The FPGA part to be used. Defaults to 'Arria10'.
part (str, optional): The FPGA part to be used. Defaults to 'Agilex7'.
clock_period (int, optional): The clock period. Defaults to 5.
io_type (str, optional): Type of implementation used. One of
'io_parallel' or 'io_stream'. Defaults to 'io_parallel'.
Expand All @@ -146,15 +148,15 @@ def create_initial_config(self, part='Arria10', clock_period=5, io_type='io_para

config = {}

config['Part'] = part if part is not None else 'Arria10'
config['Part'] = part if part is not None else 'Agilex7'
config['ClockPeriod'] = clock_period
config['IOType'] = io_type
config['HLSConfig'] = {}
config['WriterConfig'] = {
# TODO: add namespace
'WriteTar': write_tar,
}

config['UseOneAPIBSP'] = use_oneapi_bsp
return config

def compile(self, model):
Expand Down
28 changes: 19 additions & 9 deletions hls4ml/backends/oneapi/oneapi_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,11 +170,18 @@ def definition_cpp(self, name_suffix='', as_reference=False):
else:
return f'{self.type.name} {self.name}{name_suffix}'

def declare_cpp(self, pipe_min_size=0, indent=''):
lines = indent + f'class {self.pipe_id};\n'
lines += indent + (
f'using {self.pipe_name} = sycl::ext::intel::experimental::pipe<{self.pipe_id}, '
+ f'{self.type.name}, {pipe_min_size}, PipeProps>;\n'
# Updated pipe min size to be 32 for simulation.
def declare_cpp(self, pipe_min_size=32, indent=''):
# Updated to use streaming beat for restartable streaming kernel.
# Streaming beat is a wrapper type of the actual type with sideband control signals.
# Syntax: using BeatT = sycl::ext::intel::experimental::StreamingBeat<DataT, eop, empty>;
streaming_beat_t = f"{self.pipe_name}BeatT"
lines = (
f"{indent}class {self.pipe_id};\n"
f"{indent}using {streaming_beat_t} = "
f"sycl::ext::intel::experimental::StreamingBeat<{self.type.name}, true, true>;\n"
f"{indent}using {self.pipe_name} = sycl::ext::intel::experimental::pipe<"
f"{self.pipe_id}, {streaming_beat_t}, {pipe_min_size}, HostPipePropertiesT>;\n"
)
return lines

Expand All @@ -193,10 +200,13 @@ def definition_cpp(self, name_suffix='', as_reference=True):
return f'{self.name}{name_suffix}'

def declare_cpp(self, indent=''):
lines = indent + f'class {self.pipe_id};\n'
lines += indent + (
f'using {self.pipe_name} = sycl::ext::intel::experimental::pipe<{self.pipe_id}, '
+ f'{self.type.name}, {self.pragma[-1]}>;\n'
streaming_beat_t = f"{self.pipe_name}BeatT"
lines = (
f"{indent}class {self.pipe_id};\n"
f"{indent}using {streaming_beat_t} = "
f"sycl::ext::intel::experimental::StreamingBeat<{self.type.name}, true, true>;\n"
f"{indent}using {self.pipe_name} = "
f"sycl::ext::intel::experimental::pipe<{self.pipe_id}, {streaming_beat_t}, {self.pragma[-1]}>;\n"
)
return lines

Expand Down
11 changes: 8 additions & 3 deletions hls4ml/templates/oneapi/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,20 @@ set(LIBRARY_NAME myproject-${LIB_STAMP})
# You can also specify a device family (E.g. "Arria10" or "Stratix10") or a
# specific part number (E.g. "10AS066N3F40E2SG") to generate a standalone IP.
if(NOT DEFINED FPGA_DEVICE)
set(FPGA_DEVICE "Arria10")
set(FPGA_DEVICE "Agilex7")
set(BSP_FLAG "")
endif()

# Set the target to a BSP if we target an actual accelerator board.
# hls-fpga-machine-learning insert oneapi_bsp_cmake_flag

# Use cmake -DUSER_FPGA_FLAGS=<flags> to set extra flags for FPGA backend
# compilation.
set(USER_FPGA_FLAGS -Wno-unused-label ${USER_FPGA_FLAGS})
# -Xsoptimize=latency Turns off the hyper-optimized handshake
set(USER_FPGA_FLAGS -Wno-unused-label;${USER_FPGA_FLAGS};-Xsoptimize=latency)

# Use cmake -DUSER_FLAGS=<flags> to set extra flags for general compilation.
set(USER_FLAGS -Wno-unused-label -fconstexpr-steps=134217728 ${USER_FLAGS})
set(USER_FLAGS -Wno-unused-label -fconstexpr-steps=134217728 ${USER_FLAGS} ${BSP_FLAG})

# Use cmake -DUSER_INCLUDE_PATHS=<paths> to set extra paths for general
# compilation.
Expand Down
13 changes: 10 additions & 3 deletions hls4ml/templates/oneapi/firmware/myproject.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,12 @@

// This file defines the interface to the kernel

// currently this is fixed
using PipeProps = decltype(sycl::ext::oneapi::experimental::properties(sycl::ext::intel::experimental::ready_latency<0>));
// Pipe properties for host pipes. Host pipes connect to the data source DMA and sink DMA.
// They are connected to the first and the last layer to stream data into and out from the kernel.
using HostPipePropertiesT = decltype(sycl::ext::oneapi::experimental::properties(
sycl::ext::intel::experimental::ready_latency<0>, sycl::ext::intel::experimental::bits_per_symbol<16>,
sycl::ext::intel::experimental::uses_valid<true>, sycl::ext::intel::experimental::first_symbol_in_high_order_bits<true>,
sycl::ext::intel::experimental::protocol_avalon_streaming_uses_ready));

// Need to declare the input and output pipes

Expand All @@ -16,12 +20,15 @@ using PipeProps = decltype(sycl::ext::oneapi::experimental::properties(sycl::ext
class MyProjectID;

struct MyProject {

#ifndef IS_BSP
// kernel property method to config invocation interface
auto get(sycl::ext::oneapi::experimental::properties_tag) {
return sycl::ext::oneapi::experimental::properties{sycl::ext::intel::experimental::streaming_interface<>,
sycl::ext::intel::experimental::pipelined<>};
}
#else
// kernel properties and pipelining is not supported in BSP (accelerator style).
#endif

SYCL_EXTERNAL void operator()() const;
};
Expand Down
Loading
Loading