Skip to content

XuehaoSun/test-azure

Repository files navigation

  • Step 2: Enable pruning functionalities

    [Experimental option ]Modify model and optimizer.

Task request description

  • script_url (str): The URL to download the model archive.
  • optimized (bool): If True, the model script has already be optimized by Neural Coder.
  • arguments (List[Union[int, str]], optional): Arguments that are needed for running the model.
  • approach (str, optional): The optimization approach supported by Neural Coder.
  • requirements (List[str], optional): The environment requirements.
  • priority(int, optional): The importance of the task, the optional value is 1, 2, and 3, 1 is the highest priority.

Design Doc for Optimization as a Service [WIP]

Security Policy

Report a Vulnerability

Please report security issues or vulnerabilities to the Intel® Security Center.

For more information on how Intel® works to resolve security issues, see Vulnerability Handling Guidelines.

Model inference: Roughly speaking , two key steps are required to get the model's result. The first one is moving the model from the memory to the cache piece by piece, in which, memory bandwidth $B$ and parameter count $P$ are the key factors, theoretically the time cost is $P*4 /B$. The second one is computation, in which, the device's computation capacity $C$ measured in FLOPS and the forward FLOPs $F$ play the key roles, theoretically the cost is $F/C$.

Text generation: The most famous application of LLMs is text generation, which predicts the next token/word based on the inputs/context. To generate a sequence of texts, we need to predict them one by one. In this scenario, $F\approx P$ if some operations like bmm are ignored and past key values have been saved. However, the $C/B$ of the modern device could be to 100X, that makes the memory bandwidth as the bottleneck in this scenario.

Tables Are Cool
col 1 is left-aligned $1600
col 2 is centered $12
col 3 is right-aligned
failed logtesttest
testtest
failed log testtest
testtest
failed log testtest
testtest
testtest
testtest
Base coverage PR coverage Diff
Lines 86.965% 86.973% 0.008%
Branches 76.279% 76.302% 0.023%

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published