Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rfa parallel #6595

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from
Draft

Rfa parallel #6595

wants to merge 10 commits into from

Conversation

SAtacker
Copy link
Contributor

@SAtacker SAtacker commented Dec 25, 2024

Proposed Changes

Any background context you want to provide?

RFA + RFA ~ 39 Float OPs
RFA + Decimal ~33 Float OPs
RFA --> Float ~ 12-16 Float OPs
  • For reduction we follow the following:
    1. Sequential reduce of certain size vector partition
    2. Store the result of Step 1 into an RFA type without converting it to Float aka intermediate RFA
    3. Add intermediate RFAs and convert to the float at return

Checklist

Not all points below apply to all pull requests.

  • I have added a new algorithm and have added tests to go along with it.
  • I have added a test using random numbers; I have made sure it uses a seed, and that random numbers generated are valid inputs for the tests.

Signed-off-by: Shreyas Atre <[email protected]>
- Add Kate's CUDA impl for RFA

TODO:
- Use original RFA instead of Kate's
- Make a parallel version out of it
- Make a partition vector suitable version

Signed-off-by: Shreyas Atre <[email protected]>
- Also perform renorm and update only when necessary

Signed-off-by: Shreyas Atre <[email protected]>
- Currently supports sequential version of reduction
- Tests determinism by shuffling order of floating point numbers

Signed-off-by: Shreyas Atre <[email protected]>
- missing includes
- prevent max/min being expanded as macros
- minor spell check correction
- remove pragma once in cpp file
- resolve implicit type conversions in rfa type to single and double and other places
- add dual license
- remove unnecessary command for macos ci
- use HPX_UNROLL instead of vanilla pragma
- clang-17 cannot unroll so use checks
- add typename qualifier for iterator type

Signed-off-by: Shreyas Atre <[email protected]>
@StellarBot
Copy link

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each------

Info

PropertyBeforeAfter
HPX Datetime2024-03-18T14:00:30+00:002024-12-25T15:11:05+00:00
HPX Commitd27ac2ec7b41d9
Clusternamerostamrostam
Datetime2024-03-18T09:18:04.949759-05:002024-12-31T09:19:36.219213-06:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Envfile

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch--

Info

PropertyBeforeAfter
HPX Datetime2024-03-18T14:00:30+00:002024-12-25T15:11:05+00:00
HPX Commitd27ac2ec7b41d9
Clusternamerostamrostam
Datetime2024-03-18T09:19:53.062988-05:002024-12-31T09:21:27.027599-06:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Envfile

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add------
Stream Benchmark - Scale-------
Stream Benchmark - Triad------
Stream Benchmark - Copy-------

Info

PropertyBeforeAfter
HPX Datetime2024-03-18T14:00:30+00:002024-12-25T15:11:05+00:00
HPX Commitd27ac2ec7b41d9
Clusternamerostamrostam
Datetime2024-03-18T09:20:13.002391-05:002024-12-31T09:21:48.847830-06:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Envfile

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

Copy link

codacy-production bot commented Dec 31, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
Report missing for 6ff0c9d1 75.51%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (6ff0c9d) Report Missing Report Missing Report Missing
Head commit (c205a13) 191888 163223 85.06%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#6595) 535 404 75.51%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

Footnotes

  1. Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.

@StellarBot
Copy link

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each------

Info

PropertyBeforeAfter
HPX Datetime2024-03-18T14:00:30+00:002025-01-01T16:07:10+00:00
HPX Commitd27ac2ef892089
Datetime2024-03-18T09:18:04.949759-05:002025-01-01T10:14:57.272910-06:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Clusternamerostamrostam
Envfile

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch--

Info

PropertyBeforeAfter
HPX Datetime2024-03-18T14:00:30+00:002025-01-01T16:07:10+00:00
HPX Commitd27ac2ef892089
Datetime2024-03-18T09:19:53.062988-05:002025-01-01T10:16:47.427756-06:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Clusternamerostamrostam
Envfile

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add------
Stream Benchmark - Scale-----
Stream Benchmark - Triad------
Stream Benchmark - Copy------

Info

PropertyBeforeAfter
HPX Datetime2024-03-18T14:00:30+00:002025-01-01T16:07:10+00:00
HPX Commitd27ac2ef892089
Datetime2024-03-18T09:20:13.002391-05:002025-01-01T10:17:07.868060-06:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/18.1.8/bin/clang++ 18.1.8
Clusternamerostamrostam
Envfile

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants