Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft changes for multilocale suffix array construction #38

Open
wants to merge 106 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
2f36dbb
Add timing printouts when compiled with --set TIMING=true
mppf Nov 12, 2024
3aa74d6
Use block distributed, and use integral SA elements
mppf Nov 12, 2024
28586e3
Use Block distribution in partitioning, more distributed ssort
mppf Nov 13, 2024
50dea99
Hide warning, add bucket statistics, enhance trace
mppf Nov 13, 2024
05c28b9
Replicate splitters
mppf Nov 13, 2024
5f9e0f6
Use Block arrays only when CHPL_COMM != none
mppf Nov 13, 2024
6d5da0e
Replicate splitters only with CHPL_COMM!=none
mppf Nov 13, 2024
236326e
Adjust comments
mppf Nov 13, 2024
cc2d748
Add timing inside sort buckets & avoid unneeded Block there
mppf Nov 13, 2024
69689fe
Add nextCoverIndex
mppf Nov 14, 2024
8f219a3
Allow partition to have different output type
mppf Nov 14, 2024
e2a5a23
Experimental: separating lookup phase in final sort
mppf Nov 14, 2024
9e98e9a
Don't separate lookup phase
mppf Nov 14, 2024
f942712
Replicate sample ranks
mppf Nov 14, 2024
8bf935c
Also replicate the text
mppf Nov 14, 2024
8d8f8bc
Improve replicating text and sample ranks
mppf Nov 15, 2024
bfa17ad
Prototype work for no-random-access version
mppf Nov 16, 2024
39eeb70
Add more testing, fix a bug
mppf Nov 16, 2024
a66c275
Hide some communication
mppf Nov 16, 2024
6187e04
Avoid comms for accessing difference cover
mppf Nov 16, 2024
f18eebe
Don't make a local copy
mppf Nov 16, 2024
40e3048
Tidy up partition()
mppf Dec 12, 2024
2174e2b
Add mechanism to pack input
mppf Dec 13, 2024
7c420fc
Get divide by buckets working multilocale & testing
mppf Dec 15, 2024
533ea8b
Fix a comment
mppf Dec 15, 2024
5659077
Add filter mechanism to partition & test stability
mppf Dec 15, 2024
0dfb366
Add a simpler test of the divideByBuckets iterator
mppf Dec 16, 2024
4905a2a
Add getBit/setBit
mppf Dec 16, 2024
ae71f56
Add implementation of insertion sort
mppf Dec 16, 2024
c25e903
add and test shellSort and markBoundaries
mppf Dec 16, 2024
73b70ef
Add lsbRadixSort
mppf Dec 16, 2024
aef2ad1
Implement and test some more sorting routines
mppf Dec 17, 2024
035ae91
Comment out sort code not expecting to use
mppf Dec 17, 2024
64db6f8
It compiles but there are bugs
mppf Dec 17, 2024
c6c56ef
Fix bugs
mppf Dec 18, 2024
ece4ac1
Switch packInput to separately compute bitsPerChar
mppf Dec 18, 2024
89c5c7a
TestSuffixSort compiles
mppf Dec 18, 2024
5bb2e4d
Fix a bug
mppf Dec 18, 2024
fc70654
Test sortByPrefixAndMark
mppf Dec 18, 2024
850b0bb
Fix bugs
mppf Dec 18, 2024
e933be9
Enable testOthers
mppf Dec 18, 2024
fe20f2b
fix more bugs
mppf Dec 19, 2024
4d01557
TestSuffixSort is passing!
mppf Dec 19, 2024
034e491
Add stats facility, use msbRadixSort
mppf Dec 19, 2024
240da1e
Fix computeSuffixArrayDirectly
mppf Dec 19, 2024
38d9819
Fix a bug
mppf Dec 20, 2024
846d98e
Adding a stable sorter
mppf Dec 21, 2024
e8267b0
Adjusted reReplicate is working
mppf Dec 26, 2024
09c52dc
Fix bugs
mppf Dec 26, 2024
10e8a55
Tidy up sample computation to stay within limit
mppf Dec 28, 2024
fc7e4e1
Improved stable sorter
mppf Jan 2, 2025
3308c86
Avoid creating sort state for small problems
mppf Jan 4, 2025
1ef69c0
Switch to saving bucket boundaries only on boundaries
mppf Jan 6, 2025
5b6d837
Improve bucketHasEqualityBound for radixSplitters
mppf Jan 7, 2025
05fc0ff
Remove duplicate bucket boundary search in internal sorting
mppf Jan 7, 2025
9c4236a
Add optimization to improve local access to dist arrays
mppf Jan 7, 2025
7a5245b
Adjust partitioning timing test
mppf Jan 7, 2025
b7224c2
sort timing test has configurable record size
mppf Jan 7, 2025
518b397
fix header print for --timing
mppf Jan 7, 2025
42a00bd
Add timing for psort
mppf Jan 9, 2025
5cb35c8
Update partitioners
mppf Jan 10, 2025
c510198
Stable sorter is testing again
mppf Jan 10, 2025
d43e67c
Fix up & test serialUnstablePartition
mppf Jan 12, 2025
552f7d9
partition helper methods accept arrays that would be allocated
mppf Jan 12, 2025
8e66044
Bucket boundaries contain bucket sizes
mppf Jan 12, 2025
739040a
Switch to different sort strategy
mppf Jan 13, 2025
1cc8648
Test markAllEquals
mppf Jan 14, 2025
49193f4
Closer to compiling
mppf Jan 15, 2025
58095b6
Improve partitioning
mppf Jan 17, 2025
3993728
Lots of bug fixes
mppf Jan 17, 2025
fe2dad3
Fix a bug in divideByBuckets
mppf Jan 18, 2025
6597b65
Fix bugs
mppf Jan 18, 2025
66d4fc6
Comment out debug printouts
mppf Jan 18, 2025
1826d74
Make markBoundaries no longer a method
mppf Jan 19, 2025
8358f04
Add some TODO comments
mppf Jan 19, 2025
59a5cd0
Reduce suffix sort compile time
mppf Jan 19, 2025
378610d
Fix a bug & fix multilocale compilation
mppf Jan 19, 2025
9316410
Use radix sort for initial naming process
mppf Jan 19, 2025
e7fb031
Small changes
mppf Jan 20, 2025
910a4d5
Fix a bug & time copy-to-local-and-sort
mppf Jan 21, 2025
dfb6cc4
Fix a bug and include serial bucket stats in trace
mppf Jan 21, 2025
1846ecb
Fix problem with serial splitters
mppf Jan 21, 2025
b4651fa
Improve timing, time more parts
mppf Jan 22, 2025
9618a9f
'cached' stores two words instead of one
mppf Jan 27, 2025
39c0568
Tidy up some of sortAndNameSampleOffsets
mppf Jan 27, 2025
7ff9b02
Switch to final phase using parallel partitions & local copies
mppf Jan 27, 2025
d686081
Re-enable code for different character bits
mppf Jan 27, 2025
60c4c77
Use default period of 57 based on experiments
mppf Jan 27, 2025
62d76bd
Reduce memory usage of naming portion
mppf Jan 28, 2025
a5e2228
Improve sequence reading process
mppf Jan 29, 2025
521144c
Parallelize reverseComplement
mppf Jan 29, 2025
7293a02
Add ability to truncate input for SuffixSort
mppf Jan 29, 2025
d0ce2b8
Time reading input
mppf Jan 30, 2025
7fbf725
Time gatherSplitters, make it more parallel
mppf Jan 30, 2025
c72311f
Use one task in placess within a parallel region
mppf Jan 31, 2025
a4fc74e
Small comms opts
mppf Jan 31, 2025
38756e3
Add bulkCopy helper
mppf Jan 31, 2025
d101c3d
Enable bulkCopy to work with two distributed arrays
mppf Feb 1, 2025
8b6afb5
Adjust TestPartitioning for a previous change
mppf Feb 1, 2025
164ad20
Fix a bug in bulkCopy
mppf Feb 1, 2025
6e29e2d
Use bulkCopy in Partitioning
mppf Feb 1, 2025
87e6f2f
use bulkCopy in SuffixSortImpl
mppf Feb 1, 2025
de5d105
Add helper iterators
mppf Feb 4, 2025
4acf368
Make bulkCopy parallel
mppf Feb 4, 2025
e100b88
divideIntoPages does not yield empty ranges
mppf Feb 4, 2025
d1084ab
Fix bug in bulkCopy
mppf Feb 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 40 additions & 1 deletion src/ssort_chpl/DifferenceCovers.chpl
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,29 @@ private proc makeSampleTable(param period): period*int {
return sampleTable;
}

private proc makeNextTable(param period): period*int {
const cover = coverTuple(period);
const sampleSize = cover.size;
const sampleTable = makeSampleTable(period);
var nextTable: period*int;

for i in 0..<period {
nextTable[i] = -1;
}

for i in 0..<period {
for j in 0..<period {
if sampleTable[(i+j)%period] != -1 && nextTable[i] == -1 {
nextTable[i] = j;
break;
}
}
}

return nextTable;
}


record differenceCover {
/** the period of the difference cover
aka v in Karkkainen Sanders Burkhardt */
Expand All @@ -110,6 +133,10 @@ record differenceCover {
/** sample[i mod v]=index s.t. cover[index]=i, else -1 */
/*private*/ const sampleTable: period*int;

/** nextTable[i mod v] = smallest j such that i + j is in the difference
cover */
const nextTable: period*int;

/** returns the size of the difference cover, that is, cover.size */
proc sampleSize param : int { return coverTuple(period).size; }
/** returns period - sampleSize */
Expand All @@ -121,6 +148,7 @@ record differenceCover {
this.period = period;
this.ellTable = makeEllTable(period);
this.sampleTable = makeSampleTable(period);
this.nextTable = makeNextTable(period);
}

/**
Expand Down Expand Up @@ -149,7 +177,7 @@ record differenceCover {
assert(0 <= ell && ell < period);
}

return ell;
return ell: i.type;
}

/**
Expand All @@ -174,6 +202,17 @@ record differenceCover {
}
return sampleTable[i] : i.type;
}

/**
Given offset i with 0 <= i < period, returns the smallest number j
so that i + j is in the difference cover.
*/
inline proc nextCoverIndex(i: integral) : i.type {
if EXTRA_CHECKS {
assert(0 <= i && i < period);
}
return nextTable[i] : i.type;
}
}


Expand Down
1 change: 1 addition & 0 deletions src/ssort_chpl/FindUnique.chpl
Original file line number Diff line number Diff line change
Expand Up @@ -476,6 +476,7 @@ proc main(args: [] string) throws {
const fileStarts; //: [] int;
const totalSize: int;
readAllFiles(inputFilesList,
Locales,
allData=allData,
allPaths=allPaths,
concisePaths=concisePaths,
Expand Down
Loading