-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathideas.txt
67 lines (44 loc) · 3.66 KB
/
ideas.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
- limit amount of errors output by hash key (errm, alert): further errors supressed.
- automate operator descriptions in README.md
- dictionary, quant warn at compile time if dictionary does not exist ..
- way to set not-found sort order for ,dord operator. (currently INT_MAX).
? use '*' for "no query sequence" rather than empty string? Only externally, use empty string internally?
? make patchiness work with --sam-aln-context=posnum.
- patchiness/patchiness2:
patchiness2 improves by not increasing the current window size for insertions. test a bit more.
- can operator selection for suv lookup intercept be done by introspection of the subs without incurring B::Deparse overhead.
- luspuv could be done by direct addressing rather than hash lookup.
? @@ and @ selections allowed among command line arguments, to aid purpose self-explanation.
/ split & select? ^foo,bar,zut^,^2,splitget -> unpack, ^0:1:3-5,up_get
\ elaborate pack/unpack support. ^3,^%01,nspack ^%01,spackall val^%01,unspack
o implement pick in rust, working names: { pluck pak }
? inverse of /ep/ and /om/ /ep[ad]/ /om[adi]/ (apart,distance,invert). /sgneq/ /sgnne/ /sgn1eq/ /sgn1ne/
/epd/ /omd/
- use of int() in source; it truncates towards zero. 1.9999-like cases might be unwelcome.
For /any/ etc we need abs(int()), so depending on perl needs deep dive.
? val,_setta to set extra argument for /ep/ and /om/ with ,test
^3^ep,setp
X functionality to accumulate over columns, e.g. to count different clauses. (datamash sum 1-12)
--sum='list|regex' but again, datamash
? protect against log(-1) etc
- challenge: combine ed and map. replace substring with its map. First ,get it and ,map, then ,ed it.
currently: echo -e "_a_\t3\n_b_\t4\naba\t5" | pick -AiK --cdict-foo=a:Alpha,b:Beta x::1^_'(.)'_,get^foo,map 1::1^'_\K(.)(?=_)':x,ed
where \K is available since Perl 5.10.0, 2007. Requires some perl, doesn't look great, but works.
? edmap => [ 3, sub { $::STACK[-3] =~ s/$::STACK[-2]/my $x = $::dict{$::STACK[-1]}{$1}; defined($x) ? $x : $1/e } ],
echo -e "_a_\t3\n_b_\t4\n_d_\t8" | pick -AviK --fdict-foo=dict --cdict-foo=a:Alpha,b:Beta,c:Gamma 1::1^'%5E_\K(.)(?=_$)'^foo,edmap
-> $1 not the right control? You might want to replace more than just $1, using the map of $1.
the two-step solution will allow this.
? get: option to return field if no match? -> covered by ed; get allows filtering on empty string with uie
? predefine constants (e.g. log10, pi). new syntax, e.g. ^^PI ^^LOG10 ^^E ^^PHI - no real need.
? implement in C or Rust (use pcre2). string/float/int the main point of pain.
# -T pushes column; slightly inelegant but inevitable. -N -L fixed as rowno lineno operators.
# _ in alignment string (dangling sequence) can refer to both reference and query. Let's make ` for reference.
x similar to --sam also --fastq (columns 1 2 3 4), --fasta (1 2) ? unlikely, use bioawk for that.
note --fasta-dict-NAME= and --fastq-dict-NAME= to make sequences available.
x Alternative implementation to pstore: push header with $_ . '_ref', push @F with with @F_ref.
Then -A requires @header_orig as one kind of fiddliness. Also $::N, $B_printall
Better to write a separate utility for this kind of self-join.
x numerical comparisons check field is numeric; however duplicates perl's work/warnings
x --group-all -> do not skip first row (re-populate %::pstore_cache from %::pstore_init)
problematic that repopulation only happens after compute finished. This is fine if first
row is skipped, but pretty impossible for the first row. Current status: overreach.