Skip to content

Commit 75d41b5

Browse files
authored
Merge pull request #2 from decuser/gh-pages
bringing master to gh-pages
2 parents 6c19afd + a817402 commit 75d41b5

File tree

137 files changed

+61582
-38
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

137 files changed

+61582
-38
lines changed

20221210.0854

Whitespace-only changes.

Gemfile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,9 @@ gem "wdm", "~> 0.1.1", :platforms => [:mingw, :x64_mingw, :mswin]
3434
# do not have a Java counterpart.
3535
gem "http_parser.rb", "~> 0.6.0", :platforms => [:jruby]
3636

37-
gem "webrick", "~> 1.7"
37+
#gem "webrick", "~> 1.7"
38+
#gem "webrick", ">=2.2.8"
3839

3940
gem "jekyll", "~> 3.9"
41+
42+
gem "webrick", "~> 1.8"

Gemfile.lock

Lines changed: 65 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,46 @@
11
GEM
22
remote: https://rubygems.org/
33
specs:
4-
activesupport (6.0.6)
4+
activesupport (6.0.6.1)
55
concurrent-ruby (~> 1.0, >= 1.0.2)
66
i18n (>= 0.7, < 2)
77
minitest (~> 5.1)
88
tzinfo (~> 1.1)
99
zeitwerk (~> 2.2, >= 2.2.2)
10-
addressable (2.8.1)
11-
public_suffix (>= 2.0.2, < 6.0)
10+
addressable (2.8.7)
11+
public_suffix (>= 2.0.2, < 7.0)
1212
coffee-script (2.4.1)
1313
coffee-script-source
1414
execjs
1515
coffee-script-source (1.11.1)
1616
colorator (1.1.0)
17-
commonmarker (0.23.6)
18-
concurrent-ruby (1.1.10)
19-
dnsruby (1.61.9)
20-
simpleidn (~> 0.1)
17+
commonmarker (0.23.10)
18+
concurrent-ruby (1.3.4)
19+
dnsruby (1.72.2)
20+
simpleidn (~> 0.2.1)
2121
em-websocket (0.5.3)
2222
eventmachine (>= 0.12.9)
2323
http_parser.rb (~> 0)
2424
ethon (0.16.0)
2525
ffi (>= 1.15.0)
2626
eventmachine (1.2.7)
27-
execjs (2.8.1)
28-
faraday (2.7.1)
29-
faraday-net_http (>= 2.0, < 3.1)
30-
ruby2_keywords (>= 0.0.4)
31-
faraday-net_http (3.0.2)
32-
ffi (1.15.5)
27+
execjs (2.9.1)
28+
faraday (2.12.0)
29+
faraday-net_http (>= 2.0, < 3.4)
30+
json
31+
logger
32+
faraday-net_http (3.3.0)
33+
net-http
34+
ffi (1.17.0-aarch64-linux-gnu)
35+
ffi (1.17.0-aarch64-linux-musl)
36+
ffi (1.17.0-arm-linux-gnu)
37+
ffi (1.17.0-arm-linux-musl)
38+
ffi (1.17.0-arm64-darwin)
39+
ffi (1.17.0-x86-linux-gnu)
40+
ffi (1.17.0-x86-linux-musl)
41+
ffi (1.17.0-x86_64-darwin)
42+
ffi (1.17.0-x86_64-linux-gnu)
43+
ffi (1.17.0-x86_64-linux-musl)
3344
forwardable-extended (2.6.0)
3445
gemoji (3.0.1)
3546
github-pages (227)
@@ -197,35 +208,48 @@ GEM
197208
gemoji (~> 3.0)
198209
html-pipeline (~> 2.2)
199210
jekyll (>= 3.0, < 5.0)
211+
json (2.7.2)
200212
kramdown (2.3.2)
201213
rexml
202214
kramdown-parser-gfm (1.1.0)
203215
kramdown (~> 2.0)
204216
liquid (4.0.3)
205-
listen (3.7.1)
217+
listen (3.9.0)
206218
rb-fsevent (~> 0.10, >= 0.10.3)
207219
rb-inotify (~> 0.9, >= 0.9.10)
220+
logger (1.6.1)
208221
mercenary (0.3.6)
209222
minima (2.5.1)
210223
jekyll (>= 3.5, < 5.0)
211224
jekyll-feed (~> 0.9)
212225
jekyll-seo-tag (~> 2.1)
213-
minitest (5.16.3)
214-
nokogiri (1.13.9-x86_64-darwin)
226+
minitest (5.25.1)
227+
net-http (0.4.1)
228+
uri
229+
nokogiri (1.16.7-aarch64-linux)
230+
racc (~> 1.4)
231+
nokogiri (1.16.7-arm-linux)
232+
racc (~> 1.4)
233+
nokogiri (1.16.7-arm64-darwin)
234+
racc (~> 1.4)
235+
nokogiri (1.16.7-x86-linux)
236+
racc (~> 1.4)
237+
nokogiri (1.16.7-x86_64-darwin)
238+
racc (~> 1.4)
239+
nokogiri (1.16.7-x86_64-linux)
215240
racc (~> 1.4)
216241
octokit (4.25.1)
217242
faraday (>= 1, < 3)
218243
sawyer (~> 0.9)
219244
pathutil (0.16.2)
220245
forwardable-extended (~> 2.6)
221246
public_suffix (4.0.7)
222-
racc (1.6.0)
247+
racc (1.8.1)
223248
rb-fsevent (0.11.2)
224-
rb-inotify (0.10.1)
249+
rb-inotify (0.11.1)
225250
ffi (~> 1.0)
226-
rexml (3.2.5)
251+
rexml (3.3.7)
227252
rouge (3.26.0)
228-
ruby2_keywords (0.0.5)
229253
rubyzip (2.3.2)
230254
safe_yaml (1.0.5)
231255
sass (3.7.4)
@@ -236,24 +260,33 @@ GEM
236260
sawyer (0.9.2)
237261
addressable (>= 2.3.5)
238262
faraday (>= 0.17.3, < 3)
239-
simpleidn (0.2.1)
240-
unf (~> 0.1.4)
263+
simpleidn (0.2.3)
241264
terminal-table (1.8.0)
242265
unicode-display_width (~> 1.1, >= 1.1.1)
243266
thread_safe (0.3.6)
244-
typhoeus (1.4.0)
267+
typhoeus (1.4.1)
245268
ethon (>= 0.9.0)
246-
tzinfo (1.2.10)
269+
tzinfo (1.2.11)
247270
thread_safe (~> 0.1)
248-
unf (0.1.4)
249-
unf_ext
250-
unf_ext (0.0.8.2)
251271
unicode-display_width (1.8.0)
252-
webrick (1.7.0)
253-
zeitwerk (2.6.6)
272+
uri (0.13.1)
273+
webrick (1.8.1)
274+
zeitwerk (2.6.18)
254275

255276
PLATFORMS
256-
x86_64-darwin-18
277+
aarch64-linux
278+
aarch64-linux-gnu
279+
aarch64-linux-musl
280+
arm-linux
281+
arm-linux-gnu
282+
arm-linux-musl
283+
arm64-darwin
284+
x86-linux
285+
x86-linux-gnu
286+
x86-linux-musl
287+
x86_64-darwin
288+
x86_64-linux-gnu
289+
x86_64-linux-musl
257290

258291
DEPENDENCIES
259292
github-pages (~> 227)
@@ -265,7 +298,7 @@ DEPENDENCIES
265298
tzinfo (>= 1, < 3)
266299
tzinfo-data
267300
wdm (~> 0.1.1)
268-
webrick (~> 1.7)
301+
webrick (~> 1.8)
269302

270303
BUNDLED WITH
271-
2.3.26
304+
2.5.16

_layouts/post.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
---
22
layout: default
33
---
4+
<script src="{{ "/assets/mermaid-9.3.0/mermaid.js" | relative_url }}"></script>
45
<article class="post h-entry" itemscope itemtype="http://schema.org/BlogPosting">
56

67
<header class="post-header">
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
---
2+
layout: post
3+
title: "dircmp.py - a plan to improve and extend"
4+
categories: unix python
5+
---
6+
7+
## dircmp.py
8+
9+
### A plan to improve and extend
10+
11+
This note pertains to [dircmp.py](https://github.com/decuser/decuser_python_playground/blob/master/dircmp/dircmp.py) a program that I wrote to give me information about two directories for the purpose of deciding what to keep and what to remove and to learn about python. The note is a draft note and as such it's not very refined and it may lack in many ways, but I thought it might be interesting to put it out there and let anyone see it. Email me if you have comments or suggestions.
12+
13+
<!--more-->
14+
15+
#### Caveats
16+
17+
The comparisons here are between simple folders and files (including hidden files). By simple, I don't mean small - I use the program to compare very large directories. However, the tool was developed for comparing directories containing user files and not as a system maintenance tool. I haven't done much investigating of hard / soft symlinks or exotic setups.
18+
19+
#### Random Observation
20+
21+
When originally creating the program, I had the thought that git's organization would provide an ideal filesystem for keeping changes and being able to see those changes easily. Just keep file contents in blobs, their digests, names, and locations elsewhere. Files with identical contents would share the blob and digests, but the names and locations could differ. When saving a new file, do a digest, check the registry, link to the blob, etc. Sure, it'd be slow, but integral. A little above my paygrade, but percolating in the back of my mind.
22+
23+
#### Features under consideration
24+
25+
* Synchronization Planning
26+
* Synchronization Execution with Undo and potentially segregation
27+
28+
#### Other potential enhancements
29+
30+
* Historical comparisons (save results and use for future compares)
31+
32+
#### Future research
33+
34+
* symlinks and exotic setups :)
35+
36+
#### Background
37+
Currently, the program does a great job of identifying differences and matches between two directory trees. However, it does not do a good job of providing the user with plans to synchronize those trees or the ability to synchronize directories.
38+
39+
When I started this project, my goal was to identify, not address, so the program met its objectives. Now, I want to have the program generate plans to synchronize and perform the synchroniztion.
40+
41+
#### Current functionality
42+
43+
Looking at the program as a whole and as a black box, it takes a directory or pair of directories and compiles a set of results...
44+
45+
##### Results
46+
47+
Pair Results Report
48+
49+
* duplicate files found in either directory
50+
* exact matches found in both directories
51+
* files that only exist in one of the directories
52+
* files that have the same names but different digests
53+
* files that have different names but same digest
54+
55+
Single Results Report
56+
57+
* duplicate files found in directory
58+
59+
##### Method of Operation
60+
61+
The comparisons are done using a calculated digest for every file that exists within the scope of the comparison, either a single level, or recursive.
62+
63+
##### Options
64+
65+
The program supports the following options for controlling its behavior:
66+
67+
* -h, --help - show a help message and exit
68+
* -b, --brief - Brief mode - suppress file lists
69+
* -a, --all - Include hidden files in comparisons
70+
* -r, --recurse - Recurse subdirectories
71+
* -f, --fast - Perform shallow digests (super fast, but necessarily less accurate)
72+
* -d, --debug - Debug mode
73+
* -c, --compact - Compact mode
74+
* -s, --single - Single directory mode
75+
* -v, --version - show program's version number and exit
76+
77+
#### Discussion
78+
79+
All in all, it works great. I've used it a lot. It's fast and accurate. However, in using it, it has become apparent that what I really want it to do is tell me how to get two directories to synchronize.
80+
81+
I have used rsync (prolly best of breed) and tried many, many other programs to quickly and easily sync directories, and I haven't liked any of them, in the end. Usually, I wind up losing files that I don't want to lose, sometimes through unintentional misuse of the tool, especially with rsync's arcane syntax, but usually through an inability to figure out what the tool actually does (not how it does what it does, but what the results are) and thinking two directories are synced after running the tool only to find out later that they weren't ... not exactly.
82+
83+
At least with dirsync.py as it exists today, I know exactly what it is doing. The results are detailed and precise. It lives to tell me, in detail, what differences exist between two directories. With this information in hand, it is possible to determine a finite plan to synchronize them.
84+
85+
Interestingly, synchronization can be accomplished with several distinct outcomes, as follows.
86+
87+
#### Synchronize how?
88+
89+
In the following discussions, I will use left and right to differentiate the two directories being discussed and will only discuss synchronizing two directories.
90+
91+
##### One Way Synchronizations
92+
93+
* Left to Right - *right directory is made to exactly match the left directory*
94+
* Right to Left - *left directory is made to exactly match the right directory*
95+
96+
Conflicts arising in one way synchronizations are resolved by order definition - files and directories from one side are chosen whenever there is a mismatch.
97+
98+
##### Two Way Sychronizations
99+
100+
Whenever two way synchronizations are performed, there is a likelihood of conflicts and it is important to consider strategies to resolve those conflicts. This is where synchronization gets tricky.
101+
102+
Here are the possible strategies
103+
104+
* Preserve None - *remove conflicts from left and right (neither win)*
105+
* Preserve Left - *merge left into right (left wins)*
106+
* Preserve Right - *merge right into left (right wins)*
107+
* Preserve Both (versioning) *merge both ways (both win) and create versions when there is a conflict*
108+
109+
110+
##### Thoughts before diving into the details
111+
112+
I think that inline with providing undo, it may be useful to preserve conflicts for the user... as in, when there's a conflict, move the loser (one side, or both) into a separate area (preserving prior location information) for the user to decide what to do with. Given a robust enough functionality, this may be moot, but I remember doing this sorta thing before and it being useful.
113+
114+
#### Rough sketch strategy
115+
116+
* Analyze directories to determine what needs to change
117+
* Report status
118+
* Stage changes
119+
* Make changes (as economically and safely as possible)
120+
* Stage a modification
121+
* Save recovery information
122+
* Make the modification
123+
* Report changes
124+
125+
#### Thoughts related to the economics
126+
127+
The expenses in this program are the costs of computing digests, comparing those digests, and copying files. Deletions are cheap, as are moves. So, the program should only compute digests as requested. When fast mode is active, the algorithm only reads a portion of the file, rather than the entirety, so this must be taken into account. Comparisons of the digest are mandatory. Copying files is expensive and should be minimized.
128+
129+
Interestingly, when I started looking at this part of the code, I figured out that my fast digest approach probably needs to be improved. My premise, when I wrote it was that big files tend not to change in small ways over time - movies, images, and such. So, files over 10MB were considered candidates for an optimization of the digest process... This has proved to be a good intuition, but there are certainly some exceptions that could cause problems with the simple approach currently in the code (read the file size in bytes, read the first 1MB and the last 1MB and use these for the digest). I remember coming up with a strategy more along the lines of 100MB being the threshold and taking 100MB of random samples from the file. I don't remember why I changed it, but it was prolly just a matter of it taking too long and/or being somewhat more complicated to implement for the time I set aside to do the work... either way, the current code is quite basic, but fast... and it's worked fine because the only large files that I've used it on have been normal user files that simply don't change much in the middle. Still, this is definitely an area to optimize. The easiest example of a file that would be problematic that I can think of is a VM drive file... When in doubt, don't use fast mode :).
130+
131+
#### Stuff to think about
132+
133+
* Fast digest - what's a better approach that's still fast (sampling is slow, but is it necessary)?
134+
* I seem to remember counting being challenging - does the program count correctly or does it need a fix?
135+
* Symlinks are weird, but are they a problem?
136+
* How to develop a changeset - linear, in a single pass, what?
137+
* How to handle the undo functionality
138+
* What to do about destructive changes - save the to be destroyed file somewhere
139+
* How to handle duplicate versioning - naming, save somewhere
140+
* Given past experience, what to do about voluminous reporting - definitely need better delineation of sections (very hard to differentiate in terminal)
141+
* Tkinter? I dunno, I prefer something like avalonia, but that's .net, still should this have a ui?
142+
* Order to do the coding - left to right and right to left first, then merges, which preservation strategies in what order?
143+
144+
The playground has the latest code and branches [https://github.com/decuser/decuser_python_playground](https://github.com/decuser/decuser_python_playground)
145+
146+
*post last updated 2022-12-15 17:53:00 -0600*
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
layout: post
3+
title: "Warren Toomey Awarded 2022 USENIX Flame"
4+
categories: unix
5+
---
6+
7+
Warren Toomey, the founder and maintainer of all things related to The Unix Heritage Society (TUHS) [https://www.tuhs.org/](https://www.tuhs.org/) has been awarded the 2022 USENIX Lifetime Achievment Award ("The Flame").
8+
9+
Without TUHS, it would hardly even be possible to enjoy retro unix explorations. This is a well earned accolade by an unassuming and very hard working individual.
10+
11+
Congratulations, Warren!
12+
13+
![flame](https://www.tuhs.org/Images/flame.jpg){: width="480" }
14+
15+
16+
<!--more-->
17+
18+
### About Warren Toomey
19+
20+
Warren retired in 2021 after a three decade career of teaching Computer Science and IT in tertiary institutions including the University of New South Wales, Bond University and TAFE Queensland (a polytechnic/community college). His teaching was always systems focussed: computer architecture, operating systems, systems programming, networking and cybersecurity. Warren was first introduced to Unix at the end of 1982 at a Summer School at the University of Wollongong, but lost access to it when he started his undergraduate degree in 1984. This loss was the driving force behind his fascination with Unix and its history. Fortunately, Minix, 386BSD and FreeBSD came along at just the right time to slake Warren's thirst for Unix. He founded the Unix Heritage Society in 1994 (originally named the PDP-11 Unix Preservation Society) and has nurtured it since. Now retired, Warren's interests now include dressage riding, and he has become a competent groom for his wife Kaz, a professional 'Big Tour' rider.
21+
22+
### Past recipients of the Flame
23+
24+
* Chet Ramey (2020)
25+
* Margo Seltzer (2019)
26+
* Eddie Kohler (2018)
27+
* Tom Anderson (2014)
28+
* John Mashey (2012)
29+
* Dan Geer (2011)
30+
* Ward Cunningham (2010)
31+
* In Honor of Gerald J. Popek (2009)
32+
* Andrew S. Tanenbaum (2008)
33+
* Peter Honeyman (2007)
34+
* Radia Perlman (2006)
35+
* Michael Stonebraker (2005)
36+
* M. Douglas McIlroy (2004)
37+
* Rick Adams (2003)
38+
* James Gosling (2002)
39+
* The GNU Project and all its contributors (2001)
40+
* Richard Stevens (2000)
41+
* The X Window System Community at Large (1999)
42+
* Tim Berners-Lee (1998)
43+
* Brian W. Kernighan (1997)
44+
* The Software Tools Project (1996)
45+
* The Creation of USENET (1995)
46+
* Networking Technologies (1994)
47+
* Berkeley UNIX (1993)
48+
49+
Read more about the award and its recipients at [https://www.usenix.org/about/awards/flame](https://www.usenix.org/about/awards/flame)
50+
51+
Read more about why Warren started TUHS at [https://minnie.tuhs.org/Blog/2015_12_14_why_start_tuhs.html](https://minnie.tuhs.org/Blog/2015_12_14_why_start_tuhs.html)
52+
53+
\- will
54+
55+
*post added 2022-12-15 17:52:00 -0600*

0 commit comments

Comments
 (0)