Even better, count no of files of each extension:
for dir in */; do for ext in jpg json; do echo "${dir}${ext}: $(ls -l ${dir}*.${ext} 2>/dev/null | wc -l)"; done; done
ls path/to/dir | wc -l
but for all subdirs
for dir in */; do echo "${dir}: $(ls -l ${dir} | grep "^-" | wc -l)"; done
Delete output_ detection files if they don't contain any models (.pth) inside.
find . -type d -name 'output_*' '!' -exec sh -c 'ls -1 "{}"/*.pth >/dev/null 2>&1' ';' -exec mv -ttrash/ {} +
Backup training code with training results (so that I always know what code produced them)
current_file = os.path.basename(__file__)
output_file = os.path.join(OUTPUT_DIR, current_file)
shutil.copyfile(__file__, output_file)
Increase bash history size:
echo $HISTSIZE; echo $HISTFILESIZE #check current limit - for me it was 1000 and 2000 by default
nano ~/.bashrc # the lines were ind the middle of the file - I've changed to 5k and 10k respectively
source ~/.bashrc
I love fzf. I've been installing it on every laptop, vm, cluster, raspberry pi etc I touch.
It does fuzzy autocompletion - crl+R
combo in terminal just works better than normally.
git clone --depth 1 https://github.com/junegunn/fzf.git ~/.fzf
~/.fzf/install
Then 'y' for everything.
Recently I've learned, you can also use alt+c
to get fuzzy cd, and ** + tab
to get fuzzy autocompletion from: https://news.ycombinator.com/item?id=35248098 .
Example usage of ** tab
: source ** tab activate
to activate venv quicker (works with vscode!).
But there is also one more gamechanging trick, that makes history work much better (and, by extension, fzf):
shopt -s histappend
PROMPT_COMMAND="history -a;$PROMPT_COMMAND"
Put those lines in your ~/.bashrc to prevent loosing history with multiple parallel ssh sessions. You still need to restart another session so that it can see the history; but at least now it doesn't have to wait for the first session to be properly closed. Src: https://web.archive.org/web/20090815205011/http://www.cuberick.com/2008/11/update-bash-history-in-realtime.html
Google Cloud Services: my IT blocks External IP,this workaround allows me to ssh/rsync/sftp/vscode integration:
~/.ssh/config
Host compute.id
HostName compute.id
IdentityFile /home/username/.ssh/google_compute_engine
CheckHostIP no
HostKeyAlias compute.id
IdentitiesOnly yes
StrictHostKeyChecking yes
UserKnownHostsFile /home/username/.ssh/google_compute_known_hosts
ProxyCommand /usr/bin/python3 -S /home/username/google-cloud-sdk/lib/gcloud.py beta compute start-iap-tunnel instnacename %p --listen-on-stdin --project=projectname --zone=zone --verbosity=warning
ProxyUseFdpass no
User username
Hostname matters (!). Also, add you .pub key to instance.
id can be copied from google_compute_known_hosts
(I couldn't find it in web gui).
Python path may need to be changed (this still causes issues with numpy, I guess?).
Transferring big file, continue if interrupted. Read that in some cases inplace causes issues and timeout
should be used instead (counterintuitively), but for now it works fine.
rsync -av --partial --append --inplace --info=progress /home/user/path/to/verybigfiler.zip user@ip:/home/user/path/to/verybigfile.zip
How to create an img from raspberry pi sd card (for backup or quick setup for new users/new rpis), but also make sure it's actual content size not full sd card size:
sudo fdisk -l
to dinf partition;boot
androotfs
will both go if you copyone slash upmore general path i.e. /dev/mmcblk0 instead of /dev/mmcblk0p1 or /dev/mmcblk0p2dd if=/path/to/input of=/path/to/output status=progress
- but also add
conv=sparse
- resize the sd card
- use count=1500 and bs=1MB to only get 15G of data
- pad with zeros
if=/dev/zero of=largerfile.txt
and rm - use balenaEtcher instead of rpi imager to write the image to a new sd card
Not sure what exactly did the trick, but I'd first go with resize&count before trying to pad.
Nvidia drivers/cuda/cudnn crushes are a never ending horror for the machine learning devs. Countless days wasted on debugging those trivial problems (I.e after AWS issues an upgrade command on EC2 without user interaction and the upgrade fails)... As a rule of thumb, issue following commands (order may vary) before doing any manual complicated operations (such as purge or downloading drivers from nvidia website as .deb or .run)
sudo apt autoremove
sudo apt --fix-broken install
sudo apt update
sudo apt upgrade
sudo reboot
Also,
cat /var/log/apt/history.log
dpkg -l | grep nvidia
cat /proc/driver/nvidia/version
ls -l /usr/lib/x86_64-linux-gnu/libcuda
to understand your situation better
Clean heavy dir from .git history. May require manual rm from actual filetree. Use du -sh
to verify if it worked. This will also remove origin so you won't accidentally push something super nasty; it's good i.e for splitting a big repo into two - and rather bad to run on existing repo that is used by a lot of people. The approach from bitbucket with filter-branch https://support.atlassian.com/bitbucket-cloud/docs/split-a-repository-in-two/ doesn't work for me.
git filter-repo --force --path single_path_without_slash --invert-paths
To change history (obviously, this one is also potentially dangerous) when smb commited from aws as Ubuntu:
echo "NewName NewSurname <new@email> <ubuntu@mumbo-jumbo-from-ec2>" >> .mailmap
git-filter-repo --use-mailmap
rm .mailmap
A bunch of minor tricks for development with docker and git:
In separate terminal window kill all docker containers, if docker ignores SIGINT and SIGTSTP (ctrl+c and ctrl+z):
docker kill $(docker ps -q)
Enter a running docker container with bash, so you can look at filesystem, debug, execute different scripts etc.
docker exec -it $(docker ps -q) /bin/bash
Find out which layers of your docker image take most disc space:
docker history image_id
# for more details:
docker history --no-trunc image_d
Also, https://github.com/wagoodman/dive is an option (an open source tool for inspecting docker images with 29k+ stars on github).
Git. Let's say you forgot to add a file before a commit, or maybe you see a typo, or a really minor change.
git add my_changed_file.ext
git commit --amend --no-edit
Beware though, if you already pushed your changes, you will have to use git push --force
now and this will erase the work of other people if it was committed in the meantime. This is only allowed if you are absolutely sure nobody else works on your branch.
I'm deploying a mask-rcnn based app, and my docker images were 20GB+. So here are some tips that allowed me to get it to ~10GB:
- Use runtime base images instead of devel ones whenever possible. This might save 5-6GB easily. Sometimes this will force you to apt get install
python3-dev
orbuild-essential
orgcc
manually, but it's still worth it. For example:
FROM nvidia/cuda:11.3.1-cudnn8-runtime
#used to be: FROM cuda:11.3.1-cudnn8-devel-ubuntu20.04
- Create .dockerignore with
.git
as it's content. For some ungodly reasons people decided to commit model weights, videos and other oversised trash into the repository so until we fix this, this tip saves us ~1GB. - Combine commands using
&&
and\
to break lines for readability. This reduces number of layers (eachRUN
creates a new layer). If a file gets deleted in layer n+1, layer n keeps taking disc space (and is shipped i.e to ECR and your destination with the seemingly deleted file). Example:
RUN wget https://github.com/opencv/opencv/archive/4.5.3.zip \
&& unzip 4.5.3.zip && rm 4.5.3.zip \
&& wget https://github.com/opencv/opencv_contrib/archive/4.5.3.zip \
&& unzip 4.5.3.zip \
&& rm 4.5.3.zip \
&& mkdir opencv-4.5.3/build
- Don't install recommends
RUN apt-get install -y --no-install-recommends
. Occasionally this breaks some code, so test it first. - Don't cache to whl files:
pip3 install --no-cache-dir
. Not sure how this affects perfromance though, wasn't able to find any info.
Adapted a function which is helps me a lot when I need to quickly profile some other function:
from functools import wraps
import time
def time_func(func_to_time):
@wraps(func_to_time)
#wraps() doesn't do anything fancy - it's only here so that docstring and function __name__ are preserved
#you can omit it if you want
def timed(*args, **kw):
start = time.time()
result = func_to_time(*args, **kw)
stop = time.time()
print(f'ELAPSED TIME: {func_to_time.__name__}() {stop - start}')
return result
return timed
Can be used as a decorator:
@time_func
def dummy(seconds):
time.sleep(seconds)
dummy(3) # ELAPSED TIME: dummy() 3.00302
Or as an alias when we don't have easy access to the function definition:
timed_sleep = time_func(time.sleep)
timed_sleep(3) #ELAPSED TIME: sleep() 3.00302
Here come some more examples of unpacking (now with the star * operator) that seem to explain the concept really well. Star on the left hand side of an expression means "assign all elements not assigned to other variables":
a,b,*c,d = [1,2,3,4,5,6,7]
print(a,b,c,d) # 1 2 [3,4,5,6] 7
On the right hand side it can be used to merge iterables (by unpacking them and using them as arguments for a new list):
l1 = [1,2,3]; l2 = [3,4,5]
print([*l1,*l2]) # [1,2,3,3,4,5]
We can unpack dictionaries (we get their keys, they are NOT ordered, so it's better to unpack to set {}
to avoid confusion):
d1 = {"a":1,"b":"lorem ipsum"}; d2 = {1:"kappa","c":"gg"}; d3 = {99:"hf", "c": "g"}
print({*d1,*d2,*d3}) # {1, 99, 'b', 'c', 'a'} - or different order
With double star **
(this one can be used only on right hand side of the expression!) we can unpack dictionaries:
print({**d1,**d2,**d3}) # {'a': 1, 'b': 'lorem ipsum', 1: 'kappa', 'c': 'g', 99: 'hf'}
Note that we lost the "gg" key - it got overwritten by the value from the last dictionary we unpacked.
Observations on unpacking (lists, tuples, dicts, sets etc) and tuples in python.
Parantheses (
)
don't make a tuple. Comma ,
does.
type((1)) #<class 'int'>
type((1,)) #<class 'tuple'>
type(1,) # <class 'tuple'>
When we do unpacking...
a,b = b,a #swapping values without tmp variable
a,b,c = [1,2,3]
a,b,c = (1,2,3)
a,b,c = 1,2,3
...we are actually creating tuples which then get unpacked. You can verify that with:
t = z,y,z = [1,2,3]
type(t) #<class 'tuple'>
type(y) #<class 'int'>
If you've ever wondered how python imports modules, here comes a simplified reimplementation of import function based on https://github.com/fbaptiste/python-deepdive
main.py:
import os.path; import types; import sys
def simple_import(module_name, module_file_path, module_path):
if module_name in sys.modules:
return sys.modules[module_name]
with open(module_file_path, 'r') as code_file: # read source code from file
source_code = code_file.read()
mod = types.ModuleType(module_name) # next we create a module object
mod.__file__ = module_file_path
sys.modules[module_name] = mod # insert a reference to the module in sys.modules
code = compile(source_code, filename=module_file_path, mode='exec')
exec(code, mod.__dict__) # execute the module, store global variables in mod.__dict__
return sys.modules[module_name]
print('Trying to get some_module from sys.modules:', sys.modules.get('some_module'))
module1 = simple_import('some_module', 'some_module.py', '.')
print('Trying to get some_module from sys.modules:', sys.modules.get('some_module'))
import other_module
other_module.hello()
some_module.py
print('some_module.py')
def hello():
print('some_module says Hello!')
other_module.py
print('Running other_module.py')
import some_module
def hello():
print('other_module says Hello!\nand...')
some_module.hello()
This doesn't cover searching for the code in .py and .zip files from print(sys.path)
.
Importer = loader + finder, you can look for avaiable importers by printing sys.meta_path
. Also, for already imported modules you can print their __spec__
:
import math; import fractions
print(sys.meta_path)
print(math.__spec__)
print(fractions.__spec__)
For an actual implementation of import used by Python, google "importlib".
A template for simple python scripts:
"""
Docstring: I hate it when ex-java developers spam unnecessary boilerplate abstractions that do nothing 'just-in-case' so that the code is 'future-proof' and 'universal'.
Nevertheless, in each ML project I find myself surrounded by dozens of throwaway scripts that get really messy when shared over github/box/gdrive/devboxes/ec2 instances.
Please use it as your default starting python file.
"""
__author__ = "Filip Drapejkowski"
__version__ = "1.0.0"
import argparse
def main(args):
if args.verbose > 0:
print("Reading from ", args.input_path)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="This script does nothing.")
parser.add_argument("--input_path", help="Input txt file path with data about nothing.", default="input_file.txt")
parser.add_argument("--output_path", help="Output file that can be loaded into oblivion.", default="output_file.json")
parser.add_argument('--verbose', '-v', action='count', default=0)
parser.add_argument("--version", action="version", version="%(prog)s " + __version__)
args = parser.parse_args()
main(args)
A few rarely seen functions and python features that come in handy while debugging.
global()
and local()
are super useful when weird things happen in your code and it's due to variable scopes:
print(locals() is globals()) #True
def foo():
a=12; b=13; c=[1,"2",["three"]]
print(locals() is globals(), locals())
foo() # False, a dictionary with local variables and their values
print(globals)
Run these to understand Pythons's modules better or to debug conflicting dependencies:
import sys
import math
# print(sys.modules) # lots of output
sys.path.append("/directory/with/a/module") # a dirty hack for imports
print(sys.path) # shows the paths that are used for importing modules!
print(math.__dict__)
import fractions
print(sys.modules['math'])
print(sys.modules['fractions'])
print(dir(fractions)) # dir(module) may just return module.__dict__ in some situations
print(fractions.__dict__)
print(fractions.__file__)
sys.modules['testing_imports'] = lambda: "Import looked in sys.modules and found me even though I'm not a module."
import testing_imports
testing_imports()
Btw, if you import a module twice, it only gets executed once; subsequent calls are ignored since the module is already in cache.
But if you delete a reference from globals with del globals()["module_name"]
you can re-import successfully.
I've been using python 'is' keyword without reflecting what I am actually doing, so here comes a snippet that explains it really well imho:
# The 'is' keyword is used to test if two variables refer to the same object.
x = 3; y = 3; z = x
print(x is y, x is z, id(x), id(z), id(y)) # True, True, 3x the same id; nothing suprising for numeric literals
x = [1,2,3]; y = [1,2,3] ; z = x
print(x is y, x is z, id(x), id(z), id(y)) # False, True, 2x the same id; different objects despite the same content
Some short snippets that come in handy when I need to explain some more obscure features of Python:
Lambdas examples:
print((lambda x, y, third_arg, blablabla: x * y - third_arg + blablabla)(5,100,501,2))
list_1 = [1,2,3,4,5,6,7,8,9]
print(list(filter(lambda x: x%2==0, list_1)))
print(list(map(lambda x: pow(x,3), list_1)))
Decorators (with arguments):
def my_decorator(a_function_that_will_be_wrapped):
def a_wrapper_function(*args, **kwargs):
print("="*20)
a_function_that_will_be_wrapped(*args, **kwargs)
print("="*20)
return a_wrapper_function
@my_decorator
@my_decorator
@my_decorator
def print_something_twice(the_text):
print(the_text)
print(the_text)
print_something_twice("Hello World!")
While installing ROS2 and Carla simulator on Ubuntu I got the following error: W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list.d/ros2-latest.list:1 and /etc/apt/sources.list.d/ros2.list:1
The reason was running from one guide:
sudo sh -c 'echo "deb [arch=$(dpkg --print-architecture)] http://packages.ros.org/ros2/ubuntu $(lsb_release -cs) main" > /etc/apt/sources.list.d/ros2-latest.list'
And from another guide writing some lines to /etc/apt/sources.list.d/ros2.list .
All the stackoverflow/ros boards suggestions regarding editing /etc/apt/sources.list didn't help, but the solution I figured was simple: etc/apt/sources.list.d is a directory, so I just deleted all files from there and ran "sudo apt update".
How to deal with remote ssh in vscode (visual studio code) malfunctioning, hanging etc:
ctrl + shift + p and enter Remote-SSH: kill VS Code Server on Host...
Than wait patiently for few minutes (maybe close vscode if it hangs; maybe kill it from htop if needed / if there is no reposnde for ctrl + shift + p).
Changing the subject: the easiest way I know to check image size (and other info from bash): file /path/to/image.png
Btw, Sorry for lack of updates, but I'm recently spending a lot of free time on developing an Android app. The app is supposed to help in practicing math/deep learning facts/computer vision tricks and is going to be called 'brainramen'.
Cheatsheet for running multiple programs in one line in bash:
A; B # Run A and then B, regardless of success of A
A && B # Run B if and only if A succeeded
A || B # Run B if and only if A failed
A & # Run A in background.
which explains why my docker build [...] docker run was running the old image (not the rebuilt one)
I've made up a nice quiz question about gradient backpropagation, give it a try:
Automatic differentiation is...
a) ...also called numerical differentiation. It uses 2 arguments in the small vincinity of the argument for which we want to compute the derivative. It computes function values, divides difference in values by difference in arguments and the resulting slope is an estimate of the derivative we were looking for.
b) ...also called symbolic differentiation. It exploits the chain rule and the fact that complex functions consist of small subproblems which are easy to differentiate. The resulting formula is simplified to a short and elegant form.
c) ...also called computational differentiation. It exploits the chain rule and the fact that complex functions consist of small subproblems which are easy to differentiate. Lack of simplification of the final formula allows optimizations i.e. to quickly compute and reuse partial derivatives with respect to many inputs.
d) All of above.
Ok, the correct answer is AFAIK (I cannot just write the letter, cause you will read it before thinking for yourself): Not the first letter of alphabet, not the second, and not the fourth. Why would you care? Well PyTorch uses automatic differentiation, and if I understand correctly caffe does something in between symbolic and computational. You have to derive derivatives for backward pass for each layer symbolically but the framework will take care of chain rule for different layers. Still need to check what other frameworks do exactly.
Update: TensorFlow also does automatic differentiation with "gradiet tape", ref: https://jonathan-hui.medium.com/tensorflow-automatic-differentiation-autodiff-1a70763285cb https://www.tensorflow.org/guide/autodiff
diff from bash is actually nice if you ignore whitespaces (! for jupyter): !diff -w some_predictions.txt some_other_predictions_that_should_be_the_same_but_who_knows.txt
Btw the problem from 10 Apr 2021 is kinda solved. My code based on pytorch-YOLOv4 was giving wrong activations (confidence scores). https://github.com/Tianxiaomo/pytorch-YOLOv4 actually gives almost perfect activations. https://github.com/hunglc007/tensorflow-yolov4-tflite will give me wrong activations. Something might be off with NMS, but that I am still investigating. The problem is that pytorch-YOLOv4 uses IoU threshold while Darknet has hidden hier_thresh and nms_thresh. Will try pytorch-YOLOv4 with 0.0 IoU threshold and see what happens. Update: IoU threshold 0.45-0.5 in pytorch-YOLOv4 seems to be the best choice for fidelity. The scores won't be identical, but close enough (in fact mAP slightly better with PyTorch than in original Darknet model).
Just noticed the "File->Download as->PDF via LaTeX" functionality in jupyter. It requires !sudo apt-get install texlive-xetex texlive-fonts-recommended texlive-generic-recommended ...and maybe clearing-up too-long outputs, because of lack of embedded windows with scrolling in pdfs.
But it's often quite nice for a) comparing detailed metrics of trained ML models b) debugging differences in pre- and postprocessing c) Jira and documentation duties.
I've recently encountered a fascinating problem: yolov4 models perform worse after conversion from darknet to TFLite, ONNX or PyTorch. It's fascinating, that the repos are so popular, but nobody have noticed / solved it yet. Used following repos:
https://github.com/Tianxiaomo/pytorch-YOLOv4 https://github.com/hunglc007/tensorflow-yolov4-tflite
They are not fully blind, but activation values ('probabilities') differ, and some detections are missing. First, I thought Non-Maxiumum Supression is the reason (it has multiple implementations), then I figured maybe something is different with preprocessing. So I made sure normalization is the same, and I've learned yolov4 on square 416x416 images (instead of horizontal rectangles) to reduce potential for error. But It didn't help.
I think I've solved a major bug (dangling skip-links): Tianxiaomo/pytorch-YOLOv4#370 submitted a PR: Tianxiaomo/pytorch-YOLOv4#412 but the discrepancy in scores remains.
Btw, here are some examples of why I am hesitant to calling activation function values (from softmax or sigmoid) probabilities or confidence scores: https://stats.stackexchange.com/questions/309642/why-is-softmax-output-not-a-good-uncertainty-measure-for-deep-learning-models
I used to "print to pdf" using system dialog to get rid of additional pages of documents easily. Apparently the pdf becomes a picture and you can't copy text from it anymore. Switched to: pdftk ~/Desktop/too-much-pages.pdf cat 3-7 output ~/Desktop/less_pages.pdf
Running darknet binary for yolo predictions from jupyter notebook (or colab, or python in general) without mess in output:
import subprocess
command = " ".join(['./darknet detector test',test_data,cfg_path,best_weights,im_path,'-thresh 0.15'])
process = subprocess.Popen(command.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
print(output.decode())
# Works nicely with image.c modification (1 Apr 2021 post)
Resize images in dir in bash:
sudo apt-get install imagemagick
convert *.png -verbose -resize 640x480 my_dir/*.png
Or even better, so that we keep the same filenames:
cp -rv my_dir my_dir_backup
mogrify *.png -verbose -resize 640x480 my_dir/*.png
Verbose, because sometimes it looks like it failed, while everything was ok.
I use bash (not zsh) and I am not a big fan of fancy plugins or complicated, customized setups. With that being said, I still use guake and fzf. The former provides fuzzy search in history and better autocompletion, and is surprisingly easy to setup. It uses Levenstein or edit distnace to find the closest command to the thing you've typed in your history or files structure. I put it in every AWS instance / devbox / laptop that I use.
Currently I setup it with:
git clone --depth 1 https://github.com/junegunn/fzf.git ~/.fzf
~/.fzf/install
Agree to everything.
source ~/.bashrc
So I don't have to reboot/relogin. Than ctrl + r is enough to start fuzzy searching the history.
If you run detection with darknet (i.e yolov3 or yolov4), it's nice to know activation function values (aka. confidence level, although it's not very precise). For each detection darknet may print it, but it's often hard to match the values with the actual bounding boxes.
You can modify src/image.c file and rerun make all in darknet directory to draw these values on top of the bounding boxes. Just add:
const int best_class = selected_detections[i].best_class;
char prop_str[256];
strcat(labelstr, names[selected_detections[i].best_class]);
sprintf(prop_str, " \n%f", selected_detections[i].det.prob[best_class]);
strcat(labelstr, prop_str);
After line 444, although it might change over time.
It should be in draw_detections_v3()
even if you use yolov4; has to appear before get_label_v3() call but after labelstr is initiated.
Piping in shell/bash sometimes works with simple |, sometimes not (depending on how arguments are parsed by given command/program). But simply adding 'xargs' seems to solve most of the problems:
To check if paths in list of files look ok (if first file exists):
!head -1 images_list.txt | xargs du
Sed in pipe to check if annotation file for darknet (named as image, only with .txt extension) exists / what it contains:
!head -1 devel_images_list.txt | sed 's/.png/.txt/' | xargs cat
(Don't ask me why no 'xargs' in sed; for now I am just pasting 'xargs' all over the place and see whether it works or not).
The -u option in cp often help when you are to lazy to use rsync, but copying takes too long and also was interupted, here is what it does: 'copy only when the SOURCE file is newer than the destination file or when the destination file is missing' I often endup using:
cp -vru old_dir/ new_dir/
Checking the size and other info about an image in shell: identify color.jpg Will require some version of imagemagick, I just did 'apt-get install' the first proposed one.
PIL->OpenCV conversion:
from PIL import Image
img = PIL.Image.open('img.jpg').convert('RGB')
img = numpy.array(img)
img = img[:, :, ::-1].copy() #RGB to BGR
Width height order:
height, width = image.shape[:2] #cv2
width, height = image.size[:2] #PIL
Reading width and height without loading the whole image to memory - just open with PIL and don't display/modify:
image = Image.open(path)
Replace first char in each line inplace in all .txt files in directory:
sed -i 's/^./new_string/g' *.txt
Use carefully! I've damaged some datasets by mistake with it.
Delete all empty txt files:
find . -name '*.txt' -size 0 -delete
Less with line numbering (-N) and starting from line (+n):
less +29300 -N my_very_long_file_with_stupid_newline_missing_in_the_middle_of_it.txt
Quick partial ls of very big folder (-U suppresses sorting):
ls -U | head -40
Shuffle lines in a text file:
sort -R file.txt > out.txt
In fact is sorts by hashes, so it's semi-random and stable.
Solution to "Argument list too long" error while trying to mv or cp a lot of files in bash: instead of mv * target_dir/
ls | xargs mv -t target_dir/
Replace in file syntax:
sed -i 's/old_dir/new_dir\/subdir_after_escaping_slash/' test.txt
Delete all lines containing string 'my_text' inline:
sed -i '/my_text/d' filename.txt
Prepend dir name to each line (replace ^ which is beginning of a line):
sed -i.bak 's/^/dir_name\//' infile.txt
-i.bak flag makes sed work inplace, but creates backup file with .bak extension.
Quitting hanging ssh session: ctrl + d, if it doesn't work: enter ~ .
Jupyter: stopping execution programmatically:
class StopExecution(Exception):
def _render_traceback_(self):
pass
# ....
raise StopExecution
Darknet/YOLO_v4 - tips: It's surprisingly hard to find the actual architecture of YOLO models; and there is no Tensorboard to browse through the computation graph. It appears that https://netron.app/ and https://github.com/lutzroeder/netron support Darknet's .cfg files and return beautiful, interactive graphs. These graphs help to understand how the 3 [yolo] layers (2 for tiny-yolo) are attached (deep supervision of the model).
Problems:
- The relation between batch/minibatch/subdivision/epoch/iteration/darknet binary outputs/steps/brun_in values/XXXX.weights is counterintuitive to me. Asked questions on stack overflow, run some experiments; waiting for results.
- I would greatly appreciate if darknet returned current learning rate alongside with other metrics - perhaps I will fork and rebuild it.
- I still don't understand anchors/mask parameters per [yolo] layer instance. Why only a small portion of anchors is used at all?
My struggle to a find short but accurate description of the YOLO object detection network:
First, we had sliding windows + classification. Then R-CNN use unsupervised method (selective search) to propose bounding boxes. Fast R-CNN proposed bounding boxes after features extraction with the backbone network. Faster R-CNN learned a separate neural network to learn region proposals. The classification network also predicted offsets for BBs. YOLO introduced grid, and for each grid cell m bounding boxes (aka. anchors) are generated. One network predicts class, offset, width/height adjustment.
Setting DNS records is always painful - tiny details matter, tutorials are often wrong, changes take undefined time to propagate. Github pages + GoDaddy: First had to disable redirection (which was hidden deep). Than set:
a @ 185.199.108.153
a @ 185.199.109.153
a @ 185.199.110.153
a @ 185.199.111.153
cname www @
cname www.drapej.com @
A records MUST have '@' as 'name'. _domainconnect can be deleted or kept - doesn't matter TTS - kept default values.
The repo has to be named huxwell.github.io, (huxwell = my username), Settings->Custom Domain name MUST be www.drapej.com not drapej.com. CNAME file in repo root: www.drapej.com (NOT drapej.com)
Git deleting branches:
git push origin --delete branch_name
git branch -D branch_name
Jupyter logging model training with Darknet (%%capture will suppress live output, so it's not an option): import datetime file_prefix = datetime.datetime.now().strftime("%Y_%m_%d-%H_%M_%S") !./darknet detector params 2>&1 | tee ../train_logs/{file_prefix}.txt
Neural Networks/ML in general: How to label border cases? Is 15% of a cat's tail a cat? Is very blurred image of cat a cat? Should we use soft targets for neural networks? (0.8 0.2) (cat, not cat) ? Is such regression easier or harder to learn?
Asked a question on ai.stackexchange: https://ai.stackexchange.com/questions/26175/how-to-treat-label-and-process-edge-case-inputs-in-machine-learning Received some long replies, but didn't learn anything new from them :( I need to find some papers that directly compare different approaches in quantitative way.
AWS EC2: If you stop VMs when you don't use them and then restart, the address will change each time which is super annoying with ssh. Go to console.aws.amazon.com -> type Elastic IPs in search -> choose ec2 feature (not VPC feature) -> Allocate Elastic IP address (use all default options) Checkbox newly created IP -> Actions -> Associate Elastic IP -> choose your instance from the list. Now you can always scp/sftp/ssh to newly created ip instead of always changing ec2-12-345-678-901.compute-2.amazonaws.com . According to aws.amazon.com/ec2/pricing/on-demand/ it costs around 12 cents / day if the instance is stopped AFAIU.
AWS EC2: Extend disc space (EBS): Instances: checkbox target VM, Storage, click volume id, actions, modify volume. SSH to machine, df -hT, lsblk to get more info. sudo growpart /dev/xvda 1 (counterintuitive, because lsblk and aws docs suggest 'xvda1 1') Should be enough, but df and lsbik show different values, so I also used: sudo resize2fs /dev/xvda1
No reboot required!
Jupyter shortcuts: Esc to leve editing mode, arrows to jump between cells
DD - delete cell
L - show line numbers
Enter - back to editing
H - show shortcuts help
Run persistently (more reliable than nohup to me): jupyter notebook --no-browser --port 8080 &> /dev/null & disown Connect with forwarding:
ssh -L 8888:localhost:8888 some_short_alias
Github pages: often an old versions of the files are being served. https://github.com/user/repo/deployments ( https://github.com/huxwell/huxwell.github.io/deployments for me ) is a way to check if your commits have already been deployed.
With ssh (especially for Amazon EC2) it's very usefull to do alias in ~/.ssh/config, i.e:
Host some_short_alias
HostName ec2-12-345-678-901.compute-2.amazonaws.com
IdentityFile ~/.ssh/some_key_path.pem
User my_username
And than always do ssh a_short_name instead of ssh -i "~/.ssh/some_key_path.pem" ec2-12-345-678-901.compute-2.amazonaws.com Also, Visual Studio Code adds a record there automatically when you use Remote Explorer-> SSH Targets-> + Open files explorer and type sftp://a_short_name to user gui ftp file explorer. Rsync commands sometimes get very messy, so this alias helps a bit: rsync -av --dry-run /home/local_user/dir/ a_short_name:remote_dir_name
Hello World! (Or rather Hello Self!)