- Automate basic tasks
- Underlies many other open source languages and applications and can be used to glue them together
- Essential for system administration, remote computing, and high-performance computing
- Many concise special-purpose tools that can make your life easier
- Complements more fully-featured application programming languages
Powerpoint slides: Unix family tree
- Unix-like operation systems share a common architecture and layout
- Roughly compatible, with similar (or identical) shells and tools
- The environment in which most open-source software was written
- Broadly speaking, there is a tension between making computer systems fast and making them easy to use.
- A common solution is to create a 2-layer architecture: A fast, somewhat opaque core surrounded by a more friendly scriptable interface (also referred to as “hooks” or an “API”). Examples of this include video games, Emacs and other highly customizable code editors, and high-level special-purpose languages like Stata and Mathematica.
- Unix shell is the scriptable shell around the operating system. It provides a simple interface for making the operating system do work, without having to know exactly how it accomplishes that work.
The design and terminology of modern computers is based on metaphors from a previous age.
- Files and folders
- Teletype input and output
- Modern touch devices don’t expose the file system, so you may be less comfortable with navigating directory trees than people whose primary computing devices were desktop computers
Powerpoint slides: “Navigating files and directories”
whoami
- Current working directory
pwd # Print Working Directory
- By default, this is probably your home directory (discuss how to view this in Finder or File Explorer)
- Linux
/home/nelle
- Mac OS
/Users/nelle
- Windows
C:\Users\nelle
- Linux
- List the contents of the directory
ls # List directory contents
- Command flags modify what a command does
ls -F # show category markers
ls --help # In-line help info; should work in Windows
man ls # Manual for "ls"
- You can navigate through the man page using the space bar and arrow keys
- Quit man with “q”
- Online references are available for Windows users who don’t have man pages: https://linux.die.net/
- When a command is followed by an argument, it acts on that argument.
ls -F Desktop # get contents of folder ls -F Desktop/shell-lesson-data # get contents of subfolder
- Move down the directory tree
cd Desktop cd shell-lesson-data cd exercise-data
- Now that you’re “in” a new location, the context for your commands is different
pwd ls -F # This produces an error because the folder is in a different location # relative to the working directory cd shell-lesson-data
- Move up the directory tree
.
is shorthand for “current directory”;..
is shorthand for “parent directory”# Show hidden files, including current and parent directories ls -a # You can combine flags ls -Fa # Move to parent directory cd ..
- Shortcuts
cd ~ # go to home directory cd - # go back to previous directory
- An absolute path specifies a location from the root of the file system.
- A relative path specifies a location starting from the current location.
- See where we are and what we have
pwd cd exercise-data/writing # traverse several layers at once ls -F
- Create a directory
# Make a subdirectory mkdir thesis ls -F # Make multiple directories; create intermediate dirs as required mkdir -p ../project/data ../project/results # Show all directory contents recursively ls -FR ../project
- Create a text file. Note that everything is available through the file browser and the terminal.
cd thesis nano draft.txt
This is my first draft boop beep boop
- Edit with Notepad / TextEdit, then re-edit with nano.
- Move our file to a new location
cd ~/Desktop/shell-lesson-data/exercise-data/writing # Rename the file by moving it mv thesis/draft.txt thesis/quotes.txt # Verify the new file name ls thesis # You can also specify the exact file name ls thesis/quotes.txt
- Move our file to the current working directory
mv thesis/quotes.txt . ls thesis/quotes.txt # Not here anymore ls # now here
- Copy a single file
cp quotes.txt thesis/quotations.txt ls thesis ls # Alternatively ls quotes.txt thesis/quotations.txt
- Copy a directory recursively
cp -r thesis thesis_backup ls thesis thesis_backup
- Remove a file
rm quotes.txt ls quotes.txt
- Remove a file interactively
Deletion is forever!
rm -i thesis_backup/quotations.txt
- Remove a directory and its contents
rm thesis # This gives un an error rm -ri thesis # Remove recursively
Deletion is forever. Consider making a backup archive as part of your workflow.
- Create an archive with
tar
(“tape archive”).cd ~/Desktop/shell-lesson-data/exercise-data/ # [c]reate a new archive with the given [f]ilename tar -cf writing.tar writing/
- Create a compressed (zipped) archive.
# [a]uto-compress the archive based on its file extension tar -acf writing.zip writing/ # FYI, you may also see tar -a -cf writing.zip writing/ # FYI, linux servers frequently use g[z]ip tar -z -cf writing.tgz writing/
tar
is an old utility and can be finicky about the order of flags. - Extract your archive
mv writing writing_backup # e[x]tract the archive to get the original files back tar -xf writing.zip # Compare the old and restored directories ls writing ls writing_backup
- There are many useful utilities: https://www.gnu.org/software/coreutils/manual/coreutils.html
- Copy with multiple file names
cd ~/Desktop/shell-lesson-data/exercise-data/ cp creatures/minotaur.dat creatures/unicorn.dat creatures_backup/
- Copy using globs (“globals”)
You can match a single character with ? or unlimited characters with *. This is an example of shell expansion.
mkdir proteins_backup # The shell expands *.pdb into the list of all matching files, then does `cp` cp proteins/*.pdb proteins_backup/
The “Unix Philosophy” is to combine many small tools that do one job into a processing pipeline.
FYI, .pdb
is the Protein Data Bank format
- Count words in a file using
wc
cd ~/Desktop/shell-lesson-data/exercise-data/proteins/ ls # Inspect cubane.pdb cat cubane.pdb # [w]ord [c]ount for cubane.pdb wc cubane.pdb
- Run
wc
for all files# Run the command with default options wc *.pdb wc -l *.pdb # lines wc -c *.pdb # characters wc -w *.pdb # words
# Redirect output to file
wc -l *.pdb > lengths.txt
ls lengths.txt
cat lengths.txt # Inspect contents
head -n 1 lengths.txt # Inspect 1st line
less lengths.txt # Inspect with pager
- The
sort
command runs the file input through a filter and returns the filtered result.sort lengths.txt # alphanumeric sort (i.e. text) sort -n lengths.txt # numeric sort
- Send filtered output to new file
sort -n lengths.txt > sorted_lengths.txt cat sorted_lengths.txt
- (Optional) Append to the end of a file using
>>
cd ~/Desktop/shell-lesson-data/exercise-data/animal-counts/ # Create new file head -n 3 animals.csv > animals-subset.csv # Append to that file tail -n 2 animals.csv >> animals-subset.csv
Pipe output from one command directly into a second command without creating an intermediate file. This is the cornerstone of Unix workflows.
sort -n lengths.txt | head -n 1
Daisy-chain your commands together. As long as the output of command X is a legitimate input for command Y, it will work.
# Return to the beginning
wc -l *.pdb | sort -n
# Add additional commands
wc -l *.pdb | sort -n | head -n 1
- The terminal saves your command history (typically 500 or 1000 commands)
- You can see previous commands using the up/down arrows
- You can edit the command that’s currently visible and run it
- Once your command history gets big, you might want to search it:
history # or `history -1000` in zsh on Mac history | grep ls # pipe the output of history into search
We should save this stuff and reuse it.
- Create a new script
cd proteins nano middle.sh
- Edit the script file and save
# Get lines 11-15 head -n 15 octane.pdb | tail -n 5
- Execute the script
bash middle.sh
- Use a special variable to run the script on any file (
$1
returns the value of a variable;""
ensures that it works if there are spaces.)nano middle.sh
# Use the 1st argument as your input. head -n 15 "$1" | tail -n 5
bash middle.sh octane.pdb bash middle.sh pentane.pdb
- Use additional ordered arguments
nano middle.sh
# Select lines from the middle of a file. # Usage: bash middle.sh filename end_line num_lines head -n "$2" "$1" | tail -n "$3"
bash middle.sh pentane.pdb 15 5
- Use unlimited arguments
nano sorted.sh
# Sort files by their length. # Usage: bash sorted.sh one_or_more_filenames wc -l "$@" | sort -n
bash sorted.sh *.pdb ../creatures/*.dat
cd ~/Desktop/shell-lesson-data/exercise-data/animal-counts/
# Get the second column of the CSV
cut -d , -f 2 animals.csv
# Sort the values
cut -d , -f 2 animals.csv | sort
# Get unique values (`uniq` requires values to be adjacent to one another)
cut -d , -f 2 animals.csv | sort | uniq
# 1. Run a python script that produces a .csv as output
# 2. Extract the 2nd column of that .csv and get the unique values
python script.py | cut -d , -f 2 | sort | uniq
Don’t repeat yourself.
cd ~/Desktop/shell-lesson-data/exercise-data/creatures/
nano latin.sh
for filename in basilisk.dat minotaur.dat unicorn.dat
do
# Extract second line of file
head -n 2 $filename | tail -n 1
done
bash latin.sh
nano latin.sh
for filename in *.dat
do
# Extract second line of file
head -n 2 $filename | tail -n 1
done
bash latin.sh
- Create a separate directory for your scripts so that you can find them
cd ~/Desktop/shell-lesson-data/exercise-data/ mkdir scripts cd scripts nano aggregate.sh
- Write a script that takes arbitrary arguments
for filename in "$@" do echo $filename done
- Run the script against the contents of a different directory
bash aggregate.sh ../proteins/*.pdb
- Do work in the script
nano aggregate.sh
for filename in "$@" do echo $filename cat $filename >> alkanes.pdb done
bash aggregate.sh ../proteins/*.pdb
# List file in long format to show current permissions
ls -l aggregate.sh
# Change file mode (i.e. permissions)
# User can read/write/execute, Group and Other can read
chmod u=rwx,go=r aggregate.sh
# Show changed permissions
ls -l aggregate.sh
# Invoke script
./aggregate.sh ../proteins/*.pdb
cd ~/Desktop/shell-lesson-data/exercise-data/
find .
# List all directories
find . -type d
# List all files
find . -type f
# Do shell expansion, then run command
find . -name *.txt
# Prevent shell expansion and match wildcard
find . -name "*.txt"
Grep is a powerful tool for matching text patterns by using regular expressions. You can find introductory documentation for regular expressions in the References section.
Consult the Wooledge Bash Guide (see references below) for more on these topics:
- SSH
- Permissions
- Job control
- Aliases and bash customization
- Shell variables
- Mini-languages (grep, sed, AWK)
- Shell expansion
- Conditional tests
- The Unix Shell: https://swcarpentry.github.io/shell-novice/
- A list of command line utilities: https://ss64.com/bash/
- GNU core utilities: https://www.gnu.org/software/coreutils/manual/coreutils.html
- Bash guide: https://mywiki.wooledge.org/BashGuide
- Shell redirection operators(1): https://www.redhat.com/sysadmin/linux-shell-redirection-pipelining
- Shell redirection operators (2): https://www.gnu.org/software/bash/manual/html_node/Redirections.html
- Grep regular expressions: https://www.gnu.org/software/grep/manual/html_node/Regular-Expressions.html
- Using zsh on MacOS: https://scriptingosx.com/2019/06/moving-to-zsh/