Version Control in Git

1. Introduction

Version is the contents of a file at a given point in time. It also includes metadata, or information associated with the file, such as the author, where it is located, the file type, and when it was last saved
Version control is a group of systems and processes to manage changes made to documents, programs, and directories
Version control isn't just for software. Anything that changes over time or needs to be shared can benefit from using version control
Version control allows us to track files in different states and let multiple people work on the same files simultaneously, a concept known as continuous development
It also allows us to combine different versions, identify a particular version of a file, and revert changes
One popular program for version control is called Git
Git is open source and scalable to easily track everything from small solo projects to complex collaborative efforts with large teams!
Note that Git is not the same as GitHub, which is a cloud-based Git repository hosting platform. However, it's common to use Git with GitHub
A key benefit of Git is that it stores everything, so nothing is ever lost
Also, Git automatically notifies us when our work conflicts with someone else's, so it's harder to accidentally overwrite content
Additionally, Git can synchronize work done by different people on different machines.

Saving Files and Commiting

A Git repository consists of 2 parts:
- Files and directories we create/edit
- A directory called .git, which stores all extra information that Git records about the project's history
.git is located in the main directory of the repo
Git expects this information to be laid out in a particular way, so we should not edit or delete .git
Git workflow:
To save a modified file:
To commit the drafts
If we are making lots of changes then it's useful to know the status of our repo
We can use the git status command, which tells us which files are in the staging area, and which files have changes that aren't in the staging area yet.
In this case, we see report.md has been modified and is in the staging area, so we make a commit:
Comparing an unstaged file with the last commit: git diff filename
Comparing a staged file with the last commit: git diff -r HEAD filename
Adding HEAD, which is a shortcut for the most recent commit, allows us to see a difference between the report file in the staging area and the version in the last commit
To compare multiple staged files with last commit, we omit the filename in the command:

2. Making Changes

Storing data with Git

Git stores data through commits, which have three parts
The first is the commit itself, which contains metadata such as the author, commit message, and time of the commit
The second part is a tree, which tracks the names and locations in the repo when that commit happened.
For each file listed in the tree, there is a blob (3rd Part), which is short for binary large object. A blob may contain data of any kind. Blobs contain a compressed snapshot of the contents of the file when the commit happened
To view commit info: git log:
To find a particular commit which could have brought errors:
To see the ouptut of git show:

Viewing changes

Comparing changes between commits: a staged file with the last commit
Using HEAD with git show
Note that git show is useful for viewing changes made in a particular commit; while git diff compares changes between two commits
To show line-by-line changes and associated metadata:

Undoing Changes before Committing

If you accidentally added a file (which you didn't want to save coz you were still working on it), you remove it from the staging area (unstage file):
To undo changes in an unstaged file:
To undo changes to all unstaged files in repo:

Restoring and Reverting

To customize log output (especially to confine output to a few commits):
We can also customize git log by date as follows:
Restoring an old version of a file (Part 1):
Restoring an old version of a file (Part 2):
Restoring a repo to a previous state(restore old versions of all files):
Cleaning a repo:

3. Git Workflows

Configuring Git

To configure global email:
To ignore specific files, we put the files in a .ignore file:

Branches

If we are working locally and not using version control, it's common to create subdirectories to store different versions of files
We'll likely end up with extra files and sub-directories
Git uses branches to eliminate this problem:
To bring the branches back together, we merge them:
Benefits of branches:
- Avoid endless subdirectories
- Allow multiple users to work simultaneously
- Everything is tracked everything
- Minimizes the risk of conflicting versions
To identify branches in our project: git branch . The branch prefixed with a * is the branch we are currently in
To create a new branch:
To compare branches: git diff branch1 branch2

Git Merge Branches

When working on projects, developing across different components is common
This is a key reason why we should switch between branches, as it allows us to keep making progress concurrently
For example, imagine we have some code in use to track the performance of our surveys
We want to test some new ideas, but we don't want to change our existing code until we have confirmed it works. We create a new branch of our repo called testing, and test our new ideas. We can also create a new branch for debugging
To switch branches: git checkout branch_name
After we finish the task handled in the branch, we merge the branch into main branch (ground truth of the project, hence should always be up to date)
We can merge branches using: git merge source destination
The output of the merge command is:
- Last commit hashes from each branch (2)
- Type of merge e.g. Fast-forward --> meaning additional commits were made on the summary-statistics branch, so Git brings the main branch up to date
- Number of lines added or deleted per file

Handling Git Conflicts

A conflict occurs when a file in different branches has different contents that prevent them from automatically merging into a single version
Output of opening conflicting file using nano text editor:
To resolve the conflict:
- Delete all conflicting lines and remain with the relevant line, and save the file
- Add the modified file to staging area: git add modified_file
- Commit it to main branch: git commit -m "Resolving file conflict"
- Merge updated branch into main again: git merge updated_branch main
So, in the case of conflicts, prevention is definitely better than cure
While it's important that we know how to deal with conflicts, the best approach is to lower the chances of conflicts occurring
The ideal approach is to use each branch for a specific task. We should avoid editing the same file in multiple branches.
While it doesn't guarantee we'll avoid creating a conflict, it does reduce the risk

4. Collaborating with Git

Creating repos

To create a new repo:
- Create repo: git init repo_name
- Cd to repo directory: cd repo_name
- Confirm git repo has initialized correctly: git status
To convert an exisiting project into a repo:
- Convert the directory to a Git repo: git init
- Add files and commit
We should avoid creating a Git repo inside another Git repo, also known as nested repos --> this creates 2 .git directories
Unfortunately, as we try to make commits, Git will get confused about which directory it needs to update
Generally, nested repos are not necessary except when working on extremely large and complex data projects

Working with remote repos (remotes)

A remote repo is a repo stored in the cloud through an online repo hosting service such as GitHub
Key benefits to using remotes
- If our computer breaks down or we lose it, we can use a different computer to access our project from the remote repo as it is backed up there
- Colleagues can collaborate with us regardless of their location
Cloning a local repo:
Cloning a remote:
To identify a remote repo: git remote
Specifying name for remote when cloning:

Collaborating on Git Projects

Gathering from a remote

If several people are collaborating on a project then, in practice, they will access the remote, work on files locally, save them, and synchronize their changes between the remote and local repos
This means that the remote repo should be the source of truth for the project, where the latest versions of files that are not drafts can be located
To compare the files in a remote against the contents of a local repo we first need to fetch versions from the remote:
You can also fetch from a different branch by: git fetch origin branch_ name
To synchronize content between the 2 repos:
Git has simplified the process of fetch and merge into one command: git pull
If we have been working locally and not yet committed our changes, then Git won't allow us to pull from a remote. Let's say we've added a new line to the report but not staged the file or made a commit
If we try to pull from origin then Git tells us that local changes would be overwritten. We are instructed to commit our changes and told that the pull command was aborted
Therefore, it's important to save our work locally before we pull from a remote

Pushing to a remote

Is the process of bringing our local changes into a remote repo
Git push syntax:
Typical git push/pull workflow:
- We start by pulling the remote into our local repo
- We then work on our project locally, committing changes as we go
- Lastly, we push our updated local repo to the remote
- This workflow is repeated throughout our time working on the project.
Note that remote/loval conflicts can arise when we don't start the workflow by pulling from the remote
This can typically occur because while we've been working locally, our colleagues have been pushing their changes to the remote
So, if we don't pull from the remote at the start of the workflow then our local repo won't be synchronized with the remote

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version Control in Git.md

Version Control in Git.md

Version Control in Git

1. Introduction

Saving Files and Commiting

2. Making Changes

Storing data with Git

Viewing changes

Undoing Changes before Committing

Restoring and Reverting

3. Git Workflows

Configuring Git

Branches

Git Merge Branches

Handling Git Conflicts

4. Collaborating with Git

Creating repos

Working with remote repos (remotes)

Collaborating on Git Projects

Gathering from a remote

Pushing to a remote

Files

Version Control in Git.md

Latest commit

History

Version Control in Git.md

File metadata and controls

Version Control in Git

1. Introduction

Saving Files and Commiting

2. Making Changes

Storing data with Git

Viewing changes

Undoing Changes before Committing

Restoring and Reverting

3. Git Workflows

Configuring Git

Branches

Git Merge Branches

Handling Git Conflicts

4. Collaborating with Git

Creating repos

Working with remote repos (remotes)

Collaborating on Git Projects

Gathering from a remote

Pushing to a remote