Skip to content

Latest commit

 

History

History
187 lines (173 loc) · 14.5 KB

Version Control in Git.md

File metadata and controls

187 lines (173 loc) · 14.5 KB

Version Control in Git

1. Introduction

  • Version is the contents of a file at a given point in time. It also includes metadata, or information associated with the file, such as the author, where it is located, the file type, and when it was last saved
  • Version control is a group of systems and processes to manage changes made to documents, programs, and directories
  • Version control isn't just for software. Anything that changes over time or needs to be shared can benefit from using version control
  • Version control allows us to track files in different states and let multiple people work on the same files simultaneously, a concept known as continuous development
  • It also allows us to combine different versions, identify a particular version of a file, and revert changes
  • One popular program for version control is called Git
  • Git is open source and scalable to easily track everything from small solo projects to complex collaborative efforts with large teams!
  • Note that Git is not the same as GitHub, which is a cloud-based Git repository hosting platform. However, it's common to use Git with GitHub
  • A key benefit of Git is that it stores everything, so nothing is ever lost
  • Also, Git automatically notifies us when our work conflicts with someone else's, so it's harder to accidentally overwrite content
  • Additionally, Git can synchronize work done by different people on different machines.

Saving Files and Commiting

  • A Git repository consists of 2 parts:
    • Files and directories we create/edit
    • A directory called .git, which stores all extra information that Git records about the project's history
  • .git is located in the main directory of the repo
  • Git expects this information to be laid out in a particular way, so we should not edit or delete .git
  • Git workflow:
    • image
  • To save a modified file:
    • image
  • To commit the drafts
    • image
  • If we are making lots of changes then it's useful to know the status of our repo
  • We can use the git status command, which tells us which files are in the staging area, and which files have changes that aren't in the staging area yet.
  • In this case, we see report.md has been modified and is in the staging area, so we make a commit:
    • image
  • Comparing an unstaged file with the last commit: git diff filename
    • image
  • Comparing a staged file with the last commit: git diff -r HEAD filename
  • Adding HEAD, which is a shortcut for the most recent commit, allows us to see a difference between the report file in the staging area and the version in the last commit
  • To compare multiple staged files with last commit, we omit the filename in the command:
    • image

2. Making Changes

Storing data with Git

  • Git stores data through commits, which have three parts
  • The first is the commit itself, which contains metadata such as the author, commit message, and time of the commit
  • The second part is a tree, which tracks the names and locations in the repo when that commit happened.
  • For each file listed in the tree, there is a blob (3rd Part), which is short for binary large object. A blob may contain data of any kind. Blobs contain a compressed snapshot of the contents of the file when the commit happened
  • To view commit info: git log:
    • image
  • To find a particular commit which could have brought errors:
    • image
  • To see the ouptut of git show:
    • image

Viewing changes

  • Comparing changes between commits: a staged file with the last commit
    • image
  • Using HEAD with git show
    • image
  • Note that git show is useful for viewing changes made in a particular commit; while git diff compares changes between two commits
    • image
  • To show line-by-line changes and associated metadata:
    • image

Undoing Changes before Committing

  • If you accidentally added a file (which you didn't want to save coz you were still working on it), you remove it from the staging area (unstage file):
    • image
  • To undo changes in an unstaged file:
    • image
  • To undo changes to all unstaged files in repo:
    • image

Restoring and Reverting

  • To customize log output (especially to confine output to a few commits):
    • image
  • We can also customize git log by date as follows:
    • image
  • Restoring an old version of a file (Part 1):
    • image
  • Restoring an old version of a file (Part 2):
    • image
  • Restoring a repo to a previous state(restore old versions of all files):
    • image
  • Cleaning a repo:
    • image

3. Git Workflows

Configuring Git

  • To configure global email:
    • image
  • To ignore specific files, we put the files in a .ignore file:
    • image

Branches

  • If we are working locally and not using version control, it's common to create subdirectories to store different versions of files
  • We'll likely end up with extra files and sub-directories
  • Git uses branches to eliminate this problem:
    • image
  • To bring the branches back together, we merge them:
    • image
  • Benefits of branches:
    • Avoid endless subdirectories
    • Allow multiple users to work simultaneously
    • Everything is tracked everything
    • Minimizes the risk of conflicting versions
  • To identify branches in our project: git branch . The branch prefixed with a * is the branch we are currently in
  • To create a new branch:
    • image
  • To compare branches: git diff branch1 branch2

Git Merge Branches

  • When working on projects, developing across different components is common
  • This is a key reason why we should switch between branches, as it allows us to keep making progress concurrently
  • For example, imagine we have some code in use to track the performance of our surveys
  • We want to test some new ideas, but we don't want to change our existing code until we have confirmed it works. We create a new branch of our repo called testing, and test our new ideas. We can also create a new branch for debugging
  • To switch branches: git checkout branch_name
  • After we finish the task handled in the branch, we merge the branch into main branch (ground truth of the project, hence should always be up to date)
  • We can merge branches using: git merge source destination
  • The output of the merge command is:
    • Last commit hashes from each branch (2)
    • Type of merge e.g. Fast-forward --> meaning additional commits were made on the summary-statistics branch, so Git brings the main branch up to date
    • Number of lines added or deleted per file

Handling Git Conflicts

  • A conflict occurs when a file in different branches has different contents that prevent them from automatically merging into a single version
  • Output of opening conflicting file using nano text editor:
    • image
  • To resolve the conflict:
    • Delete all conflicting lines and remain with the relevant line, and save the file
    • Add the modified file to staging area: git add modified_file
    • Commit it to main branch: git commit -m "Resolving file conflict"
    • Merge updated branch into main again: git merge updated_branch main
  • So, in the case of conflicts, prevention is definitely better than cure
  • While it's important that we know how to deal with conflicts, the best approach is to lower the chances of conflicts occurring
  • The ideal approach is to use each branch for a specific task. We should avoid editing the same file in multiple branches.
  • While it doesn't guarantee we'll avoid creating a conflict, it does reduce the risk

4. Collaborating with Git

Creating repos

  • To create a new repo:
    • Create repo: git init repo_name
    • Cd to repo directory: cd repo_name
    • Confirm git repo has initialized correctly: git status
  • To convert an exisiting project into a repo:
    • Convert the directory to a Git repo: git init
    • Add files and commit
  • We should avoid creating a Git repo inside another Git repo, also known as nested repos --> this creates 2 .git directories
  • Unfortunately, as we try to make commits, Git will get confused about which directory it needs to update
  • Generally, nested repos are not necessary except when working on extremely large and complex data projects

Working with remote repos (remotes)

  • A remote repo is a repo stored in the cloud through an online repo hosting service such as GitHub
  • Key benefits to using remotes
    • If our computer breaks down or we lose it, we can use a different computer to access our project from the remote repo as it is backed up there
    • Colleagues can collaborate with us regardless of their location
  • Cloning a local repo:
    • image
  • Cloning a remote:
    • image
  • To identify a remote repo: git remote
  • Specifying name for remote when cloning:
    • image

Collaborating on Git Projects

Gathering from a remote

  • If several people are collaborating on a project then, in practice, they will access the remote, work on files locally, save them, and synchronize their changes between the remote and local repos
  • This means that the remote repo should be the source of truth for the project, where the latest versions of files that are not drafts can be located
  • To compare the files in a remote against the contents of a local repo we first need to fetch versions from the remote:
    • image
  • You can also fetch from a different branch by: git fetch origin branch_ name
  • To synchronize content between the 2 repos:
    • image
  • Git has simplified the process of fetch and merge into one command: git pull
    • image
  • If we have been working locally and not yet committed our changes, then Git won't allow us to pull from a remote. Let's say we've added a new line to the report but not staged the file or made a commit
  • If we try to pull from origin then Git tells us that local changes would be overwritten. We are instructed to commit our changes and told that the pull command was aborted
  • Therefore, it's important to save our work locally before we pull from a remote

Pushing to a remote

  • Is the process of bringing our local changes into a remote repo
  • Git push syntax:
    • image
  • Typical git push/pull workflow:
    • We start by pulling the remote into our local repo
    • We then work on our project locally, committing changes as we go
    • Lastly, we push our updated local repo to the remote
    • This workflow is repeated throughout our time working on the project.
  • Note that remote/loval conflicts can arise when we don't start the workflow by pulling from the remote
  • This can typically occur because while we've been working locally, our colleagues have been pushing their changes to the remote
  • So, if we don't pull from the remote at the start of the workflow then our local repo won't be synchronized with the remote