Version Control with Git

Version Control with Git

by
Jon Loeliger


1. Introduction



2. Installing Git



3. Getting Started


The Git Command Line


Initially, Git was provided as a suite of many simple commands, such as git-commit.
Now, it is the single git executable and affix a subcommand.
That said, both forms, git commit and git-commit , are identical.
  • List most common subcommands
  • 
    $ git help
    
  • List all subcommands
  • 
    $ git help --all
    

Quick Introduction to Using Git


Creating an Initial Repository



$ mkdir public_html
$ cd public_html
$ echo 'My website is alive!' > index.html
$ git init
Initialized empty Git repository in /home/jerry/test/public_html/.git/

the git init command creates a hidden directory, called .git, at the top level of your project. Git places all its revision information in this one top-level .git directory.
Initially, each Git repository is empty.

Adding a File to Your Repository


$ git add index.html
$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   index.html

Git has merely staged the file, but it is not permanent in the repository. This let the commit can be “batched”. The next commit will include the changes staged.

$ git commit -m "Initial contents of public_html"
Tou can also input the commit messages via a text console. To set the text editor to be used:
  • bash
  • $ export GIT_EDITOR=vi
  • tcsh
  • $ setenv GIT_EDITOR emacs

Configuring the Commit Author


$ git config user.name "Jon Loeliger"
$ git config user.email "jdl@example.com"

Viewing Your Commits

The command git log yields a sequential history of the individual commits within the repository:

$ git log
commit 51cdbfac97d1037ef8926693f1b09a6b85191273 (HEAD -> master)
Author: XXX<xxx@yyy>
Date:   Sat Jun 27 09:59:29 2020 +0800

    Initial contents of public_html

To see more detail about a particular commit, use git show with a commit number:

$ git show 51cdbfac97d1037ef8926693f1b09a6b85191273
commit 51cdbfac97d1037ef8926693f1b09a6b85191273 (HEAD -> master)
Author: XXX<xxx@yyy>
Date:   Sat Jun 27 09:59:29 2020 +0800

    Initial contents of public_html

diff --git a/index.html b/index.html
new file mode 100644
index 0000000..3e23ae4
--- /dev/null
+++ b/index.html
@@ -0,0 +1,2 @@
+
+hello

If you run git show without an explicit commit number, it simply shows the details of the most recent commit. git show-branch --more=10 , provides concise one-line summaries for the current development branch:

Viewing Commit Differences


$ git log
commit 1cfc8de547a1d4fb5eb411ec8c43dac372df183c (HEAD -> master)
...
commit 51cdbfac97d1037ef8926693f1b09a6b85191273

$ git diff 1cfc8de547a1d4fb5eb411ec8c43dac372df183c 51cdbfac97d1037ef8926693f1b09a6b85191273
diff --git a/index.html b/index.html
index f61eb65..3e23ae4 100644
--- a/index.html
+++ b/index.html
@@ -1,2 +1,2 @@
 
-hello world!
+hello

Removing and Renaming Files in Your Repository

As with an addition, a deletion requires two steps: git rm expresses your intent to remove the file and stages the change, and then git commit realizes the change in the repository. It's similar to rename a file : git mv then git commit.

Making a Copy of Your Repository

You can create a complete copy, or clone, of a repository using the git clone command.

Configuration Files

Git supports a hierarchy of configuration files:
  • .git/config
  • Repository-specific configuration settings manipulated with the --file option or by default.
    
    [core]
    	repositoryformatversion = 0
    	filemode = true
    	bare = false
    	logallrefupdates = true
    
  • ~/.gitconfig
  • User-specific configuration settings manipulated with the --global option.
    
    [user]
    	email = xxx@yyy.com
    	name = Bruce Lee
    [core]
    	editor = vi
    [color]
    	ui = auto
    [http]
    	postBuffer = 2428000
    
git config -l can be used to list the settings of all the variables in configuration files:

$ git config -l
user.email=xxx@yyy.com
user.name=Bruce Lee
core.editor=vi
color.ui=auto
http.postbuffer=2428000
core.repositoryformatversion=0
core.filemode=true
core.bare=false
core.logallrefupdates=true

4. Basic Git Concepts


Basic Concepts

Repositories

A Git repository is simply a database containing all the information needed to retain and manage the revisions and history of a project. Configuration settings are not propagated from one repository to another during a clone, or duplicating, operation. Git maintains two primary data structures, the object store and the index. All of this repository data is stored at the root of your working directory in a hidden subdirectory named .git.

object store

This contains your original data files and all the log messages, author information, dates, and other information required to rebuild any version or branch of the project. there are only 4 types of objects in the object store:
  • Blobs
  • Each version of a file is represented as a blob. A blob holds a file’s data but does not contain any metadata about the file or even its name.
  • Trees
  • A tree object represents one level of directory information. It can also recursively reference other (sub)tree objects and thus build a complete hierarchy of files and subdirectories.
  • Commits
  • A commit object holds metadata for each change introduced into the repository. Each commit points to a tree object that captures, in one complete snapshot, the state of the repository at the time the commit was performed.
  • Tags
  • A tag object assigns a human-readable name to a specific object, usually a commit.

Index

The index captures a version of the project’s overall structure at some moment in time.

Content-Addressable Names

Each object in the object store has a unique name produced by an SHA1 hash value of the content. Git users speak of SHA1, hash code, and sometimes object ID interchangeably.

Git Tracks Content

Git’s object store is based on the hashed computation of the contents of its objects. If two separate files located in two different directories have exactly the same content, Git stores a single copy of that content as a blob within the object store.

Pathname Versus Content

Git does not use filenames, git makes sure it can accurately reproduce the content of files and directories, which is indexed by hash value.

Object Store Pictures

  • The blob object is at the “bottom” of the data structure, it is only referenced by tre objects.
  • Tree objects point to blobs, and possibly to other trees as well. Any given tree object might be pointed at by many different commit objects.
  • A commit points to one particular tree
  • Each tag can point to at most one commit.
Consider a repository, each tree is represented by a triangle, a circle represents a commit.
  • after a single, initial commit added two files
  • Both the master branch and a tag named V1.0 point to the commit with ID 8675309 .
  • adding a new subdirectory with one file in it
  • The new commit has added one associated tree object with ID cafed00d to represent the total state of directory and file structure.
We can see that each commit only contains the differences from the last commit.

Git Concepts at Work

Inside the .git directory

Initialize an empty repository

$ mkdir hello
$ cd hello
$ git init
Initialized empty Git repository in /home/jerry/test/git/hello/.git/

$ find .git/objects
.git/objects
.git/objects/pack
.git/objects/info

Create a simple object and stage it:

$ echo "hello world" > hello.txt
$ git add hello.txt

Then, your objects directory should contain additional 2 files:

.git/objects/3b
.git/objects/3b/18e512dba79e4c8300dd08aeb37f8e728b8dad

Objects, Hashes, and Blobs

At the core of Git is a simple key-value data store.
You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time.
The hash in the above case is 3b18e512dba79e4c8300dd08aeb37f8e728b8dad .
Git inserts a / after the first two digits to improve filesystem efficiency. (an easy way to create a fixed, 256-way partitioning of the namespace for all possible objects with an even distribution.) git-cat-file can provide content or type and size information for repository objects.

$ git cat-file -p 3b18e512dba79e4c8300dd08aeb37f8e728b8dad
hello world

Files and Trees

Git stores content in a manner similar to a UNIX filesystem.
All the content is stored as tree and blob objects, with trees corresponding to UNIX directory entries and blobs corresponding more or less to inodes or file contents.
A single tree object contains one or more tree entries, each of which contains a SHA-1 pointer to a blob or subtree with its associated mode, type, and filename.
For example, the most recent tree in a project may look something like this:

$ git cat-file -p master^{tree}
100644 blob a906cb2a4a904a152e80877d4088654daad0c859      README
100644 blob 8f94139338f9404f26296befa88755fc2598c289      Rakefile
040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0      lib

$ git cat-file -p 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0
100644 blob 47c6340d6459e05787f644c2447d2595f5d3a54b      simplegit.rb

The git “index” is where you place files you want committed to the git repository.
Before you “commit” (checkin) files to the git repository, you need to first place the files in the git “index”.
git add 會將檔案加入 index, index 是一個二進位檔案,通常放在 .git/index,其中包含路徑名稱的排序列表、每個路徑名稱的權限和 blob 物件的 SHA-1 值。 而 git ls-files 指令可顯示 index 的內容。

$ git ls-files -s
100644 3b18e512dba79e4c8300dd08aeb37f8e728b8dad  0  hello.txt

-s,--stage:Show staged contents' mode bits(檔案權限的八進位表示法), object name, stage number和檔案名稱
Each time you run commands such as git add , git rm , or git mv , Git updates the index with the new pathname and blob information.
Whenever you want, you can use the git write-tree command to write the staging area out to a tree object.
In real life, you can (and should!) skip the low-level git write-tree and git ommit-tree steps and just use the git commit command.

Commits

The format for a commit object is simple:
  • the top-level tree for the snapshot of the project at that point
  • the author/committer information (which uses your user.name and user.email configuration settings and a timestamp)
  • a blank line, and then the commit message

Tags

There are two basic tag types:
  • lightweight
  • Lightweight tags are simply references to a commit object. These tags do not create a permanent object in the object store.
  • annotated
  • An annotated tag creates an object.
Git treats both lightweight and annotated tag names equivalently for the purposes of naming a commit.

5. File Management and the Index


A commit is a two-step process: stage your changes and commit the changes.
The index is the layer between the working directory and the repository to stage, or collect changes.
When you run git commit , Git checks the index rather than your working directory to discover what to commit.
You can query the state of the index at any time with git status .
git diff displays the changes that remain in your working directory and are not staged;
git diff --cached shows changes that are staged and will therefore contribute to your next commit.

File Classifications in Git

Git classifies your files into three groups:
  • Tracked
  • A tracked file is any file already in the repository or any file that is staged in the index.
  • Ignored
  • Git maintains a default list of files to ignore, and you can configure your repository to recognize others.
  • Untracked

Using git add

The command git add stages a file.
In terms of Git’s file classifications, if a file is untracked, git add converts that file’s status to tracked. When git add is used on a directory name, all of the files and subdirectories beneath it are staged recursively.
The entirety of each file, at the moment you issued git add , was copied into the object store and indexed by its resulting SHA1 name.
Staging a file is also called “caching a file” † or “putting a file in the index.”

Some Notes on Using git commit

Using git commit --all

The -a or --all option to git commit causes it to automatically stage all unstaged, tracked file changes before it performs the commit.

Using git rm

Any versions of the file that are part of history already committed in the repository remain in the object store and retain that history.
Git will remove a file only from the index or from the index and working directory simultaneously.
Git will not remove a file from just the working directory; the regular rm command may be used for that purpose.

Using git mv

Suppose you need to move or rename a file.

The .gitignore File



6. Commits


When a commit occurs, Git records a snapshot of the index and places that snapshot in the object store. This snapshot does not contain a copy of every file and directory in the index, Git creates new blobs for any file that has changed and new trees for any directory that has changed, and it reuses any blob or tree object that has not changed.
A commit is the only method of introducing changes to a repository, and any change in the repository must be introduced by a commit.

Atomic Changesets

Every Git commit represents a single, atomic changeset with respect to the previous state.

Identifying Commits

The unique, 40-hex-digit SHA1 commit ID is an explicit reference, while HEAD , which always points to the most recent commit, is an implied reference.
Git provides many different mechanisms for naming a commit.

Absolute Commit Names

The hash ID is an absolute name. Each commit ID is globally unique.
Git allows you to shorten this hash ID to a unique prefix within a repository’s object database.

$ git log --oneline
1cfc8de (HEAD -> master) change it twice
51cdbfa Initial contents of public_html

git takes author, date information off, also only keep 7 characters from the original hash ID.
To get the log for a commit:

$ git log 51cdbfa
commit 51cdbfac97d1037ef8926693f1b09a6b85191273
Author: XXX <yyy@gmail.com>
Date:   Sat Jun 27 09:59:29 2020 +0800

    Initial contents of public_html
    

refs and symrefs

A ref is a SHA1 hash ID that refers to an object within the Git object store.
Local topic branch names, remote tracking branch names, and tag names are all refs.

A symbolic reference, or symref, is a name that indirectly points to a Git object. It is still just a ref.
Each symbolic ref has an explicit, full name that begins with refs/ and each is stored hierarchically within the repository in the .git/refs/ directory. There are basically three different namespaces represented in .git/refs/ :


.git/refs
├── heads
│   └── zeus
├── remotes
│   └── origin
│       └── HEAD
└── tags

  • heads
  • for your local branches. For ex., a local topic branch named dev is really a short form of refs/heads/dev .
  • remotes
  • for your remote tracking branches. For ex., origin/master really names refs/remotes/origin/master .
  • tags
  • for your tags. For ex., v2.6.23 is short for refs/tags/v2.6.23 .
You can use either a full ref name or its abbreviation.

Git maintains several special symrefs automatically,

  • HEAD
  • HEAD always refers to the most recent commit on the current branch. When you change branches, HEAD is updated to refer to the new branch’s latest commit.
  • ORIG_HEAD
  • Certain operations, such as merge and reset, git will record the previous version of HEAD in ORIG_HEAD before thoses operations. You can use ORIG_HEAD to recover or revert to the previous state or to make a comparison.
  • FETCH_HEAD
  • When remote repositories are used, git fetch records the heads of all branches fetched in the file .git/FETCH_HEAD. FETCH_HEAD is a shorthand for the head of the last branch fetched and is only valid immediately after a fetch operation.
  • MERGE_HEAD
  • When a merge is in progress, MERGE_HEAD is the commit ID that is being merged into HEAD .

Relative Commit Names

Except for the first root commit, each commit is derived from at least one earlier commit.
The direct ancestor commits are called parent commits.
For a commit to have multiple parent commits, it must be the result of a merge operation. As a result, there is a parent commit for each branch contributing to a merge commit.
The ~(tilde) and ^(caret) symbols are used to point to a position relative to a specific commit:
  • The tilde symbol (~) is used to select a different ancestral parent
  • ~n refers to the n-th grandparent.
    Given the commit C , C~1 is the first parent, C~2 is the first grandparent, and C~3 is the first great-grandparent.
  • The caret symbol (^) is used to select a different parent.
  • ^n refers to the the n-th parent.
    Given a commit, C , C^1 is the first parent, C^2 is the second parent, C^3 is the third parent, and so on.

 HEAD~3 ---> HEAD~2 ---> HEAD~1 ---> HEAD
             HEAD^1~1    HEAD^1        |
                           |           |
                           |           |
             HEAD~1^2 -----+           |
                                       |
                       ---HEAD^2-------+

Using the command git show-branch, , you can inspect the graph history and examine a complex branch merge structure:

  
$ git show-branch --more=35 | tail -10
-- [master~15] Merge branch 'maint'
-- [master~3^2^] Merge branch 'maint-1.5.4' into maint
+* [master~3^2^2^] wt-status.h: declare global variables as extern
-- [master~3^2~2] Merge branch 'maint-1.5.4' into maint
-- [master~16] Merge branch 'lt/core-optim'
+* [master~16^2] Optimize symlink/directory detection
+* [master~17] rev-parse --verify: do not output anything on error
+* [master~18] rev-parse: fix using "--default" with "--verify"
+* [master~19] rev-parse: add test script for "--verify"
+* [master~20] Add svn-compatible "blame" output format to git-svn

the output is limited to the final 10 lines.
In this example, a merge took place between master~15 and master~16 that introduced a couple of other merges as well as a simple commit named master~3^2^2^ .
One common usage of git rev-parse is to print the commit ID given a revision specifier. You can use it to get the commit ID:
$ git rev-parse master~3^2^2^
32efcd91c6505ae28f87c0e9a3e2b3c0115017d8

Commit History

Viewing Old Commits

git log acts like git log HEAD , printing the log message associated with every commit in your history that is reachable from HEAD .
If you supply a commit for git log, the log starts at the named commit and works backward.
Typically, a limited history is more informative. One technique to constrain history is to specify a commit range:
$ git log master~12..master~10
Here, git log shows the commits between master~12 and master~10 , or the 10-th and 11-th prior commits on the master branch.
To print the patch, or changes, introduced by the commit:
$ git log -1 -p 4fe86488
Notice the option -1 as well: it restricts the output to a single commit.

Commit Graphs

Commit Ranges

A range is denoted with a double-period ( .. ), as in start..end , where start and end may be some forms of commit.
For ex., the range master~12..master~10 to specify the 11-th and 10-th prior commits on the master branch.

Finding Commits

Using git bisect

git-bisect uses binary search to find the commit that introduced a bug.
To start, you first need to identify a good commit and a bad commit.

Using git blame

Using Pickaxe

7. Branches


Reasons for Using Branches


8. Diffs .



9. Merges



10. Altering Commits




11. Remote Repositories




12. Repository Management




13. Patches




14. Hooks



15. Combining Projects



16. Using Git with Subversion Repositories






留言

熱門文章