Version Control with Git
Version Control with Git
byJon Loeliger
1. Introduction
2. Installing Git
3. Getting Started
The Git Command Line
Initially, Git was provided as a suite of many simple commands, such as git-commit.
Now, it is the single git executable and affix a subcommand.
That said, both forms, git commit and git-commit , are identical.
- List most common subcommands
$ git help
$ git help --all
Quick Introduction to Using Git
Creating an Initial Repository
$ mkdir public_html $ cd public_html $ echo 'My website is alive!' > index.html $ git init Initialized empty Git repository in /home/jerry/test/public_html/.git/the git init command creates a hidden directory, called .git, at the top level of your project. Git places all its revision information in this one top-level .git directory.
Initially, each Git repository is empty.
Adding a File to Your Repository
$ git add index.html $ git status On branch master No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: index.htmlGit has merely staged the file, but it is not permanent in the repository. This let the commit can be “batched”. The next commit will include the changes staged.
$ git commit -m "Initial contents of public_html"Tou can also input the commit messages via a text console. To set the text editor to be used:
- bash $ export GIT_EDITOR=vi
- tcsh $ setenv GIT_EDITOR emacs
Configuring the Commit Author
$ git config user.name "Jon Loeliger" $ git config user.email "jdl@example.com"
Viewing Your Commits
The command git log yields a sequential history of the individual commits within the repository:$ git log commit 51cdbfac97d1037ef8926693f1b09a6b85191273 (HEAD -> master) Author: XXX<xxx@yyy> Date: Sat Jun 27 09:59:29 2020 +0800 Initial contents of public_htmlTo see more detail about a particular commit, use git show with a commit number:
$ git show 51cdbfac97d1037ef8926693f1b09a6b85191273 commit 51cdbfac97d1037ef8926693f1b09a6b85191273 (HEAD -> master) Author: XXX<xxx@yyy> Date: Sat Jun 27 09:59:29 2020 +0800 Initial contents of public_html diff --git a/index.html b/index.html new file mode 100644 index 0000000..3e23ae4 --- /dev/null +++ b/index.html @@ -0,0 +1,2 @@ + +helloIf you run git show without an explicit commit number, it simply shows the details of the most recent commit. git show-branch --more=10 , provides concise one-line summaries for the current development branch:
Viewing Commit Differences
$ git log commit 1cfc8de547a1d4fb5eb411ec8c43dac372df183c (HEAD -> master) ... commit 51cdbfac97d1037ef8926693f1b09a6b85191273 $ git diff 1cfc8de547a1d4fb5eb411ec8c43dac372df183c 51cdbfac97d1037ef8926693f1b09a6b85191273 diff --git a/index.html b/index.html index f61eb65..3e23ae4 100644 --- a/index.html +++ b/index.html @@ -1,2 +1,2 @@ -hello world! +hello
Removing and Renaming Files in Your Repository
As with an addition, a deletion requires two steps: git rm expresses your intent to remove the file and stages the change, and then git commit realizes the change in the repository. It's similar to rename a file : git mv then git commit.Making a Copy of Your Repository
You can create a complete copy, or clone, of a repository using the git clone command.Configuration Files
Git supports a hierarchy of configuration files:- .git/config Repository-specific configuration settings manipulated with the --file option or by default.
[core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true
[user] email = xxx@yyy.com name = Bruce Lee [core] editor = vi [color] ui = auto [http] postBuffer = 2428000git config -l can be used to list the settings of all the variables in configuration files:
$ git config -l user.email=xxx@yyy.com user.name=Bruce Lee core.editor=vi color.ui=auto http.postbuffer=2428000 core.repositoryformatversion=0 core.filemode=true core.bare=false core.logallrefupdates=true
4. Basic Git Concepts
Basic Concepts
Repositories
A Git repository is simply a database containing all the information needed to retain and manage the revisions and history of a project. Configuration settings are not propagated from one repository to another during a clone, or duplicating, operation. Git maintains two primary data structures, the object store and the index. All of this repository data is stored at the root of your working directory in a hidden subdirectory named .git.object store
This contains your original data files and all the log messages, author information, dates, and other information required to rebuild any version or branch of the project. there are only 4 types of objects in the object store:- Blobs Each version of a file is represented as a blob. A blob holds a file’s data but does not contain any metadata about the file or even its name.
- Trees A tree object represents one level of directory information. It can also recursively reference other (sub)tree objects and thus build a complete hierarchy of files and subdirectories.
- Commits A commit object holds metadata for each change introduced into the repository. Each commit points to a tree object that captures, in one complete snapshot, the state of the repository at the time the commit was performed.
- Tags A tag object assigns a human-readable name to a specific object, usually a commit.
Index
The index captures a version of the project’s overall structure at some moment in time.Content-Addressable Names
Each object in the object store has a unique name produced by an SHA1 hash value of the content. Git users speak of SHA1, hash code, and sometimes object ID interchangeably.Git Tracks Content
Git’s object store is based on the hashed computation of the contents of its objects. If two separate files located in two different directories have exactly the same content, Git stores a single copy of that content as a blob within the object store.Pathname Versus Content
Git does not use filenames, git makes sure it can accurately reproduce the content of files and directories, which is indexed by hash value.Object Store Pictures
- The blob object is at the “bottom” of the data structure, it is only referenced by tre objects.
- Tree objects point to blobs, and possibly to other trees as well. Any given tree object might be pointed at by many different commit objects.
- A commit points to one particular tree
- Each tag can point to at most one commit.
- after a single, initial commit added two files Both the master branch and a tag named V1.0 point to the commit with ID 8675309 .
- adding a new subdirectory with one file in it The new commit has added one associated tree object with ID cafed00d to represent the total state of directory and file structure.
Git Concepts at Work
Inside the .git directory
Initialize an empty repository$ mkdir hello $ cd hello $ git init Initialized empty Git repository in /home/jerry/test/git/hello/.git/ $ find .git/objects .git/objects .git/objects/pack .git/objects/infoCreate a simple object and stage it:
$ echo "hello world" > hello.txt $ git add hello.txtThen, your objects directory should contain additional 2 files:
.git/objects/3b .git/objects/3b/18e512dba79e4c8300dd08aeb37f8e728b8dad
Objects, Hashes, and Blobs
At the core of Git is a simple key-value data store.You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time.
The hash in the above case is 3b18e512dba79e4c8300dd08aeb37f8e728b8dad .
Git inserts a / after the first two digits to improve filesystem efficiency. (an easy way to create a fixed, 256-way partitioning of the namespace for all possible objects with an even distribution.) git-cat-file can provide content or type and size information for repository objects.
$ git cat-file -p 3b18e512dba79e4c8300dd08aeb37f8e728b8dad hello world
Files and Trees
Git stores content in a manner similar to a UNIX filesystem.All the content is stored as tree and blob objects, with trees corresponding to UNIX directory entries and blobs corresponding more or less to inodes or file contents.
A single tree object contains one or more tree entries, each of which contains a SHA-1 pointer to a blob or subtree with its associated mode, type, and filename.
For example, the most recent tree in a project may look something like this:
$ git cat-file -p master^{tree} 100644 blob a906cb2a4a904a152e80877d4088654daad0c859 README 100644 blob 8f94139338f9404f26296befa88755fc2598c289 Rakefile 040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0 lib $ git cat-file -p 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0 100644 blob 47c6340d6459e05787f644c2447d2595f5d3a54b simplegit.rbThe git “index” is where you place files you want committed to the git repository.
Before you “commit” (checkin) files to the git repository, you need to first place the files in the git “index”.
git add 會將檔案加入 index, index 是一個二進位檔案,通常放在 .git/index,其中包含路徑名稱的排序列表、每個路徑名稱的權限和 blob 物件的 SHA-1 值。 而 git ls-files 指令可顯示 index 的內容。
$ git ls-files -s 100644 3b18e512dba79e4c8300dd08aeb37f8e728b8dad 0 hello.txt-s,--stage:Show staged contents' mode bits(檔案權限的八進位表示法), object name, stage number和檔案名稱
Each time you run commands such as git add , git rm , or git mv , Git updates the index with the new pathname and blob information.
Whenever you want, you can use the git write-tree command to write the staging area out to a tree object.
In real life, you can (and should!) skip the low-level git write-tree and git ommit-tree steps and just use the git commit command.
Commits
The format for a commit object is simple:- the top-level tree for the snapshot of the project at that point
- the author/committer information (which uses your user.name and user.email configuration settings and a timestamp)
- a blank line, and then the commit message
Tags
There are two basic tag types:- lightweight Lightweight tags are simply references to a commit object. These tags do not create a permanent object in the object store.
- annotated An annotated tag creates an object.
5. File Management and the Index
A commit is a two-step process: stage your changes and commit the changes.
The index is the layer between the working directory and the repository to stage, or collect changes.
When you run git commit , Git checks the index rather than your working directory to discover what to commit.
You can query the state of the index at any time with git status .
git diff displays the changes that remain in your working directory and are not staged;
git diff --cached shows changes that are staged and will therefore contribute to your next commit.
File Classifications in Git
Git classifies your files into three groups:- Tracked A tracked file is any file already in the repository or any file that is staged in the index.
- Ignored Git maintains a default list of files to ignore, and you can configure your repository to recognize others.
- Untracked
Using git add
The command git add stages a file.In terms of Git’s file classifications, if a file is untracked, git add converts that file’s status to tracked. When git add is used on a directory name, all of the files and subdirectories beneath it are staged recursively.
The entirety of each file, at the moment you issued git add , was copied into the object store and indexed by its resulting SHA1 name.
Staging a file is also called “caching a file” † or “putting a file in the index.”
Some Notes on Using git commit
Using git commit --all
The -a or --all option to git commit causes it to automatically stage all unstaged, tracked file changes before it performs the commit.Using git rm
Any versions of the file that are part of history already committed in the repository remain in the object store and retain that history.Git will remove a file only from the index or from the index and working directory simultaneously.
Git will not remove a file from just the working directory; the regular rm command may be used for that purpose.
Using git mv
Suppose you need to move or rename a file.The .gitignore File
6. Commits
When a commit occurs, Git records a snapshot of the index and places that snapshot in the object store. This snapshot does not contain a copy of every file and directory in the index, Git creates new blobs for any file that has changed and new trees for any directory that has changed, and it reuses any blob or tree object that has not changed.
A commit is the only method of introducing changes to a repository, and any change in the repository must be introduced by a commit.
Atomic Changesets
Every Git commit represents a single, atomic changeset with respect to the previous state.Identifying Commits
The unique, 40-hex-digit SHA1 commit ID is an explicit reference, while HEAD , which always points to the most recent commit, is an implied reference.Git provides many different mechanisms for naming a commit.
Absolute Commit Names
The hash ID is an absolute name. Each commit ID is globally unique.Git allows you to shorten this hash ID to a unique prefix within a repository’s object database.
$ git log --oneline 1cfc8de (HEAD -> master) change it twice 51cdbfa Initial contents of public_htmlgit takes author, date information off, also only keep 7 characters from the original hash ID.
To get the log for a commit:
$ git log 51cdbfa commit 51cdbfac97d1037ef8926693f1b09a6b85191273 Author: XXX <yyy@gmail.com> Date: Sat Jun 27 09:59:29 2020 +0800 Initial contents of public_html
refs and symrefs
A ref is a SHA1 hash ID that refers to an object within the Git object store.Local topic branch names, remote tracking branch names, and tag names are all refs.
A symbolic reference, or symref, is a name that indirectly points to a Git object. It is still just a ref.
Each symbolic ref has an explicit, full name that begins with refs/ and each is stored hierarchically within the repository in the .git/refs/ directory.
There are basically three different namespaces represented in .git/refs/ :
.git/refs ├── heads │ └── zeus ├── remotes │ └── origin │ └── HEAD └── tags
- heads for your local branches. For ex., a local topic branch named dev is really a short form of refs/heads/dev .
- remotes for your remote tracking branches. For ex., origin/master really names refs/remotes/origin/master .
- tags for your tags. For ex., v2.6.23 is short for refs/tags/v2.6.23 .
Git maintains several special symrefs automatically,
- HEAD HEAD always refers to the most recent commit on the current branch. When you change branches, HEAD is updated to refer to the new branch’s latest commit.
- ORIG_HEAD Certain operations, such as merge and reset, git will record the previous version of HEAD in ORIG_HEAD before thoses operations. You can use ORIG_HEAD to recover or revert to the previous state or to make a comparison.
- FETCH_HEAD When remote repositories are used, git fetch records the heads of all branches fetched in the file .git/FETCH_HEAD. FETCH_HEAD is a shorthand for the head of the last branch fetched and is only valid immediately after a fetch operation.
- MERGE_HEAD When a merge is in progress, MERGE_HEAD is the commit ID that is being merged into HEAD .
Relative Commit Names
Except for the first root commit, each commit is derived from at least one earlier commit.The direct ancestor commits are called parent commits.
For a commit to have multiple parent commits, it must be the result of a merge operation. As a result, there is a parent commit for each branch contributing to a merge commit.
The ~(tilde) and ^(caret) symbols are used to point to a position relative to a specific commit:
- The tilde symbol (~) is used to select a different ancestral parent ~n refers to the n-th grandparent.
- The caret symbol (^) is used to select a different parent. ^n refers to the the n-th parent.
Given the commit C , C~1 is the first parent, C~2 is the first grandparent, and C~3 is the first great-grandparent.
Given a commit, C , C^1 is the first parent, C^2 is the second parent, C^3 is the third parent, and so on.
HEAD~3 ---> HEAD~2 ---> HEAD~1 ---> HEAD HEAD^1~1 HEAD^1 | | | | | HEAD~1^2 -----+ | | ---HEAD^2-------+
Using the command git show-branch, , you can inspect the graph history and examine a complex branch merge structure:
$ git show-branch --more=35 | tail -10 -- [master~15] Merge branch 'maint' -- [master~3^2^] Merge branch 'maint-1.5.4' into maint +* [master~3^2^2^] wt-status.h: declare global variables as extern -- [master~3^2~2] Merge branch 'maint-1.5.4' into maint -- [master~16] Merge branch 'lt/core-optim' +* [master~16^2] Optimize symlink/directory detection +* [master~17] rev-parse --verify: do not output anything on error +* [master~18] rev-parse: fix using "--default" with "--verify" +* [master~19] rev-parse: add test script for "--verify" +* [master~20] Add svn-compatible "blame" output format to git-svn
the output is limited to the final 10 lines.
In this example, a merge took place between master~15 and master~16 that introduced a couple of other merges as well as a simple commit named master~3^2^2^ .
One common usage of git rev-parse is to print the commit ID given a revision specifier. You can use it to get the commit ID:
$ git rev-parse master~3^2^2^ 32efcd91c6505ae28f87c0e9a3e2b3c0115017d8
Commit History
Viewing Old Commits
git log acts like git log HEAD , printing the log message associated with every commit in your history that is reachable from HEAD .If you supply a commit for git log, the log starts at the named commit and works backward.
Typically, a limited history is more informative. One technique to constrain history is to specify a commit range:
$ git log master~12..master~10Here, git log shows the commits between master~12 and master~10 , or the 10-th and 11-th prior commits on the master branch.
To print the patch, or changes, introduced by the commit:
$ git log -1 -p 4fe86488Notice the option -1 as well: it restricts the output to a single commit.
Commit Graphs
Commit Ranges
A range is denoted with a double-period ( .. ), as in start..end , where start and end may be some forms of commit.For ex., the range master~12..master~10 to specify the 11-th and 10-th prior commits on the master branch.
Finding Commits
Using git bisect
git-bisect uses binary search to find the commit that introduced a bug.To start, you first need to identify a good commit and a bad commit.
留言