Is it possible to do a sparse checkout without checking out the whole repository first?

GitSparse CheckoutGit Sparse-Checkout

Git Problem Overview


I'm working with a repository with a very large number of files that takes hours to checkout. I'm looking into the possibility of whether Git would work well with this kind of repository now that it supports sparse checkouts but every example that I can find does the following:

git clone <path>
git config core.sparsecheckout true
echo <dir> > .git/info/sparse-checkout
git read-tree -m -u HEAD

The problem with this sequence of commands is the original clone also does a checkout. If you add -n to the original clone command, then the read-tree command results in the following error:

error: Sparse checkout leaves no entry on working directory

How can do the sparse checkout without checking out all the files first?

Git Solutions


Solution 1 - Git

Please note that this answer does download a complete copy of the data from a repository. The git remote add -f command will clone the whole repository. From the man page of git-remote:

> With -f option, git fetch <name> is run immediately after the remote information is set up.


Try this:

mkdir myrepo
cd myrepo
git init
git config core.sparseCheckout true
git remote add -f origin git://...
echo "path/within_repo/to/desired_subdir/*" > .git/info/sparse-checkout
git checkout [branchname] # ex: master

Now you will find that you have a "pruned" checkout with only files from path/within_repo/to/desired_subdir present (and in that path).

Note that on windows command line you must not quote the path, i.e. you must change the 6th command with this one:

echo path/within_repo/to/desired_subdir/* > .git/info/sparse-checkout

if you don't you'll get the quotes in the sparse-checkout file, and it will not work

Solution 2 - Git

In 2020 there is a simpler way to deal with sparse-checkout without having to worry about .git files. Here is how I did it:

git clone <URL> --no-checkout <directory>
cd <directory>
git sparse-checkout init --cone # to fetch only root files
git sparse-checkout set apps/my_app libs/my_lib # etc, to list sub-folders to checkout
git checkout # or git switch

Note that it requires git version 2.25 installed. Read more about it here: https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/

UPDATE:

The above git clone command will still clone the repo with its full history, though without checking the files out. If you don't need the full history, you can add --depth parameter to the command, like this:

# create a shallow clone,
# with only 1 (since depth equals 1) latest commit in history
git clone <URL> --no-checkout <directory> --depth 1

Solution 3 - Git

Git clone has an option (--no-checkout or -n) that does what you want.

In your list of commands, just change:

git clone <path>

To this:

git clone --no-checkout <path>

You can then use the sparse checkout as stated in the question.

Solution 4 - Git

I had a similar use case, except I wanted to checkout only the commit for a tag and prune the directories. Using --depth 1 makes it really sparse and can really speed things up.

mkdir myrepo
cd myrepo
git init
git config core.sparseCheckout true
git remote add origin <url>  # Note: no -f option
echo "path/within_repo/to/subdir/" > .git/info/sparse-checkout
git fetch --depth 1 origin tag <tagname>
git checkout <tagname>

Solution 5 - Git

Works in git 2.28

git clone --filter=blob:none --no-checkout --depth 1 --sparse <project-url>
cd <project>
git sparse-checkout init --cone

Specify the files and folders you want to clone

git sparse-checkout add <folder>/<innerfolder> <folder2>/<innerfolder2>
git checkout

Solution 6 - Git

I found the answer I was looking for from the one-liner posted earlier by pavek (thanks!) so I wanted to provide a complete answer in a single reply that works on Linux (GIT 1.7.1):

1--> mkdir myrepo
2--> cd myrepo
3--> git init
4--> git config core.sparseCheckout true
5--> echo 'path/to/subdir/' > .git/info/sparse-checkout
6--> git remote add -f origin ssh://...
7--> git pull origin master

I changed the order of the commands a bit but that does not seem to have any impact. The key is the presence of the trailing slash "/" at the end of the path in step 5.

Solution 7 - Git

Sadly none of the above worked for me so I spent very long time trying different combination of sparse-checkout file.

In my case I wanted to skip folders with IntelliJ IDEA configs.

Here is what I did:


Run git clone https://github.com/myaccount/myrepo.git --no-checkout

Run git config core.sparsecheckout true

Created .git\info\sparse-checkout with following content

!.idea/*
!.idea_modules/*
/*

Run 'git checkout --' to get all files.


Critical thing to make it work was to add /* after folder's name.

I have git 1.9

Solution 8 - Git

Updated answer 2020:

There is now a command git sparse-checkout, that I present in detail with Git 2.25 (Q1 2020)

nicono's answer illustrates its usage:

git sparse-checkout init --cone # to fetch only root files
git sparse-checkout add apps/my_app
git sparse-checkout add libs/my_lib

It has evolved with Git 2.27 and knows how to "reapply" a sparse checkout, as in here.
Note that with Git 2.28, git status will mention that you are in a sparse-checked-out repository


Note/Warning: Certain sparse-checkout patterns that are valid in non-cone mode led to segfault in cone mode, which has been corrected with Git 2.35 (Q1 2022).

See commit a3eca58, commit 391c3a1, commit a481d43 (16 Dec 2021) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 09481fe, 10 Jan 2022)

> ## sparse-checkout: refuse to add to bad patterns
> Reviewed-by: Elijah Newren
> Signed-off-by: Derrick Stolee

> When in cone mode sparse-checkout, it is unclear how 'git sparse-checkout'(man) add ... should behave if the existing sparse-checkout file does not match the cone mode patterns.
> Change the behavior to fail with an error message about the existing patterns.
> > Also, all cone mode patterns start with a '/' character, so add that restriction.
> This is necessary for our example test 'cone mode: warn on bad pattern', but also requires modifying the example sparse-checkout file we use to test the warnings related to recognizing cone mode patterns.
> > This error checking would cause a failure further down the test script because of a test that adds non-cone mode patterns without cleaning them up.
> Perform that cleanup as part of the test now.


With Git 2.36 (Q2 2022), "git sparse-checkout"(man) wants to work with per-worktree configuration, but did not work well in a worktree attached to a bare repository.

See commit 3ce1138, commit 5325591, commit 7316dc5, commit fe18733, commit 615a84a, commit 5c11c0d (07 Feb 2022) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 6249ce2, 25 Feb 2022)

> ## worktree: copy sparse-checkout patterns and config on add
> Signed-off-by: Derrick Stolee
> Reviewed-by: Elijah Newren

> When adding a new worktree, it is reasonable to expect that we want to use the current set of sparse-checkout settings for that new worktree.
> This is particularly important for repositories where the worktree would become too large to be useful.
> This is even more important when using partial clone as well, since we want to avoid downloading the missing blobs for files that should not be written to the new worktree.
> > The only way to create such a worktree without this intermediate step of expanding the full worktree is to copy the sparse-checkout patterns and config settings during 'git worktree add'(man).
> Each worktree has its own sparse-checkout patterns, and the default behavior when the sparse-checkout file is missing is to include all paths at HEAD.
> Thus, we need to have patterns from somewhere, they might as well be the current worktree's patterns.
> These are then modified independently in the future.
> > In addition to the sparse-checkout file, copy the worktree config file if worktree config is enabled and the file exists.
> This will copy over any important settings to ensure the new worktree behaves the same as the current one.
> The only exception we must continue to make is that core.bare and core.worktree should become unset in the worktree's config file.


Original answer: 2016

git 2.9 (June 2016) will generalize the --no-checkout option to git worktree add (the command which allows to works with multiple working trees for one repo)

See commit ef2a0ac (29 Mar 2016) by Ray Zhang (OneRaynyDay).
Helped-by: Eric Sunshine (sunshineco), and Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit 0d8683c, 13 Apr 2016)

The git worktree man page now includes:

--[no-]checkout:

> By default, add checks out <branch>, however, --no-checkout can be used to suppress checkout in order to make customizations, such as configuring sparse-checkout.

Solution 9 - Git

Yes, Possible to download a folder instead of downloading the whole repository. Even any/last commit

Nice way to do this

D:\Lab>git svn clone https://github.com/Qamar4P/LolAdapter.git/trunk/lol-adapter -r HEAD
  1. -r HEAD will only download last revision, ignore all history.

  2. Note trunk and /specific-folder

Copy and change URL before and after /trunk/. I hope this will help someone. Enjoy :)

Updated on 26 Sep 2019

Solution 10 - Git

Based on this answer by apenwarr and this comment by Miral I came up with the following solution which saved me nearly 94% of disk space when cloning the linux git repository locally while only wanting one Documentation subdirectory:

$ cd linux
$ du -sh .git .
2.1G    .git
894M    .
$ du -sh 
2.9G    .
$ mkdir ../linux-sparse-test
$ cd ../linux-sparse-test
$ git init
Initialized empty Git repository in /…/linux-sparse-test/.git/
$ git config core.sparseCheckout true
$ git remote add origin ../linux
# Parameter "origin master" saves a tiny bit if there are other branches
$ git fetch --depth=1 origin master
remote: Enumerating objects: 65839, done.
remote: Counting objects: 100% (65839/65839), done.
remote: Compressing objects: 100% (61140/61140), done.
remote: Total 65839 (delta 6202), reused 22590 (delta 3703)
Receiving objects: 100% (65839/65839), 173.09 MiB | 10.05 MiB/s, done.
Resolving deltas: 100% (6202/6202), done.
From ../linux
 * branch              master     -> FETCH_HEAD
 * [new branch]        master     -> origin/master
$ echo "Documentation/hid/*" > .git/info/sparse-checkout
$ git checkout master
Branch 'master' set up to track remote branch 'master' from 'origin'.
Already on 'master'
$ ls -l
total 4
drwxr-xr-x 3 abe abe 4096 May  3 14:12 Documentation/
$  du -sh .git .
181M    .git
100K    .
$  du -sh
182M    .

So I got down from 2.9GB to 182MB which is already quiet nice.

I though didn't get this to work with git clone --depth 1 --no-checkout --filter=blob:none file:///…/linux linux-sparse-test (hinted here) as then the missing files were all added as removed files to the index. So if anyone knows the equivalent of git clone --filter=blob:none for git fetch, we can probably save some more megabytes. (Reading the man page of git-rev-list also hints that there is something like --filter=sparse:path=…, but I didn't get that to work either.

(All tried with git 2.20.1 from Debian Buster.)

Solution 11 - Git

Steps to sparse checkout only specific folder:

1) git clone --no-checkout  <project clone url>  
2) cd <project folder>
3) git config core.sparsecheckout true   [You must do this]
4) echo "<path you want to sparce>/*" > .git/info/sparse-checkout
    [You must enter /* at the end of the path such that it will take all contents of that folder]
5) git checkout <branch name> [Ex: master]

Solution 12 - Git

I'm new to git but it seems that if I do git checkout for each directory then it works. Also, the sparse-checkout file needs to have a trailing slash after every directory as indicated. Someone more experience please confirm that this will work.

Interestingly, if you checkout a directory not in the sparse-checkout file it seems to make no difference. They don't show up in git status and git read-tree -m -u HEAD doesn't cause it to be removed. git reset --hard doesn't cause the directory to be removed either. Anyone more experienced care to comment on what git thinks of directories that are checked out but which are not in the sparse checkout file?

Solution 13 - Git

In git 2.27, it looks like git sparse checkout has evolved. Solution in this answer does not work exactly the same way (compared to git 2.25)

> > > > > git clone --no-checkout > cd > git sparse-checkout init --cone # to fetch only root files > git sparse-checkout set apps/my_app libs/my_lib # etc, to list sub-folders to checkout > # they are checked out immediately after this command, no need to run git pull > >

These commands worked better:

git clone --sparse <URL> <directory>
cd <directory>
git sparse-checkout init --cone # to fetch only root files
git sparse-checkout add apps/my_app
git sparse-checkout add libs/my_lib

See also : git-clone --sparse and git-sparse-checkout add

Solution 14 - Git

In my case, I want to skip the Pods folder when cloning the project. I did step by step like below and it works for me. Hope it helps.

mkdir my_folder
cd my_folder
git init
git remote add origin -f <URL>
git config core.sparseCheckout true 
echo '!Pods/*\n/*' > .git/info/sparse-checkout
git pull origin master

Memo, If you want to skip more folders, just add more line in sparse-checkout file.

Solution 15 - Git

I took this from TypeScript definitions library @types:

Let's say the repo has this structure:

types/
|_ identity/
|_ etc...

Your goal: Checkout identity/ folder ONLY. With all its contents including subfolders.

⚠️ This requires minimum git version 2.27.0, which is likely newer than the default on most machines. More complicated procedures are available in older versions, but not covered by this guide.

git clone --sparse --filter=blob:none --depth=1 <source-repo-url>
git sparse-checkout add types/identity types/identity ...

This will check out the types/identity folder to your local machine.

--sparse initializes the sparse-checkout file so the working directory starts with only the files in the root of the repository.

--filter=blob:none will exclude files, fetching them only as needed.

--depth=1 will further improve clone speed by truncating commit history, but it may cause issues as summarized here.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondromodelView Question on Stackoverflow
Solution 1 - GitapenwarrView Answer on Stackoverflow
Solution 2 - GitAlexey GrinkoView Answer on Stackoverflow
Solution 3 - GitonionjakeView Answer on Stackoverflow
Solution 4 - GitsourcedelicaView Answer on Stackoverflow
Solution 5 - GitFawaz AhmedView Answer on Stackoverflow
Solution 6 - GitJ-F BergeronView Answer on Stackoverflow
Solution 7 - GitexpertView Answer on Stackoverflow
Solution 8 - GitVonCView Answer on Stackoverflow
Solution 9 - GitQamarView Answer on Stackoverflow
Solution 10 - GitAxel BeckertView Answer on Stackoverflow
Solution 11 - GitSANDEEP MACHIRAJUView Answer on Stackoverflow
Solution 12 - GitdromodelView Answer on Stackoverflow
Solution 13 - GitniconoView Answer on Stackoverflow
Solution 14 - Giteric longView Answer on Stackoverflow
Solution 15 - Githoohoo-bView Answer on Stackoverflow