How to clone all repos at once from GitHub?
GitGithubGit CloneGit Problem Overview
I have a company GitHub account and I want to back up all of the repositories within, accounting for anything new that might get created for purposes of automation. I was hoping something like this:
git clone git@github.com:company/*.git
or similar would work, but it doesn't seem to like the wildcard there.
Is there a way in Git to clone and then pull everything assuming one has the appropriate permissions?
Git Solutions
Solution 1 - Git
On Windows and all UNIX/LINUX systems, using Git Bash or any other Terminal, replace YOURUSERNAME
by your username and use:
CNTX={users|orgs}; NAME={username|orgname}; PAGE=1
curl "https://api.github.com/$CNTX/$NAME/repos?page=$PAGE&per_page=100" |
grep -e 'clone_url*' |
cut -d \" -f 4 |
xargs -L1 git clone
- Set
CNTX=users
andNAME=yourusername
, to download all your repositories. - Set
CNTX=orgs
andNAME=yourorgname
, to download all repositories of your organization.
The maximum page-size is 100, so you have to call this several times with the right page number to get all your repositories (set PAGE
to the desired page number you want to download).
Here is a shell script that does the above: https://gist.github.com/erdincay/4f1d2e092c50e78ae1ffa39d13fa404e
Solution 2 - Git
I don't think it's possible to do it that way. Your best bet is to find and loop through a list of an Organization's repositories using the API.
Try this:
- Create an API token by going to Account Settings -> Applications
- Make a call to:
http://${GITHUB_BASE_URL}/api/v3/orgs/${ORG_NAME}/repos?access_token=${ACCESS_TOKEN}
- The response will be a JSON array of objects. Each object will include information about one of the repositories under that Organization. I think in your case, you'll be looking specifically for the
ssh_url
property. - Then
git clone
each of thosessh_url
s.
It's a little bit of extra work, but it's necessary for GitHub to have proper authentication.
Solution 3 - Git
Organisation repositories
To clone all repos from your organisation, try the following shell one-liner:
GHORG=company; curl "https://api.github.com/orgs/$GHORG/repos?per_page=1000" | grep -o 'git@[^"]*' | xargs -L1 git clone
User repositories
Cloning all using Git repository URLs:
GHUSER=CHANGEME; curl "https://api.github.com/users/$GHUSER/repos?per_page=1000" | grep -o 'git@[^"]*' | xargs -L1 git clone
Cloning all using Clone URL:
GHUSER=CHANGEME; curl "https://api.github.com/users/$GHUSER/repos?per_page=1000" | grep -w clone_url | grep -o '[^"]\+://.\+.git' | xargs -L1 git clone
Here is the useful shell function which can be added to user's startup files (using curl
+ jq
):
# Usage: gh-clone-user (user)
gh-clone-user() {
curl -sL "https://api.github.com/users/$1/repos?per_page=1000" | jq -r '.[]|.clone_url' | xargs -L1 git clone
}
Private repositories
If you need to clone the private repos, you can add Authorization token either in your header like:
-H 'Authorization: token <token>'
or pass it in the param (?access_token=TOKEN
), for example:
curl -s "https://api.github.com/users/$GHUSER/repos?access_token=$GITHUB_API_TOKEN&per_page=1000" | grep -w clone_url | grep -o '[^"]\+://.\+.git' | xargs -L1 git clone
Notes:
- To fetch only private repositories, add
type=private
into your query string. - Another way is to use
hub
after configuring your API key.
See also:
- GitHub REST API v3 - List your repositories
- <https://stackoverflow.com/q/20396329/55075>;.
- <https://stackoverflow.com/q/8713596/55075>;.
Hints:
- To increase speed, set number of parallel processes by specifying -P
parameter for xargs
(-P4
= 4 processes).
- If you need to raise the GitHub limits, try authenticating by specifying your API key.
- Add --recursive
to recurse into the registered submodules, and update any nested submodules within.
Solution 4 - Git
Simple script using GitHub CLI (no API keys)
Here's a simple solution using the official GitHub CLI tool, gh
- no need for API keys and can handle any number of private repos:
First time only: login with gh
for private repos, and follow prompts:
gh auth login
Now, clone up to 1000 repos under a new ./myorgname
folder - replace `myorgname with your org name:
gh repo list myorgname --limit 1000 | while read -r repo _; do
gh repo clone "$repo" "$repo"
done
Setup
To get the GitHub CLI tool:
- Mac -
brew install gh
- Linux or Windows - see GitHub install guide
The GitHub CLI tool will be supported long-term as and when the GitHub API changes.
Optional: update existing checkouts
To update repo folders already on disk, as well as cloning new repos, the script needs to check for failure of the gh repo clone
, like this:
gh repo list myorgname --limit 1000 | while read -r repo _; do
gh repo clone "$repo" "$repo" -- -q 2>/dev/null || (
cd "$repo"
# Handle case where local checkout is on a non-main/master branch
# - ignore checkout errors because some repos may have zero commits,
# so no main or master
git checkout -q main 2>/dev/null || true
git checkout -q master 2>/dev/null || true
git pull -q
)
done
Background
- GitHub CLI login doc
- Script comands above were derived from an issue comment and gist by davegallant
Solution 5 - Git
This gist accomplishes the task in one line on the command line:
curl -s https://api.github.com/orgs/[your_org]/repos?per_page=200 | ruby -rubygems -e 'require "json"; JSON.load(STDIN.read).each { |repo| %x[git clone #{repo["ssh_url"]} ]}'
Replace [your_org]
with your organization's name. And set your per_page
if necessary.
UPDATE:
As ATutorMe mentioned, the maximum page size is 100, according to the GitHub docs.
If you have more than 100 repos, you'll have to add a page
parameter to your url and you can run the command for each page.
curl -s "https://api.github.com/orgs/[your_org]/repos?page=2&per_page=100" | ruby -rubygems -e 'require "json"; JSON.load(STDIN.read).each { |repo| %x[git clone #{repo["ssh_url"]} ]}'
Note: The default per_page
parameter is 30
.
Solution 6 - Git
So, I will add my answer too. :) (I found it's simple)
Fetch list (I've used "magento" company):
curl -si https://api.github.com/users/magento/repos | grep ssh_url | cut -d '"' -f4
Use clone_url
instead ssh_url
to use HTTP access.
So, let's clone them all! :)
curl -si https://api.github.com/users/magento/repos | \
grep ssh_url | cut -d '"' -f4 | xargs -i git clone {}
If you are going to fetch private repo's - just add GET parameter ?access_token=YOURTOKEN
Solution 7 - Git
Go to Account Settings -> Application and create an API key
Then insert the API key, github instance url, and organization name in the script below
#!/bin/bash
# Substitute variables here
ORG_NAME="<ORG NAME>"
ACCESS_TOKEN="<API KEY>"
GITHUB_INSTANCE="<GITHUB INSTANCE>
URL="https://${GITHUB_INSTANCE}/api/v3/orgs/${ORG_NAME}/repos?access_token=${ACCESS_TOKEN}"
curl ${URL} | ruby -rjson -e 'JSON.load(STDIN.read).each {|repo| %x[git clone #{repo["ssh_url"]} ]}'
Save that in a file, chmod u+x
the file, then run it.
Thanks to Arnaud for the ruby code.
Solution 8 - Git
Use the Github CLI with some scripting to clone all (public or private) repos under a namespace
gh repo list OWNER --limit 1000 | awk '{print $1; }' | xargs -L1 gh repo clone
Where OWNER
can be your user name or an org name.
Solution 9 - Git
I found a comment in the gist @seancdavis provided to be very helpful, especially because like the original poster, I wanted to sync all the repos for quick access, however the vast majority of which were private.
curl -u [[USERNAME]] -s https://api.github.com/orgs/[[ORGANIZATION]]/repos?per_page=200 |
ruby -rubygems -e 'require "json"; JSON.load(STDIN.read).each { |repo| %x[git clone #{repo["ssh_url"]} ]}'
Replace [[USERNAME]] with your github username and [[ORGANIZATION]] with your Github organization. The output (JSON repo metadata) will be passed to a simple ruby script:
# bring in the Ruby json library
require "json"
# read from STDIN, parse into ruby Hash and iterate over each repo
JSON.load(STDIN.read).each do |repo|
# run a system command (re: "%x") of the style "git clone <ssh_url>"
%x[git clone #{repo["ssh_url"]} ]
end
Solution 10 - Git
This python one-liner will do what you need. It:
-
checks github for your available repos
-
for each, makes a system call to
git clone
python -c "import json, urllib, os; [os.system('git clone ' + r['ssh_url']) for r in json.load(urllib.urlopen('https://api.github.com/orgs/<<ORG_NAME>>/repos?per_page=200'))]"
Solution 11 - Git
Here is a Python solution:
curl -s https://api.github.com/users/org_name/repos?per_page=200 | python -c $'import json, sys, os\nfor repo in json.load(sys.stdin): os.system("git clone " + repo["clone_url"])'
Substitute org_name with the name of the organization or user whose repos you wish to download. In Windows you can run this in Git Bash. In case it cannot find python (not in your PATH etc.), the easiest solution I have found is to replace python with the path to the actual Python executable, for example: /c/ProgramData/Anaconda3/python for an Anaconda installation in Windows 10.
Solution 12 - Git
Solution 13 - Git
curl -s https://api.github.com/orgs/[GITHUBORG_NAME]/repos | grep clone_url | awk -F '":' '{ print $2 }' | sed 's/\"//g' | sed 's/,//' | while read line; do git clone "$line"; done
Solution 14 - Git
> I tried a few of the commands and tools above, but decided they were too much of a hassle, so I wrote another command-line tool to do this, called github-dl
.
To use it (assuming you have nodejs installed)
npx github-dl -d /tmp/test wires
This would get a list of all the repo's from wires
and write info into the test
directory, using the authorisation details (user/pass) you provide on the CLI.
In detail, it
- Asks for auth (supports 2FA)
- Gets list of repos for user/org through Github API
- Does pagination for this, so more than 100 repo's supported
It does not actually clone the repos, but instead write a .txt
file that you can pass into xargs
to do the cloning, for example:
cd /tmp/test
cat wires-repo-urls.txt | xargs -n2 git clone
# or to pull
cat /tmp/test/wires-repo-urls.txt | xargs -n2 git pull
Maybe this is useful for you; it's just a few lines of JS so should be easy to adjust to your needs
Solution 15 - Git
Simple solution:
NUM_REPOS=1000
DW_FOLDER="Github_${NUM_REPOS}_repos"
mkdir ${DW_FOLDER}
cd ${DW_FOLDER}
for REPO in $(curl https://api.github.com/users/${GITHUB_USER}/repos?per_page=${NUM_REPOS} | awk '/ssh_url/{print $2}' | sed 's/^"//g' | sed 's/",$//g') ; do git clone ${REPO} ; done
Solution 16 - Git
Clone all your public and private repos that are not forks:
Create first a Personal token for authentication, make sure it has all the repo
permissions
curl -u username:token https://api.github.com/user/repos\?page\=1\&per_page\=100 |
jq -r 'map(select(.fork == false)) | .[] | .ssh_url' |
xargs -L1 git clone
Clone your gists:
curl https://api.github.com/users/{username}/gists\?page\=1\&per_page\=100 |
jq -r ".[] | .git_pull_url +\" '\" + (.files|keys|join(\"__\") + \"'\")" |
xargs -L1 git clone
This jq
command is complex because gists' repo's name are hashes, so that command concatenates all filenames to be the repo's name
jq
You can filter the JSON arbitrarily using install: sudo apt-get install jq
In the example above, I filtered out forks using this: curl ... | jq -r 'map(select(.fork == false))' ...
-- useful for not cloning repos where you've made casual pull requests
jq supports some very advanced features. man jq
is your friend
Github's API urls
- Your repos (needs authentication):
https://api.github.com/user/repos\?page\=1\&per_page\=100
- Any user:
https://api.github.com/users/{other_username}/repos\?page\=1\&per_page\=100
- Orgs:
https://api.github.com/orgs/orgname/repos\?page\=1\&per_page\=100
Solution 17 - Git
So, in practice, if you want to clone all repos from the organization FOO
which match BAR
, you could use the one-liner below, which requires jq and common cli utilities
curl 'https://api.github.com/orgs/FOO/repos?access_token=SECRET' |
jq '.[] |
.ssh_url' |
awk '/BAR/ {print "git clone " $0 " & "}' |
sh
Solution 18 - Git
There is also a very useful npm module to do this. It can not only clone, but pull as well (to update data you already have).
You just create config like this:
[{ "username": "BoyCook", "dir": "/Users/boycook/code/boycook", "protocol": "ssh"}]
and do gitall clone
for example. Or gitall pull
Solution 19 - Git
In case anyone looks for a Windows solution, here's a little function in PowerShell to do the trick (could be oneliner/alias if not the fact I need it to work both with and without proxy).
function Unj-GitCloneAllBy($User, $Proxy = $null) {
(curl -Proxy $Proxy "https://api.github.com/users/$User/repos?page=1&per_page=100").Content
| ConvertFrom-Json
| %{ $_.clone_url }
# workaround git printing to stderr by @wekempf aka William Kempf
# https://github.com/dahlbyk/posh-git/issues/109#issuecomment-21638678
| %{ & git clone $_ 2>&1 }
| % { $_.ToString() }
}
Solution 20 - Git
Another shell script with comments that clones all repositories (public and private) from a user:
#!/bin/bash
USERNAME=INSERT_USERNAME_HERE
PASSWORD=INSERT_PASSWORD_HERE
# Generate auth header
AUTH=$(echo -n $USERNAME:$PASSWORD | base64)
# Get repository URLs
curl -iH "Authorization: Basic "$AUTH https://api.github.com/user/repos | grep -w clone_url > repos.txt
# Clean URLs (remove " and ,) and print only the second column
cat repos.txt | tr -d \"\, | awk '{print $2}' > repos_clean.txt
# Insert username:password after protocol:// to generate clone URLs
cat repos_clean.txt | sed "s/:\/\/git/:\/\/$USERNAME\:$PASSWORD\@git/g" > repos_clone.txt
while read FILE; do
git clone $FILE
done <repos_clone.txt
rm repos.txt & rm repos_clone.txt
Solution 21 - Git
~/.bashrc file
Create a bash alias/func in your I solved this for my team by creating an alias/bash func in my ~/.bashrc file
steps
open a terminal or linux shell and open your ~/.bashrc file
:
sudo nano ~/.bashrc
add this function:
CloneAll() {
# Make the url to the input github organization's repository page.
ORG_URL="https://api.github.com/orgs/${1}/repos?per_page=200";
# List of all repositories of that organization (seperated by newline-eol).
ALL_REPOS=$(curl -s ${ORG_URL} | grep html_url | awk 'NR%2 == 0' \
| cut -d ':' -f 2-3 | tr -d '",');
# Clone all the repositories.
for ORG_REPO in ${ALL_REPOS}; do
git clone ${ORG_REPO}.git;
done
}
save and close your ~/.bashrc flile and then close the terminal -- you need to do this or the new func wont initialize:
open new terminal and try it out:
CloneAll <your_github_org_name>
example: if your personal github repo URL is called https://github.com/awesome-async the command would be
CloneAll awesome-async
Important
the per_page=200
at the end of the first variable ORG_URL
sets the number of repos that will be cloned, so pay special attention to that:
ORG_URL="https://api.github.com/orgs/${1}/repos?per_page=200"; <---- make sure this is what you want
Hope this helps! :)
Solution 22 - Git
You can get a list of the repositories by using curl
and then iterate over said list with a bash loop:
GIT_REPOS=`curl -s curl https://${GITHUB_BASE_URL}/api/v3/orgs/${ORG_NAME}/repos?access_token=${ACCESS_TOKEN} | grep ssh_url | awk -F': ' '{print $2}' | sed -e 's/",//g' | sed -e 's/"//g'`
for REPO in $GIT_REPOS; do
git clone $REPO
done
Solution 23 - Git
You can use open-source tool to clone bunch of github repositories: https://github.com/artiomn/git_cloner
Example:
git_cloner --type github --owner octocat --login user --password user https://my_bitbucket
Use JSON API from api.github.com
.
You can see the code example in the github documentation:
https://developer.github.com/v3/
Or there:
https://github.com/artiomn/git_cloner/blob/master/src/git_cloner/github.py
Solution 24 - Git
To clone only private repos, given an access key, and given python 3 and requests module installed:
ORG=company; ACCESS_KEY=0000000000000000000000000000000000000000; for i in $(python -c "import requests; print(' '.join([x['ssh_url'] for x in list(filter(lambda x: x['private'] ,requests.get('https://api.github.com/orgs/$ORG/repos?per_page=1000&access_token=$ACCESS_KEY').json()))]))"); do git clone $i; done;
Solution 25 - Git
A Python3 solution that includes exhaustive pagination via Link
Header.
Pre-requisites:
- Github API "Personal Access Token"
pip3 install links-from-link-header
- hub
import json
import requests
from requests.auth import HTTPBasicAuth
import links_from_header
respget = lambda url: requests.get(url, auth=HTTPBasicAuth('githubusername', 'githubtoken'))
myorgname = 'abc'
nexturl = f"https://api.github.com/orgs/{myorgname}/repos?per_page=100"
while nexturl:
print(nexturl)
resp = respget(nexturl)
linkheads = resp.headers.get('Link', None)
if linkheads:
linkheads_parsed = links_from_header.extract(linkheads)
nexturl = linkheads_parsed.get('next', None)
else:
nexturl = None
respcon = json.loads(resp.content)
with open('repolist', 'a') as fh:
fh.writelines([f'{respconi["full_name"]}\n' for respconi in respcon])
Then, you can use xargs
or parallel and: cat repolist | parallel -I% hub clone %
Solution 26 - Git
If you have list of repositories in a list like this, then this shell script works:
user="https://github.com/user/"
declare -a arr=("repo1", "repo2")
for i in "${arr[@]}"
do
echo $user"$i"
git clone $user"$i"
done
Solution 27 - Git
I created a sample batch script. You can download all private/public repositories from github.com. After a repository is downloaded, it is automatically converted to a zip file.
@echo off
setlocal EnableDelayedExpansion
SET "username=olyanren"
SET "password=G....."
set "mypath=%cd%\"
SET "url=https://%username%:%password%@github.com/%username%/"
FOR /F "tokens=* delims=" %%i in (files.txt) do (
SET repo=%%i
rmdir /s /q !repo!
git clone "!url!!repo!.git"
cd !repo!
echo !mypath!
git archive --format=zip -o "!mypath!!repo!.zip" HEAD
cd ..
)
Note: files.txt file should contain only repository names like:
repository1
repository2
Solution 28 - Git
The prevailing answers here don't take into account that the Github API will only return a maximum of 100 repositories despite what you may specify in per_page
. If you are cloning a Github org with more than 100 repositories, you will have to follow the paging links in the API response.
I wrote a CLI tool to do just that:
clone-github-org -o myorg
This will clone all repositories in the myorg
organization to the current working directory.
Solution 29 - Git
For orgs you have access to with private repos:
curl -u <YOUR_GITHUB_USERNAME> -s https://api.github.com/orgs/<ORG_NAME>/repos?per_page=200 | ruby -rubygems -e ’require “json”; JSON.load(STDIN.read).each { |repo| %x[git clone #{repo[“html_url”]} ]}'
It uses the html_url
, so you don't need an access_token
just enter your github password when prompted.
Solution 30 - Git
"""
Clone all public Github Repos
https://developer.github.com/v3/repos/#list-repositories-for-a-user
"""
import urllib.request, base64
import json
import os
def get_urls(username):
url = f"https://api.github.com/users/{username}/repos?per_page=200"
request = urllib.request.Request(url)
result = urllib.request.urlopen(request)
return json.load(result)
if __name__ == "__main__":
for r in get_urls("MartinThoma"):
if not os.path.isdir(r["name"]):
print(f"Clone {r['name']}...")
os.system("git clone " + r["ssh_url"])
else:
print(f"SKIP {r['name']}...")
Solution 31 - Git
To clone all your own private and public repos simple generate a new access token with repos access and use this:
(replace with your own access token and username)
for line in $(curl https://api.github.com/user/repos?access_token=ACCESS_TOKEN_HERE | grep -o "[email protected]:YOUR_USER_NAME/[^ ,\"]\+");do git clone $line;done
This will clone all repos in current folder
This is a little bash program, you can just paste it in the terminal and hit enter
Solution 32 - Git
You could use a tool like GitHub Archive which allows you to clone/pull public and private personal repos, organization repos, and gists all with one simple tool.
As for automation, you could then set up GitHub Archive to run once a day or once a week for example and it will skip those that are cloned and pull in new changes since the last time it was run of all others.
Solution 33 - Git
When I want to clone all my repos fast I do:
for i in `echo https://github.com/user/{repox,repoy,repoz,repob}`; do git clone $i; done
Solution 34 - Git
Here's a way to get all of a user's gists that takes into account github's new api and pageation rules...
usage:
python3 gist.py bgoonz
Also ... every clone is going to be it's own repo which can get pretty memory intensive on your drive... you can remove the git repos recursively using:
find . \( -name ".git" -o -name ".gitignore" -o -name ".gitmodules" -o -name ".gitattributes" \) -exec rm -rf -- {} +
If you want to clone them all into an existing repository of yours... make sure you aren't in the outermost folder of your repo when you run this command or it will delete your .git folder just as indiscriminately as it will delete the ones that belong to the gists.
Language:Python
#!/usr/bin/env python3
import os
import sys
import json
import hashlib
import requests
from subprocess import call
from concurrent.futures import ThreadPoolExecutor as PoolExecutor
def download_all_from_user(user: str):
next_page = True
page = 1
while next_page:
url = f"https://api.github.com/users/{user}/gists?page={page}"
response = requests.get(url)
if not len(response.json()):
next_page = False
else:
page += 1
download_all(response.json())
def download_all(gists: list):
with PoolExecutor(max_workers=10) as executor:
for _ in executor.map(download, gists):
pass
def download(gist):
target = gist["id"] + hashlib.md5(gist["updated_at"].encode('utf-8')).hexdigest()
call(["git", "clone", gist["git_pull_url"], target])
description_file = os.path.join(target, "description.txt")
with open(description_file, "w") as f:
f.write(f"{gist['description']}\n")
# Run
user = sys.argv[1]
download_all_from_user(user)
Solution 35 - Git
I wanted to suggest another option, which may be easier than some of the scripts posted here. mergestat
is a command-line tool that can be used to clone all org repositories from GitHub, as described on this page.
mergestat "SELECT clone('https://github.com/mergestat/'|| name) AS path FROM github_org_repos('mergestat')" -v --clone-dir my-dir
This is not the primary purpose of the tool, but is a useful side-effect of what it does (it's a way to query git repositories with SQL). Full disclosure, I am the maintainer/creator, but wanted to share here as this is a fairly frequent use-case/question we get from users, and I hope mergestat
can provide a simple solution to it.
Solution 36 - Git
Update from May 19
use this bash command for an organization (private repo included)
curl -u "{username}" "https://api.github.com/orgs/{org}/repos?page=1&per_page=100" | grep -o 'git@[^"]*' | xargs -L1 git clone
Solution 37 - Git
Here is the windows version using PowerShell
$name="facebook" #either username or org_name
$api_url="https://api.github.com/users/$($name)/repos?per_page=200"
$repos=Invoke-WebRequest -UseBasicParsing -Uri $api_url |ConvertFrom-Json
foreach ($repo in $repos)
{
Write-Host "Cloning via SSH URL $($repo.ssh_url)"
git clone $repo.ssh_url
}
Solution 38 - Git
An easy way to grab the list of repository names is to browse to the page listing the repos of the org in your web browser and extracting it via JavaScript:
Array.from(document.querySelectorAll('[itemprop="name codeRepository"]')).map(function(item){ return item.text.replace(/\s/g,'') })
You can easily adapt this code to generate the actual git clone
commands if you like.
The advantage of this is that you don't need to bother with API keys or command line extractions.
A possible disadvantage is that you might need to change this code if Github changes their site design or code.