Methods for using Git with Google Colab
GitGoogle ColaboratoryGit Problem Overview
Are there any recommended methods to integrate git with colab?
For example, is it possible to work off code from google source repositories or the likes?
Neither google drive nor cloud storage can be used for git functionality.
So I was wondering if there is a way to still do it?
Git Solutions
Solution 1 - Git
If you want to clone a private repository, the quickest way would be to create a personal access token and select only privileges that your application needs. Then clone command for GitHub would look like:
!git clone https://[email protected]/username/repository.git
Solution 2 - Git
git
is installed on the machine, and you can use !
to invoke shell commands.
For example, to clone a git
repository:
!git clone https://github.com/fastai/courses.git
Here's a complete example that clones a repository and loads an Excel file stored therein. https://colab.research.google.com/notebook#fileId=1v-yZk-W4YXOxLTLi7bekDw2ZWZXWW216
Solution 3 - Git
The very simple and easy way to clone your private github repo in Google colab is as below.
- Your password won't be exposed
- Though your password contains special character also it works
- Just run the below snippet in Colab cell and it will execute in an interactive way
import os
from getpass import getpass
import urllib
user = input('User name: ')
password = getpass('Password: ')
password = urllib.parse.quote(password) # your password is converted into url format
repo_name = input('Repo name: ')
cmd_string = 'git clone https://{0}:{1}@github.com/{0}/{2}.git'.format(user, password, repo_name)
os.system(cmd_string)
cmd_string, password = "", "" # removing the password from the variable
Solution 4 - Git
You can use ssh protocol to connect your private repository with colab
-
Generate ssh key pairs on your local machine, don't forget to keep
the paraphrase empty, check this tutorial. -
Upload it to colab, check the following screenshot
from google.colab import files
uploaded = files.upload()
-
Move the ssh kay pairs to /root and connect to git
- remove previously ssh files
! rm -rf /root/.ssh/*
! mkdir /root/.ssh
- uncompress your ssh files
! tar -xvzf ssh.tar.gz
- copy it to root
! cp ssh/* /root/.ssh && rm -rf ssh && rm -rf ssh.tar.gz
! chmod 700 /root/.ssh
- add your git server e.g gitlab as a ssh known host
! ssh-keyscan gitlab.com >> /root/.ssh/known_hosts
! chmod 644 /root/.ssh/known_hosts
- set your git account
! git config --global user.email "email"
! git config --global user.name "username"
- finally connect to your git server
! ssh [email protected]
- remove previously ssh files
-
Authenticate your private repository, please check this Per-repository deploy keys.
-
Use
! [email protected]:{account}/{projectName}.git
note: to use push, you have to give write access for
the public ssh key that you authenticate git server with.
Solution 5 - Git
In order to protect your account username and password, you can use getPass
and concatenate them in the shell command:
from getpass import getpass
import os
user = getpass('BitBucket user')
password = getpass('BitBucket password')
os.environ['BITBUCKET_AUTH'] = user + ':' + password
!git clone https://$BITBUCKET_AUTH@bitbucket.org/{user}/repository.git
Solution 6 - Git
You can almost use this link: https://qiita.com/Rowing0914/items/51a770925653c7c528f9
as a summary of the above link you should do this steps:
1- connect your google colab runtime to your Google Drive using this commands:
from google.colab import drive
drive.mount('/content/drive')
It would need a authentication process. Do whatever it needs.
2- Set current directory
the path you want to clone the Git project there:
in my example:
path_clone = "drive/My Drive/projects"
%cd path_clone
don't forget to use !
in the beginning of cd
command.
3- Clone the Git project:
!git clone <Git project URL address>
now you would have the cloned Git project in projects
folder in you Google Drive (which is also connected to your Google Colab runtime machine)
4- Go to your Google Drive (using browser or etc) and then go to the "projects" folder and open the .ipynb
file that you want to use in Google Colab.
5- Now you have Google Colab runtime with the .ipynb
that you wanted to use which is also connected to your Google Drive and all cloned git files are in the Colab runtime's storage.
Note:
1- Check that your Colab runtime is connected to Google Drive. If it's not connected, just repeat the step #1 above.
2- Double check by using "pwd" and "cd" commands that the current directory
is related to the cloned git project in google Drive (step #2 above).
Solution 7 - Git
Three steps to use git to sync colab with github or gitlab.
-
Generate a private-public key pair. Copy the private key to the system clibboard for use in step 2. Paste the public key to github or gitlab as appropriate.
In Linux, ssh-keygen can be used to generate the key-pair in ~/.ssh. The resultant private key is in the file id_rsa, the public key is in the file id_rsa.pub.
-
In Colab, execute
key = \ ''' paste the private key here (your id_rsa or id_ecdsa file in the .ssh directory, e.g. -----BEGIN EC PRIVATE KEY----- M..............................................................9 ...............................................................J ..................................== -----END EC PRIVATE KEY----- ''' ! mkdir -p /root/.ssh with open(r'/root/.ssh/id_rsa', 'w', encoding='utf8') as fh: fh.write(key) ! chmod 600 /root/.ssh/id_rsa ! ssh-keyscan github.com >> /root/.ssh/known_hosts # test setup ! ssh -T [email protected] # if you see something like "Hi ffreemt! You've successfully # authenticated, but GitHub does not provide shell access." # you are all set. You can tweak .ssh/config for multiple github accounts
-
Use git to pull/push as usual.
The same idea can be used for rsync (or ssh) bewteen colab and HostA with minor changes:
-
Generate a private-public key pair. Copy the private key to the system clibboard for use in step 2. Paste the public key to authorized_keys in .ssh in HostA.
-
In Colab, execute
key = \ ''' paste the private key here ''' ! mkdir -p /root/.ssh with open(r'/root/.ssh/id_rsa', 'w', encoding='utf8') as fh: fh.write(key) ! chmod 600 /root/.ssh/id_rsa ! ssh -oStrictHostKeyChecking=no root@HostA hostnam # ssh-keyscan
HostA >> /root/.ssh/known_hosts does not seem to work with IP.
- Use rsync to sync files bewtenn colab and HostA as usual.
Solution 8 - Git
> Update September 2021 — For security reasons, passwords are now deprecated for github usage. Please use the Personal Access Token
instead — Go to github.com -> Settings ->Developer Settings -> Personal Access Token and generate a token for the required purpose. Use this in place of your password for all tasks mentioned along this tutorial!
For more details you can also see my article on Medium : https://medium.com/geekculture/using-git-github-on-google-colaboratory-7ef3b76fe61b
None of the answers provide a straight and direct answer like this one :
GitColab
Probably this is the answer you are looking for..
Works on colab for both public and private repositories and don't change/skip any step: (Replace all {vars}
)
TL;DR Complete Process:
!git clone https://{your_username}:{your_password}@github.com/{destination_repo_username}/{destination_repo_projectname}.git
%cd /content/{destination_repo_username}
!git config --global user.name "{your_username}"
!git config --global user.email "{your_email_id}"
!git config --global user.password "{your_password}"
Make Your Changes and then run :
!git add .
!git commit -m "{Message}"
!git push
Cloning a Repository :
!git clone https://{your_username}:{your_password}@github.com/{destination_repo_username}/{destination_repo_projectname}.git
Change the directory to
Change the directory to {destination_repo_username} using line magic command %cd
for jupyter notebooks.
%cd /content/{destination_repo_username}
Verify!
Pull
Sanity Check to see if everything works perfectly!
!git pull
If no changes were made to the remote git repo after cloning, the following should be the displayed output :
Already up to date.
Status
Similarly check the status of the staged/unstaged changes.
!git status
It should display this, with the default branch selected :
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
Check Older Logs
Check the previous commits you have made on the repo :
!git log -n 4
Outputs Git Commit IDs with Logs :
commit 18ccf27c8b2d92b560e6eeab2629ba0c6ea422a5 (HEAD -> main, origin/main, origin/HEAD)
Author: Farhan Hai Khan <njrfarhandasilva10@gmail.com>
Date: Mon May 31 00:12:14 2021 +0530
Create README.md
commit bd6ee6d4347eca0e3676e88824c8e1118cfbff6b
Author: khanfarhan10 <njrfarhandasilva10@gmail.com>
Date: Sun May 30 18:40:16 2021 +0000
Add Zip COVID
commit 8a3a12863a866c9d388cbc041a26d49aedfa4245
Author: khanfarhan10 <njrfarhandasilva10@gmail.com>
Date: Sun May 30 18:03:46 2021 +0000
Add COVID Data
commit 6a16dc7584ba0d800eede70a217d534a24614cad
Author: khanfarhan10 <njrfarhandasilva10@gmail.com>
Date: Sun May 30 16:04:20 2021 +0000
Removed sample_data using colab (testing)
Make changes in the local repo
Make changes from the local repo directory.
These might include, edditions, deletions, edits.
Pro Tip : If you want you can copy paste things from drive to a git repo by:
Mount Google Drive:
from google.colab import drive
drive.mount('/content/gdrive')
Copy contents using shutil :
import shutil
# For a folder:
shutil.copytree(src_folder,des_folder)
# For a file:
shutil.copy(src_file,des_file)
# Create a ZipFile
shutil.make_archive(archive_name, 'zip', directory_to_zip)
Set Git Credentials
Tell Git Who You Are?
!git config --global user.name "{your_username}"
!git config --global user.email "{your_email_id}"
!git config --global user.password "{your_password}"
Check Remote Again
Check if the remote url is set and configured correctly :
!git remote -v
If configured properly it should output the following :
origin https://{your_username}:{your_password}@github.com/{destination_repo_username}/{destination_repo_projectname}.git (fetch)
origin https://{your_username}:{your_password}@github.com/{destination_repo_username}/{destination_repo_projectname}.git (push)
Add, Commit, Push
You know what to do.
!git add .
!git commit -m "{Message}"
!git push
Enjoy!
Solution 9 - Git
Cloning a private repo to google colab :
Generate a token:
Settings -> Developer settings -> Personal access tokens -> Generate new token
Copy the token and clone the repo (replace username and token accordingly)
!git clone https://username:[email protected]/username/repo_name.git
Solution 10 - Git
The solution https://stackoverflow.com/a/53094151/3924118 did not work for me because the expression {user}
was not being converted to the actual username (I was getting a 400 bad request), so I slightly changed that solution to the following one.
from getpass import getpass
import os
os.environ['USER'] = input('Enter the username of your Github account: ')
os.environ['PASSWORD'] = getpass('Enter the password of your Github account: ')
os.environ['REPOSITORY'] = input('Enter the name of the Github repository: ')
os.environ['GITHUB_AUTH'] = os.environ['USER'] + ':' + os.environ['PASSWORD']
!rm -rf $REPOSITORY # To remove the previous clone of the Github repository
!git clone https://[email protected]/$USER/$REPOSITORY.git
os.environ['USER'] = os.environ['PASSWORD'] = os.environ['REPOSITORY'] = os.environ['GITHUB_AUTH'] = ""
If you are able to clone your-repo
, you should not see any password in the output of this command. If you get an error, the password could be displayed to the output, so make sure you do not share your notebook whenever this command fails.
Solution 11 - Git
I tried some of the methods here and they all worked well, but an issue I faced was, it became difficult to handle all the git commands and other related commands, for example version control with DVC, within notebook cells. So, I turned to this nice solution, Kora. It is a terminal emulator that can be run with in colab. This gives the ease of usage very similar to a terminal in local machine. The notebook will be still alive and we can edit files and cells as usual. Since this console is temporary, no information is exposed. GitHub login and other commands can be run as usual.
Kora: https://pypi.org/project/kora/
Usage:
!pip install kora
from kora import console
console.start()
Solution 12 - Git
I finally pulled myself together and wrote a python package for this.
pip install clmutils # colab-misc-utils
Create a dotenv or .env in /content/drive/MyDrive (if google drive is mounted to drive) or /content/drive/.env with
# for git
user_email = "your-email"
user_name = "your-github-name"
gh_key = "-----BEGIN EC PRIVATE KEY-----
...............................................................9
your github private key........................................J
..................................==
-----END EC PRIVATE KEY-----
"
In a Colab cell
from clmutils import setup_git, Settings
config = Settings()
setup_git(
user_name=config.user_name,
user_email=config.user_email,
priv_key=config.gh_key
)
You are then all set to do all the git cloen
, amend code, git push
stuff as if it were on your own lovely computer at home or at work.
clmutils
also has a funtion called setup_ssh_tunnel
to setup a reverse ssh tunnel to Colab. It also reads various keys, username, hostname from the .env file. It's a bit involving. But if you know how to manually set up a revers ssh tunnel to Colab, you would have no problems figuring out what they are used for. Details are available on the github repo (google clmutils pypi
).
Solution 13 - Git
Mount the drive using:
from google.colab import drive
drive.mount('/content/drive/')
Then:
%cd /content/drive/
To clone the repo in your drive
!git clone <github repo url>
Access other files from the repo(example: helper.py is another file in repo):
import imp
helper = imp.new_module('helper')
exec(open("drive/path/to/helper.py").read(), helper.__dict__)
Solution 14 - Git
This works if you want to share your repo and colab. Also works if you have multiple repos. Just throw it in a cell.
import ipywidgets as widgets
from IPython.display import display
import subprocess
class credentials_input():
def __init__(self, repo_name):
self.repo_name = repo_name
self.username = widgets.Text(description='Username', value='')
self.pwd = widgets.Password(description = 'Password', placeholder='password here')
self.username.on_submit(self.handle_submit_username)
self.pwd.on_submit(self.handle_submit_pwd)
display(self.username)
def handle_submit_username(self, text):
display(self.pwd)
return
def handle_submit_pwd(self, text):
cmd = f'git clone https://{self.username.value}:{self.pwd.value}@{self.repo_name}'
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
print(output, error)
self.username.value, self.pwd.value = '', ''
get_creds = credentials_input('github.com/username/reponame.git')
get_creds
Solution 15 - Git
Another solution based on answer from @Marafon Thiago:
ATENTION: In password with special caracters use the respective encoding of caracter.
Ex passwd = '@123'
you should type :passwd = '%40123'
from getpass import getpass
user = getpass('BitBucket user')
password = getpass('BitBucket password')
!git init
!git clone https://{user}:{password}@bitbucket.org/aqtechengenharia/aqtlibpy.git