Methods for using Git with Google Colab

GitGoogle Colaboratory

Git Problem Overview


Are there any recommended methods to integrate git with colab?

For example, is it possible to work off code from google source repositories or the likes?

Neither google drive nor cloud storage can be used for git functionality.

So I was wondering if there is a way to still do it?

Git Solutions


Solution 1 - Git

If you want to clone a private repository, the quickest way would be to create a personal access token and select only privileges that your application needs. Then clone command for GitHub would look like:

!git clone https://[email protected]/username/repository.git

Solution 2 - Git

git is installed on the machine, and you can use ! to invoke shell commands.

For example, to clone a git repository:

!git clone https://github.com/fastai/courses.git

Here's a complete example that clones a repository and loads an Excel file stored therein. https://colab.research.google.com/notebook#fileId=1v-yZk-W4YXOxLTLi7bekDw2ZWZXWW216

Solution 3 - Git

The very simple and easy way to clone your private github repo in Google colab is as below.

  1. Your password won't be exposed
  2. Though your password contains special character also it works
  3. Just run the below snippet in Colab cell and it will execute in an interactive way
import os
from getpass import getpass
import urllib

user = input('User name: ')
password = getpass('Password: ')
password = urllib.parse.quote(password) # your password is converted into url format
repo_name = input('Repo name: ')

cmd_string = 'git clone https://{0}:{1}@github.com/{0}/{2}.git'.format(user, password, repo_name)

os.system(cmd_string)
cmd_string, password = "", "" # removing the password from the variable

Solution 4 - Git

You can use ssh protocol to connect your private repository with colab

  1. Generate ssh key pairs on your local machine, don't forget to keep
    the paraphrase empty, check this tutorial.

  2. Upload it to colab, check the following screenshot

    from google.colab import files
    uploaded = files.upload()

  3. Move the ssh kay pairs to /root and connect to git

    • remove previously ssh files
      ! rm -rf /root/.ssh/*
      ! mkdir /root/.ssh
    • uncompress your ssh files
      ! tar -xvzf ssh.tar.gz
    • copy it to root
      ! cp ssh/* /root/.ssh && rm -rf ssh && rm -rf ssh.tar.gz ! chmod 700 /root/.ssh
    • add your git server e.g gitlab as a ssh known host
      ! ssh-keyscan gitlab.com >> /root/.ssh/known_hosts
      ! chmod 644 /root/.ssh/known_hosts
    • set your git account
      ! git config --global user.email "email"
      ! git config --global user.name "username"
    • finally connect to your git server
      ! ssh [email protected]
  4. Authenticate your private repository, please check this Per-repository deploy keys.

  5. Use ! [email protected]:{account}/{projectName}.git
    note: to use push, you have to give write access for
    the public ssh key that you authenticate git server with.

Solution 5 - Git

In order to protect your account username and password, you can use getPass and concatenate them in the shell command:

from getpass import getpass
import os

user = getpass('BitBucket user')
password = getpass('BitBucket password')
os.environ['BITBUCKET_AUTH'] = user + ':' + password

!git clone https://$BITBUCKET_AUTH@bitbucket.org/{user}/repository.git

Solution 6 - Git

You can almost use this link: https://qiita.com/Rowing0914/items/51a770925653c7c528f9

as a summary of the above link you should do this steps:

1- connect your google colab runtime to your Google Drive using this commands:

from google.colab import drive
drive.mount('/content/drive')

It would need a authentication process. Do whatever it needs.

2- Set current directory the path you want to clone the Git project there:

in my example:

path_clone = "drive/My Drive/projects"
%cd path_clone

don't forget to use ! in the beginning of cd command.

3- Clone the Git project:

!git clone <Git project URL address>

now you would have the cloned Git project in projects folder in you Google Drive (which is also connected to your Google Colab runtime machine)

4- Go to your Google Drive (using browser or etc) and then go to the "projects" folder and open the .ipynb file that you want to use in Google Colab.

5- Now you have Google Colab runtime with the .ipynb that you wanted to use which is also connected to your Google Drive and all cloned git files are in the Colab runtime's storage.

Note:

1- Check that your Colab runtime is connected to Google Drive. If it's not connected, just repeat the step #1 above.

2- Double check by using "pwd" and "cd" commands that the current directory is related to the cloned git project in google Drive (step #2 above).

Solution 7 - Git

Three steps to use git to sync colab with github or gitlab.

  1. Generate a private-public key pair. Copy the private key to the system clibboard for use in step 2. Paste the public key to github or gitlab as appropriate.

    In Linux, ssh-keygen can be used to generate the key-pair in ~/.ssh. The resultant private key is in the file id_rsa, the public key is in the file id_rsa.pub.

  2. In Colab, execute

    key = \
    '''
    paste the private key here 
    (your id_rsa or id_ecdsa file in the .ssh directory, e.g.
    -----BEGIN EC PRIVATE KEY-----
    M..............................................................9
    ...............................................................J
    ..................................==
    -----END EC PRIVATE KEY-----
    '''
    ! mkdir -p /root/.ssh
    with open(r'/root/.ssh/id_rsa', 'w', encoding='utf8') as fh:
        fh.write(key)
    ! chmod 600 /root/.ssh/id_rsa
    ! ssh-keyscan github.com >> /root/.ssh/known_hosts 
    # test setup
    ! ssh -T [email protected]
    # if you see something like "Hi ffreemt! You've successfully 
    # authenticated, but GitHub does not provide shell access."
    # you are all set. You can tweak .ssh/config for multiple github accounts
    
  3. Use git to pull/push as usual.

The same idea can be used for rsync (or ssh) bewteen colab and HostA with minor changes:

  1. Generate a private-public key pair. Copy the private key to the system clibboard for use in step 2. Paste the public key to authorized_keys in .ssh in HostA.

  2. In Colab, execute

    key = \
    '''
    paste the private key here
    '''
    ! mkdir -p /root/.ssh
    with open(r'/root/.ssh/id_rsa', 'w', encoding='utf8') as fh:
        fh.write(key)
    ! chmod 600 /root/.ssh/id_rsa
    ! ssh -oStrictHostKeyChecking=no root@HostA hostnam  # ssh-keyscan 
    

HostA >> /root/.ssh/known_hosts does not seem to work with IP.

  1. Use rsync to sync files bewtenn colab and HostA as usual.

Solution 8 - Git

> Update September 2021 — For security reasons, passwords are now deprecated for github usage. Please use the Personal Access Token instead — Go to github.com -> Settings ->Developer Settings -> Personal Access Token and generate a token for the required purpose. Use this in place of your password for all tasks mentioned along this tutorial!

For more details you can also see my article on Medium : https://medium.com/geekculture/using-git-github-on-google-colaboratory-7ef3b76fe61b

None of the answers provide a straight and direct answer like this one :

GitColab

Probably this is the answer you are looking for..

Works on colab for both public and private repositories and don't change/skip any step: (Replace all {vars})

TL;DR Complete Process:

!git clone https://{your_username}:{your_password}@github.com/{destination_repo_username}/{destination_repo_projectname}.git
%cd /content/{destination_repo_username}

!git config --global user.name "{your_username}"
!git config --global user.email "{your_email_id}"
!git config --global user.password "{your_password}"

Make Your Changes and then run :

!git add .
!git commit -m "{Message}"
!git push

Cloning a Repository :

!git clone https://{your_username}:{your_password}@github.com/{destination_repo_username}/{destination_repo_projectname}.git

Change the directory to

Change the directory to {destination_repo_username} using line magic command %cd for jupyter notebooks.

%cd /content/{destination_repo_username}

Verify!

Pull

Sanity Check to see if everything works perfectly!

!git pull

If no changes were made to the remote git repo after cloning, the following should be the displayed output :

Already up to date.
Status

Similarly check the status of the staged/unstaged changes.

!git status

It should display this, with the default branch selected :

On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

Check Older Logs

Check the previous commits you have made on the repo :

!git log -n 4

Outputs Git Commit IDs with Logs :

commit 18ccf27c8b2d92b560e6eeab2629ba0c6ea422a5 (HEAD -> main, origin/main, origin/HEAD)
Author: Farhan Hai Khan <njrfarhandasilva10@gmail.com>
Date:   Mon May 31 00:12:14 2021 +0530

    Create README.md

commit bd6ee6d4347eca0e3676e88824c8e1118cfbff6b
Author: khanfarhan10 <njrfarhandasilva10@gmail.com>
Date:   Sun May 30 18:40:16 2021 +0000

    Add Zip COVID

commit 8a3a12863a866c9d388cbc041a26d49aedfa4245
Author: khanfarhan10 <njrfarhandasilva10@gmail.com>
Date:   Sun May 30 18:03:46 2021 +0000

    Add COVID Data

commit 6a16dc7584ba0d800eede70a217d534a24614cad
Author: khanfarhan10 <njrfarhandasilva10@gmail.com>
Date:   Sun May 30 16:04:20 2021 +0000

    Removed sample_data using colab (testing)

Make changes in the local repo

Make changes from the local repo directory.

These might include, edditions, deletions, edits.

Pro Tip : If you want you can copy paste things from drive to a git repo by:
Mount Google Drive:
from google.colab import drive
drive.mount('/content/gdrive')
Copy contents using shutil :
import shutil

# For a folder:
shutil.copytree(src_folder,des_folder)

# For a file:
shutil.copy(src_file,des_file)

# Create a ZipFile
shutil.make_archive(archive_name, 'zip', directory_to_zip)

Set Git Credentials

Tell Git Who You Are?

!git config --global user.name "{your_username}"
!git config --global user.email "{your_email_id}"
!git config --global user.password "{your_password}"

Check Remote Again

Check if the remote url is set and configured correctly :

!git remote -v

If configured properly it should output the following :

origin	https://{your_username}:{your_password}@github.com/{destination_repo_username}/{destination_repo_projectname}.git (fetch)
origin	https://{your_username}:{your_password}@github.com/{destination_repo_username}/{destination_repo_projectname}.git (push)

Add, Commit, Push

You know what to do.

!git add .
!git commit -m "{Message}"
!git push

Enjoy!

Solution 9 - Git

Cloning a private repo to google colab :

Generate a token:

Settings -> Developer settings -> Personal access tokens -> Generate new token

Copy the token and clone the repo (replace username and token accordingly)

!git clone https://username:[email protected]/username/repo_name.git

Solution 10 - Git

The solution https://stackoverflow.com/a/53094151/3924118 did not work for me because the expression {user} was not being converted to the actual username (I was getting a 400 bad request), so I slightly changed that solution to the following one.

from getpass import getpass
import os

os.environ['USER'] = input('Enter the username of your Github account: ')
os.environ['PASSWORD'] = getpass('Enter the password of your Github account: ')
os.environ['REPOSITORY'] = input('Enter the name of the Github repository: ')
os.environ['GITHUB_AUTH'] = os.environ['USER'] + ':' + os.environ['PASSWORD']

!rm -rf $REPOSITORY # To remove the previous clone of the Github repository
!git clone https://[email protected]/$USER/$REPOSITORY.git

os.environ['USER'] = os.environ['PASSWORD'] = os.environ['REPOSITORY'] = os.environ['GITHUB_AUTH'] = ""

If you are able to clone your-repo, you should not see any password in the output of this command. If you get an error, the password could be displayed to the output, so make sure you do not share your notebook whenever this command fails.

Solution 11 - Git

I tried some of the methods here and they all worked well, but an issue I faced was, it became difficult to handle all the git commands and other related commands, for example version control with DVC, within notebook cells. So, I turned to this nice solution, Kora. It is a terminal emulator that can be run with in colab. This gives the ease of usage very similar to a terminal in local machine. The notebook will be still alive and we can edit files and cells as usual. Since this console is temporary, no information is exposed. GitHub login and other commands can be run as usual.

Kora: https://pypi.org/project/kora/

Usage:

!pip install kora
from kora import console
console.start()

Solution 12 - Git

I finally pulled myself together and wrote a python package for this.

pip install clmutils  # colab-misc-utils

Create a dotenv or .env in /content/drive/MyDrive (if google drive is mounted to drive) or /content/drive/.env with

# for git 
user_email = "your-email"
user_name = "your-github-name"
gh_key = "-----BEGIN EC PRIVATE KEY-----
...............................................................9
your github private key........................................J
..................................==
-----END EC PRIVATE KEY-----
"

In a Colab cell

from clmutils import setup_git, Settings

config = Settings()
setup_git(
    user_name=config.user_name,
    user_email=config.user_email,
    priv_key=config.gh_key
)

You are then all set to do all the git cloen, amend code, git push stuff as if it were on your own lovely computer at home or at work.

clmutils also has a funtion called setup_ssh_tunnel to setup a reverse ssh tunnel to Colab. It also reads various keys, username, hostname from the .env file. It's a bit involving. But if you know how to manually set up a revers ssh tunnel to Colab, you would have no problems figuring out what they are used for. Details are available on the github repo (google clmutils pypi).

Solution 13 - Git

Mount the drive using:

from google.colab import drive
drive.mount('/content/drive/')

Then:

%cd /content/drive/

To clone the repo in your drive

!git clone <github repo url> 

Access other files from the repo(example: helper.py is another file in repo):

import imp 
helper = imp.new_module('helper')
exec(open("drive/path/to/helper.py").read(), helper.__dict__)

Solution 14 - Git

This works if you want to share your repo and colab. Also works if you have multiple repos. Just throw it in a cell.

import ipywidgets as widgets
from IPython.display import display
import subprocess

class credentials_input():
    def __init__(self, repo_name):
        self.repo_name = repo_name
        self.username = widgets.Text(description='Username', value='')
        self.pwd = widgets.Password(description = 'Password', placeholder='password here')
        
        self.username.on_submit(self.handle_submit_username)
        self.pwd.on_submit(self.handle_submit_pwd)        
        display(self.username)

    def handle_submit_username(self, text):
        display(self.pwd)
        return

    def handle_submit_pwd(self, text):
        cmd = f'git clone https://{self.username.value}:{self.pwd.value}@{self.repo_name}'
        process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
        output, error = process.communicate()
        print(output, error)
        self.username.value, self.pwd.value = '', ''

get_creds = credentials_input('github.com/username/reponame.git')
get_creds

Solution 15 - Git

Another solution based on answer from @Marafon Thiago:

ATENTION: In password with special caracters use the respective encoding of caracter.

Ex passwd = '@123' you should type :passwd = '%40123'

from getpass import getpass
user = getpass('BitBucket user')
password = getpass('BitBucket password')

!git init
!git clone https://{user}:{password}@bitbucket.org/aqtechengenharia/aqtlibpy.git 

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPrashanthView Question on Stackoverflow
Solution 1 - GitPaulius VenclovasView Answer on Stackoverflow
Solution 2 - GitBob SmithView Answer on Stackoverflow
Solution 3 - GitVinoj John HosanView Answer on Stackoverflow
Solution 4 - GitFadi BakouraView Answer on Stackoverflow
Solution 5 - GitMarafon ThiagoView Answer on Stackoverflow
Solution 6 - GitSharifiradView Answer on Stackoverflow
Solution 7 - GitmikeyView Answer on Stackoverflow
Solution 8 - GitFarhan Hai KhanView Answer on Stackoverflow
Solution 9 - GitmounirboulwafaView Answer on Stackoverflow
Solution 10 - GitnbroView Answer on Stackoverflow
Solution 11 - GitsreagmView Answer on Stackoverflow
Solution 12 - GitmikeyView Answer on Stackoverflow
Solution 13 - GitIshView Answer on Stackoverflow
Solution 14 - GitJaden TravnikView Answer on Stackoverflow
Solution 15 - GitvpzView Answer on Stackoverflow