Can you get the number of lines of code from a GitHub repository?

GithubLine Count

Github Problem Overview


In a GitHub repository you can see “language statistics”, which displays the percentage of the project that’s written in a language. It doesn’t, however, display how many lines of code the project consists of. Often, I want to quickly get an impression of the scale and complexity of a project, and the count of lines of code can give a good first impression. 500 lines of code implies a relatively simple project, 100,000 lines of code implies a very large/complicated project.

So, is it possible to get the lines of code written in the various languages from a GitHub repository, preferably without cloning it?


The question “https://stackoverflow.com/q/4822471/388916” asks how to count the lines of code in a local Git repository, but:

  1. You have to clone the project, which could be massive. Cloning a project like Wine, for example, takes ages.
  2. You would count lines in files that wouldn’t necessarily be code, like i13n files.
  3. If you count just (for example) Ruby files, you’d potentially miss massive amount of code in other languages, like JavaScript. You’d have to know beforehand which languages the project uses. You’d also have to repeat the count for every language the project uses.

All in all, this is potentially far too time-intensive for “quickly checking the scale of a project”.

Github Solutions


Solution 1 - Github

A shell script, cloc-git

You can use this shell script to count the number of lines in a remote Git repository with one command:

#!/usr/bin/env bash
git clone --depth 1 "$1" temp-linecount-repo &&
  printf "('temp-linecount-repo' will be deleted automatically)\n\n\n" &&
  cloc temp-linecount-repo &&
  rm -rf temp-linecount-repo

Installation

This script requires CLOC (“Count Lines of Code”) to be installed. cloc can probably be installed with your package manager – for example, brew install cloc with Homebrew. There is also a docker image published under mribeiro/cloc.

You can install the script by saving its code to a file cloc-git, running chmod +x cloc-git, and then moving the file to a folder in your $PATH such as /usr/local/bin.

Usage

The script takes one argument, which is any URL that git clone will accept. Examples are https://github.com/evalEmpire/perl5i.git (HTTPS) or [email protected]:evalEmpire/perl5i.git (SSH). You can get this URL from any GitHub project page by clicking “Clone or download”.

Example output:

$ cloc-git https://github.com/evalEmpire/perl5i.git
Cloning into 'temp-linecount-repo'...
remote: Counting objects: 200, done.
remote: Compressing objects: 100% (182/182), done.
remote: Total 200 (delta 13), reused 158 (delta 9), pack-reused 0
Receiving objects: 100% (200/200), 296.52 KiB | 110.00 KiB/s, done.
Resolving deltas: 100% (13/13), done.
Checking connectivity... done.
('temp-linecount-repo' will be deleted automatically)


     171 text files.
     166 unique files.                                          
      17 files ignored.

http://cloc.sourceforge.net v 1.62  T=1.13 s (134.1 files/s, 9764.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Perl                           149           2795           1425           6382
JSON                             1              0              0            270
YAML                             2              0              0            198
-------------------------------------------------------------------------------
SUM:                           152           2795           1425           6850
-------------------------------------------------------------------------------

Alternatives

Run the commands manually

If you don’t want to bother saving and installing the shell script, you can run the commands manually. An example:

$ git clone --depth 1 https://github.com/evalEmpire/perl5i.git
$ cloc perl5i
$ rm -rf perl5i

Linguist

If you want the results to match GitHub’s language percentages exactly, you can try installing Linguist instead of CLOC. According to its README, you need to gem install linguist and then run linguist. I couldn’t get it to work (issue #2223).

Solution 2 - Github

> You can run something like

git ls-files | xargs wc -l

which will give you the total count →

lines of code

Or use this tool → http://line-count.herokuapp.com/

Solution 3 - Github

I created an extension for Google Chrome browser - GLOC which works for public and private repos.

Counts the number of lines of code of a project from:

  • project detail page
  • user's repositories
  • organization page
  • search results page
  • trending page
  • explore page


enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here

Solution 4 - Github

If you go to the graphs/contributors page, you can see a list of all the contributors to the repo and how many lines they've added and removed.

Unless I'm missing something, subtracting the aggregate number of lines deleted from the aggregate number of lines added among all contributors should yield the total number of lines of code in the repo. (EDIT: it turns out I was missing something after all. Take a look at orbitbot's comment for details.)

UPDATE:

This data is also available in GitHub's API. So I wrote a quick script to fetch the data and do the calculation:

'use strict';

async function countGithub(repo) {
    const response = await fetch(`https://api.github.com/repos/${repo}/stats/contributors`)
    const contributors = await response.json();
    const lineCounts = contributors.map(contributor => (
        contributor.weeks.reduce((lineCount, week) => lineCount + week.a - week.d, 0)
    ));
    const lines = lineCounts.reduce((lineTotal, lineCount) => lineTotal + lineCount);
    window.alert(lines);
}

countGithub('jquery/jquery'); // or count anything you like

Just paste it in a Chrome DevTools snippet, change the repo and click run.

Disclaimer (thanks to lovasoa):

Take the results of this method with a grain of salt, because for some repos (sorich87/bootstrap-tour) it results in negative values, which might indicate there's something wrong with the data returned from GitHub's API.

UPDATE:

Looks like this method to calculate total line numbers isn't entirely reliable. Take a look at orbitbot's comment for details.

Solution 5 - Github

You can clone just the latest commit using git clone --depth 1 <url> and then perform your own analysis using Linguist, the same software Github uses. That's the only way I know you're going to get lines of code.

Another option is to use the API to list the languages the project uses. It doesn't give them in lines but in bytes. For example...

$ curl https://api.github.com/repos/evalEmpire/perl5i/languages
{
  "Perl": 274835
}

Though take that with a grain of salt, that project includes YAML and JSON which the web site acknowledges but the API does not.

Finally, you can use code search to ask which files match a given language. This example asks which files in perl5i are Perl. https://api.github.com/search/code?q=language:perl+repo:evalEmpire/perl5i. It will not give you lines, and you have to ask for the file size separately using the returned url for each file.

Solution 6 - Github

Not currently possible on Github.com or their API-s

I have talked to customer support and confirmed that this can not be done on github.com. They have passed the suggestion along to the Github team though, so hopefully it will be possible in the future. If so, I'll be sure to edit this answer.

Meanwhile, Rory O'Kane's answer is a brilliant alternative based on cloc and a shallow repo clone.

Solution 7 - Github

From the @Tgr's comment, there is an online tool : https://codetabs.com/count-loc/count-loc-online.html

LOC counting example for strimzi/strimzi-kafka-operator repository

Solution 8 - Github

You can use GitHub API to get the sloc like the following function

function getSloc(repo, tries) {

    //repo is the repo's path
    if (!repo) {
        return Promise.reject(new Error("No repo provided"));
    }

    //GitHub's API may return an empty object the first time it is accessed
    //We can try several times then stop
    if (tries === 0) {
        return Promise.reject(new Error("Too many tries"));
    }

    let url = "https://api.github.com/repos" + repo + "/stats/code_frequency";

    return fetch(url)
        .then(x => x.json())
        .then(x => x.reduce((total, changes) => total + changes[1] + changes[2], 0))
        .catch(err => getSloc(repo, tries - 1));
}

Personally I made an chrome extension which shows the number of SLOC on both github project list and project detail page. You can also set your personal access token to access private repositories and bypass the api rate limit.

You can download from here https://chrome.google.com/webstore/detail/github-sloc/fkjjjamhihnjmihibcmdnianbcbccpnn

Source code is available here https://github.com/martianyi/github-sloc

Solution 9 - Github

You can use tokei:

cargo install tokei
git clone --depth 1 https://github.com/XAMPPRocky/tokei
tokei tokei/

Output:

===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 BASH                    4           48           30           10            8
 JSON                    1         1430         1430            0            0
 Shell                   1           49           38            1           10
 TOML                    2           78           65            4            9
-------------------------------------------------------------------------------
 Markdown                4         1410            0         1121          289
 |- JSON                 1           41           41            0            0
 |- Rust                 1           47           38            5            4
 |- Shell                1           19           16            0            3
 (Total)                           1517           95         1126          296
-------------------------------------------------------------------------------
 Rust                   19         3750         3123          119          508
 |- Markdown            12          358            5          302           51
 (Total)                           4108         3128          421          559
===============================================================================
 Total                  31         6765         4686         1255          824
===============================================================================

Tokei has support for badges:

Count Lines
[![](https://tokei.rs/b1/github/XAMPPRocky/tokei)](https://github.com/XAMPPRocky/tokei)

> By default the badge will show the repo's LoC(Lines of Code), you can also specify for it to show a different category, by using the ?category= query string. It can be either code, blanks, files, lines, comments.

Count Files
[![](https://tokei.rs/b1/github/XAMPPRocky/tokei?category=files)](https://github.com/XAMPPRocky/tokei)

Solution 10 - Github

Firefox add-on Github SLOC

I wrote a small firefox addon that prints the number of lines of code on github project pages: Github SLOC

Solution 11 - Github

Hey all this is ridiculously easy...

  1. Create a new branch from your first commit
  2. When you want to find out your stats, create a new PR from main
  3. The PR will show you the number of changed lines - as you're doing a PR from the first commit all your code will be counted as new lines

And the added benefit is that if you don't approve the PR and just leave it in place, the stats (No of commits, files changed and total lines of code) will simply keep up-to-date as you merge changes into main. :) Enjoy.

enter image description here

Solution 12 - Github

npm install sloc -g
git clone --depth 1 https://github.com/vuejs/vue/
sloc ".\vue\src" --format cli-table
rm -rf ".\vue\"
Instructions and Explanation
  1. Install sloc from npm, a command line tool (Node.js needs to be installed).
npm install sloc -g
  1. Clone shallow repository (faster download than full clone).
git clone --depth 1 https://github.com/facebook/react/
  1. Run sloc and specifiy the path that should be analyzed.
sloc ".\react\src" --format cli-table

sloc supports formatting the output as a cli-table, as json or csv. Regular expressions can be used to exclude files and folders (Further information on npm).

  1. Delete repository folder (optional)

Powershell: rm -r -force ".\react\" or on Mac/Unix: rm -rf ".\react\"

Screenshots of the executed steps (cli-table):

sloc output as acli-table

sloc output (no arguments):

sloc output without arguments

It is also possible to get details for every file with the --details option:

sloc ".\react\src" --format cli-table --details     

Solution 13 - Github

Open terminal and run the following:

curl -L "https://api.codetabs.com/v1/loc?github=username/reponame"

Solution 14 - Github

If the question is "can you quickly get NUMBER OF LINES of a github repo", the answer is no as stated by the other answers.

However, if the question is "can you quickly check the SCALE of a project", I usually gauge a project by looking at its size. Of course the size will include deltas from all active commits, but it is a good metric as the order of magnitude is quite close.

E.g.

How big is the "docker" project?

In your browser, enter api.github.com/repos/ORG_NAME/PROJECT_NAME i.e. api.github.com/repos/docker/docker

In the response hash, you can find the size attribute:

{
    ...
    size: 161432,
    ...
}

This should give you an idea of the relative scale of the project. The number seems to be in KB, but when I checked it on my computer it's actually smaller, even though the order of magnitude is consistent. (161432KB = 161MB, du -s -h docker = 65MB)

Solution 15 - Github

Pipe the output from the number of lines in each file to sort to organize files by line count. git ls-files | xargs wc -l |sort -n

Solution 16 - Github

This is so easy if you are using Vscode and you clone the project first. Just install the Lines of Code (LOC) Vscode extension and then run LineCount: Count Workspace Files from the Command Pallete.

The extension shows summary statistics by file type and it also outputs result files with detailed information by each folder.

Solution 17 - Github

There in another online tool that counts lines of code for public and private repos without having to clone/download them - https://klock.herokuapp.com/

screenshot

Solution 18 - Github

shields.io has a badge that can count up all the lines for you here. Here is an example of what it looks like counting the Raycast extensions repo:

https://img.shields.io/tokei/lines/github/raycast/extensions

Solution 19 - Github

None of the answers here satisfied my requirements. I only wanted to use existing utilities. The following script will use basic utilities:

  • Git
  • GNU or BSD awk
  • GNU or BSD sed
  • Bash

Get total lines added to a repository (subtracts lines deleted from lines added).

#!/bin/bash
git diff --shortstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD | \
sed 's/[^0-9,]*//g' | \
awk -F, '!($2 > 0) {$2="0"};!($3 > 0) {$3="0"}; {print $2-$3}'

Get lines of code filtered by specified file types of known source code (e.g. *.py files or add more extensions, etc).

#!/bin/bash
git diff --shortstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- *.{py,java,js} | \
sed 's/[^0-9,]*//g' | \
awk -F, '!($2 > 0) {$2="0"};!($3 > 0) {$3="0"}; {print $2-$3}'

4b825dc642cb6eb9a060e54bf8d69288fbee4904 is the id of the "empty tree" in Git and it's always available in every repository.

Sources:

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionHubroView Question on Stackoverflow
Solution 1 - GithubRory O'KaneView Answer on Stackoverflow
Solution 2 - GithubAhmad AwaisView Answer on Stackoverflow
Solution 3 - GithubKas ElvirovView Answer on Stackoverflow
Solution 4 - GithubLewisView Answer on Stackoverflow
Solution 5 - GithubSchwernView Answer on Stackoverflow
Solution 6 - GithubHubroView Answer on Stackoverflow
Solution 7 - GithubKarbos 538View Answer on Stackoverflow
Solution 8 - GithubYi KaiView Answer on Stackoverflow
Solution 9 - GithubGorkaView Answer on Stackoverflow
Solution 10 - GithublovasoaView Answer on Stackoverflow
Solution 11 - GithubPaul M SorauerView Answer on Stackoverflow
Solution 12 - GithubTobi ObeckView Answer on Stackoverflow
Solution 13 - Githubishandutta2007View Answer on Stackoverflow
Solution 14 - GithubJimmy DaView Answer on Stackoverflow
Solution 15 - GithubCambodianCoderView Answer on Stackoverflow
Solution 16 - GithubMike BendorfView Answer on Stackoverflow
Solution 17 - GithubsicvoloView Answer on Stackoverflow
Solution 18 - GithubsandypocketsView Answer on Stackoverflow
Solution 19 - GithubSam GleskeView Answer on Stackoverflow