Using PowerShell to find the oldest commit of a public GitHub repo without cloning
Posted: (EET/GMT+2)
GitHub is a great place for open-source projects. Going through a project's main page with your web browser is informative, but something very simple is lacking: how old is this repo, anyway?
Yes, the main page usually gives you hints about this with commits and dates, but on popular and highly active repos, you can't really tell. Also, while you can list all commits, the web UI only provides the Next/Previous buttons. Good luck with that on any active repository with 100K+ commits.
So, what would work? You could clone the whole repo locally with git clone and then look at the first log entry. Great, but what if you just need the date?
PowerShell to the rescue. The GitHub API at api.github.com has the commit endpoint, that very helpfully lists the URL to the last commit (it uses newest-first ordering). Grab this value, and you can get information about that commit, and there you have it.
Here's an example PowerShell script, save this to Get-GitHubFirstCommit.ps1 and run:
param(
[Parameter(Mandatory = $true)]
[string]$Owner,
[Parameter(Mandatory = $true)]
[string]$Repo,
# Optional: branch, tag, or commit SHA. If omitted, GitHub uses the default branch.
[string]$Ref
)
$ErrorActionPreference = "Stop"
$headers = @{
"Accept" = "application/vnd.github+json"
"User-Agent" = "PowerShell-GitHub-First-Commit"
}
$baseUrl = "https://api.github.com/repos/$Owner/$Repo/commits"
$query = @{
per_page = 1
}
if ($Ref) {
$query.sha = $Ref
}
function ConvertTo-QueryString {
param([hashtable]$Params)
($Params.GetEnumerator() | ForEach-Object {
"$($_.Key)=$([uri]::EscapeDataString([string]$_.Value))"
}) -join "&"
}
function Get-LastPageFromLinkHeader {
param([string]$LinkHeader)
if (-not $LinkHeader) {
return 1
}
# Example:
# ; rel="last"
if ($LinkHeader -match '<[^>]*[?&]page=(\d+)[^>]*>;\s*rel="last"') {
return [int]$Matches[1]
}
return 1
}
$firstUrl = $baseUrl + "?" + (ConvertTo-QueryString $query)
# Use Invoke-WebRequest so we can inspect response headers.
$response = Invoke-WebRequest -Uri $firstUrl -Headers $headers -Method Get
$linkHeader = $response.Headers["Link"]
$lastPage = Get-LastPageFromLinkHeader -LinkHeader $linkHeader
$query.page = $lastPage
$lastUrl = $baseUrl + "?" + (ConvertTo-QueryString $query)
# echo "Last commit's URL is: $lastUrl"
$commitResponse = Invoke-RestMethod -Uri $lastUrl -Headers $headers -Method Get
if (-not $commitResponse -or $commitResponse.Count -eq 0) {
Write-Error "No commits found for $Owner/$Repo."
exit 1
}
$commit = $commitResponse[0]
[PSCustomObject]@{
Repository = "$Owner/$Repo"
Ref = if ($Ref) { $Ref } else { "(default branch)" }
Sha = $commit.sha
AuthorDate = $commit.commit.author.date
CommitterDate = $commit.commit.committer.date
AuthorName = $commit.commit.author.name
AuthorEmail = $commit.commit.author.email
Message = $commit.commit.message
HtmlUrl = $commit.html_url
}
When you run the script, it needs two parameters: the owner (GitHub organization name) and the repository's name. For example, with the Roslyn compiler's repo:
.\Get-GitHubFirstCommit.ps1 -Owner dotnet -Repo roslyn
Then, the tool will report:
Repository : dotnet/roslyn
Ref : (default branch)
Sha : 3611ed35610793e814c8aa25715aa582ec08a8b6
AuthorDate : 18.3.2014 22.47.06
CommitterDate : 18.3.2014 22.47.06
AuthorName : Pilchie
AuthorEmail : kevinpi@microsoft.com
Message : Roslyn.Says("Hello World");
HtmlUrl : https://github.com/dotnet/roslyn/commit/3611ed35610793e814c8aa25715aa582ec08a8b6
Happy GitHub digging!