I’m writing an overview of Code for a project and thought I’d see if you can query GitHub at a meta-data level. Turns out you sure can 🙂 Check out the GitHub Archive over here http://www.githubarchive.org/
Row | repository_language | repos_by_lang | |
1 | JavaScript | 267193 | |
2 | Java | 200889 | |
3 | Ruby | 188521 | |
4 | PHP | 126322 | |
5 | CSS | 124412 | |
6 | C | 111537 | |
7 | Python | 103123 | |
8 | C++ | 56977 | |
9 | Objective-C | 44526 | |
10 | C# | 42716 | |
11 | Shell | 33517 | |
12 | R | 19724 |
(Generated used Adam Bard’s query and Google Big Query via his blog http://adambard.com/blog/top-github-languages-for-2013-so-far/)
SELECT repository_language, count(repository_language) AS repos_by_lang
FROM [githubarchive:github.timeline]
WHERE repository_fork == “false”
AND type == “CreateEvent”
AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC(‘2014-01-01 00:00:00’)
AND PARSE_UTC_USEC(repository_created_at) < PARSE_UTC_USEC(‘2014-06-10 00:00:00’)
GROUP BY repository_language
ORDER BY repos_by_lang DESC
LIMIT 100