Skip to content

defgsus/github-nodes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Github nodes

Track connections between groups, people and repos on github with python and javascript

demo

Currently this is just a small python lib that interacts with the github rest v3 api. All results are cached in an mongoDB.

It's not using the GraphQL api, mainly because the results are harder to cache. And the GraphQL api uses the REST api under the hood, anyways, so i rather store all the rest results in cache and build the graph from that.

Throttling is a problem. You can create a github_credentials.py in the githubapi directory, containing:

USERNAME = "github-user-login"
TOKEN = "api-token"

or give these credentials to the GithubClient instantiation, which enables 5000 requests per hour. The GithubClient class will throttle your requests to comply with githubs rate limits.

visualization

The create_html.py is an example prog that creates a html/js website that displays the nodes you are interested in using vis.js.

usage

The class Github does all the abstraction and caching of the github api. It returns the raw json objects (including the _id from mongodb)

gh = Github()
gh.get_user("username")
gh.get_repo_list("user-or-orgname")
gh.get_repo("username", "reponame")
gh.get_repo("username/reponame")
gh.get_events("user-or-orgname")
gh.get_organisation("orgname")
gh.get_organisation_members("orgname")
gh.get_repo_contributors("username/reponame")
# or for direct uncached access
gh.get_url("repos/user/reponame")

Users and organisations are quite the same in terms of returned data-structure. Except that organisations can have members and that the rest urls are different.

The GithubNodes class will automatically follow the connections between all the known github objects. It does so by inspecting the particular data attached to each encountered objects. Orgs have members, repos have owners, ... Also the recent events are inferred for contributions. Currently, if a fork is encountered, the source repository is used instead.

gn = GithubNodes()
gn.add_organization("google", follow_depth=3)

Note that these things can really take forever for active users or organisations. (depth=1 already takes about 20 minutes in this example)

Also note that deactivating the cache for these spidery tasks is not implemeneted well. The GithubNodes class will reinspect objects quite often. If you want to have a fresh look at things, use Github.clear_cache() beforehand to remove a document or whole collection from cache.

Please also note that this is all pretty beta--if at all..

About

Visualizing connections between github users and repos (python+js)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors