GitHub - maptracker/maptracker: Massive triple-store graph database allowing X-to-Y identifier conversion

MapTracker Graph Database

MapTracker is a massive graph database - over a terabyte on disk, 1.2B nodes, 2.0B edges, 3.5B metadata assignments. It's used at BMS to "resolve X to Y" - that is, given an object of "type X" find - in a qualitative way - all "related" objects of "type Y". This is done using an aggresively normalized triple store and a large set of rules that dictate what kinds of edges are reasonable to traverse when going from X to Y.

MapTracker is generally not used "on its own", but is rather a component in other tools. Examples avaiable here are:

Chem-Bio Hopper - "Hop" from biology to chemistry, or vice-versa, using published chemical activities
Hypergeometric Affy - Given a set of "interesting" (generally overexpressed) Affymetrix probesets, run Fisher's Exact Test to identify ontologies that appear overrepresented in the set.
Standardize Gene - Given a set of gene identifiers (eg symbols), attempt to determine what they "really are" (ie, given messy gene symbols, convert to rigorous gene accessions)

The schema (tables) is relatively simple. What has made MapTracker particularly powerful is:

Careful normalization of loaded data
Segregation of nodes into namespaces. Ameliorates collisions, particularly with identifiers like gene symbols
Exhaustive logic defining valid connections between X-to-Y. Example, RNA to probeset
Generic transitive logic that lets X-to-Y be automatically merged with Y-to-Q and Q-to-W in order to find X-to-W. Such "chains" allow only fundamental connections to be defined yet allow the network to be (safely, rationally) explored far beyone its expected "neighbors"

The image below is an auto-generated network, created by sampling 20,000 random edges from the database (created by exploreSelf.pl). It represents, at a high level, the common node-edge-node triples held by the database.

All edges are part of a controlled vocabulary. Most (though not all) are directional. The edges in the above sample include:

BMS Public Disclosure approval

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
MapTracker		MapTracker
img		img
.gitignore		.gitignore
ChemBioHopper.md		ChemBioHopper.md
HypergeometricAffy.md		HypergeometricAffy.md
MapTracker.param		MapTracker.param
MapTracker.pm		MapTracker.pm
PubD-Disclosure-Approval.md		PubD-Disclosure-Approval.md
README.md		README.md
chemBioHopper.pl		chemBioHopper.pl
hypergeometric_affy.pl		hypergeometric_affy.pl
mapBrowser.pl		mapBrowser.pl
mapTracker.pl		mapTracker.pl
perlGuide.conf		perlGuide.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MapTracker Graph Database

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MapTracker Graph Database

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages