Feature Frequency Profile (FFP)

An alignment-free sequence comparison based on natural language analysis in information theory (e.g., k-mer), which is primary developed to compare whole-genomic sequences such as genomes, proteomes and transcriptome, but it can also compare sequences of English alpabets (books or scripts) and custom alphabet sets other than nucleotide (four letters) or amino acids (20 letters). It constructs representing profiles and determines their distances for further visualization and interpretation.

Early FFP application was explored in 2009 (Sims GE, Jun S-R, Wu GA, Kim S-H; https://doi.org/10.1073/pnas.0909377106 ) and addressed its conceptual background and then demonstrated the potential using nucleotide and amino acids sequences.

I re-developed from scratch to improve overall performance, focus on genericity (e.g., able to compare non-genomic sequences) and harnessing multicore or parallel compute environments; thus, evolving independent from Sims GE's version of FFP applications.

Please cite one of the publications below if you are utilzing programs provided here for your publication.

JJ. Choi, “Whole-genomic sequence comparison for evolutionary studies based on a natural-language analysis of information theory,” UC Berkeley. (2024).
"Whole-proteome tree of life suggests a deep burst of organism diversity", JaeJin Choi and Sung-Hou Kim (2019), PNAS.
"A genome Tree of Life for the Fungi kingdom", JaeJin Choi and Sung-Hou Kim (2017), PNAS.

Requirements

Mainly developed and tested in linux environment (Ubuntu 20.04+), and so the command blocks provided below.
Please contact JaeJin Choi ([email protected]) if you have questions or comments regarding the program codes, bug, or usage.

GCC (g++) version 4.7.1+
Any recent g++ versions that support c++11
Google sparse hash library
https://github.com/sparsehash/sparsehash

sudo apt update
sudo apt install libsparsehash-dev

zlib version 1.2.8+
https://github.com/madler/zlib or http://www.zlib.net/

sudo apt update
sudo apt install zlib1g-dev

Tutorial / Supplement

A tutorial you can walkthrough here:
Additional fungi study supplement files (e.g., tree newick and divergence matrix) are here:

Version compatibility

The first and the second numberings indicate program compatibility between FF Profile and FFP distance calculation. For instance, any versions between 2v.3.x are compatible but not with any 2v.4.x.
Incompatibility is due to output file format change during improvement or adding more functions difficult to unify and resolve. Thus, be cautious and when using different versions.

Versions

The latest
For usage and compiling options, check individual version folder.
Old text based
, list all versions

Note

Typically, longer feature lengths (ls) consume more memory. In fungi whole proteome study, the largest proteome has 35,274 proteins containing 10,866,611 amino acids, and the version used worked for feature lengths up to 24 amino acids (l=24).

Name		Name	Last commit message	Last commit date
Latest commit History 411 Commits
example		example
ffp-wrapper		ffp-wrapper
fungi_tree_supplement		fungi_tree_supplement
resources		resources
versions		versions
.Rhistory		.Rhistory
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Feature Frequency Profile (FFP)

Requirements

Tutorial / Supplement

Version compatibility

Versions

Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Feature Frequency Profile (FFP)

Requirements

Tutorial / Supplement

Version compatibility

Versions

Note

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages