Skip to content

gnarph/DIRT

Repository files navigation

DIRT

Master: Build Status

Develop: Build Status Coverage Status

What is DIRT?

DIRT (Dynamic Identification of Reused Text) aims to allow users (primarily academics) to find passages that are shared by pairs of documents within a corpus. It will allow them to view pairs of documents and their common passages, as well as show which documents within a corpus have common passages with one particular document within the same corpus, known as the focus document.

DIRT also aims to be extensible to support other languages, although ancient Chinese will be the focus for the prototype. DIRT should be able to find matches in a UTF-8 encoded corpus in any language, with a language specific module improving the permissiveness of matching.

Install Dependencies

Dependencies can be installed with

pip install --allow-external jianfan --allow-unverified jianfan -r requirements.txt

Running Tests

Tests can be run from the root directory with

nosetests

Coverage can be checked using

./check_test_coverage.sh

Contributing

Python code should follow PEP 8 and have tests before pull requesting or merging to develop.

About

DIRT Identifies Reused Text

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors