buzz is a Python library for parsing and analysing natural language.
It relies heavily on pandas, numpy, and occasionally NLTK. Dependency parsing is done by spaCy, and dependency searching is handled by a purpose-built library called depgrep. Almost all major data structures are based on Pandas' DataFrames, so you can use that functionality for anything that isn't already provided by buzz.
Note that a shorter, general introduction to buzz is available via GitHub. This site provides more comprehensive documentation.
- Installation
- Modelling and parsing corpora
- Exploring parsed datasets
- Processing raw strings
- Generating tables
- Concordancing
- Measuring prototypicality and similarity
- Working with pandas
- Interactive visualisation in the browser
- Case study: lexical density
For a web-app based on buzz, called buzzword, head here. If you're not such a strong programmer, but want to be able to use the core features of buzz, then this is likely the project for you. This code is open-source, and I can help you get it running on your university server with the datasets you want to be able to explore.
Pull requests are always welcome for both buzz and buzzword. I believe they can address a lot of shortcomings in available tools for research into natural language, and welcome any collaboration you might want to offer.