A Java version of Hazm (Python library for digesting Persian text)
- Text cleaning
- Sentence and word tokenizer
- Word lemmatizer
- POS tagger
- Dependency parser
- Corpus readers for Hamshahri and Bijankhan
- You can download pre-trained tagger and parser models for persian and put these models in the
core/src/main/resourcesfolder of your project.
You must install this module with maven.
Add this dependency to your pom.xml:
<dependency>
<groupId>ir.ac.iust.nlp</groupId>
<artifactId>jhazm</artifactId>
<version>1.0.0</version>
</dependency>Note: If the artifact is not available in Maven Central, you can use JitPack:
- Add JitPack repository to your
pom.xml:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>- Add the dependency:
<dependency>
<groupId>com.github.majidasgari</groupId>
<artifactId>JHazm</artifactId>
<version>master-SNAPSHOT</version>
</dependency>For using this project as library in maven just use:
mvn clean installTo make a single jar file run this command:
mvn clean compile assembly:singleTo run and see the help:
java -jar target/jhazm-jar-with-dependencies.jarFor example to do POS Tag on bundled sample file use:
java -jar target/jhazm-jar-with-dependencies.jar -a partOfSpeechTagging -o test.txtOr to run on any other file:
java -jar target/jhazm-jar-with-dependencies.jar -a partOfSpeechTagging -o test.txt -i input.txtOr on some piece of text:
java -jar target/jhazm-jar-with-dependencies.jar -a partOfSpeechTagging -o test.txt -t "سلام من خوب هستم!"- Java 21 or higher
- Maven 3.6+
Good Luck!