This repository contains the source files of Talend Data Quality libraries.
| Project | Description |
|---|---|
| dataquality-common | Abstractions of data analysis, and low-level utilities such as East Asian text pattern recognition |
| dataquality-libraries | Parent pom aggregating other library projects, devops tools |
| dataquality-record-linkage | Record Matching algorithms, blocking key calculation and T-Swoosh |
| dataquality-sampling | Reservoir sampling, data masking, data duplication |
| dataquality-semantic | API for semantic category analysis |
| dataquality-standardization | Standardization library based on Apache Lucene |
| dataquality-statistics | API for data analysis and statistics (require JDK1.8) |
| dataquality-wordnet | Content validation API based on WordNet dictionary |
Talend Open Studio for Data Quality can be download from the Talend website.
- All project are maven based.
- The parent pom builds all the libraries.
Copyright (c) 2006-2016 Talend
Licensed under the Apache Licence v2
