ETROG-3117 Fragment Boundaries Exposed in ADM
Adds a LayoutRegion field to the ADM to break the text into structured
and unstructured segments.
ETROG-3125 Add a POS tag set field to MorphoAnalysis
Adds a TagSet enumeration and a field to MorphoAnalysis to store
which TagSet the analysis's part of speech comes from.
ETROG-3126 Analyze foreign tokens within Russian as English
Ignore null attribute values in the constructor of AnnotatedText.
Previously, it would throw a NullPointerException.
COMN-254 Fix lossy copy constructor
Fixed ArabicMorphoAnalysis.Builder's copy constructor so it copies all data.
RD-2427 Implement ADM Updates for Nearest Neighbors
relatedTerms renamed to similarTerms to match its endpoint.
RD-2428 Implement ADM Updates for Nearest Neighbors
Bugfix: relatedTerms was missing from KnownAttributes.
NOTE: relatedTerms has been renamed to similarTerms in version 2.5.2, and this version should not be used.
RD-2428 Implement ADM Updates for Nearest Neighbors
Adds the relatedTerms slot.
NOTE: relatedTerms has been renamed to similarTerms in version 2.5.2, and this version should not be used.
COMN-244 Consume new parent with updated Guava
Fixed dependencies to use new version of common-api.
COMN-244 Consume new parent with updated Guava
ROS-312 Consume new parent with updated guava
SUPPO-779 Update jackson
ROS-307 Support Java 9 JRE
ROS-305 Mention copy method doesn't include linkingConfidence
Fixed copy constructor of Mention builder.
TEJ-974 Expose FLINX confidence score
Added linkingConfidence in Mention.
TEJ-975 Double value serialization limit to 8 digits below decimal
All Double-typed fields in com.basistech.dm.AnnotatedText are serialized to 8 digits below the decimal point.
ROS-268 Add Serializable
All the model classes now support Serializable.
RD-1713 Removed internal classes
The following internal classes were removed:
com.basistech.rosette.dm.internal.Mention
com.basistech.rosette.dm.internal.TextWrapper
ROS-259 Fix some recent bugs.
ROS-257 Add relationship modality.
ROS-258 Add representation for relationship salience.
ROS-255 Add representation for embeddings.
ROS-256 Add representation for entity salience.
ROS-254 Add representation for dependencies.
ROS-251 New slot for "topics"
A document can have multiple topics. These are sort of like "labels" or "keywords" but the topic does not need to be mentioned directly in the doc. For example, an article about Michael Jordan might have topics "basketball", "sports", "Michael Jordan", "bulls", "chicago". A categorizer might simply emit "SPORTS". There's usually a single best category (though we still produce a ranked list), but we may expect to have multiple good topics (though they are still presented as a ranked list).
This change adds AnnoatedText#getTopicResults and associated
builders and tests. A single topic is held in a
CategorizerResult. getTopicResults returns a list of them.
Incremental fix to compatibility case of using the old api to add info to a 'new' adm.
ROS-229 Add 'Tnn' entity ids for entities created from V1.0 adms with no links.
ROS-230 Fix bug: head mention indices not copied by copy constructor.
RELAX-360 Fix mistake in mention sorting.
ROS-227 Document order for upgraded Entity/Mention
Contrary to the explanation below (in ROS-43), when the code constructs Entity/Mention structures from old EntityMention objects, it respects document order, never head mention order. Entities are ordered by their first mention's document position, and mentions in an entity, of course, by their document order.
ROS-226 NPE in Entity.Builder
Prevent NullPointerException when adding sentiment to an Entity.Builder created via the copy constructor.
ROS-228 Coreference chain ID compatibility
The EntityMention objects created in response to AnnotatedText.getEntityMentions() contain correct coreferenceChainId fields derived from the head mentions in the Entity objects.
ROS-218 Entity-level sentiment is a list
Entity-level sentiment used to return a single CategorizerResult.
After this change it returns a List<CategorizerResult> in confidence
ranked order. Currently there will be three results, one for each of
"neg", "neu", "pos". ResolvedEntity#getSentiment is already
deprecated because of ROS-43. It still returns the top result only.
Entity#getSentiment() is the new method that returns the list.
ROS-43 Combine EntityMentions and ResolvedEntity
Old behavior:
EntityMention and ResolvedEntity are produced in separate lists,
releated by the chainId field, where the chainId of a ResolvedEntity
is the index of the head mention in the list of EntityMention. All
EntityMentions with the same chainId are mentions of the same
entity. It is awkward to interate though the entities, which seems to
be the most popular use-case.
New behavior:
EntityMention and ResolvedEntity are deprecated, replaced by
Mention and Entity. An Entity contains a list of one or more
Mentions. The Entity list is ordered by the document order of the
head mentions. (But see change in ROS-227 above.) The Mention list
in each Entity is in document order, but note that mentions across
entities cannot be ordered in this way. Each Entity contains a
headMentionIndex which points to its head mention.
The entity type field now lives at the Entity level, not the
Mention level.
If no chaining is done, each Entity will have a single Mention
without a headMentionIndex (it will be null).
If chaining is done, headMentionIndex will always be non-null. Note
that it is possible to do chaining but still end up with all singleton
entities. In this case, headMentionIndex for each Entity will be
0.
If entity resolution is also done, the Entity will have an
entityId in addition to the headMentionIndex.
It's now easy to iterate through the entities, and the mentions within an entity. However, it is awkward to get a list of mentions in document order. A helper method may be added for this in the future.
Older json containing EntityMention and ResolvedEntity will be
deserialized into the new data structure, compatibly, so you can code
to the new api, or still use the deprecated classes and methods. It
is recommended to switch to the new api, and to re-serialize any files
with the old json format, since the compatibility layer does add some
overhead.
ROS-50 Add version attribute
This change adds a new version attribute to the serialized json form
of AnnotatedText. It will currently use version 1.1.0. An older
json without a version attribute will be treated as having version
1.0.0, and it will be converted compatibly. The changes between
versions 1.0.0 and 1.1.0 are detailed above, in ROS-43. If an
incompatible version is found at runtime, an exception will be thrown.
ROS-194 ADM model and json on github.com
This is the first release of the the annotated data model from source code at github.com. It depends on new major versions of common-api and common-lib, so even though its own API is not significantly different from the prior version, we moved up to 2.0.0.
ROS-205: make builders of polymorphic classes use generics to return the right type
Before this change, the following would not compile:
new HanMorphoAnalysis.Builder().lemma("foo").addReading("bar");
^^^^^^^^^^^^^^^^^^
because lemma() returns a base class builder but addReading() is only on the subclass builder. The reverse would work:
new HanMorphoAnalysis.Builder().addReading("bar").lemma("foo");
or a cast:
HanMorphoAnalysis.Builder hmaBuilder = new HanMorphoAnalysis.Builder();
((HanMorphoAnalysis.Builder)hmaBuilder.lemma("foo")).addReading("bar");
After the change, both orders work.
repair to the change for ROS-201 to make the builder part work.
ROS-201 AnnotatedText#getData() should be null by default
AnnotatedData#getData now returns null instead of the empty string
by default.
ROS-204 new common libs
ROS-205: make builders of polymorphic classes use generics to return the right type
The intent here was to allow something like:
new HanMorphoAnalysis.Builder().lemma("x").addReading("y");
But it does not work, since the return value of lemma is still not
HanMorhoAnalysis. Research continues to decide if there is a further
tweak that does the desired thing, or whether the commits in question
should be git revert-ed.
ROS-198 entity-level sentiment refactored
ResolvedEntity sentimentCategory and sentimentConfidence have been moved into a CategorizerResult, which also has an explanationSet. Also pull in new textanalytics parent version 57.2.14.
ROS-186 enum module moved out
The EnumModule is now part of common-api; also pick up the fact that
common-api has an independent version. Code changes are required
since the package has changed from:
com.basistech.rosette.dm.jackson.EnumModule
to
com.basistech.util.jackson.EnumModule
ROS-183 new parent and some checkstyle repairs
ROS-181 add entity-level sentiment
ResolvedEntity now has slots for sentimentCategory (e.g. "positive", "negative", "neutral") and a sentimentConfidence.
- ROS-98 Apply workarounds to bugs in Jackson 2.6.2.
- [ROS-166] (https://basistech.atlassian.net/browse/ROS-98) Use new parent that uses maven-bundle-plugin 3.0.0.
hasSyntheticPredicate should check for null extents.
Note: This release changes the inventory of components that are built here, and makes 'interesting' API changes. There are no more '-osgi' jar files; the necessary OSGi metadata is in the jars that need it. When moving to this version, please be sure to take advantage of AbstractAnnotator and note the switch to Jackson 2.6.2.
Throw an appropriate Jackson exception (InvalidFormatException) when we encounter a bad ISO-639 code.
String to AnnotatedText.
Annotator.annotate no longer declares throws RosetteException.
AnnotateText no longer implements CharSequence. Callers will need
to use getData() to get to the textual data.
ROS-159 stop serializing null values.
ROS-97 More OSGi/shade improvements
We got rid of the adm-model-osgi vs. adm-model distinction. There's just adm-model with OSGi metadata. The MANIFEST is maintained by configuration in the pom.xml.
The shading removed in ROS-87 is back here; the one adm-model jar does shade Guava.
ROS-87 OSGi/shade improvements
adm-json-array was integrated into adm-json. adm-json was made into an OSGi bundle. It still functions outside of OSGi but you must provide the guava dependency.
adm-json-osgi was removed. The package com.basistech.rosette.dm.jackson was removed and replaced with com.basistech.rosette.dm.json.plain and com.basistech.rosette.dm.json.array.
adm-model-osgi no longer contains a shaded copy of guava, it just imports it via OSGi metadata.
[RELAX-143] (https://basistech.atlassian.net/browse/RELAX-143) Add relationship fields.
[RELAX-139] (https://basistech.atlassian.net/browse/RELAX-139) Fixing a crash in the copy builder of RelationshipMention
Missing initializations
ROS-88 Parent 57.1.1
ROS-76 Fix serialization for CharSequence
If an AnnotatedText was constructed with a special CharSequence (rather than just a string), the serialized form could contain extraneous fields from the non-String object, which couldn't be deserialized. This change forces the serialziation to use just the toString representation of the CharSequence.
Rework of the relationship classes. not compatible, but nothing touches them outside of relax, yet.
Rerelease with 57.0.1 as parent, so that it looks for common 35.0.0 instead of 34.x.x. Let this be a lesson to us in picking version numbers.
COMN-199 Remove adm-shaded
adm-shaded was a mistake.
COMN-198 Add Relationship Support
This also includes a move to basis parent 57.0.0, and so a requirement for Java 1.7.
COMN-189 Add ComposingAnnotator
com.basistech.rosette.dm.util.ComposingAnnotator provides a function
needed in RBL-JVM, which is to group a sequence of annotators into a
single annotator.
COMN-186 Optimize copying attributes
There were too many copies of the map behind extendedProperties, which exist on every attribute, even if empty. This showed up in an ADM application that did a lot of Token copies.
COMN-183 Rename setEndOffset
Attribute.Builder.setEndOffset() was deprecated. Use the new method
Attribute.Builder.endOffset() instead.
COMN-190 ADM builders can reset lists
Attribute builders that hold lists now have a setter for the entire
list, not just the ability to add new elements. For example, in
Token.Builder:
public Builder addAnalysis(MorphoAnalysis analysis);
public Builder analyses(List<MorphoAnalysis> analyses); // new
Calling the list setter with an empty list or null results in an empty list in the Builder.
COMN-180 Remove guava dependency
adm-model-osgi no longer externally depend on guava; adm-shaded no longer has extra copies of Guava classes in it.
In support of INDOC-11, added com.basistech.dm.internal to the exported package list so that indoc can get to TextWrapper. Not wonderfully pretty, and not the end of the world.
COMN-167 Restructure OSGi more
The Jackson customizations move out of the com.basistech.dm package, into com.basistech.dm.jackson and com.basistech.dm.jackson.array. The com.basistech:adm-osgi artifact was split into com.basistech:adm-model-osgi and com.basistech:adm-json-osgi.
Moving the Jackson classes to to their own OSGi bundle allows an OSGi application to use the Jackson code inside the OSGi environment and still use the main data model to communicate between the inside and the outside.
Note that the package movement means that this has to fan out to projects that are using the json support today.
COMN-163 Restructure OSGi
All the OSGi metadata moves to a new artifact, adm-osgi. This avoids 'split package' issues with multiple OSGi components with code in the same package.
COMN-158 Support array "shape" json
The new adm-json-array artifact supports serialization of ADM as array-shaped json. There is some space savings and some possible runtime savings, however components should never store this format. It's strictly for use "over the wire" - intended for RaaS.
To use normal ADM "object" shape serialization, do this:
ObjectMapper mapper = AnnotatedDataModelModule.setupObjectMapper(new ObjectMapper());
To use the new "array" shape serialization, do this:
ObjectMapper mapper = AnnotatedDataModelArrayModule.setupObjectMapper(new ObjectMapper());
COMN-142 Boxed Integer/Double consistency
ResolvedEntity coreferenceChainId was changed from int to Integer. ResolvedEntity confidence was changed from double to Double.
Clients may have to adjust code on ADM boundaries if they are dealing with -1 chainIds, and they may need to add checks for null.
COMN-140 AraDmConverter - separate api/cli
AraDmConverter#main has been moved to a separate class AraDmConverterCommand. This avoids the need for users of the AraDmConverter api to depend on args4j.
COMN-137 Name translation api
Added the Name class, which is a stripped down version of the RNI Name. The intent is to allow RNT functionality via the ADM.
COMN-61 Add map api for metadata
Added documentMetadata convenience method to build metadata from a map.
COMN-73 Lang code serialization
COMN-76 Enum map key serialzaiton
These allow serialization from ISO codes rather than the enum string, e.g. "eng" instead of "ENGLISH".
COMN-147 Dispatching Annotator
com.basistech.rosette.dm.util.WholeDocumentLanguageDispatchAnnotatorBuilder
creates Annotator objects that delegate based on language. This
permits an application to build a family of annotators, one per
language, and then aggregate them into a single annotator. It is a DRY
feature; it's not complex, it's just that we don't want to end up with
multiple copies of this in multiple places.
COMN-151 New Parent/New Common
Pick up rosette-common-java 34.0.0, with new jar structure, via textanalytics 56.3.1.
COMN-133 Support unordered json attributes
json attributes are unordered per the spec, so tools are allowed to shuffle map keys. This change allows deserialization from shuffled keys.
COMN-149 Remove Guava dependency
Guava is now shaded inside adm-model.
COMN-152 OSGi metadata
The adm-model jar file now has OSGi metadata. We intend to use this for a classpath isolation solution for RaaS.
COMN-92 Empty analysis fix
Fixes serialization of an empty analysis.