OMTD-SHARE Java Annotatations 3.0.2.7

The document explains the how to embed OpenMinTeD-SHARE descriptor in software components and resource packages and how to use related tooling.

Introduction

OpenMinTeD SHARE is a set of metadata elements to describe text and data-mining (TDM) and natural language processing (NLP) resources.

Get sources for this project from GitHub:

Package structure

We expect that the resource you distribute is packaged as a ZIP or JAR file. Within this file, you can place your OpenMinTeD descriptors at any location you like best. If you package descriptors for TDM software components, you may wish to place them directly next to the classes that implement the these components. If you package multiple resources, e.g. annotation schema descriptions, in a single ZIP file you might wish to place the respective OpenMinTeD-SHARE descriptions directly next to them.

However, in order to allow for the automatic detection of your OpenMinTeD-SHARE descriptors, you will have to set up a file in a well-known location which points to your descriptors. This is described in the next section.

Discovery

Making descriptors discoverable

In order to make your descriptors discoverable, you have to create a folder called META-INF/eu.openminted.share in your ZIP or JAR file. Into this folder, you have to place a file called descriptors.txt with pointers to the actual descriptor files.

Example descriptors.txt file
../../descriptors/Component1.xml
../../descriptors/Component2.xml

Discovering descriptors

We presently provide a convenience API for Java to discover descriptors. Additionally, we assume for the moment, that any descriptors you may wish to discover exist on your Java classpath, i.e. they are packaged as Maven artifacts.

In the near future, we will also provide convenience methods to discover descriptors in explicitly named JAR and ZIP files that do not need to be on the Java classpath.

To locate the descriptors, you can use the DescriptorFactory class.

Example of locating descriptors using the DescriptorFactory class
URL[] descriptorPaths = DescriptorFactory.scanDescriptors();

Annotating components

Java classes can be directly annotated with OpenMinTeD-SHARE metadata so you do not have to manually maintain a separate XML file. The OpenMinTeD-SHARE Maven Plugin can then be used to generate the OpenMinTeD-SHARE descriptor automatically as part of a build.

Example OpenMinTeD-SHARE annotations on a Java class
import eu.openminted.share.annotations.api.Component;
import eu.openminted.share.annotations.api.constants.OperationType;

@Component(classes=OperationType.READER)
class TextCorpusReader

Documentation

In addition to the descriptions obtained from the native framework descriptors, it is possible to reference external documentation or publications. URLs referring to such external resources may contain placeholders such as version or command. In addition, properties defined in the configuration section of the OMTD-SHARE Maven plugin in the Maven POM are interpolated. This is useful e.g. to centrally configure a documentation base URL.

Table 1. Built-in properties
Property Description

version

Component version

command

Command registered in the first distribution info section of the OMTD-SHARE descriptor

shortClassName

If the command contains dots, this property addresses the substring starting after the last dot

External documentation URL example (Java)
import eu.openminted.share.annotations.api.Component;
import eu.openminted.share.annotations.api.DocumentationResource;
import eu.openminted.share.annotations.api.constants.OperationType;

@Component(classes=OperationType.READER)
@DocumentationResource("${docbase}/${version}/${command}.html")
class TextCorpusReader
External documentation URL example (Maven POM)
<plugin>
  <groupId>eu.openminted.share.annotations</groupId>
  <artifactId>omtd-share-annotations-maven-plugin</artifactId>
  <configuration>
    <properties>
      <docbase>http://mywebsite.com/docs</docbase>
    </properties>
  </configuration>
</plugin>

Parameters

The parameters of a component are picked up from the native component annotations, e.g. @ConfigurationParameter (UIMA/uimaFIT) or (@Parameter) (GATE).

Hiding parameters

Some parameters offered by the components are not suitable for OpenMinTeD, e.g. because they require the specification of a file system path which is not reasonably possible on the OMTD platform. Thus, it is sensible to hide such parameters from users of the OMTD platform.

Hidden parameters must either be optional or they must provide a default value.
Example of hiding a parameter (UIMA/uimaFIT)
import eu.openminted.share.annotations.api.Component;
import eu.openminted.share.annotations.api.Parameters;
import eu.openminted.share.annotations.api.constants.OperationType;

@Component(classes=OperationType.READER)
@Parameters(exclude = TextCorpusReader.PARAM_HIDDEN)
class TextCorpusReader
    extends JCasAnnotator_ImplBase
{
    /**
     * Hidden parameter.
     */
    public static final String PARAM_HIDDEN = "hidden";
    @ConfigurationParameter(name = PARAM_HIDDEN, mandatory = true, defaultValue = "val")
    private String hidden;

Using the Maven Plugin

A set of properly annotated class files within a Maven project can be automatically processed as part of the build to produce the relevant descriptor files using the Maven plugin we provide. To use this plugin simply add the following to your existing pom.xml.

<dependencies>
  <dependency>
    <groupId>eu.openminted.share.annotations</groupId>
    <artifactId>omtd-share-annotations-api</artifactId>
    <version>3.0.2.7</version>
  </dependency>
</dependencies>
<build>
  <plugins>
    <plugin>
      <groupId>eu.openminted.share.annotations</groupId>
      <artifactId>omtd-share-annotations-maven-plugin</artifactId>
      <version>3.0.2.7</version>
      <executions>
        <execution>
          <phase>process-classes</phase>
          <goals>
            <goal>generate</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>

Note that if you already have a repositories, pluginRepositories or build section within your pom.xml you will only need to include the relevant repository or plugin element.

UIMA type mappings

UIMA type capabilities can be automatically converted to OMTD-SHARE annotation type information. This requires adding an additional configuration to the OMTD-SHARE Maven Plugin:

<plugin>
  <groupId>eu.openminted.share.annotations</groupId>
  <artifactId>omtd-share-annotations-maven-plugin</artifactId>
  <version>3.0.2.7</version>
  <executions>
    ...
  </executions>
  <configuration>
    <uimaTypeMappings>
      <uimaTypeMapping>META-INF/eu.openminted.share/uimaTypeMapping.map</uimaTypeMapping>
    </uimaTypeMappings>
  </configuration>
</plugin>

The plugin looks for the mappings in the source paths of the current module as well as its dependencies. The intended idea is that the mapping files are maintained in the same place as the UIMA type systems they describe. So for example the DKPro Core Named Entity API module provides a named entity type and also includes a UIMA-to-OMTD type mapping file which can be used by the Maven plugin.

The mapping file is a simple Java properties file assigning a UIMA type name to a OMTD-SHARE annotation type:

de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token=http://w3id.org/meta-share/omtd-share/Token
de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence=http://w3id.org/meta-share/omtd-share/Sentence
de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS=http://w3id.org/meta-share/omtd-share/PartOfSpeech

MIME type mappings

MIME types can be automatically converted to OMTD-SHARE data format information. However, mind that OMTD-SHARE data formats are usually more specific than MIME types, so there will be many cases in which such a mapping is not very useful. Enabling the mapping requires adding an additional configuration to the OMTD-SHARE Maven Plugin:

<plugin>
  <groupId>eu.openminted.share.annotations</groupId>
  <artifactId>omtd-share-annotations-maven-plugin</artifactId>
  <version>3.0.2.7</version>
  <executions>
    ...
  </executions>
  <configuration>
    <uimaTypeMappings>
      <uimaTypeMapping>META-INF/eu.openminted.share/mimeTypeMapping.map</uimaTypeMapping>
    </uimaTypeMappings>
  </configuration>
</plugin>

The mapping lookup mechanism in the same as for the UIMA type mappings described above.

The mapping file is a simple Java properties file assigning a MIME type name to a OMTD-SHARE data format:

text/tab-separated-values=http://w3id.org/meta-share/omtd-share/TabularFormat

Metadata mappings

The OMTD-SHARE Maven plugin looks for metadata in the following order:

  1. UIMA

  2. GATE

  3. Maven

All metadata found in the process is usually aggregated, not overwritten. In some cases, that might lead to duplicate items.

Maven

Table 2. Maven mappings (may be incomplete)
Maven OMTD-SHARE

/project/version

/componentMetadataRecord/componentInfo/versionInfo/version

/project/version

/componentMetadataRecord/componentInfo/identificationInfo/resourceIdentifiers/resourceIdentifier

/project/groupId

/componentMetadataRecord/componentInfo/identificationInfo/resourceIdentifiers/resourceIdentifier

/project/artifactId

/componentMetadataRecord/componentInfo/identificationInfo/resourceIdentifiers/resourceIdentifier

/project/version

/componentMetadataRecord/componentInfo/distributionInfos/componentDistributionInfo/distributionLocation

/project/groupId

/componentMetadataRecord/componentInfo/distributionInfos/componentDistributionInfo/distributionLocation

/project/artifactId

/componentMetadataRecord/componentInfo/distributionInfos/componentDistributionInfo/distributionLocation

/project/url

/componentMetadataRecord/componentInfo/contactInfo/contactPoint

/project/developers/name

/componentMetadataRecord/componentInfo/resourceCreationInfo/resourceCreators/resourceCreator/surname

/project/developers/email

/componentMetadataRecord/componentInfo/resourceCreationInfo/resourceCreators/resourceCreator/communicationInfo/emails/email

/project/developers/organization

/componentMetadataRecord/componentInfo/resourceCreationInfo/resourceCreators/resourceCreator/affiliation/organizationNames/organizationName

/project/developers/roles/role

/componentMetadataRecord/componentInfo/resourceCreationInfo/resourceCreators/resourceCreator/affiliation/position

/project/licenses/license

/componentMetadataRecord/componentInfo/rightsInfo/licenseInfos/licenseInfo

/project/mailingLists/mainlingList/name

/componentMetadataRecord/componentInfo/contactInfo/mailingLists/mailingListInfo/mailingListName

/project/mailingLists/mainlingList/archive

/componentMetadataRecord/componentInfo/contactInfo/mailingLists/mailingListInfo/archive

/project/mailingLists/mainlingList/post

/componentMetadataRecord/componentInfo/contactInfo/mailingLists/mailingListInfo/post

/project/mailingLists/mainlingList/subscribe

/componentMetadataRecord/componentInfo/contactInfo/mailingLists/mailingListInfo/subscribe

/project/mailingLists/mainlingList/unsubscribe

/componentMetadataRecord/componentInfo/contactInfo/mailingLists/mailingListInfo/unsubscribe

UIMA

The UIMA mappings shown below are for analysis engines, but they apply in a similar way to collection readers, just that the root element and metadata element have different names.

Table 3. UIMA mappings (may be incomplete)
UIMA OMTD-SHARE

/analysisEngineDescription/analysisEngineMetaData/name

/componentMetadataRecord/componentInfo/identificationInfo/resourceNames/resourceName

/analysisEngineDescription/annotatorImplementationName

/componentMetadataRecord/componentInfo/identificationInfo/resourceIdentifiers/resourceIdentifier

/analysisEngineDescription/analysisEngineMetaData/vendor

/componentMetadataRecord/componentInfo/contactInfo/contactGroups/contactGroup/groupNames/groupName

/analysisEngineDescription/analysisEngineMetaData/copyright

/componentMetadataRecord/componentInfo/rightsInfo/copyrightStatement

/analysisEngineDescription/analysisEngineMetaData/configurationParameters/configurationParameter/name

/componentMetadataRecord/componentInfo/parameterInfos/parameterInfo/parameterName

/analysisEngineDescription/analysisEngineMetaData/configurationParameters/configurationParameter/name

/componentMetadataRecord/componentInfo/parameterInfos/parameterInfo/parameterLabel

/analysisEngineDescription/analysisEngineMetaData/configurationParameters/configurationParameter/description

/componentMetadataRecord/componentInfo/parameterInfos/parameterInfo/parameterDescription

/analysisEngineDescription/analysisEngineMetaData/configurationParameters/configurationParameter/type

/componentMetadataRecord/componentInfo/parameterInfos/parameterInfo/parameterType

/analysisEngineDescription/analysisEngineMetaData/configurationParameters/configurationParameter/multiValued

/componentMetadataRecord/componentInfo/parameterInfos/parameterInfo/multiValue

/analysisEngineDescription/analysisEngineMetaData/configurationParameters/configurationParameter/mandatory

/componentMetadataRecord/componentInfo/parameterInfos/parameterInfo/optional

/analysisEngineDescription/analysisEngineMetaData/configurationParameterSettings/nameValuePair/value

/componentMetadataRecord/componentInfo/parameterInfos/parameterInfo/defaultValue

GATE

TBD