OMTD-SHARE Java Annotatations 3.0.2.5

The document explains the how to embed OpenMinTeD-SHARE descriptor in software components and resource packages and how to use related tooling.

Introduction

OpenMinTeD SHARE is a set of metadata elements to describe text and data-mining (TDM) and natural language processing (NLP) resources.

Get sources for this project from GitHub:

Package structure

We expect that the resource you distribute is packaged as a ZIP or JAR file. Within this file, you can place your OpenMinTeD descriptors at any location you like best. If you package descriptors for TDM software components, you may wish to place them directly next to the classes that implement the these components. If you package multiple resources, e.g. annotation schema descriptions, in a single ZIP file you might wish to place the respective OpenMinTeD-SHARE descriptions directly next to them.

However, in order to allow for the automatic detection of your OpenMinTeD-SHARE descriptors, you will have to set up a file in a well-known location which points to your descriptors. This is described in the next section.

Discovery

Making descriptors discoverable

In order to make your descriptors discoverable, you have to create a folder called META-INF/eu.openminted.share in your ZIP or JAR file. Into this folder, you have to place a file called descriptors.txt with pointers to the actual descriptor files.

Example descriptors.txt file
../../descriptors/Component1.xml
../../descriptors/Component2.xml

Discovering descriptors

We presently provide a convenience API for Java to discover descriptors. Additionally, we assume for the moment, that any descriptors you may wish to discover exist on your Java classpath, i.e. they are packaged as Maven artifacts.

In the near future, we will also provide convenience methods to discover descriptors in explicitly named JAR and ZIP files that do not need to be on the Java classpath.

To locate the descriptors, you can use the DescriptorFactory class.

Example of locating descriptors using the DescriptorFactory class
URL[] descriptorPaths = DescriptorFactory.scanDescriptors();

Annotating components

Java classes can be directly annotated with OpenMinTeD-SHARE metadata so you do not have to manually maintain a separate XML file. The OpenMinTeD-SHARE Maven Plugin can then be used to generate the OpenMinTeD-SHARE descriptor automatically as part of a build.

Example OpenMinTeD-SHARE annotations on a Java class
import eu.openminted.share.annotations.api.Component;
import eu.openminted.share.annotations.api.constants.OperationType;

@Component(classes=OperationType.READER)
class TextCorpusReader

Using the Maven Plugin

A set of properly annotated class files within a Maven project can be automatically processed as part of the build to produce the relevant descriptor files using the Maven plugin we provide. To use this plugin simply add the following to your existing pom.xml.

<dependencies>
  <dependency>
    <groupId>eu.openminted.share.annotations</groupId>
    <artifactId>omtd-share-annotations-api</artifactId>
    <version>3.0.2.5</version>
  </dependency>
</dependencies>
<build>
  <plugins>
    <plugin>
      <groupId>eu.openminted.share.annotations</groupId>
      <artifactId>omtd-share-annotations-maven-plugin</artifactId>
      <version>3.0.2.5</version>
      <executions>
        <execution>
          <phase>process-classes</phase>
          <goals>
            <goal>generate</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>

Note that if you already have a repositories, pluginRepositories or build section within your pom.xml you will only need to include the relevant repository or plugin element.

UIMA type mappings

UIMA type capabilities can be automatically converted to OMTD-SHARE annotation type information. This requires adding an additional configuration to the OMTD-SHARE Maven Plugin:

<plugin>
  <groupId>eu.openminted.share.annotations</groupId>
  <artifactId>omtd-share-annotations-maven-plugin</artifactId>
  <version>3.0.2.5</version>
  <executions>
    ...
  </executions>
  <configuration>
    <uimaTypeMappings>
      <uimaTypeMapping>META-INF/eu.openminted.share/uimaTypeMapping.map</uimaTypeMapping>
    </uimaTypeMappings>
  </configuration>
</plugin>

The plugin looks for the mappings in the source paths of the current module as well as its dependencies. The intended idea is that the mapping files are maintained in the same place as the UIMA type systems they describe. So for example the DKPro Core Named Entity API module provides a named entity type and also includes a UIMA-to-OMTD type mapping file which can be used by the Maven plugin.

The mapping file is a simple Java properties file assigning a UIMA type name to a OMTD-SHARE annotation type:

de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token=http://w3id.org/meta-share/omtd-share/Token
de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence=http://w3id.org/meta-share/omtd-share/Sentence
de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS=http://w3id.org/meta-share/omtd-share/PartOfSpeech

MIME type mappings

MIME types can be automatically converted to OMTD-SHARE data format information. However, mind that OMTD-SHARE data formats are usually more specific than MIME types, so there will be many cases in which such a mapping is not very useful. Enabling the mapping requires adding an additional configuration to the OMTD-SHARE Maven Plugin:

<plugin>
  <groupId>eu.openminted.share.annotations</groupId>
  <artifactId>omtd-share-annotations-maven-plugin</artifactId>
  <version>3.0.2.5</version>
  <executions>
    ...
  </executions>
  <configuration>
    <uimaTypeMappings>
      <uimaTypeMapping>META-INF/eu.openminted.share/mimeTypeMapping.map</uimaTypeMapping>
    </uimaTypeMappings>
  </configuration>
</plugin>

The mapping lookup mechanism in the same as for the UIMA type mappings described above.

The mapping file is a simple Java properties file assigning a MIME type name to a OMTD-SHARE data format:

text/tab-separated-values=http://w3id.org/meta-share/omtd-share/TabularFormat