- Analytics by category
- Uncategorized (132)
- Chunker (7)
- Classifier (8)
- Coreference (3)
- CrowdSourcing (1)
- Developers/Debugging (9)
- Evaluation (2)
- Filtering (6)
- Flow (8)
- Gazetteer (16)
- Irrelevant (1)
- Keywords/Terms (3)
- Language Identifier (7)
- Lemmatizer (7)
- Machine Learning (2)
- MorphTagger (3)
- Named Entity Recognizer (11)
- Normalizer (19)
- Parser (24)
- Pre-built Workflows (12)
- Readability (1)
- SRL (2)
- Scripted analytics (6)
- Segmenter (55)
- Semantics (2)
- Sentiment (1)
- Spelling/Grammar (5)
- Stemmer (4)
- Tagger (52)
- Topics (3)
- Validation (1)
- Viewer/Editor (18)
- Analytics by product
- (original) AlvisNLP (52)
- (original) DKPro Core (UIMA) (52)
- (original) GATE (135)
- (original) ILSP (UIMA) (5)
- (original) NaCTeM (UIMA) (18)
- (service) AlchemyAPI (2)
- (service) CrowdFlower (3)
- (service) Lupedia (1)
- (service) TextRazor (1)
- (service) Textalytics (6)
- (service) UAIC (6)
- ABNER (2)
- Arktweet (2)
- BANNER (5)
- BioCreative (2)
- BioLG (1)
- BulStem (1)
- CCG (2)
- CRF++ (2)
- Cjf (1)
- ClearNLP (5)
- EnjuParser (3)
- FreeLing (5)
- GATE Hepple (5)
- GENIA (5)
- HunPos (1)
- IULA (2)
- JTok (1)
- Java BreakIterator (2)
- Jazzy (1)
- KEA (1)
- LBJ (1)
- Langdetect (1)
- LanguageTool (3)
- LingPipe (6)
- Lucene/Solr (1)
- MLRS (3)
- Mallet (1)
- MaltParser (2)
- Mate Tools (4)
- MeCab (1)
- Minipar (1)
- Morpha (1)
- MutationFinder (1)
- NormaGene (1)
- Ogmios (1)
- OpenCalais (1)
- OpenNLP (15)
- Penn Bio-Tools (5)
- Porter Stemmer (1)
- RASP (5)
- RfTagger (1)
- SPECIES (2)
- STePP (1)
- SVMLight (2)
- Sfst (1)
- Snowball (2)
- Stanford (17)
- TermRaider (6)
- TextCat (3)
- TreeTagger (3)
- Web1T (1)
- WordNet (4)
- Yatea (2)
- Zemanta (1)
- I/O components by format
- Uncategorized (47)
- AclAnthology (1)
- Alvis Enriched Document (1)
- BNC (1)
- BioNLP Shared Task (2)
- BioNLP-ST 2013 a1/a2 (1)
- Brat (2)
- CLARIN TCF (2)
- CadixeJSON (1)
- CoNLL 2000 (2)
- CoNLL 2002 (2)
- CoNLL 2006 (2)
- CoNLL 2007 (1)
- CoNLL 2009 (2)
- CoNLL 2012 (2)
- Cochrane (1)
- DataSift JSON (1)
- Factored Tag Lem (1)
- Fast Infoset (2)
- GATE JSON (1)
- GATE XML (2)
- Genia JSON (1)
- GrAF (1)
- HTML (1)
- HTML5 Microdata (1)
- I2B2 (1)
- ImsCwb (2)
- JDBC (1)
- KEA Corpus (1)
- LLL (1)
- MediaWiki markup (1)
- NEGRA Export (1)
- OBO (1)
- PDF (1)
- Penn Treebank Chunked (1)
- Penn Treebank Combined (2)
- Prague Markup Language (1)
- PubMed (2)
- RDF (3)
- RTF (1)
- Relp (1)
- Reuters-21578 (2)
- Solr (1)
- TEI-XML (4)
- TIGER-XML (2)
- Text (14)
- TüPP-D/Z (1)
- Twitter JSON (1)
- UIMA Binary CAS (4)
- UIMA CAS Dump (1)
- UIMA JSON (1)
- Web1T (1)
- XCES (2)
- XMI (7)
- XML (12)
- Component details
- Uncategorized (132)
- ANNIE NE Transducer
- ANNIE OrthoMatcher
- ANNIE+Measurements
- Ab3P
- Action
- AggregateValues
- Agreement Evaluator
- AlchemyAPI: Entity Extraction
- AlchemyAPI: Keyword Extraction
- AlvisREPrepareCrossValidation
- AnchorTuples
- Annotation Remover
- AnnotationTermbank
- AntecedentChoice
- Arabic Gazetteer Collector
- Arabic Main Grammar
- Arabic OrthoMatcher
- Assert
- AttestedTermsProjector
- BDM Computation PR
- Banner Sentence Breaker
- BioLG
- CSV Corpus Populater
- CartesianProductTuples
- Cebuano Transducer
- Cebuano Transducer Postprocessor
- Chemical Entity Recogniser
- ColognePhoneticTranscriptor
- Compound Document
- Compound Document From Xml
- ConnectSesameOntology
- Control Script
- Copy Anns to Another Doc PR
- Corpus Indexing Support
- Crawler PR
- CreateSesameOntology
- DisambiguateAlternatives
- DocumentFrequencyBank
- DoubleMetaphonePhoneticTranscriptor
- ElementMapper
- ElementProjector
- ElementProjector2
- EngLemmatiser
- Feature Generator
- FileMapper
- FileMapper2
- FreelingMorpho
- GATE Composite document
- Gazetteer List Collector
- GermanSeparatedParticleAnnotator
- Groovy support for GATE
- Hindi Main Grammar
- Hindi OrthoMatcher
- Hindi Tokeniser Postprocessor
- HyponymyTermbank
- InsertContents
- Kleio Search
- LBJ Named Entity Recognizer
- LayerComparator
- Linguistic Simplifier
- Linguistic Simplifier
- Lucene IR Engine
- Lupedia Service PR
- MergeLayers
- MergeSections
- MetaMap Annotator
- MetaphonePhoneticTranscriptor
- MutationFinder
- NGramAnnotator
- NGrams
- NeMine
- NewCount
- OBOMapper
- OBOProjector
- OWLIM Ontology
- OWLIM Ontology DEPRECATED
- OntoReif
- OpenNLPNEDetector
- OpenNLPSentenceDetector
- OrthoRef
- OscarMER
- PMI Bank
- PatternMatcher
- ProminentConceptReporter
- Quality Assurance PR
- QuickHTML
- RO_FDGBank
- Reference Evaluator
- RegExp
- Regex Annotator
- RemoveContents
- RemoveEquivalent
- RemoveOverlaps
- Romanian Transducer
- SFTP BioNLP Shared Task Data Provider
- SQLImport
- SeSMig
- Search Results
- SearchPR
- Sequence_Impl
- SimpleProjector
- SimpleProjector2
- SoundexPhoneticTranscriptor
- Species
- SplitOverlaps
- TermRaider English Term Extraction
- Termbank Score Copier
- TextRazor Service PR
- TfIdfTermbank
- TfidfAnnotator
- TomapProjector
- TomapTrain
- TyDIProjector
- Type Mapper
- UAICDiacriticsDescriptor
- UAICLemmav1
- UAICLemmav2
- UAICSegV1
- UMLS Full Dictionary Feature Extractor
- WapitiLabel
- WapitiTrain
- WoSMig
- WordNet
- WordNet 1.6
- YateaProjector
- Zemanta Service PR
- Chunker (7)
- Classifier (8)
- Coreference (3)
- CrowdSourcing (1)
- Developers/Debugging (9)
- Evaluation (2)
- Filtering (6)
- Flow (8)
- Gazetteer (16)
- ANNIE Gazetteer
- Arabic Gazetteer
- Arabic Infered Gazetteer
- Cebuano Gazetteer
- DictionaryAnnotator
- Flexible Gazetteer
- Hash Gazetteer
- Hindi Gazetteer
- Hindi Tokeniser Gazetteer
- Inflectional gazetteer
- Large KB Gazetteer
- Onto Root Gazetteer
- OntoGazetteer
- Romanian Gazetteer
- Russian Gazetteer
- Sharable Gazettee
- Irrelevant (1)
- Keywords/Terms (3)
- Language Identifier (7)
- Lemmatizer (7)
- Machine Learning (2)
- MorphTagger (3)
- Named Entity Recognizer (11)
- Normalizer (19)
- ApplyChangesAnnotator
- Backmapper
- CapitalizationNormalizer
- CjfNormalizer
- Date Annotation Normalizer
- Date Normalizer
- DictionaryBasedTokenTransformer
- Document normalizer
- ExpressiveLengtheningNormalizer
- FileBasedTokenTransformer
- HyphenationRemover
- RegexBasedTokenTransformer
- ReplacementFileNormalizer
- SharpSNormalizer
- SpellingNormalizer
- StanfordPtbTransformer
- TokenCaseTransformer
- Tweet Normaliser
- UmlautNormalizer
- Parser (24)
- BerkeleyParser
- CCGParser
- ClearNlpParser
- English Dependency Parser
- English POS Tagger and Dependency Parser
- Enju Parser
- EnjuParser
- EnjuParser2
- FreelingShallowParser
- GENIA Dependency Parser
- ILSP Dependency Parser
- MaltParser
- MateParser
- Minipar Wrapper
- MstParser
- OpenNLP Parser
- OpenNLPParser
- OpenNlpParser
- RASP2 Parser
- Stanford Dependency Parser
- StanfordDependencyConverter
- StanfordParser
- StanfordParser
- Textalytics Lemmatization, PoS and Parsing
- Pre-built Workflows (12)
- Readability (1)
- Reader (91)
- ACE Corpus Reader
- AclAnthologyReader
- Aimed Collection Reader
- AlvisAEReader
- AlvisAEReader2
- AnimalReader
- BIO Format Collection Reader
- BinaryCasReader
- BioC Reader
- BioCreative CHEMDNER Reader
- BioNLP ST Data Reader
- BioNLPSTReader
- BlikiWikipediaReader
- BncReader
- BratReader
- CombinationReader
- Conll2000Reader
- Conll2002Reader
- Conll2006Reader
- Conll2009Reader
- Conll2012Reader
- Entity Annotation Results Importer
- EuropePMC Open Access Reader
- FSOVFileReader
- Fast Infoset Document Format
- GATE .cochrane.txt document format
- GATE .pubMed.txt document format
- GATE DataSift JSON Document Format
- GATE JSON Tweet Document Format
- GateXMLReaderDescriptor
- GeniaJSONReader
- GeniaReader
- HtmlReader
- I2B2Reader
- ILSP File System Collection Reader
- ImsCwbReader
- Input Text Reader
- JdbcReader
- KEA Corpus Importer
- LIBSVMReader
- LLLReader
- MediaWiki Corpus Populater
- MediaWiki Document Format
- MediaWiki XML Document Format
- Merge GENIA-coref with -term Collection Reader
- NegraExportReader
- OBOReader
- PdfReader
- PennTreebankChunkedReader
- PennTreebankCombinedReader
- PubMed Abstract Reader
- PubTatorReader
- RDF Reader
- RTFReader
- Reuters21578SgmlReader
- Reuters21578TxtReader
- SFTP Document Reader
- SFTP XMI Reader
- SerializedCasReader
- Shared Task 2004 Reader
- StringReader
- TSV Reader
- TabularReader
- TcfReader
- TeiReader
- TextFileReader
- TextReader
- TigerXmlReader
- TreeTaggerReader
- TueppReader
- Twitter Collection Reader
- Twitter Corpus Populator
- WebOfKnowledgeReader
- WikipediaArticleInfoReader
- WikipediaArticleReader
- WikipediaDiscussionReader
- WikipediaLinkReader
- WikipediaPageReader
- WikipediaQueryReader
- WikipediaRevisionPairReader
- WikipediaRevisionReader
- WikipediaTemplateFilteredArticleReader
- XMI Reader
- XMLReader
- XMLReader2
- XcesReaderDescriptor
- XmiReader
- XmlReader
- XmlTextReader
- XmlXPathReader
- SRL (2)
- Scripted analytics (6)
- Segmenter (55)
- ANNIE English Tokeniser
- ANNIE Sentence Splitter
- Arabic Tokeniser
- ArktweetTokenizer
- Banner Base Tokenizer
- Banner Simple Tokenizer
- Banner Whitespace Tokenizer
- BreakIteratorSegmenter
- Cafetiere Sentence Splitter
- CamelCaseTokenSegmenter
- Cebuano Gazetteer Tokeniser
- Cebuano Tokeniser
- Chinese Segmenter PR
- ClearNlpSegmenter
- CompoundAnnotator
- Freeling Sentence Splitter
- FreelingTokenizer
- GATE Unicode Tokeniser
- GENIA Sentence Splitter
- GENIA Sentence Splitter
- Hashtag Tokenizer
- Hindi Splitter
- Hindi Tokeniser
- ILSP Paragraph, Sentence and Token Segmentor
- IULATokenizer
- JTokSegmenter
- LanguageToolSegmenter
- LineBasedSentenceSegmenter
- LingPipe Sentence Splitter
- LingPipe Sentence Splitter PR
- LingPipe Tokenizer PR
- MLRS Maltese Tokeniser
- MLRS Paragraph Splitter
- MLRS Sentence Splitter
- OSCAR 4 Tokeniser
- OgmiosTokenizer
- OpenNLP Sentence Splitter
- OpenNLP Tokenizer
- OpenNLPTokenizer
- OpenNlpSegmenter
- ParagraphSplitter
- PatternBasedTokenSegmenter
- Penn BioTokenizer
- RASP2 Tokenizer
- RegEx Sentence Splitter
- RegexTokenizer
- Romanian Tokeniser
- Stanford PTB Tokenizer
- StanfordSegmenter
- TokenMerger
- TokenTrimmer
- TrailingCharacterRemover
- UAICTokenizerDescriptor
- WhitespaceTokenizer
- Semantics (2)
- Sentiment (1)
- Spelling/Grammar (5)
- Stemmer (4)
- Tagger (52)
- ABNER Tagger
- ANNIE POS Tagger
- Anatomical Entity Tagger
- ArktweetPosTagger
- BANNER CRF Tagger
- BioCreative Gene Mention Tagger
- CCGPosTagger
- CRF++ Tagger
- Cebuano POS Tagger
- Chemistry Tagger
- ClearNlpPosTagger
- FreelingTagger
- GENIA Tagger
- GenericTagger
- GeniaTagger
- Hepple POS Tagger
- HepplePosTagger
- Hindi POS Tagger
- HunPosTagger
- ILSP FBT Tagger
- IULATagger
- LingPipe POS Tagger PR
- MateMorphTagger
- MatePosTagger
- MeCabTagger
- Measurement Tagger
- Medical Condition Tagger
- NormaGene Tagger
- Numbers Tagger
- OpenCalais Tagger
- OpenNLP POS Tagger
- OpenNlpPosTagger
- POS Mapper
- Penn BioTagger
- Penn BioTagger: Genes
- Penn BioTagger: Malignancy
- Penn BioTagger: Variation
- PosMapper
- RASP POS Converter
- RASP2 POS Tagger
- RfTagger
- Roman Numerals Tagger
- Russian POS Tagger
- SVMLight Tagger
- Species Tagger
- Stanford POS Tagger
- StanfordPosTagger
- Stepp Tagger
- TreeTagger
- TreeTaggerPosTagger
- UaicPosTagger
- Topics (3)
- Validation (1)
- Viewer/Editor (18)
- Writer (64)
- ADBWriter
- AlvisDBIndexer
- AlvisIRIndexer
- BIO Format Writer Cas Consumer
- BinaryCasWriter
- BioC Writer
- BioNLP ST Data Writer
- BratWriter
- CasDumpWriter
- CoNLL2007 Cas Consumer
- Configurable Exporter
- Conll2000Writer
- Conll2002Writer
- Conll2006Writer
- Conll2009Writer
- Conll2012Writer
- EnrichedDocumentWriter
- ExportAlignmentPR
- ExportCadixeJSON
- ExpressionExtract
- Factored Tag Lem Consumer
- Fast Infoset Exporter
- FillDB
- Flexible Exporter
- GATE JSON Exporter
- GATE XML Writer CAS Consumer
- GeniaWriter
- HTML5 Microdata Exporter
- ILSP GrAF Consumer
- ILSP PML Cas Consumer
- ILSP XCES Consumer
- ILSP Xmi Writer CAS Consumer
- ImsCwbWriter
- InlineXmlWriter
- JsonWriter
- Legacy Coref Data Writer
- MalletTopicProportionsWriter
- MalletTopicsProportionsSortedWriter
- PennTreebankCombinedWriter
- RDF Writer
- RDFExport
- RelpWriter
- SFTP XMI Writer
- SerializedCasWriter
- Simplified Text Exporter
- Simplified Text Exporter
- SolrWriter
- TGrepWriter
- TSV Writer
- TabularExport
- TcfWriter
- TeiWriter
- TextWriter
- TfidfConsumer
- TigerXmlWriter
- TokenizedTextWriter
- TwitterDatabaseConsumer
- Web1TWriter
- WhatsWrongExport
- XMI Writer
- XMLWriter
- XMLWriter2
- XMLWriter2ForINIST
- XmiWriter
- Uncategorized (132)
This document provides an overview over the analytics components and data formats supported by the component collections of the OpenMinTeD partners.
Analytics by category
Uncategorized (132)
Components listed here are presently uncategorized.
Component | Description | Framework |
---|---|---|
ANNIE named entity grammar. |
GATE |
|
ANNIE orthographical coreference component. |
GATE |
|
Ready-made application for ANNIE plus the measurement tagger |
GATE |
|
synopsis |
AlvisNLP |
|
Applies action expressions on selected elements. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Reports agreement on annotations coming from different views (sofas). |
NaCTeM (UIMA) |
|
Runs the AlchemyAPI Entity Extraction service on a GATE document |
GATE |
|
Runs the AlchemyAPI Keyword Extraction service on a GATE document |
GATE |
|
synopsis |
AlvisNLP |
|
Creates tuples with a common argument. |
AlvisNLP |
|
Removes span-of-text annotations. |
NaCTeM (UIMA) |
|
TermRaider Termbank derived from document annotations |
GATE |
|
Biotopes-specific module: chooses an antecedent. |
AlvisNLP |
|
No description |
GATE |
|
A module for executing Jape grammars. |
GATE |
|
ANNIE orthographical coreference component. |
GATE |
|
Tests an assertion on specified elements. |
AlvisNLP |
|
Descriptor automatically generated by uimaFIT |
DKPro Core (UIMA) |
|
Projects a list of terms given in tree-tagger format. |
AlvisNLP |
|
Compute BDM score for each pair of concepts in the given ontology. |
GATE |
|
Sentence breaker using the Sun Java API "BreakIterator". |
NaCTeM (UIMA) |
|
Applies BioLG and lp2lp to sentences. |
AlvisNLP |
|
Populate a corpus from CSV files |
GATE |
|
Creates tuples for each element of a Cartesian product. |
AlvisNLP |
|
A module for executing Jape grammars. |
GATE |
|
A module for executing Jape grammars. |
GATE |
|
A named entity recogniser capable of annotating names of chemicals, drugs and metabolites. |
NaCTeM (UIMA) |
|
Cologne phonetic (Kölner Phonetik) transcription based on Apache Commons Codec. |
DKPro Core (UIMA) |
|
GATE Compound Document. |
GATE |
|
GATE Compound Document. |
GATE |
|
Connect to a repository containing and ontology |
GATE |
|
Editor for the Groovy script controlling a scriptable controller |
GATE |
|
Copy the annotations from one document to another document. |
GATE |
|
No description |
GATE |
|
GATE implementation of the Websphinx crawling API |
GATE |
|
Create a ontology from a Sesame configuration file for a repository |
GATE |
|
Tests input tokens whether they belong to an entry in the specified dictionary using SecondString Soft TF/IDF. |
NaCTeM (UIMA) |
|
Disambiguate features that have multiple values. |
AlvisNLP |
|
Document frequency counter derived from corpora and other DFBs |
GATE |
|
Double-Metaphone phonetic transcription based on Apache Commons Codec. |
DKPro Core (UIMA) |
|
Maps elements according to a collection of mapping elements. |
AlvisNLP |
|
Searches for entries in a dictionary generated by an expression. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
English lemmatiser which is adapted from WordNet. |
NaCTeM (UIMA) |
|
Generates a list of user-defined observations for each token. |
NaCTeM (UIMA) |
|
Maps the value of an annoation feature according to a mapping file. |
AlvisNLP |
|
Maps elements according to a tab-separated mapping file. |
AlvisNLP |
|
Performs tokenisation, and determines possible lemmas and POS tags for each token, with confidence scores. |
NaCTeM (UIMA) |
|
GATE Composite document. |
GATE |
|
Gazetteer lists collector. |
GATE |
|
Annotator to be used for post-processing of German corpora that have been lemmatized and POS-tagged with the TreeTagger, based on the STTS tagset. |
DKPro Core (UIMA) |
|
No description |
GATE |
|
A module for executing Jape grammars |
GATE |
|
Hindi Orthomatcher |
GATE |
|
A module for executing Jape grammars |
GATE |
|
TermRaider Termbank derived from head/string hyponymy |
GATE |
|
Descriptor automatically generated by uimaFIT |
DKPro Core (UIMA) |
|
synopsis |
AlvisNLP |
|
Uses the Keio service to fetch MEDLINE abstracts matching a specified query. |
NaCTeM (UIMA) |
|
A wrapper for the Illinois Named Entity Tagger |
NaCTeM (UIMA) |
|
Compares annotations in two different layers. |
AlvisNLP |
|
A processing resource that takes document and corpus parameters |
GATE |
|
Example application for the linguistic simplifier |
GATE |
|
No description |
GATE |
|
Runs a lupedia annotation service on a GATE document |
GATE |
|
Process results of a crowd annotation task to find where annotators agree and disagree. |
GATE |
|
Creates a new layer in each section containing all annotations in source layers. |
AlvisNLP |
|
Merge several sections into a single one. |
AlvisNLP |
|
This plugin uses the MetaMap Java API to send GATE document content to MetaMap skrmedpostctl server and PrologBeans mmserver instances running on the given machine/port |
GATE |
|
Metaphone phonetic transcription based on Apache Commons Codec. |
DKPro Core (UIMA) |
|
GATE MutationFinder Wrapper |
GATE |
|
N-gram annotator. |
DKPro Core (UIMA) |
|
Computes annotation n-grams. |
AlvisNLP |
|
No description |
NaCTeM (UIMA) |
|
Counts element occurrences and writes the results in a file, including tfidf. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Projects OBO terms and synonyms on sections. |
AlvisNLP |
|
Ontology created as a temporary OWLIM3 in-memory repository |
GATE |
|
Ontology created as a temporary OWLIM3 in-memory repository, for backwards compatibility only |
GATE |
|
synopsis |
AlvisNLP |
|
Detects named entities in text and creates corresponding entity annotations that span the found entities. |
NaCTeM (UIMA) |
|
Detect sentence boundaries and create sentence annotations that span these boundaries. |
NaCTeM (UIMA) |
|
An orthographic coreferencer |
GATE |
|
Runs Oscar 3 with maximum entropy based recogniser with syntactic tokens as input |
NaCTeM (UIMA) |
|
Pointwise Mutual Information from corpora |
GATE |
|
Example application for the PMI (pointwise mutual information) tool |
GATE |
|
Matches a regular expression-like pattern on the sequence of annotations in a given layer. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
The Quality Assurance PR provides a functionality of the Corpus QA Tool in GATE Developer |
GATE |
|
synopsis |
AlvisNLP |
|
This reader performs the transformation of the CONLL tab separated text format to the CAS ConllDependency format. |
NaCTeM (UIMA) |
|
Reports annotation performance comparing views (sofas) to one selected reference view. |
NaCTeM (UIMA) |
|
Matches a regular expression on sections contents and create an annotation for each match. |
AlvisNLP |
|
Annotates spans of text based on a custom regular expression. |
NaCTeM (UIMA) |
|
synopsis |
AlvisNLP |
|
Removes duplicate elements. |
AlvisNLP |
|
Removes overlapping annotations from a given layer. |
AlvisNLP |
|
A module for executing Jape grammars |
GATE |
|
Reads a corpus in BioNLP Shared Task format from a remote directory on a user-specified server via SFTP. |
NaCTeM (UIMA) |
|
synopsis |
AlvisNLP |
|
Detects sentence boundaries and creates one annotation for each sentence.This module assumes WoSMig processed the same sections. |
AlvisNLP |
|
Viewer for IR search results |
GATE |
|
Provides IR functionality. |
GATE |
|
Sequence of modules. |
AlvisNLP |
|
Show resources that would otherwise be hidden, e.g. resources created for internal use by other resources |
GATE |
|
Projects a simple dictionary on sections. |
AlvisNLP |
|
Projects a simple dictionary on sections. |
AlvisNLP |
|
Soundex phonetic transcription based on Apache Commons Codec. |
DKPro Core (UIMA) |
|
Calls the Species taxon tagger. |
AlvisNLP |
|
Splits overlapping annotations. |
AlvisNLP |
|
Example application showing typical set-up for the TermRaider tools |
GATE |
|
Copy scores from Termbanks back to their source annotations |
GATE |
|
Runs the TextRazor annotation service (http://textrazor.com) on a GATE document |
GATE |
|
TermRaider Termbank derived from vectors in document features |
GATE |
|
This component adds Tfidf annotations consisting of a term and a tfidf weight. |
DKPro Core (UIMA) |
|
synopsis |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Projects terms from a TiDI export. |
AlvisNLP |
|
No description |
NaCTeM (UIMA) |
|
No description |
NaCTeM (UIMA) |
|
Assigns base forms to tokenised text. |
NaCTeM (UIMA) |
|
Assigns base forms in Romanian text, given POS-tagged text. |
NaCTeM (UIMA) |
|
Splits texts into fragments |
NaCTeM (UIMA) |
|
Extracts Dictionary features from a UMLS-sourced dictionary |
NaCTeM (UIMA) |
|
synopsis |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Performs word segmentation on section contents. |
AlvisNLP |
|
WordNet |
GATE |
|
Princeton WordNet 1.6. |
GATE |
|
synopsis |
AlvisNLP |
|
Runs a zemanta annotation service on a GATE document |
GATE |
Chunker (7)
Component | Description | Framework |
---|---|---|
ANNIE VP Chunker component. |
GATE |
|
No description |
ILSP (UIMA) |
|
Ready-made NP chunking application |
GATE |
|
Implementation of the Ramshaw and Marcus base noun phrase chunker |
GATE |
|
Chunker using an OpenNLP maxent model |
GATE |
|
Chunk annotator using OpenNLP. |
DKPro Core (UIMA) |
|
Chunk annotator using TreeTagger. |
DKPro Core (UIMA) |
Classifier (8)
Component | Description | Framework |
---|---|---|
Build a CrowdFlower job asking users to select the right label for entities |
GATE |
|
Import judgments from a CrowdFlower job created by the Entity Classification Job Builder as GATE annotations. |
GATE |
|
Process results of a crowd annotation task to find where annotators agree and disagree. |
GATE |
|
Searches for discrimminating attributes with Weka. |
AlvisNLP |
|
Classifies elements with a Weka classifier. |
AlvisNLP |
|
Classify text based on a semantic space |
GATE |
|
Textalytics Text Classification |
GATE |
|
Trains a Weka classifier where examples are elements. |
AlvisNLP |
Coreference (3)
Component | Description | Framework |
---|---|---|
Nominal Coreference resolution component |
GATE |
|
Pronominal Coreference resolution component. |
GATE |
|
No description |
DKPro Core (UIMA) |
CrowdSourcing (1)
Component | Description | Framework |
---|---|---|
Build a CrowdFlower job asking users to annotate entities within a snippet of text |
GATE |
Developers/Debugging (9)
Component | Description | Framework |
---|---|---|
Dump dependencies to screen. |
DKPro Core (UIMA) |
|
Removes fields from the document meta data which may be different depending on the machine a test is run on. |
DKPro Core (UIMA) |
|
Warns whenever an AWT component is updated from anywhere other than the event dispatch thread |
GATE |
|
Utility analysis engine for use with CAS multipliers in uimaFIT pipelines. |
DKPro Core (UIMA) |
|
Dumps the Java heap to the specified file |
GATE |
|
Allows the Log4J log level to be set to ALL from within the GUI |
GATE |
|
Can be used to measure how long the processing between two points in a pipeline takes. |
DKPro Core (UIMA) |
|
Copyright 2012 Ubiquitous Knowledge Processing (UKP) Lab Technische Universität Darmstadt Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. |
DKPro Core (UIMA) |
|
Unloads all plugins for which we cannot find any loaded instances |
GATE |
Evaluation (2)
Component | Description | Framework |
---|---|---|
Compares two sets of elements. |
AlvisNLP |
|
Compute inter-annotator agreement (IAA). |
GATE |
Filtering (6)
Component | Description | Framework |
---|---|---|
Removes annotations that do not conform to minimum or maximum length constraints. |
DKPro Core (UIMA) |
|
Reads a list of words from a text file (one token per line) and retains only tokens or other annotations that match any of these words. |
DKPro Core (UIMA) |
|
Uses boilerpipe to determine which sections of a document are interesting content and which are just boilerplate |
GATE |
|
Removes all tokens/lemmas/stems/POS tags (depending on the "Mode" setting) that do not match the given parts of speech. |
DKPro Core (UIMA) |
|
Remove every token that does or does not match a given regular expression. |
DKPro Core (UIMA) |
|
Remove all of the specified types from the CAS if their covered text is in the stop word dictionary. |
DKPro Core (UIMA) |
Flow (8)
Component | Description | Framework |
---|---|---|
Merge Annotations from different annotators. |
GATE |
|
Annotation set transfer component. |
GATE |
|
Combines documents in a composite document. |
GATE |
|
Deletes one member document from a compound doc. |
GATE |
|
Remove named annotation sets or reset the default annotation set |
GATE |
|
A controller whose execution strategy is controlled by a Groovy script |
GATE |
|
Processes individual segments as separate documents |
GATE |
|
Sets the focus of a compound document to a specified member document. |
GATE |
Gazetteer (16)
Component | Description | Framework |
---|---|---|
A list lookup component. |
GATE |
|
A list lookup component. |
GATE |
|
A list lookup component. |
GATE |
|
A list lookup component. |
GATE |
|
Takes a plain text file with phrases as input and annotates the phrases in the CAS file. |
DKPro Core (UIMA) |
|
A more flexible list lookup component. |
GATE |
|
A list lookup component implemented by OntoText Lab. |
GATE |
|
A list lookup component. |
GATE |
|
A list lookup component. |
GATE |
|
Gazetteer with support for inflectional morphology |
GATE |
|
KIM KB based alias-lookup commponent |
GATE |
|
An ontology lookup component |
GATE |
|
A list lookup component based on mapping between ontology classes and gazetteer lists. |
GATE |
|
A list lookup component. |
GATE |
|
Customised version of the hash gazetteer |
GATE |
|
A list lookup component. |
GATE |
Irrelevant (1)
Component | Description | Framework |
---|---|---|
Duplicate any resource with a right click menu option |
GATE |
Keywords/Terms (3)
Component | Description | Framework |
---|---|---|
A Keyphrase Extractor by Eibe Frank. |
GATE |
|
Selects most relevant keywords in documents. |
AlvisNLP |
|
Extract terms from the corpus using the YaTeA term extractor. |
AlvisNLP |
Language Identifier (7)
Component | Description | Framework |
---|---|---|
Langdetect language identifier based on character n-grams. |
DKPro Core (UIMA) |
|
Language detector based on n-gram frequency counts, e.g. as provided by Web1T |
DKPro Core (UIMA) |
|
Detection based on character n-grams. |
DKPro Core (UIMA) |
|
GATE PR for language identification using LingPipe |
GATE |
|
Generate language fingerprints for use with the TextCat Language Indentification PR |
GATE |
|
Recognizes the document language using TextCat |
GATE |
|
Textalytics Language Identification |
GATE |
Lemmatizer (7)
Component | Description | Framework |
---|---|---|
Lemmatizer using Clear NLP. |
DKPro Core (UIMA) |
|
Wrapper for the GATE rule based lemmatizer. |
DKPro Core (UIMA) |
|
ILSP Lemmatizer consults a assigns lemmas to tokens from Greek texts. |
ILSP (UIMA) |
|
Naive lexicon-based lemmatizer. |
DKPro Core (UIMA) |
|
DKPro Annotator for the MateToolsLemmatizer. |
DKPro Core (UIMA) |
|
Lemmatize based on a finite-state machine. |
DKPro Core (UIMA) |
|
Stanford Lemmatizer component. |
DKPro Core (UIMA) |
Machine Learning (2)
Component | Description | Framework |
---|---|---|
Supports training, application and evaluation of machine learning models for NLP tasks |
GATE |
|
Trains a machine learning algorithm from a corpus. |
GATE |
MorphTagger (3)
Component | Description | Framework |
---|---|---|
Morphological Analyzer for the English Language. |
GATE |
|
RASP morphological analyser, which adds lemma and suffix to the WordForm annotations produced by the RASP POS tagger (or the ANNIE POS tagger plus the RASP converter) |
GATE |
|
Sfst morphological analyzer. |
DKPro Core (UIMA) |
Named Entity Recognizer (11)
Component | Description | Framework |
---|---|---|
Wraps the ABNER entity identification system into the UIMA framework. |
NaCTeM (UIMA) |
|
Produces a Conditional Random Fields model. |
NaCTeM (UIMA) |
|
This module uses a Maximum Entropy NER engine focusing on EL or EN textual newsy data. |
ILSP (UIMA) |
|
LingPipe Named Entity Recognizer |
GATE |
|
NER PR using a set of OpenNLP maxent models |
GATE |
|
OpenNLP name finder wrapper. |
DKPro Core (UIMA) |
|
Produces an SVMLight model based on user-specified learning parameters. |
NaCTeM (UIMA) |
|
Stanford Named Entity Recogniser |
GATE |
|
synopsis |
AlvisNLP |
|
Stanford Named Entity Recognizer component. |
DKPro Core (UIMA) |
|
This service is to annotate yeast metabolites with a supervised NER system using CRF. |
NaCTeM (UIMA) |
Normalizer (19)
Component | Description | Framework |
---|---|---|
Applies changes annotated using a SofaChangeAnnotation. |
DKPro Core (UIMA) |
|
After processing a file with the ApplyChangesAnnotator this annotator can be used to map the annotations created in the cleaned view back to the original view. |
DKPro Core (UIMA) |
|
Takes a text and replaces wrong capitalization |
DKPro Core (UIMA) |
|
Converts traditional Chinese to simplified Chinese or vice-versa. |
DKPro Core (UIMA) |
|
provides normalized values for all existing date annotations |
GATE |
|
provides normalized values for all known dates |
GATE |
|
Reads a tab-separated file containing mappings from one token to another. |
DKPro Core (UIMA) |
|
Normalize document content to remove "smart quotes" etc. |
GATE |
|
Takes a text and shortens extra long words |
DKPro Core (UIMA) |
|
Replaces all tokens that are listed in the file in #PARAM_MODEL_LOCATION by the string specified in #PARAM_REPLACEMENT. |
DKPro Core (UIMA) |
|
Simple dictionary-based hyphenation remover. |
DKPro Core (UIMA) |
|
A JCasTransformerChangeBased_ImplBase implementation that replaces tokens based on a regular expressions. |
DKPro Core (UIMA) |
|
Takes a text and replaces desired expressions This class should not work on tokens as some expressions might span several tokens |
DKPro Core (UIMA) |
|
Takes a text and replaces sharp s |
DKPro Core (UIMA) |
|
Converts annotations of the type SpellingAnomaly into a SofaChangeAnnoatation. |
DKPro Core (UIMA) |
|
Uses the normalizing tokenizer of the Stanford CoreNLP tools to escape the text PTB-style. |
DKPro Core (UIMA) |
|
Change tokens to follow a specific casing: all upper case, all lower case, or 'normal case': lowercase everything but the first character of a token and the characters immediately following a hyphen. |
DKPro Core (UIMA) |
|
Normalise texts in tweets (convert into standard English spelling mistakes, colloquialisms, typing variations and so on) |
GATE |
|
Takes a text and checks for umlauts written as "ae", "oe", or "ue" and normalizes them if they really are umlauts depending on a frequency model. |
DKPro Core (UIMA) |
Parser (24)
Component | Description | Framework |
---|---|---|
Berkeley Parser annotator . |
DKPro Core (UIMA) |
|
Syntax parsing with CCG Parser. |
AlvisNLP |
|
Clear parser annotator. |
DKPro Core (UIMA) |
|
Ready-made application for Stanford English parser |
GATE |
|
Ready-made application for Stanford English POS tagger and parser |
GATE |
|
A syntactic parser for English. |
NaCTeM (UIMA) |
|
Parses sentences with the ENJU dependency parser. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Performs tokenisation, lemmatisation, POS tagging and shallow parsing (chunking). |
NaCTeM (UIMA) |
|
A dependency parser for biomedical text. |
NaCTeM (UIMA) |
|
ILSP Dependency Parser is a tool trained on the Greek Dependency Treebank (Prokopidis et al., 2005), a resource which comprises data annotated at several linguistic levels. |
ILSP (UIMA) |
|
Dependency parsing using MaltPaser. |
DKPro Core (UIMA) |
|
DKPro Annotator for the MateToolsParser. |
DKPro Core (UIMA) |
|
MiniPar is a shallow parser. |
GATE |
|
Dependency parsing using MSTParser. |
DKPro Core (UIMA) |
|
Syntactic parser from Apache OpenNLP |
GATE |
|
Parse the document and create phrasal and clausal annotations over the text. |
NaCTeM (UIMA) |
|
OpenNLP parser. |
DKPro Core (UIMA) |
|
RASP dependency parser |
GATE |
|
Generates Stanford-style dependencies together with POS tokens for English. |
NaCTeM (UIMA) |
|
Converts a constituency structure into a dependency structure. |
DKPro Core (UIMA) |
|
Stanford parser wrapper |
GATE |
|
Stanford Parser component. |
DKPro Core (UIMA) |
|
Textalytics Lemmatization, PoS and Parsing |
GATE |
Pre-built Workflows (12)
Component | Description | Framework |
---|---|---|
Ready-made Arabic IE application |
GATE |
|
Ready-made Cebuano IE application |
GATE |
|
Ready-made Chinese IE application |
GATE |
|
Ready-made French IE application |
GATE |
|
Ready-made German IE application |
GATE |
|
Ready-made application for measurement annotator |
GATE |
|
Ready-made Romanian IE application |
GATE |
|
Basic version of the RussIE application |
GATE |
|
RussIE application with orthomatcher and inflexional gazetteer |
GATE |
|
RussIE application with inflexional gazetteer |
GATE |
|
RussIE application with orthomatcher |
GATE |
|
English TwitIE application |
GATE |
Readability (1)
Component | Description | Framework |
---|---|---|
Assign a set of popular readability scores to the text. |
DKPro Core (UIMA) |
SRL (2)
Component | Description | Framework |
---|---|---|
ClearNLP semantic role labeller. |
DKPro Core (UIMA) |
|
DKPro Annotator for the MateTools Semantic Role Labeler. |
DKPro Core (UIMA) |
Scripted analytics (6)
Component | Description | Framework |
---|---|---|
Runs a Groovy script as a processing resource |
GATE |
|
A module for executing Jape grammars. |
GATE |
|
An optimised, JAPE-compatible transducer. |
GATE |
|
Runs a Prolog program with the corpus data structure encoded as facts. |
AlvisNLP |
|
Runs a script. |
AlvisNLP |
|
Wrapper for a Text Analysis Engine from UIMA. |
GATE |
Segmenter (55)
Component | Description | Framework |
---|---|---|
A customisable English tokeniser. |
GATE |
|
ANNIE sentence splitter. |
GATE |
|
A customisable English tokeniser. |
GATE |
|
ArkTweet tokenizer. |
DKPro Core (UIMA) |
|
Tokens returned by this class consist primarily of contiguous alphanumeric characters or single punctuation marks, however certain constructs such * as real numbers, percentages are recognized and returned as a single token. |
NaCTeM (UIMA) |
|
Tokens ouput by this tokenizer consist of a contiguous block of alphanumeric characters or a single punctuation mark. |
NaCTeM (UIMA) |
|
* Instances of this class tokenize {@link Sentence}s only at whitespace characters. |
NaCTeM (UIMA) |
|
BreakIterator segmenter. |
DKPro Core (UIMA) |
|
Uses a set of heuristics and patterns to find sentence boundaries. |
NaCTeM (UIMA) |
|
Split up existing tokens again if they are camel-case text. |
DKPro Core (UIMA) |
|
A list lookup component. |
GATE |
|
A customisable English tokeniser. |
GATE |
|
Segment the Chinese text into words, based on the PAUM learning algorithm. |
GATE |
|
Tokenizer using Clear NLP. |
DKPro Core (UIMA) |
|
Annotates compound parts and linking morphemes. |
DKPro Core (UIMA) |
|
Performs tokenisation. |
NaCTeM (UIMA) |
|
Performs tokenisation. |
NaCTeM (UIMA) |
|
A customisable Unicode tokeniser. |
GATE |
|
A processing resource that takes document and corpus parameters |
GATE |
|
Machine learning-based sentence splitter optimized for biomedical texts. |
NaCTeM (UIMA) |
|
Tokenizes Multi-Word Hashtags |
GATE |
|
A Sentence Splitter. |
GATE |
|
A customisable Hindi tokeniser. |
GATE |
|
_Sentence_and_Token_Segmentor,ILSP Paragraph, Sentence and Token Segmentor |
This module is a regex and abbreviation based segmentor targetting texts written in Greek. |
ILSP (UIMA) |
Performs paragraph splitting, sentence splitting, and tokenisation. |
NaCTeM (UIMA) |
|
JTok segmenter. |
DKPro Core (UIMA) |
|
Segmenter using LanguageTool to do the heavy lifting. |
DKPro Core (UIMA) |
|
Annotates each line in the source text as a sentence. |
DKPro Core (UIMA) |
|
Sentence splitter based on LingPipe models. |
NaCTeM (UIMA) |
|
Provides an interface to LingPipe sentence splitter API. |
GATE |
|
Provides a LingPipe tokenizer. |
GATE |
|
Tokenises Maltese text |
NaCTeM (UIMA) |
|
Identifies the paragraphs in the text, creating a Paragraph annotation for each one |
NaCTeM (UIMA) |
|
Identifies the sentences in the text, creating a Sentence annotation for each |
NaCTeM (UIMA) |
|
Segments text into tokens. |
NaCTeM (UIMA) |
|
Tokenizes the sections contents according to the Ogmios tokenizer specifications. |
AlvisNLP |
|
Sentence splitter using an OpenNLP maxent model |
GATE |
|
Tokenizer using an OpenNLP maxent model |
GATE |
|
Tokenize the text and create token annotations that span the tokens. |
NaCTeM (UIMA) |
|
Tokenizer and sentence splitter using OpenNLP. |
DKPro Core (UIMA) |
|
This class creates paragraph annotations for the given input document. |
DKPro Core (UIMA) |
|
Split up existing tokens again at particular split-chars. |
DKPro Core (UIMA) |
|
Tokenizer for biomedical text |
GATE |
|
RASP2 Tokenizer. |
GATE |
|
A sentence splitter based on regular expressions. |
GATE |
|
This segmenter splits sentences and tokens based on regular expressions that define the sentence and token boundaries. |
DKPro Core (UIMA) |
|
A customisable Romanian tokeniser. |
GATE |
|
Stanford Penn Treebank v3 Tokenizer, for English |
GATE |
|
No description |
DKPro Core (UIMA) |
|
Merges any Tokens that are covered by a given annotation type. |
DKPro Core (UIMA) |
|
Remove prefixes and suffixes from tokens. |
DKPro Core (UIMA) |
|
Removing trailing character (sequences) from tokens, e.g. punctuation. |
DKPro Core (UIMA) |
|
Tokenizer tuned for Tweets |
GATE |
|
No description |
NaCTeM (UIMA) |
|
A strict whitespace tokenizer, i.e. tokenizes according to whitespaces and linebreaks only. |
DKPro Core (UIMA) |
Semantics (2)
Component | Description | Framework |
---|---|---|
The Semantic Enrichment PR allows adding new data to semantic annotations by querying external RDF (Linked Data) repositories. |
GATE |
|
This Analysis Engine annotates English single words with semantic field information retrieved from an ExternalResource. |
DKPro Core (UIMA) |
Spelling/Grammar (5)
Component | Description | Framework |
---|---|---|
This component assumes that some spell checker has already been applied upstream (e.g. |
DKPro Core (UIMA) |
|
This annotator uses Jazzy for the decision whether a word is spelled correctly or not. |
DKPro Core (UIMA) |
|
Detect grammatical errors in text using LanguageTool a rule based grammar checker. |
DKPro Core (UIMA) |
|
Creates SofaChangeAnnotations containing corrections for previously identified spelling errors. |
DKPro Core (UIMA) |
|
_Grammar_and_Style_Proofreading,Textalytics Spell, Grammar and Style Proofreading |
Textalytics Spell, Grammar and Style Proofreading |
GATE |
Stemmer (4)
Component | Description | Framework |
---|---|---|
This plugin is an implementation of the BulStem stemmer algorithm for Bulgarian developed by Preslav Nakov. |
GATE |
|
synopsis |
AlvisNLP |
|
UIMA wrapper for the Snowball stemmer. |
DKPro Core (UIMA) |
|
Wrapper for the Snowball stemmer. |
GATE |
Tagger (52)
Component | Description | Framework |
---|---|---|
GATE wrapper over ABNER |
GATE |
|
Mark Hepple's Brill-style POS tagger |
GATE |
|
Tags anatomical entities using Brown, UMLS and OBO Anatomy dictionary features |
NaCTeM (UIMA) |
|
Wrapper for Twitter Tokenizer and POS Tagger. |
DKPro Core (UIMA) |
|
A UIMA wrapper for BANNER entity tagger. |
NaCTeM (UIMA) |
|
Tags Gene mentions using a model trained on BioCreative GM task data, with Entrez Gene and UMLS dictionary features. |
NaCTeM (UIMA) |
|
Applies the CCG POS tagger on annotations. |
AlvisNLP |
|
Uses Conditional Random Fields model for labeling. |
NaCTeM (UIMA) |
|
Mark Hepple's Brill-style POS tagger, adapted for languages where entries are multiword |
GATE |
|
A tagger for chemical names. |
GATE |
|
Part-of-Speech annotator using Clear NLP. |
DKPro Core (UIMA) |
|
Performs tokenisation, lemmatisation and POS tagging. |
NaCTeM (UIMA) |
|
Tags biological named entities: proteins, cell lines, cell types, DNAs, and RNAs. |
NaCTeM (UIMA) |
|
The Generic Tagger is Generic! |
GATE |
|
Runs Genia Tagger on annotations. |
AlvisNLP |
|
Mark Hepple's POS tagger, from dragontools/Banner toolkit. |
NaCTeM (UIMA) |
|
GATE Hepple part-of-speech tagger. |
DKPro Core (UIMA) |
|
Mark Hepple's Brill-style POS tagger, adapted for languages where entries are multiword |
GATE |
|
Part-of-Speech annotator using HunPos. |
DKPro Core (UIMA) |
|
ILSP FBT Tagger is an adaptation of the Brill tagger trained on Greek text. |
ILSP (UIMA) |
|
Performs paragraph splitting, sentence splitting, tokenisation and POS tagging. |
NaCTeM (UIMA) |
|
Provides a LingPipe part of speech tagger. |
GATE |
|
DKPro Annotator for the MateToolsMorphTagger. |
DKPro Core (UIMA) |
|
DKPro Annotator for the MateToolsPosTagger |
DKPro Core (UIMA) |
|
Annotator for the MeCab Japanese POS Tagger. |
DKPro Core (UIMA) |
|
A measurement tagger based upon GNU Units |
GATE |
|
A tagger that recognises mentions of medical conditions. |
NaCTeM (UIMA) |
|
A processing resource that takes document and corpus parameters |
GATE |
|
Finds numbers in (both words and digits) and annotates them with their numeric value |
GATE |
|
An OpenCalais based semantic annotator |
GATE |
|
POS Tagger using an OpenNLP maxent model |
GATE |
|
Part-of-Speech annotator using OpenNLP. |
DKPro Core (UIMA) |
|
Map complex Russian morphology tags into simpler POS categories |
GATE |
|
Ready-made application for the Penn BioTagger |
GATE |
|
Penn BioTagger for Genes |
GATE |
|
Penn BioTagger for malignancy types |
GATE |
|
Penn BioTagger for variations |
GATE |
|
Maps existing POS tags from one tagset to another using a user provided properties file. |
DKPro Core (UIMA) |
|
Converts from PennTreebank POS tags to the C2 tagset used by RASP. |
GATE |
|
RASP part-of-speech tagger, creating WordForm annotations |
GATE |
|
Rftagger morphological analyzer. |
DKPro Core (UIMA) |
|
Finds and annotates Roman numerals |
GATE |
|
Part-of-speech tagger for Russian |
GATE |
|
Applies an SVMLight-trained model on instances. |
NaCTeM (UIMA) |
|
Tags species |
NaCTeM (UIMA) |
|
Stanford Part-of-Speech Tagger |
GATE |
|
Stanford Part-of-Speech tagger component. |
DKPro Core (UIMA) |
|
No description |
NaCTeM (UIMA) |
|
Runs tree-tagger. |
AlvisNLP |
|
Part-of-Speech and lemmatizer annotator using TreeTagger. |
DKPro Core (UIMA) |
|
Stanford POS tagger trained on Tweets |
GATE |
|
Carries out sentence splitting, tokenisation, POS tagging and lemmatitisation on plain text. |
NaCTeM (UIMA) |
Topics (3)
Component | Description | Framework |
---|---|---|
Estimate an LDA topic model using Mallet and write it to a file. |
DKPro Core (UIMA) |
|
Infers the topic distribution over documents using a Mallet ParallelTopicModel. |
DKPro Core (UIMA) |
|
Textalytics Topics Extraction |
GATE |
Validation (1)
Component | Description | Framework |
---|---|---|
Produces an annotation set whose content is restricted by the specified set of schemas |
GATE |
Viewer/Editor (18)
Component | Description | Framework |
---|---|---|
Editor for compound documents. |
GATE |
|
Ontology editing tool. |
GATE |
|
Gazetteer viewer and editor |
GATE |
|
Gazetteer viewer and editor. |
GATE |
|
A JAPE grammar file viewer |
GATE |
|
A JAPE grammar file viewer |
GATE |
|
Ontology Annotation Tool. |
GATE |
|
viewer for the TermRaider Pairbank |
GATE |
|
Relation Annotation Tool Class view. |
GATE |
|
Relation Annotation Tool Instance view. |
GATE |
|
An annotation editor restricted by schemas. |
GATE |
|
Editor for the Groovy script behind this PR |
GATE |
|
Starts an interactive shell that allows to query the corpus data structure. |
AlvisNLP |
|
Starts an interactive shell that allows to query the corpus data structure. |
AlvisNLP |
|
A Simple Annotation Schema Viewer |
GATE |
|
Viewer for syntax trees generated by a parser. |
GATE |
|
viewer for the TermRaider Termbank |
GATE |
|
WordNet viewer |
GATE |
Analytics by product
(original) AlvisNLP (52)
The components listed here could not be associated with a known third-party tool collection and are assumed to be original components.
Component | Description | Framework |
---|---|---|
synopsis |
AlvisNLP |
|
Applies action expressions on selected elements. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Creates tuples with a common argument. |
AlvisNLP |
|
Biotopes-specific module: chooses an antecedent. |
AlvisNLP |
|
Tests an assertion on specified elements. |
AlvisNLP |
|
Projects a list of terms given in tree-tagger format. |
AlvisNLP |
|
Creates tuples for each element of a Cartesian product. |
AlvisNLP |
|
Compares two sets of elements. |
AlvisNLP |
|
Disambiguate features that have multiple values. |
AlvisNLP |
|
Maps elements according to a collection of mapping elements. |
AlvisNLP |
|
Searches for entries in a dictionary generated by an expression. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Maps the value of an annoation feature according to a mapping file. |
AlvisNLP |
|
Maps elements according to a tab-separated mapping file. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Selects most relevant keywords in documents. |
AlvisNLP |
|
Compares annotations in two different layers. |
AlvisNLP |
|
Creates a new layer in each section containing all annotations in source layers. |
AlvisNLP |
|
Merge several sections into a single one. |
AlvisNLP |
|
Computes annotation n-grams. |
AlvisNLP |
|
Counts element occurrences and writes the results in a file, including tfidf. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Projects OBO terms and synonyms on sections. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Matches a regular expression-like pattern on the sequence of annotations in a given layer. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Matches a regular expression on sections contents and create an annotation for each match. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Removes duplicate elements. |
AlvisNLP |
|
Removes overlapping annotations from a given layer. |
AlvisNLP |
|
Runs a Prolog program with the corpus data structure encoded as facts. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Runs a script. |
AlvisNLP |
|
Detects sentence boundaries and creates one annotation for each sentence.This module assumes WoSMig processed the same sections. |
AlvisNLP |
|
Searches for discrimminating attributes with Weka. |
AlvisNLP |
|
Sequence of modules. |
AlvisNLP |
|
Starts an interactive shell that allows to query the corpus data structure. |
AlvisNLP |
|
Starts an interactive shell that allows to query the corpus data structure. |
AlvisNLP |
|
Projects a simple dictionary on sections. |
AlvisNLP |
|
Projects a simple dictionary on sections. |
AlvisNLP |
|
Splits overlapping annotations. |
AlvisNLP |
|
Classifies elements with a Weka classifier. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Trains a Weka classifier where examples are elements. |
AlvisNLP |
|
Projects terms from a TiDI export. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Performs word segmentation on section contents. |
AlvisNLP |
(original) DKPro Core (UIMA) (52)
The components listed here could not be associated with a known third-party tool collection and are assumed to be original components.
Component | Description | Framework |
---|---|---|
Removes annotations that do not conform to minimum or maximum length constraints. |
DKPro Core (UIMA) |
|
Reads a list of words from a text file (one token per line) and retains only tokens or other annotations that match any of these words. |
DKPro Core (UIMA) |
|
Applies changes annotated using a SofaChangeAnnotation. |
DKPro Core (UIMA) |
|
Descriptor automatically generated by uimaFIT |
DKPro Core (UIMA) |
|
After processing a file with the ApplyChangesAnnotator this annotator can be used to map the annotations created in the cleaned view back to the original view. |
DKPro Core (UIMA) |
|
Berkeley Parser annotator . |
DKPro Core (UIMA) |
|
Split up existing tokens again if they are camel-case text. |
DKPro Core (UIMA) |
|
Takes a text and replaces wrong capitalization |
DKPro Core (UIMA) |
|
Cologne phonetic (Kölner Phonetik) transcription based on Apache Commons Codec. |
DKPro Core (UIMA) |
|
Annotates compound parts and linking morphemes. |
DKPro Core (UIMA) |
|
This component assumes that some spell checker has already been applied upstream (e.g. |
DKPro Core (UIMA) |
|
Dump dependencies to screen. |
DKPro Core (UIMA) |
|
Takes a plain text file with phrases as input and annotates the phrases in the CAS file. |
DKPro Core (UIMA) |
|
Reads a tab-separated file containing mappings from one token to another. |
DKPro Core (UIMA) |
|
Removes fields from the document meta data which may be different depending on the machine a test is run on. |
DKPro Core (UIMA) |
|
Double-Metaphone phonetic transcription based on Apache Commons Codec. |
DKPro Core (UIMA) |
|
Takes a text and shortens extra long words |
DKPro Core (UIMA) |
|
Replaces all tokens that are listed in the file in #PARAM_MODEL_LOCATION by the string specified in #PARAM_REPLACEMENT. |
DKPro Core (UIMA) |
|
Wrapper for the GATE rule based lemmatizer. |
DKPro Core (UIMA) |
|
Annotator to be used for post-processing of German corpora that have been lemmatized and POS-tagged with the TreeTagger, based on the STTS tagset. |
DKPro Core (UIMA) |
|
Simple dictionary-based hyphenation remover. |
DKPro Core (UIMA) |
|
Descriptor automatically generated by uimaFIT |
DKPro Core (UIMA) |
|
Utility analysis engine for use with CAS multipliers in uimaFIT pipelines. |
DKPro Core (UIMA) |
|
Annotates each line in the source text as a sentence. |
DKPro Core (UIMA) |
|
Estimate an LDA topic model using Mallet and write it to a file. |
DKPro Core (UIMA) |
|
DKPro Annotator for the MateToolsParser. |
DKPro Core (UIMA) |
|
Metaphone phonetic transcription based on Apache Commons Codec. |
DKPro Core (UIMA) |
|
Dependency parsing using MSTParser. |
DKPro Core (UIMA) |
|
N-gram annotator. |
DKPro Core (UIMA) |
|
Creates SofaChangeAnnotations containing corrections for previously identified spelling errors. |
DKPro Core (UIMA) |
|
This class creates paragraph annotations for the given input document. |
DKPro Core (UIMA) |
|
Split up existing tokens again at particular split-chars. |
DKPro Core (UIMA) |
|
Removes all tokens/lemmas/stems/POS tags (depending on the "Mode" setting) that do not match the given parts of speech. |
DKPro Core (UIMA) |
|
Maps existing POS tags from one tagset to another using a user provided properties file. |
DKPro Core (UIMA) |
|
Assign a set of popular readability scores to the text. |
DKPro Core (UIMA) |
|
A JCasTransformerChangeBased_ImplBase implementation that replaces tokens based on a regular expressions. |
DKPro Core (UIMA) |
|
Remove every token that does or does not match a given regular expression. |
DKPro Core (UIMA) |
|
This segmenter splits sentences and tokens based on regular expressions that define the sentence and token boundaries. |
DKPro Core (UIMA) |
|
Takes a text and replaces desired expressions This class should not work on tokens as some expressions might span several tokens |
DKPro Core (UIMA) |
|
Takes a text and replaces sharp s |
DKPro Core (UIMA) |
|
Soundex phonetic transcription based on Apache Commons Codec. |
DKPro Core (UIMA) |
|
Converts annotations of the type SpellingAnomaly into a SofaChangeAnnoatation. |
DKPro Core (UIMA) |
|
Remove all of the specified types from the CAS if their covered text is in the stop word dictionary. |
DKPro Core (UIMA) |
|
Can be used to measure how long the processing between two points in a pipeline takes. |
DKPro Core (UIMA) |
|
Copyright 2012 Ubiquitous Knowledge Processing (UKP) Lab Technische Universität Darmstadt Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. |
DKPro Core (UIMA) |
|
This component adds Tfidf annotations consisting of a term and a tfidf weight. |
DKPro Core (UIMA) |
|
Change tokens to follow a specific casing: all upper case, all lower case, or 'normal case': lowercase everything but the first character of a token and the characters immediately following a hyphen. |
DKPro Core (UIMA) |
|
Merges any Tokens that are covered by a given annotation type. |
DKPro Core (UIMA) |
|
Remove prefixes and suffixes from tokens. |
DKPro Core (UIMA) |
|
Removing trailing character (sequences) from tokens, e.g. punctuation. |
DKPro Core (UIMA) |
|
Takes a text and checks for umlauts written as "ae", "oe", or "ue" and normalizes them if they really are umlauts depending on a frequency model. |
DKPro Core (UIMA) |
|
A strict whitespace tokenizer, i.e. tokenizes according to whitespaces and linebreaks only. |
DKPro Core (UIMA) |
(original) GATE (135)
The components listed here could not be associated with a known third-party tool collection and are assumed to be original components.
Component | Description | Framework |
---|---|---|
A customisable English tokeniser. |
GATE |
|
A list lookup component. |
GATE |
|
ANNIE named entity grammar. |
GATE |
|
Nominal Coreference resolution component |
GATE |
|
ANNIE orthographical coreference component. |
GATE |
|
Pronominal Coreference resolution component. |
GATE |
|
ANNIE sentence splitter. |
GATE |
|
ANNIE VP Chunker component. |
GATE |
|
Ready-made application for ANNIE plus the measurement tagger |
GATE |
|
Merge Annotations from different annotators. |
GATE |
|
Annotation set transfer component. |
GATE |
|
A list lookup component. |
GATE |
|
No description |
GATE |
|
Ready-made Arabic IE application |
GATE |
|
A list lookup component. |
GATE |
|
A module for executing Jape grammars. |
GATE |
|
ANNIE orthographical coreference component. |
GATE |
|
A customisable English tokeniser. |
GATE |
|
Compute BDM score for each pair of concepts in the given ontology. |
GATE |
|
Supports training, application and evaluation of machine learning models for NLP tasks |
GATE |
|
Uses boilerpipe to determine which sections of a document are interesting content and which are just boilerplate |
GATE |
|
Populate a corpus from CSV files |
GATE |
|
A list lookup component. |
GATE |
|
A list lookup component. |
GATE |
|
Ready-made Cebuano IE application |
GATE |
|
A customisable English tokeniser. |
GATE |
|
A module for executing Jape grammars. |
GATE |
|
A module for executing Jape grammars. |
GATE |
|
A tagger for chemical names. |
GATE |
|
Ready-made Chinese IE application |
GATE |
|
Segment the Chinese text into words, based on the PAUM learning algorithm. |
GATE |
|
Combines documents in a composite document. |
GATE |
|
GATE Compound Document. |
GATE |
|
Editor for compound documents. |
GATE |
|
GATE Compound Document. |
GATE |
|
Connect to a repository containing and ontology |
GATE |
|
Editor for the Groovy script controlling a scriptable controller |
GATE |
|
Copy the annotations from one document to another document. |
GATE |
|
No description |
GATE |
|
GATE implementation of the Websphinx crawling API |
GATE |
|
Create a ontology from a Sesame configuration file for a repository |
GATE |
|
provides normalized values for all existing date annotations |
GATE |
|
provides normalized values for all known dates |
GATE |
|
Deletes one member document from a compound doc. |
GATE |
|
Remove named annotation sets or reset the default annotation set |
GATE |
|
Normalize document content to remove "smart quotes" etc. |
GATE |
|
Document frequency counter derived from corpora and other DFBs |
GATE |
|
Warns whenever an AWT component is updated from anywhere other than the event dispatch thread |
GATE |
|
A more flexible list lookup component. |
GATE |
|
Ready-made French IE application |
GATE |
|
GATE Composite document. |
GATE |
|
Morphological Analyzer for the English Language. |
GATE |
|
Ontology editing tool. |
GATE |
|
A customisable Unicode tokeniser. |
GATE |
|
Gazetteer viewer and editor |
GATE |
|
Gazetteer viewer and editor. |
GATE |
|
Gazetteer lists collector. |
GATE |
|
The Generic Tagger is Generic! |
GATE |
|
Ready-made German IE application |
GATE |
|
Runs a Groovy script as a processing resource |
GATE |
|
No description |
GATE |
|
A list lookup component implemented by OntoText Lab. |
GATE |
|
Tokenizes Multi-Word Hashtags |
GATE |
|
A list lookup component. |
GATE |
|
A module for executing Jape grammars |
GATE |
|
Hindi Orthomatcher |
GATE |
|
A Sentence Splitter. |
GATE |
|
A customisable Hindi tokeniser. |
GATE |
|
A list lookup component. |
GATE |
|
A module for executing Jape grammars |
GATE |
|
Compute inter-annotator agreement (IAA). |
GATE |
|
Gazetteer with support for inflectional morphology |
GATE |
|
A module for executing Jape grammars. |
GATE |
|
An optimised, JAPE-compatible transducer. |
GATE |
|
A JAPE grammar file viewer |
GATE |
|
A JAPE grammar file viewer |
GATE |
|
Dumps the Java heap to the specified file |
GATE |
|
KIM KB based alias-lookup commponent |
GATE |
|
A processing resource that takes document and corpus parameters |
GATE |
|
Example application for the linguistic simplifier |
GATE |
|
Allows the Log4J log level to be set to ALL from within the GUI |
GATE |
|
Trains a machine learning algorithm from a corpus. |
GATE |
|
Process results of a crowd annotation task to find where annotators agree and disagree. |
GATE |
|
Process results of a crowd annotation task to find where annotators agree and disagree. |
GATE |
|
A measurement tagger based upon GNU Units |
GATE |
|
Ready-made application for measurement annotator |
GATE |
|
This plugin uses the MetaMap Java API to send GATE document content to MetaMap skrmedpostctl server and PrologBeans mmserver instances running on the given machine/port |
GATE |
|
Ready-made NP chunking application |
GATE |
|
Implementation of the Ramshaw and Marcus base noun phrase chunker |
GATE |
|
Finds numbers in (both words and digits) and annotates them with their numeric value |
GATE |
|
Ontology Annotation Tool. |
GATE |
|
Ontology created as a temporary OWLIM3 in-memory repository |
GATE |
|
Ontology created as a temporary OWLIM3 in-memory repository, for backwards compatibility only |
GATE |
|
An ontology lookup component |
GATE |
|
A list lookup component based on mapping between ontology classes and gazetteer lists. |
GATE |
|
An orthographic coreferencer |
GATE |
|
Pointwise Mutual Information from corpora |
GATE |
|
Example application for the PMI (pointwise mutual information) tool |
GATE |
|
Map complex Russian morphology tags into simpler POS categories |
GATE |
|
The Quality Assurance PR provides a functionality of the Corpus QA Tool in GATE Developer |
GATE |
|
Relation Annotation Tool Class view. |
GATE |
|
Relation Annotation Tool Instance view. |
GATE |
|
A sentence splitter based on regular expressions. |
GATE |
|
Finds and annotates Roman numerals |
GATE |
|
A list lookup component. |
GATE |
|
Ready-made Romanian IE application |
GATE |
|
A customisable Romanian tokeniser. |
GATE |
|
A module for executing Jape grammars |
GATE |
|
Basic version of the RussIE application |
GATE |
|
RussIE application with orthomatcher and inflexional gazetteer |
GATE |
|
RussIE application with inflexional gazetteer |
GATE |
|
RussIE application with orthomatcher |
GATE |
|
Customised version of the hash gazetteer |
GATE |
|
Part-of-speech tagger for Russian |
GATE |
|
An annotation editor restricted by schemas. |
GATE |
|
Produces an annotation set whose content is restricted by the specified set of schemas |
GATE |
|
Editor for the Groovy script behind this PR |
GATE |
|
A controller whose execution strategy is controlled by a Groovy script |
GATE |
|
Viewer for IR search results |
GATE |
|
Provides IR functionality. |
GATE |
|
Processes individual segments as separate documents |
GATE |
|
The Semantic Enrichment PR allows adding new data to semantic annotations by querying external RDF (Linked Data) repositories. |
GATE |
|
A list lookup component. |
GATE |
|
Show resources that would otherwise be hidden, e.g. resources created for internal use by other resources |
GATE |
|
A Simple Annotation Schema Viewer |
GATE |
|
Sets the focus of a compound document to a specified member document. |
GATE |
|
Viewer for syntax trees generated by a parser. |
GATE |
|
Copy scores from Termbanks back to their source annotations |
GATE |
|
Classify text based on a semantic space |
GATE |
|
Duplicate any resource with a right click menu option |
GATE |
|
Normalise texts in tweets (convert into standard English spelling mistakes, colloquialisms, typing variations and so on) |
GATE |
|
English TwitIE application |
GATE |
|
Tokenizer tuned for Tweets |
GATE |
|
Wrapper for a Text Analysis Engine from UIMA. |
GATE |
|
Unloads all plugins for which we cannot find any loaded instances |
GATE |
(original) ILSP (UIMA) (5)
The components listed here could not be associated with a known third-party tool collection and are assumed to be original components.
Component | Description | Framework |
---|---|---|
No description |
ILSP (UIMA) |
|
ILSP FBT Tagger is an adaptation of the Brill tagger trained on Greek text. |
ILSP (UIMA) |
|
ILSP Lemmatizer consults a assigns lemmas to tokens from Greek texts. |
ILSP (UIMA) |
|
This module uses a Maximum Entropy NER engine focusing on EL or EN textual newsy data. |
ILSP (UIMA) |
|
_Sentence_and_Token_Segmentor,ILSP Paragraph, Sentence and Token Segmentor |
This module is a regex and abbreviation based segmentor targetting texts written in Greek. |
ILSP (UIMA) |
(original) NaCTeM (UIMA) (18)
The components listed here could not be associated with a known third-party tool collection and are assumed to be original components.
Component | Description | Framework |
---|---|---|
Reports agreement on annotations coming from different views (sofas). |
NaCTeM (UIMA) |
|
Tags anatomical entities using Brown, UMLS and OBO Anatomy dictionary features |
NaCTeM (UIMA) |
|
Removes span-of-text annotations. |
NaCTeM (UIMA) |
|
Uses a set of heuristics and patterns to find sentence boundaries. |
NaCTeM (UIMA) |
|
Tests input tokens whether they belong to an entry in the specified dictionary using SecondString Soft TF/IDF. |
NaCTeM (UIMA) |
|
Generates a list of user-defined observations for each token. |
NaCTeM (UIMA) |
|
Uses the Keio service to fetch MEDLINE abstracts matching a specified query. |
NaCTeM (UIMA) |
|
A tagger that recognises mentions of medical conditions. |
NaCTeM (UIMA) |
|
No description |
NaCTeM (UIMA) |
|
Segments text into tokens. |
NaCTeM (UIMA) |
|
Runs Oscar 3 with maximum entropy based recogniser with syntactic tokens as input |
NaCTeM (UIMA) |
|
This reader performs the transformation of the CONLL tab separated text format to the CAS ConllDependency format. |
NaCTeM (UIMA) |
|
Reports annotation performance comparing views (sofas) to one selected reference view. |
NaCTeM (UIMA) |
|
Annotates spans of text based on a custom regular expression. |
NaCTeM (UIMA) |
|
Reads a corpus in BioNLP Shared Task format from a remote directory on a user-specified server via SFTP. |
NaCTeM (UIMA) |
|
No description |
NaCTeM (UIMA) |
|
Extracts Dictionary features from a UMLS-sourced dictionary |
NaCTeM (UIMA) |
|
This service is to annotate yeast metabolites with a supervised NER system using CRF. |
NaCTeM (UIMA) |
(service) AlchemyAPI (2)
Component | Description | Framework |
---|---|---|
Runs the AlchemyAPI Entity Extraction service on a GATE document |
GATE |
|
Runs the AlchemyAPI Keyword Extraction service on a GATE document |
GATE |
(service) CrowdFlower (3)
Component | Description | Framework |
---|---|---|
Build a CrowdFlower job asking users to annotate entities within a snippet of text |
GATE |
|
Build a CrowdFlower job asking users to select the right label for entities |
GATE |
|
Import judgments from a CrowdFlower job created by the Entity Classification Job Builder as GATE annotations. |
GATE |
(service) Lupedia (1)
Component | Description | Framework |
---|---|---|
Runs a lupedia annotation service on a GATE document |
GATE |
(service) TextRazor (1)
Component | Description | Framework |
---|---|---|
Runs the TextRazor annotation service (http://textrazor.com) on a GATE document |
GATE |
(service) Textalytics (6)
Component | Description | Framework |
---|---|---|
Textalytics Language Identification |
GATE |
|
Textalytics Lemmatization, PoS and Parsing |
GATE |
|
Textalytics Sentiment Analysis |
GATE |
|
_Grammar_and_Style_Proofreading,Textalytics Spell, Grammar and Style Proofreading |
Textalytics Spell, Grammar and Style Proofreading |
GATE |
Textalytics Text Classification |
GATE |
|
Textalytics Topics Extraction |
GATE |
(service) UAIC (6)
Component | Description | Framework |
---|---|---|
No description |
NaCTeM (UIMA) |
|
Assigns base forms to tokenised text. |
NaCTeM (UIMA) |
|
Assigns base forms in Romanian text, given POS-tagged text. |
NaCTeM (UIMA) |
|
Splits texts into fragments |
NaCTeM (UIMA) |
|
No description |
NaCTeM (UIMA) |
|
Carries out sentence splitting, tokenisation, POS tagging and lemmatitisation on plain text. |
NaCTeM (UIMA) |
ABNER (2)
Component | Description | Framework |
---|---|---|
Wraps the ABNER entity identification system into the UIMA framework. |
NaCTeM (UIMA) |
|
GATE wrapper over ABNER |
GATE |
Arktweet (2)
Component | Description | Framework |
---|---|---|
Wrapper for Twitter Tokenizer and POS Tagger. |
DKPro Core (UIMA) |
|
ArkTweet tokenizer. |
DKPro Core (UIMA) |
BANNER (5)
Component | Description | Framework |
---|---|---|
A UIMA wrapper for BANNER entity tagger. |
NaCTeM (UIMA) |
|
Tokens returned by this class consist primarily of contiguous alphanumeric characters or single punctuation marks, however certain constructs such * as real numbers, percentages are recognized and returned as a single token. |
NaCTeM (UIMA) |
|
Tokens ouput by this tokenizer consist of a contiguous block of alphanumeric characters or a single punctuation mark. |
NaCTeM (UIMA) |
|
* Instances of this class tokenize {@link Sentence}s only at whitespace characters. |
NaCTeM (UIMA) |
|
English lemmatiser which is adapted from WordNet. |
NaCTeM (UIMA) |
BioCreative (2)
Component | Description | Framework |
---|---|---|
Tags Gene mentions using a model trained on BioCreative GM task data, with Entrez Gene and UMLS dictionary features. |
NaCTeM (UIMA) |
|
A named entity recogniser capable of annotating names of chemicals, drugs and metabolites. |
NaCTeM (UIMA) |
BulStem (1)
Component | Description | Framework |
---|---|---|
This plugin is an implementation of the BulStem stemmer algorithm for Bulgarian developed by Preslav Nakov. |
GATE |
CCG (2)
Component | Description | Framework |
---|---|---|
Syntax parsing with CCG Parser. |
AlvisNLP |
|
Applies the CCG POS tagger on annotations. |
AlvisNLP |
CRF++ (2)
Component | Description | Framework |
---|---|---|
Uses Conditional Random Fields model for labeling. |
NaCTeM (UIMA) |
|
Produces a Conditional Random Fields model. |
NaCTeM (UIMA) |
Cjf (1)
Component | Description | Framework |
---|---|---|
Converts traditional Chinese to simplified Chinese or vice-versa. |
DKPro Core (UIMA) |
ClearNLP (5)
Component | Description | Framework |
---|---|---|
Lemmatizer using Clear NLP. |
DKPro Core (UIMA) |
|
Clear parser annotator. |
DKPro Core (UIMA) |
|
Part-of-Speech annotator using Clear NLP. |
DKPro Core (UIMA) |
|
Tokenizer using Clear NLP. |
DKPro Core (UIMA) |
|
ClearNLP semantic role labeller. |
DKPro Core (UIMA) |
EnjuParser (3)
Component | Description | Framework |
---|---|---|
A syntactic parser for English. |
NaCTeM (UIMA) |
|
Parses sentences with the ENJU dependency parser. |
AlvisNLP |
|
synopsis |
AlvisNLP |
FreeLing (5)
Component | Description | Framework |
---|---|---|
Performs tokenisation. |
NaCTeM (UIMA) |
|
Performs tokenisation, and determines possible lemmas and POS tags for each token, with confidence scores. |
NaCTeM (UIMA) |
|
Performs tokenisation, lemmatisation, POS tagging and shallow parsing (chunking). |
NaCTeM (UIMA) |
|
Performs tokenisation, lemmatisation and POS tagging. |
NaCTeM (UIMA) |
|
Performs tokenisation. |
NaCTeM (UIMA) |
GATE Hepple (5)
Component | Description | Framework |
---|---|---|
Mark Hepple's Brill-style POS tagger |
GATE |
|
Mark Hepple's Brill-style POS tagger, adapted for languages where entries are multiword |
GATE |
|
Mark Hepple's POS tagger, from dragontools/Banner toolkit. |
NaCTeM (UIMA) |
|
GATE Hepple part-of-speech tagger. |
DKPro Core (UIMA) |
|
Mark Hepple's Brill-style POS tagger, adapted for languages where entries are multiword |
GATE |
GENIA (5)
Component | Description | Framework |
---|---|---|
A dependency parser for biomedical text. |
NaCTeM (UIMA) |
|
A processing resource that takes document and corpus parameters |
GATE |
|
Machine learning-based sentence splitter optimized for biomedical texts. |
NaCTeM (UIMA) |
|
Tags biological named entities: proteins, cell lines, cell types, DNAs, and RNAs. |
NaCTeM (UIMA) |
|
Runs Genia Tagger on annotations. |
AlvisNLP |
IULA (2)
Component | Description | Framework |
---|---|---|
Performs paragraph splitting, sentence splitting, tokenisation and POS tagging. |
NaCTeM (UIMA) |
|
Performs paragraph splitting, sentence splitting, and tokenisation. |
NaCTeM (UIMA) |
Java BreakIterator (2)
Component | Description | Framework |
---|---|---|
Sentence breaker using the Sun Java API "BreakIterator". |
NaCTeM (UIMA) |
|
BreakIterator segmenter. |
DKPro Core (UIMA) |
Jazzy (1)
Component | Description | Framework |
---|---|---|
This annotator uses Jazzy for the decision whether a word is spelled correctly or not. |
DKPro Core (UIMA) |
LBJ (1)
Component | Description | Framework |
---|---|---|
A wrapper for the Illinois Named Entity Tagger |
NaCTeM (UIMA) |
Langdetect (1)
Component | Description | Framework |
---|---|---|
Langdetect language identifier based on character n-grams. |
DKPro Core (UIMA) |
LanguageTool (3)
Component | Description | Framework |
---|---|---|
Detect grammatical errors in text using LanguageTool a rule based grammar checker. |
DKPro Core (UIMA) |
|
Naive lexicon-based lemmatizer. |
DKPro Core (UIMA) |
|
Segmenter using LanguageTool to do the heavy lifting. |
DKPro Core (UIMA) |
LingPipe (6)
Component | Description | Framework |
---|---|---|
GATE PR for language identification using LingPipe |
GATE |
|
LingPipe Named Entity Recognizer |
GATE |
|
Provides a LingPipe part of speech tagger. |
GATE |
|
Sentence splitter based on LingPipe models. |
NaCTeM (UIMA) |
|
Provides an interface to LingPipe sentence splitter API. |
GATE |
|
Provides a LingPipe tokenizer. |
GATE |
MLRS (3)
Component | Description | Framework |
---|---|---|
Tokenises Maltese text |
NaCTeM (UIMA) |
|
Identifies the paragraphs in the text, creating a Paragraph annotation for each one |
NaCTeM (UIMA) |
|
Identifies the sentences in the text, creating a Sentence annotation for each |
NaCTeM (UIMA) |
Mallet (1)
Component | Description | Framework |
---|---|---|
Infers the topic distribution over documents using a Mallet ParallelTopicModel. |
DKPro Core (UIMA) |
MaltParser (2)
Component | Description | Framework |
---|---|---|
ILSP Dependency Parser is a tool trained on the Greek Dependency Treebank (Prokopidis et al., 2005), a resource which comprises data annotated at several linguistic levels. |
ILSP (UIMA) |
|
Dependency parsing using MaltPaser. |
DKPro Core (UIMA) |
Mate Tools (4)
Component | Description | Framework |
---|---|---|
DKPro Annotator for the MateToolsLemmatizer. |
DKPro Core (UIMA) |
|
DKPro Annotator for the MateToolsMorphTagger. |
DKPro Core (UIMA) |
|
DKPro Annotator for the MateToolsPosTagger |
DKPro Core (UIMA) |
|
DKPro Annotator for the MateTools Semantic Role Labeler. |
DKPro Core (UIMA) |
MeCab (1)
Component | Description | Framework |
---|---|---|
Annotator for the MeCab Japanese POS Tagger. |
DKPro Core (UIMA) |
Morpha (1)
Component | Description | Framework |
---|---|---|
Lemmatize based on a finite-state machine. |
DKPro Core (UIMA) |
NormaGene (1)
Component | Description | Framework |
---|---|---|
A processing resource that takes document and corpus parameters |
GATE |
Ogmios (1)
Component | Description | Framework |
---|---|---|
Tokenizes the sections contents according to the Ogmios tokenizer specifications. |
AlvisNLP |
OpenNLP (15)
Component | Description | Framework |
---|---|---|
Chunker using an OpenNLP maxent model |
GATE |
|
NER PR using a set of OpenNLP maxent models |
GATE |
|
POS Tagger using an OpenNLP maxent model |
GATE |
|
Syntactic parser from Apache OpenNLP |
GATE |
|
Sentence splitter using an OpenNLP maxent model |
GATE |
|
Tokenizer using an OpenNLP maxent model |
GATE |
|
Detects named entities in text and creates corresponding entity annotations that span the found entities. |
NaCTeM (UIMA) |
|
Parse the document and create phrasal and clausal annotations over the text. |
NaCTeM (UIMA) |
|
Detect sentence boundaries and create sentence annotations that span these boundaries. |
NaCTeM (UIMA) |
|
Tokenize the text and create token annotations that span the tokens. |
NaCTeM (UIMA) |
|
Chunk annotator using OpenNLP. |
DKPro Core (UIMA) |
|
OpenNLP name finder wrapper. |
DKPro Core (UIMA) |
|
OpenNLP parser. |
DKPro Core (UIMA) |
|
Part-of-Speech annotator using OpenNLP. |
DKPro Core (UIMA) |
|
Tokenizer and sentence splitter using OpenNLP. |
DKPro Core (UIMA) |
Penn Bio-Tools (5)
Component | Description | Framework |
---|---|---|
Ready-made application for the Penn BioTagger |
GATE |
|
Penn BioTagger for Genes |
GATE |
|
Penn BioTagger for malignancy types |
GATE |
|
Penn BioTagger for variations |
GATE |
|
Tokenizer for biomedical text |
GATE |
RASP (5)
Component | Description | Framework |
---|---|---|
Converts from PennTreebank POS tags to the C2 tagset used by RASP. |
GATE |
|
RASP morphological analyser, which adds lemma and suffix to the WordForm annotations produced by the RASP POS tagger (or the ANNIE POS tagger plus the RASP converter) |
GATE |
|
RASP part-of-speech tagger, creating WordForm annotations |
GATE |
|
RASP dependency parser |
GATE |
|
RASP2 Tokenizer. |
GATE |
SPECIES (2)
Component | Description | Framework |
---|---|---|
Calls the Species taxon tagger. |
AlvisNLP |
|
Tags species |
NaCTeM (UIMA) |
SVMLight (2)
Component | Description | Framework |
---|---|---|
Applies an SVMLight-trained model on instances. |
NaCTeM (UIMA) |
|
Produces an SVMLight model based on user-specified learning parameters. |
NaCTeM (UIMA) |
Snowball (2)
Component | Description | Framework |
---|---|---|
UIMA wrapper for the Snowball stemmer. |
DKPro Core (UIMA) |
|
Wrapper for the Snowball stemmer. |
GATE |
Stanford (17)
Component | Description | Framework |
---|---|---|
Ready-made application for Stanford English parser |
GATE |
|
Ready-made application for Stanford English POS tagger and parser |
GATE |
|
Generates Stanford-style dependencies together with POS tokens for English. |
NaCTeM (UIMA) |
|
Stanford Named Entity Recogniser |
GATE |
|
Stanford Part-of-Speech Tagger |
GATE |
|
Stanford Penn Treebank v3 Tokenizer, for English |
GATE |
|
No description |
DKPro Core (UIMA) |
|
Converts a constituency structure into a dependency structure. |
DKPro Core (UIMA) |
|
Stanford Lemmatizer component. |
DKPro Core (UIMA) |
|
synopsis |
AlvisNLP |
|
Stanford Named Entity Recognizer component. |
DKPro Core (UIMA) |
|
Stanford parser wrapper |
GATE |
|
Stanford Parser component. |
DKPro Core (UIMA) |
|
Stanford Part-of-Speech tagger component. |
DKPro Core (UIMA) |
|
Uses the normalizing tokenizer of the Stanford CoreNLP tools to escape the text PTB-style. |
DKPro Core (UIMA) |
|
No description |
DKPro Core (UIMA) |
|
Stanford POS tagger trained on Tweets |
GATE |
TermRaider (6)
Component | Description | Framework |
---|---|---|
TermRaider Termbank derived from document annotations |
GATE |
|
TermRaider Termbank derived from head/string hyponymy |
GATE |
|
viewer for the TermRaider Pairbank |
GATE |
|
Example application showing typical set-up for the TermRaider tools |
GATE |
|
viewer for the TermRaider Termbank |
GATE |
|
TermRaider Termbank derived from vectors in document features |
GATE |
TextCat (3)
Component | Description | Framework |
---|---|---|
Detection based on character n-grams. |
DKPro Core (UIMA) |
|
Generate language fingerprints for use with the TextCat Language Indentification PR |
GATE |
|
Recognizes the document language using TextCat |
GATE |
TreeTagger (3)
Component | Description | Framework |
---|---|---|
Runs tree-tagger. |
AlvisNLP |
|
Chunk annotator using TreeTagger. |
DKPro Core (UIMA) |
|
Part-of-Speech and lemmatizer annotator using TreeTagger. |
DKPro Core (UIMA) |
Web1T (1)
Component | Description | Framework |
---|---|---|
Language detector based on n-gram frequency counts, e.g. as provided by Web1T |
DKPro Core (UIMA) |
WordNet (4)
Component | Description | Framework |
---|---|---|
This Analysis Engine annotates English single words with semantic field information retrieved from an ExternalResource. |
DKPro Core (UIMA) |
|
WordNet |
GATE |
|
Princeton WordNet 1.6. |
GATE |
|
WordNet viewer |
GATE |
Yatea (2)
Component | Description | Framework |
---|---|---|
Extract terms from the corpus using the YaTeA term extractor. |
AlvisNLP |
|
synopsis |
AlvisNLP |
I/O components by format
Uncategorized (47)
Component | Description | Framework |
---|---|---|
Reads ... |
NaCTeM (UIMA) |
|
synopsis |
AlvisNLP |
|
reads documents and annotations from an AlvisAE campaign. |
AlvisNLP |
|
reads documents and annotations from an AlvisAE campaign. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Project-specific file reader. |
AlvisNLP |
|
Reads BIO format files from specified directory. |
NaCTeM (UIMA) |
|
Writes specified types of annotations to the specified directory in the BIO format. |
NaCTeM (UIMA) |
|
Reads a file in BioC format. |
NaCTeM (UIMA) |
|
Writes BioC annotations to files. |
NaCTeM (UIMA) |
|
Reads data prepared specifically for the BioCreative IV's CHEMDNER track. |
NaCTeM (UIMA) |
|
Reads files formatted for the BioNLP Shared Task series and outputs documents with named entity, relation and event annotations. |
NaCTeM (UIMA) |
|
Writes BioNLP entity and event annotations to files. |
NaCTeM (UIMA) |
|
Bliki-based Wikipedia reader. |
DKPro Core (UIMA) |
|
Combines multiple readers into a single reader. |
DKPro Core (UIMA) |
|
Allows annotations to be exported according to a specified format. |
GATE |
|
Import judgments from a CrowdFlower job created by the Entity Annotation Job Builder as GATE annotations. |
GATE |
|
Write elements in a tab separated file. |
AlvisNLP |
|
Stores the corpus into a SQL database. |
AlvisNLP |
|
Exports a document with GATE annotations to its original format. |
GATE |
|
Reads the contents of a given URL and strips the HTML. |
DKPro Core (UIMA) |
|
Reads files from the filesystem. |
ILSP (UIMA) |
|
Reads a dataset in LIBSVM format |
NaCTeM (UIMA) |
|
A simple PR that converts co-reference data from the Relations-based model to the legacy format (based on 'matches' annotation and document features). |
GATE |
|
Write topic proportions to a file in the shape depends on the {@link TopicDistribution annotation which should have been created by MalletTopicModelInferencer before. |
DKPro Core (UIMA) |
|
Write the topic proportions according to an LDA topic model to an output file. |
DKPro Core (UIMA) |
|
synopsis |
AlvisNLP |
|
Reads training or evaluation data from the BioNLP/NLPBA 2004 Bio-Entity Recognition Task |
NaCTeM (UIMA) |
|
TGrep2 corpus file writer. |
DKPro Core (UIMA) |
|
No description |
NaCTeM (UIMA) |
|
Saves annotations of a selected type to a file in tab-separated-value format. |
NaCTeM (UIMA) |
|
Writes the corpus data structure in files in tabular format. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
This consumer builds a DfModel. |
DKPro Core (UIMA) |
|
Read files in tree-tagger output format and creates a document for each file read. |
AlvisNLP |
|
No description |
NaCTeM (UIMA) |
|
Populate a corpus from Twitter JSON containing multiple Tweets |
GATE |
|
No description |
NaCTeM (UIMA) |
|
Reads Web of Knowledge search result import files. |
AlvisNLP |
|
Writes files in What's Wrong with my NLP format. |
AlvisNLP |
|
Reads all general article infos without retrieving the whole Page objects |
DKPro Core (UIMA) |
|
Reads all discussion pages. |
DKPro Core (UIMA) |
|
Read links from Wikipedia. |
DKPro Core (UIMA) |
|
Reads all article pages that match a query created by the numerous parameters of this class. |
DKPro Core (UIMA) |
|
Reads pairs of adjacent revisions of all articles. |
DKPro Core (UIMA) |
|
Reads Wikipedia page revisions. |
DKPro Core (UIMA) |
AclAnthology (1)
Component | Description | Framework |
---|---|---|
Reada the ACL anthology corpus and outputs CASes with plain text documents. |
DKPro Core (UIMA) |
Alvis Enriched Document (1)
Component | Description | Framework |
---|---|---|
Writes the corpus in the infamous Alvis Enriched Document Format suitable for indexation with Zebra-Alvis. |
AlvisNLP |
BNC (1)
Component | Description | Framework |
---|---|---|
Reader for the British National Corpus (XML version). |
DKPro Core (UIMA) |
BioNLP Shared Task (2)
Component | Description | Framework |
---|---|---|
Reads text files and their associated annotation files in BioNLP Shared Task format. |
AlvisNLP |
|
Writes each section in three files in the BioNLP challenge format. |
AlvisNLP |
BioNLP-ST 2013 a1/a2 (1)
Component | Description | Framework |
---|---|---|
Reads documents and annotations in the BioNLP-ST 2013 a1/a2 format. |
AlvisNLP |
Brat (2)
Component | Description | Framework |
---|---|---|
Reader for the brat format. |
DKPro Core (UIMA) |
|
Writer for the brat annotation format. |
DKPro Core (UIMA) |
CLARIN TCF (2)
Component | Description | Framework |
---|---|---|
Reader for the WebLicht TCF format. |
DKPro Core (UIMA) |
|
Writer for the WebLicht TCF format. |
DKPro Core (UIMA) |
CadixeJSON (1)
Component | Description | Framework |
---|---|---|
Writes each document in a file in the AlvisAE protocol format. |
AlvisNLP |
CoNLL 2000 (2)
Component | Description | Framework |
---|---|---|
Reads the Conll 2000 chunking format. |
DKPro Core (UIMA) |
|
Writes the CoNLL 2000 chunking format. |
DKPro Core (UIMA) |
CoNLL 2002 (2)
Component | Description | Framework |
---|---|---|
Reads the CoNLL 2002 named entity format. |
DKPro Core (UIMA) |
|
Writes the CoNLL 2002 named entity format. |
DKPro Core (UIMA) |
CoNLL 2006 (2)
Component | Description | Framework |
---|---|---|
Reads a file in the CoNLL-2006 format (aka CoNLL-X). |
DKPro Core (UIMA) |
|
Writes a file in the CoNLL-2006 format (aka CoNLL-X). |
DKPro Core (UIMA) |
CoNLL 2007 (1)
Component | Description | Framework |
---|---|---|
Writes sentences from the CAS in the CoNLL 2007 format. |
ILSP (UIMA) |
CoNLL 2009 (2)
Component | Description | Framework |
---|---|---|
Reads a file in the CoNLL-2009 format. |
DKPro Core (UIMA) |
|
Writes a file in the CoNLL-2009 format. |
DKPro Core (UIMA) |
CoNLL 2012 (2)
Component | Description | Framework |
---|---|---|
Reads a file in the CoNLL-2009 format. |
DKPro Core (UIMA) |
|
Writer for the CoNLL-2009 format. |
DKPro Core (UIMA) |
Cochrane (1)
Component | Description | Framework |
---|---|---|
Load this to allow the opening of Cochrane text documents, and choose the mime type "text/x-cochrane", or use the correct file extension. |
GATE |
Factored Tag Lem (1)
Component | Description | Framework |
---|---|---|
Writes sentences from the CAS in the Factored Tag Lem format |
ILSP (UIMA) |
Fast Infoset (2)
Component | Description | Framework |
---|---|---|
Format parser for GATE XML stored in the binary Fast Infoset format |
GATE |
|
Export GATE documents to GATE XML stored in the binary Fast Infoset format |
GATE |
GATE XML (2)
Component | Description | Framework |
---|---|---|
Writes the CAS to GATE XML format |
ILSP (UIMA) |
|
Reads GATE documents created with ILSP tools |
ILSP (UIMA) |
GrAF (1)
Component | Description | Framework |
---|---|---|
Writes sentences from the CAS to GrAF standoff format. |
ILSP (UIMA) |
ImsCwb (2)
Component | Description | Framework |
---|---|---|
Reads a tab-separated format including pseudo-XML tags. |
DKPro Core (UIMA) |
|
This Consumer outputs the content of all CASes into the IMS workbench format. |
DKPro Core (UIMA) |
JDBC (1)
Component | Description | Framework |
---|---|---|
Collection reader for JDBC database.The obtained data will be written into CAS DocumentText as well as fields of the DocumentMetaData annotation. |
DKPro Core (UIMA) |
MediaWiki markup (1)
Component | Description | Framework |
---|---|---|
Document format for parsing MediaWiki markup |
GATE |
NEGRA Export (1)
Component | Description | Framework |
---|---|---|
This CollectionReader reads a file which is formatted in the NEGRA export format. |
DKPro Core (UIMA) |
Penn Treebank Chunked (1)
Component | Description | Framework |
---|---|---|
Penn Treebank chunked format reader. |
DKPro Core (UIMA) |
Penn Treebank Combined (2)
Component | Description | Framework |
---|---|---|
Penn Treebank combined format reader. |
DKPro Core (UIMA) |
|
Penn Treebank combined format writer. |
DKPro Core (UIMA) |
Prague Markup Language (1)
Component | Description | Framework |
---|---|---|
Writes sentences from the CAS in the Prague Markup Language format for editing dependency structures in TrEd |
ILSP (UIMA) |
PubMed (2)
Component | Description | Framework |
---|---|---|
Load this to allow the opening of PubMed text documents, and choose the mime type "text/x-pubmed"or use the correct file extension. |
GATE |
|
Fetches PubMed abstracts from NaCTeM's Kleio service. |
NaCTeM (UIMA) |
RDF (3)
Component | Description | Framework |
---|---|---|
Reads Common Annotation Structures (CASes) from RDF-encoded files. |
NaCTeM (UIMA) |
|
Saves Common Annotation Structures into RDF files. |
NaCTeM (UIMA) |
|
synopsis |
AlvisNLP |
Reuters-21578 (2)
Component | Description | Framework |
---|---|---|
Read a Reuters-21578 corpus in SGML format. |
DKPro Core (UIMA) |
|
Read a Reuters-21578 corpus that has been transformed into text format using ExtractReuters in the lucene-benchmarks project. |
DKPro Core (UIMA) |
Solr (1)
Component | Description | Framework |
---|---|---|
A simple implementation of SolrWriter_ImplBase |
DKPro Core (UIMA) |
TEI-XML (4)
Component | Description | Framework |
---|---|---|
Reads Aimed corpus (225 abstracts from MEDLINE) with the gold standard sentence, protein, protein-protein interaction anntations. |
NaCTeM (UIMA) |
|
Reader for the TEI XML. |
DKPro Core (UIMA) |
|
UIMA CAS consumer writing the CAS document text in TEI format. |
DKPro Core (UIMA) |
|
Reads all pages that contain or do not contain the templates specified in the template whitelist and template blacklist. |
DKPro Core (UIMA) |
TIGER-XML (2)
Component | Description | Framework |
---|---|---|
UIMA collection reader for TIGER-XML files. |
DKPro Core (UIMA) |
|
UIMA CAS consumer writing the CAS document text in the TIGER-XML format. |
DKPro Core (UIMA) |
Text (14)
Component | Description | Framework |
---|---|---|
Descriptor automatically generated by uimaFIT |
DKPro Core (UIMA) |
|
Reads open-access full-text articles from the Europe PMC web service |
NaCTeM (UIMA) |
|
Project-specific text file reader. |
AlvisNLP |
|
Reads text supplied in a parameter. |
NaCTeM (UIMA) |
|
Read GENIA-coref files and GENIA-event/-term files and merge each couple into one CAS. |
NaCTeM (UIMA) |
|
Reads plain-text documents from a remote directory on a user-specified server via SFTP. |
NaCTeM (UIMA) |
|
Simplified text exporter (plain text output) |
GATE |
|
Simple reader that generates a CAS from a String. |
DKPro Core (UIMA) |
|
Reads files and adds a document in the corpus for each file. |
AlvisNLP |
|
UIMA collection reader for plain text files. |
DKPro Core (UIMA) |
|
UIMA CAS consumer writing the CAS document text as plain text file. |
DKPro Core (UIMA) |
|
This class writes a set of pre-processed documents into a large text file containing one sentence per line and tokens split by whitespaces. |
DKPro Core (UIMA) |
|
Reads all article pages. |
DKPro Core (UIMA) |
|
Reads all Wikipedia pages in the database (articles, discussions, etc). |
DKPro Core (UIMA) |
TüPP-D/Z (1)
Component | Description | Framework |
---|---|---|
UIMA collection reader for Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z) XML files. |
DKPro Core (UIMA) |
UIMA Binary CAS (4)
Component | Description | Framework |
---|---|---|
UIMA Binary CAS formats reader. |
DKPro Core (UIMA) |
|
Write CAS in one of the UIMA binary formats. |
DKPro Core (UIMA) |
|
No description |
DKPro Core (UIMA) |
|
No description |
DKPro Core (UIMA) |
UIMA CAS Dump (1)
Component | Description | Framework |
---|---|---|
Dumps CAS content to a text file. |
DKPro Core (UIMA) |
XCES (2)
Component | Description | Framework |
---|---|---|
Writes sentences from the CAS to the XCES format |
ILSP (UIMA) |
|
Reads XCES XML files. |
ILSP (UIMA) |
XMI (7)
Component | Description | Framework |
---|---|---|
Serializes the CAS to XMI. |
ILSP (UIMA) |
|
Reads an XMI-formatted corpus from an SFTP-enabled server. |
NaCTeM (UIMA) |
|
Saves Common Annotation Structures to an SFTP server |
NaCTeM (UIMA) |
|
Reads common annotation structures (CAS) from files in XMI format. |
NaCTeM (UIMA) |
|
Serialises entires common annotation structures (CAS) to XMI format. |
NaCTeM (UIMA) |
|
Reader for UIMA XMI files. |
DKPro Core (UIMA) |
|
UIMA XMI format writer. |
DKPro Core (UIMA) |
XML (12)
Component | Description | Framework |
---|---|---|
A PR to export alignment information in an xml file. |
GATE |
|
Writes an approximation of the content of a textual CAS as an inline XML file. |
DKPro Core (UIMA) |
|
Populate a corpus from a MediaWiki XML dump |
GATE |
|
Deprecated MediaWiki importer |
GATE |
|
Reads a corpus in XML files. |
AlvisNLP |
|
Reads XML files and creates elements. |
AlvisNLP |
|
Writes an XML serialization of the corpus into a file. |
AlvisNLP |
|
Writes the corpus data structure into a file via an XSLT stylesheet. |
AlvisNLP |
|
synopsis |
AlvisNLP |
|
Reader for XML files. |
DKPro Core (UIMA) |
|
No description |
DKPro Core (UIMA) |
|
A component reader for XML files implemented with XPath. |
DKPro Core (UIMA) |
Component details
Uncategorized (132)
ANNIE NE Transducer
Category: Uncategorized
Framework: GATE
Version: unknown
ANNIE named entity grammar.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationAccessors |
— |
java.util.List |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
enableDebugging |
— |
java.lang.Boolean |
— |
false |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
grammarURL |
— |
java.net.URL |
— |
resources/NE/main.jape |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
operators |
— |
java.util.List |
— |
— |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
ANNIE OrthoMatcher
Category: Uncategorized
Framework: GATE
Version: unknown
ANNIE orthographical coreference component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
annotationTypes |
— |
java.util.List |
— |
Organization;Person;Location;Date |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
false |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
definitionFileURL |
— |
java.net.URL |
— |
resources/othomatcher/listsNM.def |
— |
— |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
extLists |
— |
java.lang.Boolean |
— |
true |
— |
— |
highPrecisionOrgs |
— |
java.lang.Boolean |
— |
false |
— |
— |
minimumNicknameLikelihood |
— |
java.lang.Double |
— |
0.50 |
— |
— |
organizationType |
— |
java.lang.String |
— |
Organization |
— |
— |
personType |
— |
java.lang.String |
— |
Person |
— |
— |
processUnknown |
— |
java.lang.Boolean |
— |
true |
— |
— |
ANNIE+Measurements
Category: Uncategorized
Framework: GATE
Version: unknown
Ready-made application for ANNIE plus the measurement tagger
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
Ab3P
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
installDir |
— |
org.bibliome.util.files.InputDirectory |
True |
— |
— |
— |
longFormFeature |
— |
java.lang.String |
True |
— |
— |
— |
longFormRole |
— |
java.lang.String |
True |
— |
— |
— |
longFormsLayerName |
— |
java.lang.String |
True |
— |
— |
— |
relationName |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
shortFormRole |
— |
java.lang.String |
True |
— |
— |
— |
shortFormsLayerName |
— |
java.lang.String |
True |
— |
— |
— |
Action
Category: Uncategorized
Framework: AlvisNLP
Version: 2012-04-30
Applies action expressions on selected elements.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
action |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
addToLayer |
— |
java.lang.Boolean |
False |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
createAnnotations |
— |
java.lang.Boolean |
False |
— |
— |
— |
createDocuments |
— |
java.lang.Boolean |
False |
— |
— |
— |
createRelations |
— |
java.lang.Boolean |
False |
— |
— |
— |
createSections |
— |
java.lang.Boolean |
False |
— |
— |
— |
createTuples |
— |
java.lang.Boolean |
False |
— |
— |
— |
deleteElements |
— |
java.lang.Boolean |
False |
— |
— |
— |
removeFromLayer |
— |
java.lang.Boolean |
False |
— |
— |
— |
setArguments |
— |
java.lang.Boolean |
False |
— |
— |
— |
setFeatures |
— |
java.lang.Boolean |
False |
— |
— |
— |
target |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
AggregateValues
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
aggregators |
— |
org.bibliome.alvisnlp.modules.aggregate.Aggregator[] |
True |
— |
— |
— |
entries |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
key |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
outFile |
— |
org.bibliome.util.streams.TargetStream |
True |
— |
— |
— |
separator |
— |
java.lang.Character |
True |
— |
— |
— |
Agreement Evaluator
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
Reports agreement on annotations coming from different views (sofas).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
OutputFile |
— |
String |
True |
— |
false |
— |
AlchemyAPI: Entity Extraction
Category: Uncategorized
Framework: GATE
Version: unknown
Runs the AlchemyAPI Entity Extraction service on a GATE document
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationType |
— |
java.lang.String |
— |
Mention |
— |
true |
apiKey |
— |
java.lang.String |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
numberOfSentencesInBatch |
— |
java.lang.Integer |
— |
— |
— |
true |
numberOfSentencesInContext |
— |
java.lang.Integer |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
AlchemyAPI: Keyword Extraction
Category: Uncategorized
Framework: GATE
Version: unknown
Runs the AlchemyAPI Keyword Extraction service on a GATE document
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationType |
— |
java.lang.String |
— |
Keyword |
— |
true |
apiKey |
— |
java.lang.String |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
numberOfSentencesInBatch |
— |
java.lang.Integer |
— |
— |
— |
true |
numberOfSentencesInContext |
— |
java.lang.Integer |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
AlvisREPrepareCrossValidation
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
cParameter |
— |
java.lang.Double |
True |
— |
— |
— |
dependencies |
— |
org.bibliome.alvisnlp.modules.alvisre.AlvisRERelations |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
folds |
— |
java.lang.Integer |
True |
— |
— |
— |
outDir |
— |
org.bibliome.util.files.OutputDirectory |
True |
— |
— |
— |
relations |
— |
org.bibliome.alvisnlp.modules.alvisre.AlvisRERelations[] |
True |
— |
— |
— |
schema |
— |
org.w3c.dom.DocumentFragment |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sectionSeparator |
— |
java.lang.String |
True |
— |
— |
— |
sentences |
— |
org.bibliome.alvisnlp.modules.alvisre.AlvisRETokens |
True |
— |
— |
— |
similarityFunction |
— |
org.w3c.dom.DocumentFragment |
True |
— |
— |
— |
terms |
— |
org.bibliome.alvisnlp.modules.alvisre.AlvisRETokens[] |
True |
— |
— |
— |
words |
— |
org.bibliome.alvisnlp.modules.alvisre.AlvisRETokens |
True |
— |
— |
— |
AnchorTuples
Category: Uncategorized
Framework: AlvisNLP
Version: 2012-04-30
Creates tuples with a common argument.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
anchor |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
anchorRole |
— |
java.lang.String |
True |
— |
— |
— |
arguments |
— |
alvisnlp.module.types.ExpressionMapping |
True |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
relationName |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
Annotation Remover
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
Removes span-of-text annotations.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
mode |
Set to 'remove' if you wish to remove annotations of the types given in 'types'. Set to 'retain' if you wish to retain only the annotations of the types given in 'types'. |
String |
True |
— |
false |
— |
types |
List of annotation types. |
String |
True |
— |
true |
— |
AnnotationTermbank
Category: Uncategorized
Framework: GATE
Version: unknown
TermRaider Termbank derived from document annotations
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpora |
— |
java.util.Set |
— |
— |
— |
— |
debugMode |
— |
java.lang.Boolean |
— |
false |
— |
— |
idDocumentFeature |
— |
java.lang.String |
— |
— |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
— |
inputAnnotationFeature |
— |
java.lang.String |
— |
canonical |
— |
— |
inputAnnotationTypes |
— |
java.util.Set |
— |
SingleWord;MultiWord |
— |
— |
inputScoreFeature |
— |
java.lang.String |
— |
localAugTfIdf |
— |
— |
languageFeature |
— |
java.lang.String |
— |
lang |
— |
— |
mergingMode |
— |
gate.termraider.modes.MergingMode |
— |
MAXIMUM |
— |
— |
normalization |
— |
gate.termraider.modes.Normalization |
— |
Sigmoid |
— |
— |
scoreProperty |
— |
java.lang.String |
— |
tfIdfAug |
— |
— |
AntecedentChoice
Category: Uncategorized
Framework: AlvisNLP
Version: 2012-04-30
Biotopes-specific module: chooses an antecedent.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
Arabic Gazetteer Collector
Category: Uncategorized
Framework: GATE
Version: unknown
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
Arabic |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
resources/arabic_lists_collector.gapp |
— |
— |
Arabic Main Grammar
Category: Uncategorized
Framework: GATE
Version: unknown
A module for executing Jape grammars.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationAccessors |
— |
java.util.List |
— |
— |
— |
— |
binaryGrammarURL |
— |
java.net.URL |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
enableDebugging |
— |
java.lang.Boolean |
— |
false |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
grammarURL |
— |
java.net.URL |
— |
resources/grammar/main.jape |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
ontology |
— |
gate.creole.ontology.Ontology |
— |
— |
— |
true |
operators |
— |
java.util.List |
— |
— |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
Arabic OrthoMatcher
Category: Uncategorized
Framework: GATE
Version: unknown
ANNIE orthographical coreference component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
annotationTypes |
— |
java.util.List |
— |
Organization;Person;Location;Date |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
false |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
definitionFileURL |
— |
java.net.URL |
— |
resources/orthomatcher/listsNM.def |
— |
— |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
extLists |
— |
java.lang.Boolean |
— |
true |
— |
— |
highPrecisionOrgs |
— |
java.lang.Boolean |
— |
false |
— |
— |
minimumNicknameLikelihood |
— |
java.lang.Double |
— |
0.50 |
— |
— |
organizationType |
— |
java.lang.String |
— |
Organization |
— |
— |
personType |
— |
java.lang.String |
— |
Person |
— |
— |
processUnknown |
— |
java.lang.Boolean |
— |
true |
— |
— |
Assert
Category: Uncategorized
Framework: AlvisNLP
Version:
Tests an assertion on specified elements.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
assertion |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
severe |
— |
java.lang.Boolean |
True |
— |
— |
— |
stopAt |
— |
java.lang.Integer |
False |
— |
— |
— |
target |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
[[AssertAnnotations$InternalJCasHolder]] ==== AssertAnnotations$InternalJCasHolder
Category: Uncategorized
Framework: DKPro Core (UIMA)
Version: 1.8.0
Descriptor automatically generated by uimaFIT
AttestedTermsProjector
Category: Uncategorized
Framework: AlvisNLP
Version: 2010-10-28
Projects a list of terms given in tree-tagger format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
errorDuplicateValues |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreCase |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreDiacritics |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreWhitespace |
— |
java.lang.Boolean |
False |
— |
— |
— |
lemmaFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
lemmaKeys |
— |
java.lang.Boolean |
True |
— |
— |
— |
multipleValueAction |
— |
org.bibliome.alvisnlp.modules.projectors.MultipleValueAction |
True |
— |
— |
— |
normalizeSpace |
— |
java.lang.Boolean |
False |
— |
— |
— |
posFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
subject |
— |
org.bibliome.alvisnlp.modules.projectors.Subject |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
termFeatureName |
— |
java.lang.String |
False |
— |
— |
— |
termsFile |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
BDM Computation PR
Category: Uncategorized
Framework: GATE
Version: unknown
Compute BDM score for each pair of concepts in the given ontology.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ontology |
— |
gate.creole.ontology.Ontology |
— |
— |
— |
true |
outputBDMFile |
— |
java.net.URL |
— |
— |
— |
true |
Banner Sentence Breaker
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
Sentence breaker using the Sun Java API "BreakIterator".
BioLG
Category: Uncategorized
Framework: AlvisNLP
Version: 2012-04-30
Applies BioLG and lp2lp to sentences.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
dependencyLabelFeature |
— |
java.lang.String |
True |
— |
— |
— |
dependencyRelation |
— |
java.lang.String |
True |
— |
— |
— |
dependentRole |
— |
java.lang.String |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
headRole |
— |
java.lang.String |
True |
— |
— |
— |
linkageNumberFeature |
— |
java.lang.String |
True |
— |
— |
— |
lp2lpConf |
— |
org.bibliome.util.files.InputFile |
True |
— |
— |
— |
lp2lpExecutable |
— |
org.bibliome.util.files.ExecutableFile |
True |
— |
— |
— |
maxLinkages |
— |
java.lang.Integer |
False |
— |
— |
— |
parserPath |
— |
org.bibliome.util.files.WorkingDirectory |
True |
— |
— |
— |
posFeature |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceLayer |
— |
java.lang.String |
True |
— |
— |
— |
sentenceRole |
— |
java.lang.String |
True |
— |
— |
— |
timeout |
— |
java.lang.Integer |
True |
— |
— |
— |
union |
— |
java.lang.Boolean |
True |
— |
— |
— |
wordLayer |
— |
java.lang.String |
True |
— |
— |
— |
wordNumberLimit |
— |
java.lang.Integer |
True |
— |
— |
— |
CSV Corpus Populater
Category: Uncategorized
Framework: GATE
Version: unknown
Populate a corpus from CSV files
CartesianProductTuples
Category: Uncategorized
Framework: AlvisNLP
Version: 2012-04-30
Creates tuples for each element of a Cartesian product.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
anchor |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
arguments |
— |
alvisnlp.module.types.ExpressionMapping |
True |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
relationName |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
Cebuano Transducer
Category: Uncategorized
Framework: GATE
Version: unknown
A module for executing Jape grammars.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationAccessors |
— |
java.util.List |
— |
— |
— |
— |
binaryGrammarURL |
— |
java.net.URL |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
enableDebugging |
— |
java.lang.Boolean |
— |
false |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
grammarURL |
— |
java.net.URL |
— |
resources/grammar/main.jape |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
ontology |
— |
gate.creole.ontology.Ontology |
— |
— |
— |
true |
operators |
— |
java.util.List |
— |
— |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
Cebuano Transducer Postprocessor
Category: Uncategorized
Framework: GATE
Version: unknown
A module for executing Jape grammars.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationAccessors |
— |
java.util.List |
— |
— |
— |
— |
binaryGrammarURL |
— |
java.net.URL |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
enableDebugging |
— |
java.lang.Boolean |
— |
false |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
grammarURL |
— |
java.net.URL |
— |
resources/tokeniser/join.jape |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
ontology |
— |
gate.creole.ontology.Ontology |
— |
— |
— |
true |
operators |
— |
java.util.List |
— |
— |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
Chemical Entity Recogniser
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 0.1
A named entity recogniser capable of annotating names of chemicals, drugs and metabolites. Built on top of the NERsuite package [1]. Available models: Chemical: trained on the BioCreative IV CHEMDNER Track training and development corpora [2] Drug: trained on the DDI training corpus [3] Metabolite: trained on NaCTeM's Metabolite corpus [4] Dictionaries used: Chemical: ChEBI [5], DrugBank [6], CTD Chemicals [7], PubChem Compound [8], Jochem [9] Drug: DrugBank [6] Metabolite: ChEBI [5], Human Metabolome Database [10] Links: [1] http://nersuite.nlplab.org [2] http://www.biocreative.org/resources/corpora/bc-iv-chemdner-corpus [3] http://labda.inf.uc3m.es/doku.php?id=en:labda_ddicorpus [4] http://www.nactem.ac.uk/metabolite-corpus [5] http://www.ebi.ac.uk/chebi [6] http://www.drugbank.ca [7] http://ctdbase.org [8] http://pubchem.ncbi.nlm.nih.gov [9] http://www.biosemantics.org/new/index.php?page=Jochem [10] http://www.hmdb.ca
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
model |
The model to use |
String |
True |
— |
false |
— |
performAbbreviationRecognition |
Additionally perform abbreviation recognition |
Boolean |
False |
— |
false |
— |
performTokenRelabelling |
Additionally perform relabelling based on token chemical composition |
Boolean |
False |
— |
false |
— |
ColognePhoneticTranscriptor
Category: Uncategorized
Framework: DKPro Core (UIMA)
Version: 1.8.0
Cologne phonetic (Kölner Phonetik) transcription based on Apache Commons Codec. Works for German.
Compound Document
Category: Uncategorized
Framework: GATE
Version: unknown
GATE Compound Document.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
collectRepositioningInfo |
— |
java.lang.Boolean |
— |
false |
— |
— |
documentIDs |
— |
java.util.ArrayList |
— |
— |
— |
— |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
markupAware |
— |
java.lang.Boolean |
— |
true |
— |
— |
preserveOriginalContent |
— |
java.lang.Boolean |
— |
false |
— |
— |
sourceUrl |
— |
java.net.URL |
— |
— |
— |
— |
Compound Document From Xml
Category: Uncategorized
Framework: GATE
Version: unknown
GATE Compound Document.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compoundDocumentUrl |
— |
java.net.URL |
— |
— |
— |
— |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
ConnectSesameOntology
Category: Uncategorized
Framework: GATE
Version: unknown
Connect to a repository containing and ontology
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
repositoryID |
— |
java.lang.String |
— |
— |
— |
— |
repositoryLocation |
— |
java.net.URL |
— |
— |
— |
— |
Control Script
Category: Uncategorized
Framework: GATE
Version: unknown
Editor for the Groovy script controlling a scriptable controller
Copy Anns to Another Doc PR
Category: Uncategorized
Framework: GATE
Version: unknown
Copy the annotations from one document to another document.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationTypes |
— |
java.util.List |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
sourceFilesURL |
— |
java.net.URL |
— |
— |
— |
true |
Crawler PR
Category: Uncategorized
Framework: GATE
Version: unknown
GATE implementation of the Websphinx crawling API
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
convertXmlTypes |
— |
java.lang.Boolean |
— |
true |
— |
true |
depth |
— |
java.lang.Integer |
— |
3 |
— |
true |
dfs |
— |
java.lang.Boolean |
— |
true |
— |
true |
domain |
— |
crawl.DomainMode |
— |
SUBTREE |
— |
true |
keywords |
— |
java.util.List |
— |
— |
— |
true |
keywordsCaseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
true |
max |
— |
java.lang.Integer |
— |
-1 |
— |
true |
maxPageSize |
— |
java.lang.Integer |
— |
100 |
— |
true |
outputCorpus |
— |
gate.Corpus |
— |
— |
— |
true |
root |
— |
java.lang.String |
— |
— |
— |
true |
source |
— |
gate.Corpus |
— |
— |
— |
true |
stopAfter |
— |
java.lang.Integer |
— |
-1 |
— |
true |
userAgent |
— |
java.lang.String |
— |
— |
— |
true |
CreateSesameOntology
Category: Uncategorized
Framework: GATE
Version: unknown
Create a ontology from a Sesame configuration file for a repository
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
configFile |
— |
java.net.URL |
— |
— |
— |
— |
repositoryID |
— |
java.lang.String |
— |
— |
— |
— |
repositoryLocation |
— |
java.net.URL |
— |
— |
— |
— |
[[Dictionary_Pluggable_Soft_TF/IDF_Matcher]] ==== Dictionary Pluggable Soft TF/IDF Matcher
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
Tests input tokens whether they belong to an entry in the specified dictionary using SecondString Soft TF/IDF. The dictionary should have suffix of .list for its file name, and its format should be (Format: key1 TAB alias11 TAB alias12 ... NEWLINE key2 TAB alias21 TAB alias22 ...)
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
DictionaryFile |
File which contains the dictionary (Format: key1 TAB alias11 TAB alias12 … NEWLINE key2 TAB alias21 TAB alias22 …) |
String |
True |
— |
false |
— |
MaxTokenCombination |
— |
Integer |
False |
— |
false |
— |
MinMatchingScore |
— |
Float |
False |
— |
false |
— |
DisambiguateAlternatives
Category: Uncategorized
Framework: AlvisNLP
Version: 2012-04-30
Disambiguate features that have multiple values.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
ambiguousFeature |
— |
java.lang.String |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
target |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
warnIfAmbiguous |
— |
java.lang.Boolean |
False |
— |
— |
— |
DocumentFrequencyBank
Category: Uncategorized
Framework: GATE
Version: unknown
Document frequency counter derived from corpora and other DFBs
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpora |
— |
java.util.Set |
— |
— |
— |
— |
debugMode |
— |
java.lang.Boolean |
— |
false |
— |
— |
idDocumentFeature |
— |
java.lang.String |
— |
— |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
— |
inputAnnotationFeature |
— |
java.lang.String |
— |
canonical |
— |
— |
inputAnnotationTypes |
— |
java.util.Set |
— |
SingleWord;MultiWord |
— |
— |
inputBanks |
— |
java.util.Set |
— |
— |
— |
— |
languageFeature |
— |
java.lang.String |
— |
lang |
— |
— |
scoreProperty |
— |
java.lang.String |
— |
documentFrequency |
— |
— |
segmentAnnotationType |
— |
java.lang.String |
— |
— |
— |
— |
DoubleMetaphonePhoneticTranscriptor
Category: Uncategorized
Framework: DKPro Core (UIMA)
Version: 1.8.0
Double-Metaphone phonetic transcription based on Apache Commons Codec. Works for English.
ElementMapper
Category: Uncategorized
Framework: AlvisNLP
Version:
Maps elements according to a collection of mapping elements.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
entries |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
form |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
ignoreCase |
— |
java.lang.Boolean |
False |
— |
— |
— |
key |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
operator |
— |
org.bibliome.alvisnlp.modules.mapper.MappingOperator |
True |
— |
— |
— |
target |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
targetFeatures |
— |
java.lang.String[] |
True |
— |
— |
— |
values |
— |
alvisnlp.corpus.expressions.Expression[] |
True |
— |
— |
— |
ElementProjector
Category: Uncategorized
Framework: AlvisNLP
Version:
Searches for entries in a dictionary generated by an expression.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
errorDuplicateValues |
— |
java.lang.Boolean |
False |
— |
— |
— |
features |
— |
alvisnlp.module.types.ExpressionMapping |
True |
— |
— |
— |
ignoreCase |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreDiacritics |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreWhitespace |
— |
java.lang.Boolean |
False |
— |
— |
— |
key |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
multipleValueAction |
— |
org.bibliome.alvisnlp.modules.projectors.MultipleValueAction |
True |
— |
— |
— |
normalizeSpace |
— |
java.lang.Boolean |
False |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
subject |
— |
org.bibliome.alvisnlp.modules.projectors.Subject |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
values |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
ElementProjector2
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
action |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
addToLayer |
— |
java.lang.Boolean |
False |
— |
— |
— |
allUpperCaseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
allowJoined |
— |
java.lang.Boolean |
False |
— |
— |
— |
caseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
createAnnotations |
— |
java.lang.Boolean |
False |
— |
— |
— |
createDocuments |
— |
java.lang.Boolean |
False |
— |
— |
— |
createRelations |
— |
java.lang.Boolean |
False |
— |
— |
— |
createSections |
— |
java.lang.Boolean |
False |
— |
— |
— |
createTuples |
— |
java.lang.Boolean |
False |
— |
— |
— |
deleteElements |
— |
java.lang.Boolean |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
entries |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
ignoreDiacritics |
— |
java.lang.Boolean |
False |
— |
— |
— |
joinDash |
— |
java.lang.Boolean |
False |
— |
— |
— |
key |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
matchStartCaseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
multipleEntryBehaviour |
— |
org.bibliome.alvisnlp.modules.trie.MultipleEntryBehaviour |
True |
— |
— |
— |
removeFromLayer |
— |
java.lang.Boolean |
False |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
setArguments |
— |
java.lang.Boolean |
False |
— |
— |
— |
setFeatures |
— |
java.lang.Boolean |
False |
— |
— |
— |
skipConsecutiveWhitespaces |
— |
java.lang.Boolean |
False |
— |
— |
— |
skipWhitespace |
— |
java.lang.Boolean |
False |
— |
— |
— |
subject |
— |
org.bibliome.alvisnlp.modules.trie.Subject |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
trieSink |
— |
org.bibliome.util.files.OutputFile |
False |
— |
— |
— |
trieSource |
— |
org.bibliome.util.files.InputFile |
False |
— |
— |
— |
wordStartCaseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
EngLemmatiser
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
English lemmatiser which is adapted from WordNet. From dragontools/Banner toolkit.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
DisableVerbAdjective |
— |
Boolean |
True |
— |
false |
— |
IndexLookup |
— |
Boolean |
True |
— |
false |
— |
Feature Generator
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
Generates a list of user-defined observations for each token. Token and sequence boundaries are also parametrised. The output of this component is useful for machine learning components.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
FeatureDefinitions |
— |
String |
True |
— |
true |
— |
SequenceAnnotationType |
— |
String |
True |
— |
false |
— |
TokenAnnotationType |
— |
String |
True |
— |
false |
— |
FileMapper
Category: Uncategorized
Framework: AlvisNLP
Version: 2010-10-28
Maps the value of an annoation feature according to a mapping file.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
ignoreCase |
— |
java.lang.Boolean |
False |
— |
— |
— |
mappedLayerName |
— |
java.lang.String |
True |
— |
— |
— |
mappingFile |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
separator |
— |
java.lang.Character |
True |
— |
— |
— |
sourceFeature |
— |
java.lang.String |
True |
— |
— |
— |
targetFeatures |
— |
java.lang.String[] |
True |
— |
— |
— |
FileMapper2
Category: Uncategorized
Framework: AlvisNLP
Version:
Maps elements according to a tab-separated mapping file.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
form |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
ignoreCase |
— |
java.lang.Boolean |
False |
— |
— |
— |
keyColumn |
— |
java.lang.Integer |
True |
— |
— |
— |
mappingFile |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
operator |
— |
org.bibliome.alvisnlp.modules.mapper.MappingOperator |
True |
— |
— |
— |
separator |
— |
java.lang.Character |
True |
— |
— |
— |
target |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
targetFeatures |
— |
java.lang.String[] |
True |
— |
— |
— |
FreelingMorpho
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
Performs tokenisation, and determines possible lemmas and POS tags for each token, with confidence scores. Operates on English (en). Spanish (es) and Catalan (ca), Welsh (cy), Galician (gl), Italian (it) and Portuguese (pt) by setting the "language" parameter (default is English).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
— |
String |
True |
— |
false |
— |
GATE Composite document
Category: Uncategorized
Framework: GATE
Version: unknown
GATE Composite document.
Gazetteer List Collector
Category: Uncategorized
Framework: GATE
Version: unknown
Gazetteer lists collector.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationTypes |
— |
java.util.ArrayList |
— |
Organization;Person;Location;Date |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
gazetteer |
— |
gate.creole.gazetteer.Gazetteer |
— |
— |
— |
true |
markupASName |
— |
java.lang.String |
— |
Key |
— |
true |
theLanguage |
— |
java.lang.String |
— |
— |
— |
true |
GermanSeparatedParticleAnnotator
Category: Uncategorized
Framework: DKPro Core (UIMA)
Version: 1.8.0
Annotator to be used for post-processing of German corpora that have been lemmatized and POS-tagged with the TreeTagger, based on the STTS tagset. This Annotator deals with German particle verbs. Particle verbs consist of a particle and a stem, e.g. anfangen = an+fangen There are many usages of German particle verbs where the stem and the particle are separated, e.g., Wir fangen gleich an. The TreeTagger lemmatizes the verb stem as "fangen" and the separated particle as "an", the proper verblemma "anfangen" is thus not available as an annotation. The GermanSeparatedParticleAnnotator replaces the lemma of the stem of particle-verbs (e.g., fangen) by the proper verb lemma (e.g. anfangen) and leaves the lemma of the separated particle unchanged.
Hindi Main Grammar
Category: Uncategorized
Framework: GATE
Version: unknown
A module for executing Jape grammars
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
grammarURL |
— |
java.net.URL |
— |
resources/grammar/main.jape |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
Hindi OrthoMatcher
Category: Uncategorized
Framework: GATE
Version: unknown
Hindi Orthomatcher
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
annotationTypes |
— |
java.util.ArrayList |
— |
Organization;Person;Location;Date |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
false |
— |
— |
definitionFileURL |
— |
java.net.URL |
— |
resources/orthomatcher/listsNM.def |
— |
— |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
extLists |
— |
java.lang.Boolean |
— |
true |
— |
— |
highPrecisionOrgs |
— |
java.lang.Boolean |
— |
false |
— |
— |
minimumNicknameLikelihood |
— |
java.lang.Double |
— |
0.50 |
— |
— |
organizationType |
— |
java.lang.String |
— |
Organization |
— |
— |
personType |
— |
java.lang.String |
— |
Person |
— |
— |
processUnknown |
— |
java.lang.Boolean |
— |
true |
— |
— |
Hindi Tokeniser Postprocessor
Category: Uncategorized
Framework: GATE
Version: unknown
A module for executing Jape grammars
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
grammarURL |
— |
java.net.URL |
— |
resources/tokeniser/join.jape |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
HyponymyTermbank
Category: Uncategorized
Framework: GATE
Version: unknown
TermRaider Termbank derived from head/string hyponymy
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpora |
— |
java.util.Set |
— |
— |
— |
— |
debugMode |
— |
java.lang.Boolean |
— |
false |
— |
— |
idDocumentFeature |
— |
java.lang.String |
— |
— |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
— |
inputAnnotationFeature |
— |
java.lang.String |
— |
canonical |
— |
— |
inputAnnotationTypes |
— |
java.util.Set |
— |
SingleWord;MultiWord |
— |
— |
inputHeadFeatures |
— |
java.util.List |
— |
— |
— |
— |
languageFeature |
— |
java.lang.String |
— |
lang |
— |
— |
normalization |
— |
gate.termraider.modes.Normalization |
— |
Sigmoid |
— |
— |
scoreProperty |
— |
java.lang.String |
— |
kyotoDomainRelevance |
— |
— |
[[IOTestRunner$Validator]] ==== IOTestRunner$Validator
Category: Uncategorized
Framework: DKPro Core (UIMA)
Version: 1.8.0
Descriptor automatically generated by uimaFIT
InsertContents
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
insert |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
offset |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
points |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
userFunctions |
— |
org.bibliome.alvisnlp.library.UserFunction[] |
True |
— |
— |
— |
Kleio Search
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 0.3
Uses the Keio service to fetch MEDLINE abstracts matching a specified query. Kleio is available at http://www.nactem.ac.uk/Kleio/
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
query |
Kleio query |
String |
True |
— |
false |
— |
recentFirst |
If true, results will be sorted by the date of publication in decreasing order. Otherwise, they will be sorted by relevance. |
Boolean |
False |
— |
false |
— |
LBJ Named Entity Recognizer
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
A wrapper for the Illinois Named Entity Tagger
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
BeamSize |
— |
Integer |
False |
— |
false |
— |
BrownClusterFiles |
Set of resource files |
String |
True |
— |
true |
— |
BrownClusterThresholds |
Settings per cluster resource file |
Integer |
True |
— |
true |
— |
BrownIsLowercase |
Setting per cluster resource |
String |
True |
— |
true |
— |
ChunkScheme |
Whether BIO, BILOU, IOB2, etc. |
String |
True |
— |
false |
— |
EmbeddingDimensionalities |
— |
Integer |
False |
— |
false |
— |
Features |
Which features to use |
String |
True |
— |
true |
— |
ForceNewSentenceOnLineBreaks |
— |
Boolean |
False |
— |
false |
— |
InferenceMethod |
— |
String |
False |
— |
false |
— |
IsLowercaseWordEmbeddings |
— |
Boolean |
False |
— |
false |
— |
KeepOriginalFileTokenizationAndSentenceSplitting |
— |
Boolean |
False |
— |
false |
— |
Labels |
Which labels to output |
String |
True |
— |
true |
— |
LinkScoreThreshold |
— |
Float |
False |
— |
false |
— |
MinWordAppThresholdsForEmbeddings |
— |
Integer |
False |
— |
false |
— |
NormalizationConstantsForEmbeddings |
— |
Float |
False |
— |
false |
— |
NormalizationMethodsForEmbeddings |
— |
String |
False |
— |
false |
— |
NormalizeTitleText |
— |
Boolean |
True |
— |
false |
— |
PredictionConfidenceThreshold |
— |
Integer |
False |
— |
false |
— |
ThresholdPrediction |
— |
Boolean |
False |
— |
false |
— |
TokenizationScheme |
— |
String |
True |
— |
false |
— |
LayerComparator
Category: Uncategorized
Framework: AlvisNLP
Version: 2010-10-28
Compares annotations in two different layers.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
outFile |
— |
org.bibliome.util.streams.TargetStream |
True |
— |
— |
— |
predictedLayerName |
— |
java.lang.String |
True |
— |
— |
— |
referenceLayerName |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
Linguistic Simplifier
Category: Uncategorized
Framework: GATE
Version: unknown
A processing resource that takes document and corpus parameters
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
gazetteerURL |
— |
java.net.URL |
— |
resources/gazetteer/lists.def |
— |
— |
japeURL |
— |
java.net.URL |
— |
resources/jape/main.jape |
— |
— |
nounVerbMapURL |
— |
java.net.URL |
— |
resources/noun_verb.csv |
— |
— |
wordNet |
— |
gate.wordnet.WordNet |
— |
— |
— |
true |
Linguistic Simplifier
Category: Uncategorized
Framework: GATE
Version: unknown
Example application for the linguistic simplifier
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
Lupedia Service PR
Category: Uncategorized
Framework: GATE
Version: unknown
Runs a lupedia annotation service on a GATE document
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
datasets |
— |
java.util.List |
— |
Person;Event;Place;Organisation;Work |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
keepFirstAndLongestMatch |
— |
java.lang.Boolean |
— |
true |
— |
true |
keepHighest |
— |
java.lang.Boolean |
— |
true |
— |
true |
keepSpecific |
— |
java.lang.Boolean |
— |
true |
— |
true |
lang |
— |
gate.lupedia.Language |
— |
en |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
singleGreedyMatch |
— |
java.lang.Boolean |
— |
false |
— |
true |
skipShortWords |
— |
java.lang.Boolean |
— |
true |
— |
true |
skipStopWords |
— |
java.lang.Boolean |
— |
true |
— |
true |
threshold |
— |
java.lang.Double |
— |
0.70 |
— |
true |
[[Majority-vote_consensus_builder_(annotation)]] ==== Majority-vote consensus builder (annotation)
Category: Uncategorized
Framework: GATE
Version: unknown
Process results of a crowd annotation task to find where annotators agree and disagree.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
consensusASName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
disputeASName |
— |
java.lang.String |
— |
crowdDisputed |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
minimumAgreement |
— |
java.lang.Integer |
— |
— |
— |
true |
resultASName |
— |
java.lang.String |
— |
crowdResults |
— |
true |
resultAnnotationType |
— |
java.lang.String |
— |
— |
— |
true |
MergeLayers
Category: Uncategorized
Framework: AlvisNLP
Version: 2010-10-28
Creates a new layer in each section containing all annotations in source layers.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sourceLayerNames |
— |
java.lang.String[] |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
MergeSections
Category: Uncategorized
Framework: AlvisNLP
Version:
Merge several sections into a single one.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
fragmentLayerName |
— |
java.lang.String |
False |
— |
— |
— |
fragmentSelection |
— |
org.bibliome.alvisnlp.modules.clone.FragmentSelection |
True |
— |
— |
— |
fragmentSeparator |
— |
java.lang.String |
True |
— |
— |
— |
removeSections |
— |
java.lang.Boolean |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sectionSeparator |
— |
java.lang.String |
True |
— |
— |
— |
sectionsLayerName |
— |
java.lang.String |
False |
— |
— |
— |
targetSectionName |
— |
java.lang.String |
True |
— |
— |
— |
MetaMap Annotator
Category: Uncategorized
Framework: GATE
Version: unknown
This plugin uses the MetaMap Java API to send GATE document content to MetaMap skrmedpostctl server and PrologBeans mmserver instances running on the given machine/port
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotNormalize |
— |
gate.metamap.AnnotNormalizeMode |
— |
None |
— |
true |
annotateNegEx |
— |
java.lang.Boolean |
— |
false |
— |
true |
annotatePhrases |
— |
java.lang.Boolean |
— |
false |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
excludeIfContains |
— |
java.util.ArrayList |
— |
— |
— |
true |
excludeIfWithin |
— |
java.util.ArrayList |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
inputASTypeFeature |
— |
java.lang.String |
— |
— |
— |
true |
inputASTypes |
— |
java.util.ArrayList |
— |
— |
— |
true |
metaMapOptions |
— |
java.lang.String |
— |
-Xy |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASType |
— |
java.lang.String |
— |
MetaMap |
— |
true |
outputMode |
— |
gate.metamap.OutputMode |
— |
HighestMappingOnly |
— |
true |
taggerMode |
— |
gate.metamap.TaggerMode |
— |
CoReference |
— |
true |
MetaphonePhoneticTranscriptor
Category: Uncategorized
Framework: DKPro Core (UIMA)
Version: 1.8.0
Metaphone phonetic transcription based on Apache Commons Codec. Works for English.
MutationFinder
Category: Uncategorized
Framework: GATE
Version: unknown
GATE MutationFinder Wrapper
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
regexURL |
— |
java.net.URL |
— |
resources/regex.txt |
— |
— |
NGramAnnotator
Category: Uncategorized
Framework: DKPro Core (UIMA)
Version: 1.8.0
N-gram annotator.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
N |
The length of the n-grams to generate (the "n" in n-gram). |
Integer |
True |
— |
false |
— |
NGrams
Category: Uncategorized
Framework: AlvisNLP
Version: 2012-04-30
Computes annotation n-grams.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
keepAnnotations |
— |
java.lang.String[] |
True |
— |
— |
— |
maxNGramSize |
— |
java.lang.Integer |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
False |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
tokenLayerName |
— |
java.lang.String |
True |
— |
— |
— |
NeMine
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 0.0.1-SNAPSHOT
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
threshold |
— |
Float |
True |
— |
false |
— |
NewCount
Category: Uncategorized
Framework: AlvisNLP
Version: 2012-04-30
Counts element occurrences and writes the results in a file, including tfidf.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
countFile |
— |
org.bibliome.util.streams.TargetStream |
False |
— |
— |
— |
documents |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
featureKey |
— |
java.lang.String |
True |
— |
— |
— |
headers |
— |
java.lang.Boolean |
False |
— |
— |
— |
target |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
tfidfFile |
— |
org.bibliome.util.streams.TargetStream |
False |
— |
— |
— |
OBOMapper
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
ancestorsFeature |
— |
java.lang.String |
False |
— |
— |
— |
childrenFeature |
— |
java.lang.String |
False |
— |
— |
— |
form |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
idFeature |
— |
java.lang.String |
False |
— |
— |
— |
idKeys |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreCase |
— |
java.lang.Boolean |
False |
— |
— |
— |
keepDBXref |
— |
java.lang.Boolean |
False |
— |
— |
— |
nameFeature |
— |
java.lang.String |
False |
— |
— |
— |
oboFiles |
— |
java.lang.String[] |
True |
— |
— |
— |
operator |
— |
org.bibliome.alvisnlp.modules.mapper.MappingOperator |
True |
— |
— |
— |
parentsFeature |
— |
java.lang.String |
False |
— |
— |
— |
pathFeature |
— |
java.lang.String |
False |
— |
— |
— |
target |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
versionFeature |
— |
java.lang.String |
False |
— |
— |
— |
OBOProjector
Category: Uncategorized
Framework: AlvisNLP
Version:
Projects OBO terms and synonyms on sections.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
allUpperCaseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
allowJoined |
— |
java.lang.Boolean |
False |
— |
— |
— |
ancestorsFeature |
— |
java.lang.String |
False |
— |
— |
— |
caseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
childrenFeature |
— |
java.lang.String |
False |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
idFeature |
— |
java.lang.String |
False |
— |
— |
— |
ignoreDiacritics |
— |
java.lang.Boolean |
False |
— |
— |
— |
joinDash |
— |
java.lang.Boolean |
False |
— |
— |
— |
keepDBXref |
— |
java.lang.Boolean |
False |
— |
— |
— |
matchStartCaseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
multipleEntryBehaviour |
— |
org.bibliome.alvisnlp.modules.trie.MultipleEntryBehaviour |
True |
— |
— |
— |
nameFeature |
— |
java.lang.String |
False |
— |
— |
— |
oboFiles |
— |
java.lang.String[] |
True |
— |
— |
— |
parentsFeature |
— |
java.lang.String |
False |
— |
— |
— |
pathFeature |
— |
java.lang.String |
False |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
skipConsecutiveWhitespaces |
— |
java.lang.Boolean |
False |
— |
— |
— |
skipWhitespace |
— |
java.lang.Boolean |
False |
— |
— |
— |
subject |
— |
org.bibliome.alvisnlp.modules.trie.Subject |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
trieSink |
— |
org.bibliome.util.files.OutputFile |
False |
— |
— |
— |
trieSource |
— |
org.bibliome.util.files.InputFile |
False |
— |
— |
— |
versionFeature |
— |
java.lang.String |
False |
— |
— |
— |
wordStartCaseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
OWLIM Ontology
Category: Uncategorized
Framework: GATE
Version: unknown
Ontology created as a temporary OWLIM3 in-memory repository
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
baseURI |
— |
java.lang.String |
— |
— |
— |
— |
dataDirectoryURL |
— |
java.net.URL |
— |
— |
— |
— |
loadImports |
— |
java.lang.Boolean |
— |
true |
— |
— |
mappingsURL |
— |
java.net.URL |
— |
— |
— |
— |
n3URL |
— |
java.net.URL |
— |
— |
— |
— |
ntriplesURL |
— |
java.net.URL |
— |
— |
— |
— |
persistent |
— |
java.lang.Boolean |
— |
false |
— |
— |
rdfXmlURL |
— |
java.net.URL |
— |
— |
— |
— |
turtleURL |
— |
java.net.URL |
— |
— |
— |
— |
OWLIM Ontology DEPRECATED
Category: Uncategorized
Framework: GATE
Version: unknown
Ontology created as a temporary OWLIM3 in-memory repository, for backwards compatibility only
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
baseURI |
— |
java.lang.String |
— |
— |
— |
— |
dataDirectoryURL |
— |
java.net.URL |
— |
— |
— |
— |
defaultNameSpace |
— |
java.lang.String |
— |
— |
— |
— |
loadImports |
— |
java.lang.Boolean |
— |
true |
— |
— |
mappingsURL |
— |
java.net.URL |
— |
— |
— |
— |
n3URL |
— |
java.net.URL |
— |
— |
— |
— |
ntriplesURL |
— |
java.net.URL |
— |
— |
— |
— |
persistent |
— |
java.lang.Boolean |
— |
false |
— |
— |
rdfXmlURL |
— |
java.net.URL |
— |
— |
— |
— |
turtleURL |
— |
java.net.URL |
— |
— |
— |
— |
OntoReif
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
OpenNLPNEDetector
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
Detects named entities in text and creates corresponding entity annotations that span the found entities. Uses the OpenNLP MaxEnt named entity Detector. Each entity class has a separate MaxEnt model file. All model files must be stored in a single model file directory and use the following naming convention: "class.bin.gz", where "class" is the entity class name and ".bin.gz" must appear as shown, e.g., "person.bin.gz". This analysis engine takes a parameter called "EntityTypeMapping" which maps each entity class name to an entity annotation type. The entity class name must match a model file in the model file directory, and the entity annotation type must be defined in the type system and have a corresponding JCas Java class.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
EntityTypeMappings |
Mapping from entity names (obtained from the model filename) to the JCas class for the corresponding annotation. Each mapping string is of the form "name,class", i.e., the entity type name followed by a comma followed by the annotation class. |
String |
False |
— |
true |
— |
ModelDirectory |
— |
String |
True |
— |
false |
— |
OpenNLPSentenceDetector
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
Detect sentence boundaries and create sentence annotations that span these boundaries. Uses the OpenNLP MaxEnt Sentence Detector.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ModelFile |
Filename of the model file. |
String |
True |
— |
false |
— |
OrthoRef
Category: Uncategorized
Framework: GATE
Version: unknown
An orthographic coreferencer
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
configFileUrl |
— |
java.net.URL |
— |
resources/default-config.coref.xml |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
maxLookBehind |
— |
java.lang.Integer |
— |
10 |
— |
true |
OscarMER
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
Runs Oscar 3 with maximum entropy based recogniser with syntactic tokens as input
PMI Bank
Category: Uncategorized
Framework: GATE
Version: unknown
Pointwise Mutual Information from corpora
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
allowOverlapCollocations |
— |
java.lang.Boolean |
— |
false |
— |
— |
corpora |
— |
java.util.Set |
— |
— |
— |
— |
debugMode |
— |
java.lang.Boolean |
— |
false |
— |
— |
innerAnnotationTypes |
— |
java.util.Set |
— |
Entity |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
— |
inputAnnotationFeature |
— |
java.lang.String |
— |
canonical |
— |
— |
languageFeature |
— |
java.lang.String |
— |
lang |
— |
— |
outerAnnotationType |
— |
java.lang.String |
— |
Sentence |
— |
— |
outerAnnotationWindow |
— |
java.lang.Integer |
— |
2 |
— |
— |
requireTypeDifference |
— |
java.lang.Boolean |
— |
false |
— |
— |
scoreProperty |
— |
java.lang.String |
— |
pmiScore |
— |
— |
[[PMI_Example_(English)]] ==== PMI Example (English)
Category: Uncategorized
Framework: GATE
Version: unknown
Example application for the PMI (pointwise mutual information) tool
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
PatternMatcher
Category: Uncategorized
Framework: AlvisNLP
Version: 2010-10-28
Matches a regular expression-like pattern on the sequence of annotations in a given layer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
actions |
— |
org.bibliome.alvisnlp.modules.pattern.action.MatchAction[] |
True |
— |
— |
— |
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
annotationComparator |
— |
alvisnlp.corpus.AnnotationComparator |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
layerName |
— |
java.lang.String |
True |
— |
— |
— |
overlappingBehaviour |
— |
org.bibliome.alvisnlp.modules.pattern.OverlappingBehaviour |
True |
— |
— |
— |
pattern |
— |
org.bibliome.alvisnlp.modules.pattern.ElementPattern |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
ProminentConceptReporter
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
conceptAnnotations |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
conceptId |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
documents |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sectionName |
— |
java.lang.String |
True |
— |
— |
— |
Quality Assurance PR
Category: Uncategorized
Framework: GATE
Version: unknown
The Quality Assurance PR provides a functionality of the Corpus QA Tool in GATE Developer
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationTypes |
— |
java.util.List |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
featureNames |
— |
java.util.List |
— |
— |
— |
true |
keyASName |
— |
java.lang.String |
— |
Key |
— |
true |
measure |
— |
gate.qa.Measure |
— |
— |
— |
true |
outputFolderUrl |
— |
java.net.URL |
— |
— |
— |
true |
responseASName |
— |
java.lang.String |
— |
— |
— |
true |
QuickHTML
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
classFeature |
— |
java.lang.String |
True |
— |
— |
— |
colors |
— |
java.lang.String[] |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
features |
— |
java.lang.String[] |
False |
— |
— |
— |
layers |
— |
java.lang.String[] |
False |
— |
— |
— |
outDir |
— |
org.bibliome.util.files.OutputDirectory |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
tagFeature |
— |
java.lang.String |
False |
— |
— |
— |
RO_FDGBank
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.1
This reader performs the transformation of the CONLL tab separated text format to the CAS ConllDependency format.
Reference Evaluator
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
Reports annotation performance comparing views (sofas) to one selected reference view.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
OutputFile |
— |
String |
True |
— |
false |
— |
RegExp
Category: Uncategorized
Framework: AlvisNLP
Version: 2010-09-27
Matches a regular expression on sections contents and create an annotation for each match.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
pattern |
— |
java.util.regex.Pattern |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
Regex Annotator
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.1
Annotates spans of text based on a custom regular expression.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationType |
Fully qualified type of annotations to be produced. The type must extend uima.tcas.Annotation or be uima.tcas.Annotation. |
String |
True |
— |
false |
— |
caseSensitive |
— |
Boolean |
False |
— |
false |
— |
findFirstOnly |
If true, matching will stop after encountering the first match. |
Boolean |
False |
— |
false |
— |
multilineMatching |
If true then the "^" and "$" symbols match the beginnngs and ends of lines. Otherwise, they match the beginning and end of the entire text. |
Boolean |
False |
— |
false |
— |
regularExpression |
A valid regular expression. |
String |
True |
— |
false |
— |
RemoveContents
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
stripLayerName |
— |
java.lang.String |
True |
— |
— |
— |
userFunctions |
— |
org.bibliome.alvisnlp.library.UserFunction[] |
True |
— |
— |
— |
RemoveEquivalent
Category: Uncategorized
Framework: AlvisNLP
Version:
Removes duplicate elements.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
equivalency |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
priority |
— |
alvisnlp.corpus.expressions.Expression |
False |
— |
— |
— |
target |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
RemoveOverlaps
Category: Uncategorized
Framework: AlvisNLP
Version: 2010-10-28
Removes overlapping annotations from a given layer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
annotationComparator |
— |
alvisnlp.corpus.AnnotationComparator |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
layerName |
— |
java.lang.String |
True |
— |
— |
— |
removeEqual |
— |
java.lang.Boolean |
True |
— |
— |
— |
removeIncluded |
— |
java.lang.Boolean |
True |
— |
— |
— |
removeOverlapping |
— |
java.lang.Boolean |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
Romanian Transducer
Category: Uncategorized
Framework: GATE
Version: unknown
A module for executing Jape grammars
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
grammarURL |
— |
java.net.URL |
— |
resources/Grammar/main.jape |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
SFTP BioNLP Shared Task Data Provider
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
Reads a corpus in BioNLP Shared Task format from a remote directory on a user-specified server via SFTP.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
Password |
— |
String |
True |
— |
false |
— |
RemoteDirectory |
— |
String |
True |
— |
false |
— |
ServerURL |
— |
String |
True |
— |
false |
— |
Username |
— |
String |
True |
— |
false |
— |
SQLImport
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
action |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
addToLayer |
— |
java.lang.Boolean |
False |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
createAnnotations |
— |
java.lang.Boolean |
False |
— |
— |
— |
createDocuments |
— |
java.lang.Boolean |
False |
— |
— |
— |
createRelations |
— |
java.lang.Boolean |
False |
— |
— |
— |
createSections |
— |
java.lang.Boolean |
False |
— |
— |
— |
createTuples |
— |
java.lang.Boolean |
False |
— |
— |
— |
deleteElements |
— |
java.lang.Boolean |
False |
— |
— |
— |
parameters |
— |
org.bibliome.alvisnlp.modules.sql.SQLParameter[] |
True |
— |
— |
— |
password |
— |
java.lang.String |
True |
— |
— |
— |
query |
— |
java.lang.String |
True |
— |
— |
— |
removeFromLayer |
— |
java.lang.Boolean |
False |
— |
— |
— |
setArguments |
— |
java.lang.Boolean |
False |
— |
— |
— |
setFeatures |
— |
java.lang.Boolean |
False |
— |
— |
— |
target |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
url |
— |
java.lang.String |
True |
— |
— |
— |
username |
— |
java.lang.String |
True |
— |
— |
— |
SeSMig
Category: Uncategorized
Framework: AlvisNLP
Version: 2010-10-28
Detects sentence boundaries and creates one annotation for each sentence.This module assumes WoSMig processed the same sections.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
eosStatusFeature |
— |
java.lang.String |
True |
— |
— |
— |
formFeature |
— |
java.lang.String |
True |
— |
— |
— |
noBreakLayerName |
— |
java.lang.String |
False |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
strongPunctuations |
— |
java.lang.String |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
typeFeature |
— |
java.lang.String |
True |
— |
— |
— |
wordLayerName |
— |
java.lang.String |
True |
— |
— |
— |
Search Results
Category: Uncategorized
Framework: GATE
Version: unknown
Viewer for IR search results
SearchPR
Category: Uncategorized
Framework: GATE
Version: unknown
Provides IR functionality.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.creole.ir.IndexedCorpus |
— |
— |
— |
true |
fieldNames |
— |
java.util.ArrayList |
— |
* |
— |
true |
limit |
— |
java.lang.Integer |
— |
20 |
— |
true |
query |
— |
java.lang.String |
— |
— |
— |
true |
searcherClassName |
— |
java.lang.String |
— |
gate.creole.ir.lucene.LuceneSearch |
— |
true |
Sequence_Impl
Category: Uncategorized
Framework: AlvisNLP
Version:
Sequence of modules.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
[[Show/Hide_Resources]] ==== Show/Hide Resources
Category: Uncategorized
Framework: GATE
Version: unknown
Show resources that would otherwise be hidden, e.g. resources created for internal use by other resources
SimpleProjector
Category: Uncategorized
Framework: AlvisNLP
Version: 2010-10-28
Projects a simple dictionary on sections.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
dictFile |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
entryFeatureNames |
— |
java.lang.String[] |
True |
— |
— |
— |
errorDuplicateValues |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreCase |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreDiacritics |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreWhitespace |
— |
java.lang.Boolean |
False |
— |
— |
— |
multipleValueAction |
— |
org.bibliome.alvisnlp.modules.projectors.MultipleValueAction |
True |
— |
— |
— |
normalizeSpace |
— |
java.lang.Boolean |
False |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
separator |
— |
java.lang.Character |
True |
— |
— |
— |
skipBlankLines |
— |
java.lang.Boolean |
True |
— |
— |
— |
strictColumnNumber |
— |
java.lang.Boolean |
False |
— |
— |
— |
subject |
— |
org.bibliome.alvisnlp.modules.projectors.Subject |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
trimColumns |
— |
java.lang.Boolean |
True |
— |
— |
— |
SimpleProjector2
Category: Uncategorized
Framework: AlvisNLP
Version:
Projects a simple dictionary on sections.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
allUpperCaseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
allowJoined |
— |
java.lang.Boolean |
False |
— |
— |
— |
caseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
dictFile |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
ignoreDiacritics |
— |
java.lang.Boolean |
False |
— |
— |
— |
joinDash |
— |
java.lang.Boolean |
False |
— |
— |
— |
keyIndex |
— |
java.lang.Integer[] |
True |
— |
— |
— |
matchStartCaseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
multipleEntryBehaviour |
— |
org.bibliome.alvisnlp.modules.trie.MultipleEntryBehaviour |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
separator |
— |
java.lang.Character |
True |
— |
— |
— |
skipBlank |
— |
java.lang.Boolean |
False |
— |
— |
— |
skipConsecutiveWhitespaces |
— |
java.lang.Boolean |
False |
— |
— |
— |
skipEmpty |
— |
java.lang.Boolean |
False |
— |
— |
— |
skipWhitespace |
— |
java.lang.Boolean |
False |
— |
— |
— |
strictColumnNumber |
— |
java.lang.Boolean |
True |
— |
— |
— |
subject |
— |
org.bibliome.alvisnlp.modules.trie.Subject |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
trieSink |
— |
org.bibliome.util.files.OutputFile |
False |
— |
— |
— |
trieSource |
— |
org.bibliome.util.files.InputFile |
False |
— |
— |
— |
trimColumns |
— |
java.lang.Boolean |
False |
— |
— |
— |
valueFeatures |
— |
java.lang.String[] |
True |
— |
— |
— |
wordStartCaseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
SoundexPhoneticTranscriptor
Category: Uncategorized
Framework: DKPro Core (UIMA)
Version: 1.8.0
Soundex phonetic transcription based on Apache Commons Codec. Works for English.
Species
Category: Uncategorized
Framework: AlvisNLP
Version:
Calls the Species taxon tagger.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
speciesDir |
— |
org.bibliome.util.files.InputDirectory |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
taxidFeature |
— |
java.lang.String |
False |
— |
— |
— |
SplitOverlaps
Category: Uncategorized
Framework: AlvisNLP
Version:
Splits overlapping annotations.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
checkedlayerNames |
— |
java.lang.String[] |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
indexFeatureName |
— |
java.lang.String |
False |
— |
— |
— |
modifiedlayerName |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
TermRaider English Term Extraction
Category: Uncategorized
Framework: GATE
Version: unknown
Example application showing typical set-up for the TermRaider tools
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
Termbank Score Copier
Category: Uncategorized
Framework: GATE
Version: unknown
Copy scores from Termbanks back to their source annotations
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
docFrequencyFeature |
— |
java.lang.String |
— |
docFrequency |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
frequencyFeature |
— |
java.lang.String |
— |
frequency |
— |
true |
termbank |
— |
gate.termraider.bank.AbstractTermbank |
— |
— |
— |
true |
TextRazor Service PR
Category: Uncategorized
Framework: GATE
Version: unknown
Runs the TextRazor annotation service (http://textrazor.com) on a GATE document
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
apiKey |
— |
java.lang.String |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
TfIdfTermbank
Category: Uncategorized
Framework: GATE
Version: unknown
TermRaider Termbank derived from vectors in document features
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpora |
— |
java.util.Set |
— |
— |
— |
— |
debugMode |
— |
java.lang.Boolean |
— |
false |
— |
— |
docFreqSource |
— |
gate.termraider.bank.DocumentFrequencyBank |
— |
— |
— |
— |
idDocumentFeature |
— |
java.lang.String |
— |
— |
— |
— |
idfCalculation |
— |
gate.termraider.modes.IdfCalculation |
— |
LogarithmicScaled |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
— |
inputAnnotationFeature |
— |
java.lang.String |
— |
canonical |
— |
— |
inputAnnotationTypes |
— |
java.util.Set |
— |
SingleWord;MultiWord |
— |
— |
languageFeature |
— |
java.lang.String |
— |
lang |
— |
— |
normalization |
— |
gate.termraider.modes.Normalization |
— |
Sigmoid |
— |
— |
scoreProperty |
— |
java.lang.String |
— |
tfIdf |
— |
— |
tfCalculation |
— |
gate.termraider.modes.TfCalculation |
— |
Logarithmic |
— |
— |
TfidfAnnotator
Category: Uncategorized
Framework: DKPro Core (UIMA)
Version: 1.8.0
This component adds Tfidf annotations consisting of a term and a tfidf weight.
The annotator is type agnostic concerning the input annotation, so you have to specify the
annotation type and string representation. It uses a pre-serialized DfStore, which can be
created using the TfidfConsumer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
featurePath |
This annotator is type agnostic, so it is mandatory to specify the type of the working annotation and how to obtain the string representation with the feature path. |
String |
True |
— |
false |
— |
lowercase |
If set to true, the whole text is handled in lower case. |
Boolean |
False |
— |
false |
— |
tfdfPath |
Provide the path to the Df-Model. When a shared SharedDfModel is bound to this annotator, this is ignored. |
String |
False |
— |
false |
— |
weightingModeIdf |
The model for inverse document frequency weighting.<br> Invoke toString() on an enum of WeightingModeIdf for setup. <p> Default value is "NORMAL" yielding an unweighted idf. |
String |
False |
— |
false |
— |
weightingModeTf |
The model for term frequency weighting.<br> Invoke toString() on an enum of WeightingModeTf for setup. <p> Default value is "NORMAL" yielding an unweighted tf. |
String |
False |
— |
false |
— |
TomapProjector
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
allUpperCaseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
allowJoined |
— |
java.lang.Boolean |
False |
— |
— |
— |
caseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
conceptFeature |
— |
java.lang.String |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
explanationFeaturePrefix |
— |
java.lang.String |
False |
— |
— |
— |
ignoreDiacritics |
— |
java.lang.Boolean |
False |
— |
— |
— |
joinDash |
— |
java.lang.Boolean |
False |
— |
— |
— |
lemmaKeys |
— |
java.lang.Boolean |
False |
— |
— |
— |
matchStartCaseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
multipleEntryBehaviour |
— |
org.bibliome.alvisnlp.modules.trie.MultipleEntryBehaviour |
True |
— |
— |
— |
onlyMNP |
— |
java.lang.Boolean |
False |
— |
— |
— |
scoreFeature |
— |
java.lang.String |
False |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
skipConsecutiveWhitespaces |
— |
java.lang.Boolean |
False |
— |
— |
— |
skipWhitespace |
— |
java.lang.Boolean |
False |
— |
— |
— |
subject |
— |
org.bibliome.alvisnlp.modules.trie.Subject |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
tomapClassifier |
— |
org.bibliome.alvisnlp.modules.tomap.TomapClassifier |
True |
— |
— |
— |
trieSink |
— |
org.bibliome.util.files.OutputFile |
False |
— |
— |
— |
trieSource |
— |
org.bibliome.util.files.InputFile |
False |
— |
— |
— |
wordStartCaseInsensitive |
— |
java.lang.Boolean |
False |
— |
— |
— |
yateaFile |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
TomapTrain
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
bioYatea |
— |
java.lang.Boolean |
False |
— |
— |
— |
conceptIdentifier |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
configDir |
— |
org.bibliome.util.files.InputDirectory |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
formFeature |
— |
java.lang.String |
True |
— |
— |
— |
language |
— |
java.lang.String |
False |
— |
— |
— |
lemmaFeature |
— |
java.lang.String |
True |
— |
— |
— |
localeDir |
— |
org.bibliome.util.files.InputDirectory |
False |
— |
— |
— |
outFile |
— |
org.bibliome.util.streams.TargetStream |
True |
— |
— |
— |
outputDir |
— |
org.bibliome.util.files.OutputDirectory |
False |
— |
— |
— |
perlLib |
— |
java.lang.String |
False |
— |
— |
— |
posFeature |
— |
java.lang.String |
True |
— |
— |
— |
postProcessingConfig |
— |
org.bibliome.util.files.InputFile |
False |
— |
— |
— |
postProcessingOutput |
— |
org.bibliome.util.files.OutputFile |
False |
— |
— |
— |
rcFile |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
True |
— |
— |
— |
suffix |
— |
java.lang.String |
False |
— |
— |
— |
wordLayerName |
— |
java.lang.String |
True |
— |
— |
— |
workingDir |
— |
org.bibliome.util.files.WorkingDirectory |
True |
— |
— |
— |
yateaDefaultConfig |
— |
alvisnlp.module.types.Mapping |
True |
— |
— |
— |
yateaExecutable |
— |
org.bibliome.util.files.ExecutableFile |
True |
— |
— |
— |
yateaOptions |
— |
alvisnlp.module.types.Mapping |
True |
— |
— |
— |
TyDIProjector
Category: Uncategorized
Framework: AlvisNLP
Version: 2010-10-28
Projects terms from a TiDI export.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
acronymsFile |
— |
org.bibliome.util.streams.SourceStream |
False |
— |
— |
— |
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
canonicalFormFeature |
— |
java.lang.String |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
errorDuplicateValues |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreCase |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreDiacritics |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreWhitespace |
— |
java.lang.Boolean |
False |
— |
— |
— |
lemmaFile |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
mergeFile |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
multipleValueAction |
— |
org.bibliome.alvisnlp.modules.projectors.MultipleValueAction |
True |
— |
— |
— |
normalizeSpace |
— |
java.lang.Boolean |
False |
— |
— |
— |
quasiSynonymsFile |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
saveDictFile |
— |
org.bibliome.util.streams.TargetStream |
False |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
subject |
— |
org.bibliome.alvisnlp.modules.projectors.Subject |
True |
— |
— |
— |
synonymsFile |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
typographicVariationsFile |
— |
org.bibliome.util.streams.SourceStream |
False |
— |
— |
— |
Type Mapper
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 0.1
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ignoreMissingSourceType |
— |
Boolean |
False |
— |
false |
— |
ignoreMissingTargetType |
— |
Boolean |
False |
— |
false |
— |
mappingDefinition |
Definition of mappings from source types to target types. |
String |
False |
— |
false |
— |
UAICLemmav1
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
Assigns base forms to tokenised text. Also assigns certain parts of speech
UAICLemmav2
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 1.0
Assigns base forms in Romanian text, given POS-tagged text.
UMLS Full Dictionary Feature Extractor
Category: Uncategorized
Framework: NaCTeM (UIMA)
Version: 0.0.1-SNAPSHOT
Extracts Dictionary features from a UMLS-sourced dictionary
WapitiLabel
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
commandLineOptions |
— |
java.lang.String[] |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
features |
— |
alvisnlp.corpus.expressions.Expression[] |
True |
— |
— |
— |
labelFeature |
— |
java.lang.String |
True |
— |
— |
— |
modelFile |
— |
org.bibliome.util.files.InputFile |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
False |
— |
— |
— |
tokenLayerName |
— |
java.lang.String |
True |
— |
— |
— |
wapitiExecutable |
— |
org.bibliome.util.files.ExecutableFile |
True |
— |
— |
— |
WapitiTrain
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
commandLineOptions |
— |
java.lang.String[] |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
features |
— |
alvisnlp.corpus.expressions.Expression[] |
True |
— |
— |
— |
modelFile |
— |
org.bibliome.util.files.OutputFile |
True |
— |
— |
— |
modelType |
— |
java.lang.String |
False |
— |
— |
— |
patternFile |
— |
org.bibliome.util.files.InputFile |
False |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
False |
— |
— |
— |
tokenLayerName |
— |
java.lang.String |
True |
— |
— |
— |
trainAlgorithm |
— |
java.lang.String |
False |
— |
— |
— |
wapitiExecutable |
— |
org.bibliome.util.files.ExecutableFile |
True |
— |
— |
— |
WoSMig
Category: Uncategorized
Framework: AlvisNLP
Version: 2010-10-28
Performs word segmentation on section contents.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
annotationComparator |
— |
alvisnlp.corpus.AnnotationComparator |
True |
— |
— |
— |
annotationTypeFeature |
— |
java.lang.String |
True |
— |
— |
— |
balancedPunctuations |
— |
java.lang.String |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
fixedFormLayerName |
— |
java.lang.String |
False |
— |
— |
— |
fixedType |
— |
java.lang.String |
True |
— |
— |
— |
punctuationType |
— |
java.lang.String |
True |
— |
— |
— |
punctuations |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
wordType |
— |
java.lang.String |
True |
— |
— |
— |
WordNet
Category: Uncategorized
Framework: GATE
Version: unknown
WordNet
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
propertyUrl |
— |
java.net.URL |
— |
— |
— |
— |
WordNet 1.6
Category: Uncategorized
Framework: GATE
Version: unknown
Princeton WordNet 1.6.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
propertyUrl |
— |
java.net.URL |
— |
— |
— |
— |
YateaProjector
Category: Uncategorized
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
errorDuplicateValues |
— |
java.lang.Boolean |
False |
— |
— |
— |
head |
— |
java.lang.String |
True |
— |
— |
— |
ignoreCase |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreDiacritics |
— |
java.lang.Boolean |
False |
— |
— |
— |
ignoreWhitespace |
— |
java.lang.Boolean |
False |
— |
— |
— |
mnpOnly |
— |
java.lang.Boolean |
False |
— |
— |
— |
modifier |
— |
java.lang.String |
True |
— |
— |
— |
monoHeadId |
— |
java.lang.String |
True |
— |
— |
— |
multipleValueAction |
— |
org.bibliome.alvisnlp.modules.projectors.MultipleValueAction |
True |
— |
— |
— |
normalizeSpace |
— |
java.lang.Boolean |
False |
— |
— |
— |
projectLemmas |
— |
java.lang.Boolean |
False |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
subject |
— |
org.bibliome.alvisnlp.modules.projectors.Subject |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
termId |
— |
java.lang.String |
True |
— |
— |
— |
termLemma |
— |
java.lang.String |
False |
— |
— |
— |
termPOS |
— |
java.lang.String |
False |
— |
— |
— |
yateaFile |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
Zemanta Service PR
Category: Uncategorized
Framework: GATE
Version: unknown
Runs a zemanta annotation service on a GATE document
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
apiKey |
— |
java.lang.String |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
numberOfSentencesInBatch |
— |
java.lang.Integer |
— |
— |
— |
true |
numberOfSentencesInContext |
— |
java.lang.Integer |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
Chunker (7)
ANNIE VP Chunker
Category: Chunker
Framework: GATE
Version: unknown
ANNIE VP Chunker component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
enableDebugging |
— |
java.lang.Boolean |
— |
false |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
grammarURL |
— |
java.net.URL |
— |
../ANNIE/resources/VP/VerbGroups.jape |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
Noun Phrase Chunker
Category: Chunker
Framework: GATE
Version: unknown
Ready-made NP chunking application
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
Noun Phrase Chunker
Category: Chunker
Framework: GATE
Version: unknown
Implementation of the Ramshaw and Marcus base noun phrase chunker
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationName |
— |
java.lang.String |
— |
NounChunk |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
posFeature |
— |
java.lang.String |
— |
category |
— |
true |
posTagURL |
— |
java.net.URL |
— |
pos_tag_dict |
— |
— |
rulesURL |
— |
java.net.URL |
— |
rules |
— |
— |
unknownTag |
— |
java.lang.String |
— |
I |
— |
true |
OpenNLP Chunker
Category: Chunker
Framework: GATE
Version: unknown
Chunker using an OpenNLP maxent model
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
chunkFeature |
— |
java.lang.String |
— |
chunk |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
model |
— |
java.net.URL |
— |
models/english/en-chunker.bin |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
OpenNlpChunker
Category: Chunker
Framework: DKPro Core (UIMA)
Version: 1.8.0
Chunk annotator using OpenNLP.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ChunkMappingLocation |
Load the chunk tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spamming the heap with thousands of strings representing only a few different tags. Default: true |
Boolean |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. Default: false |
Boolean |
True |
— |
false |
— |
TreeTaggerChunker
Category: Chunker
Framework: DKPro Core (UIMA)
Version: 1.8.0
Chunk annotator using TreeTagger.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ChunkMappingLocation |
Location of the mapping file for chunk tags to UIMA types. |
String |
False |
— |
false |
— |
executablePath |
Use this TreeTagger executable instead of trying to locate the executable automatically. |
String |
False |
— |
false |
— |
flushSequence |
A sequence to flush the internal TreeTagger buffer and to force it to output the rest of the completed analysis. This is typically just a sequence of like 5-10 full stops (".") separated by new line characters. However, some models may require a different flush sequence, e.g. a short sentence in the respective language. For chunker models, mind that the sentence must also be POS tagged, e.g. Nous-PRO:PER\n…. |
String |
False |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spaming the heap with thousands of strings representing only a few different tags. Default: true |
Boolean |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
performanceMode |
TT4J setting: Disable some sanity checks, e.g. whether tokens contain line breaks (which is not allowed). Turning this on will increase your performance, but the wrapper may throw exceptions if illegal data is provided. |
Boolean |
True |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. Default: false |
Boolean |
True |
— |
false |
— |
Classifier (8)
Entity Classification Job Builder
Category: Classifier
Framework: GATE
Version: unknown
Build a CrowdFlower job asking users to select the right label for entities
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
apiKey |
— |
java.lang.String |
— |
— |
— |
— |
contextASName |
— |
java.lang.String |
— |
— |
— |
true |
contextAnnotationType |
— |
java.lang.String |
— |
Sentence |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
entityASName |
— |
java.lang.String |
— |
— |
— |
true |
entityAnnotationType |
— |
java.lang.String |
— |
Mention |
— |
true |
jobId |
— |
java.lang.Long |
— |
— |
— |
true |
skipExisting |
— |
java.lang.Boolean |
— |
true |
— |
true |
Entity Classification Results Importer
Category: Classifier
Framework: GATE
Version: unknown
Import judgments from a CrowdFlower job created by the Entity Classification Job Builder as GATE annotations.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
answerFeatureName |
— |
java.lang.String |
— |
answer |
— |
true |
apiKey |
— |
java.lang.String |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
entityASName |
— |
java.lang.String |
— |
— |
— |
true |
entityAnnotationType |
— |
java.lang.String |
— |
Mention |
— |
true |
jobId |
— |
java.lang.Long |
— |
— |
— |
true |
resultASName |
— |
java.lang.String |
— |
crowdResults |
— |
true |
resultAnnotationType |
— |
java.lang.String |
— |
Mention |
— |
true |
[[Majority-vote_consensus_builder_(classification)]] ==== Majority-vote consensus builder (classification)
Category: Classifier
Framework: GATE
Version: unknown
Process results of a crowd annotation task to find where annotators agree and disagree.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
answerFeatureName |
— |
java.lang.String |
— |
answer |
— |
true |
consensusASName |
— |
java.lang.String |
— |
crowdConsensus |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
disputeASName |
— |
java.lang.String |
— |
crowdDisputed |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
entityAnnotationType |
— |
java.lang.String |
— |
Mention |
— |
true |
minimumAgreement |
— |
java.lang.Integer |
— |
— |
— |
true |
noAgreementAction |
— |
gate.crowdsource.classification.MajorityVoteClassificationConsensus$Action |
— |
resolveLocally |
— |
true |
originalEntityASName |
— |
java.lang.String |
— |
— |
— |
true |
resultASName |
— |
java.lang.String |
— |
crowdResults |
— |
true |
resultAnnotationType |
— |
java.lang.String |
— |
Mention |
— |
true |
SelectingElementClassifier
Category: Classifier
Framework: AlvisNLP
Version: 2012-04-30
Searches for discrimminating attributes with Weka.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
evaluationFile |
— |
org.bibliome.util.streams.TargetStream |
True |
— |
— |
— |
evaluator |
— |
java.lang.String |
True |
— |
— |
— |
evaluatorOptions |
— |
java.lang.String[] |
False |
— |
— |
— |
examples |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
relationDefinition |
— |
org.bibliome.alvisnlp.modules.classifiers.RelationDefinition |
True |
— |
— |
— |
search |
— |
java.lang.String |
False |
— |
— |
— |
searchOptions |
— |
java.lang.String[] |
False |
— |
— |
— |
TaggingElementClassifier
Category: Classifier
Framework: AlvisNLP
Version: 2012-04-30
Classifies elements with a Weka classifier.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
classifierFile |
— |
java.io.File |
True |
— |
— |
— |
evaluationFile |
— |
org.bibliome.util.streams.TargetStream |
False |
— |
— |
— |
examples |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
predictedClassFeatureKey |
— |
java.lang.String |
True |
— |
— |
— |
relationDefinition |
— |
org.bibliome.alvisnlp.modules.classifiers.RelationDefinition |
True |
— |
— |
— |
Text Categorization PR
Category: Classifier
Framework: GATE
Version: unknown
Classify text based on a semantic space
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
categoriesURL |
— |
java.net.URL |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
inputAnnotationType |
— |
java.lang.String |
— |
Sentence |
— |
true |
inputFeatureName |
— |
java.lang.String |
— |
root |
— |
true |
modelURL |
— |
java.net.URL |
— |
— |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputAnnotationType |
— |
java.lang.String |
— |
Sentence |
— |
true |
outputFeatureName |
— |
java.lang.String |
— |
category |
— |
true |
sematicSpaceURL |
— |
java.net.URL |
— |
— |
— |
— |
stopWordsURL |
— |
java.net.URL |
— |
— |
— |
— |
tokenAnnotationType |
— |
java.lang.String |
— |
Token |
— |
true |
Textalytics Text Classification
Category: Classifier
Framework: GATE
Version: unknown
Textalytics Text Classification
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
apiURL |
— |
java.lang.String |
— |
— |
true |
|
categories |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
debug |
— |
java.lang.Boolean |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASTypes |
— |
java.util.List |
— |
— |
— |
true |
inputASname |
— |
java.lang.String |
— |
— |
— |
true |
key |
— |
java.lang.String |
— |
— |
— |
true |
model |
— |
java.lang.String |
— |
— |
— |
true |
outputASname |
— |
java.lang.String |
— |
Textalytics |
— |
true |
title |
— |
java.lang.String |
— |
— |
— |
true |
verbose |
— |
java.lang.Boolean |
— |
— |
— |
true |
TrainingElementClassifier
Category: Classifier
Framework: AlvisNLP
Version: 2012-04-30
Trains a Weka classifier where examples are elements.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
algorithm |
— |
java.lang.String |
True |
— |
— |
— |
arffFile |
— |
org.bibliome.util.streams.TargetStream |
False |
— |
— |
— |
classifierFile |
— |
java.io.File |
True |
— |
— |
— |
classifierInfoFile |
— |
org.bibliome.util.streams.TargetStream |
False |
— |
— |
— |
classifierOptions |
— |
java.lang.String[] |
False |
— |
— |
— |
crossFolds |
— |
java.lang.Integer |
False |
— |
— |
— |
evaluationFile |
— |
org.bibliome.util.streams.TargetStream |
False |
— |
— |
— |
examples |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
foldFeatureKey |
— |
java.lang.String |
False |
— |
— |
— |
predictedClassFeatureKey |
— |
java.lang.String |
False |
— |
— |
— |
randomSeed |
— |
java.lang.Long |
True |
— |
— |
— |
relationDefinition |
— |
org.bibliome.alvisnlp.modules.classifiers.RelationDefinition |
True |
— |
— |
— |
Coreference (3)
ANNIE Nominal Coreferencer
Category: Coreference
Framework: GATE
Version: unknown
Nominal Coreference resolution component
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
ANNIE Pronominal Coreferencer
Category: Coreference
Framework: GATE
Version: unknown
Pronominal Coreference resolution component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inanimatedEntityTypes |
— |
java.lang.String |
— |
Organization;Location |
— |
true |
resolveIt |
— |
java.lang.Boolean |
— |
false |
— |
true |
StanfordCoreferenceResolver
Category: Coreference
Framework: DKPro Core (UIMA)
Version: 1.8.0
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
maxDist |
DCoRef parameter: Maximum sentence distance between two mentions for resolution (-1: no constraint on the distance) |
Integer |
True |
— |
false |
— |
postprocessing |
DCoRef parameter: Do post processing |
Boolean |
True |
— |
false |
— |
score |
DCoRef parameter: Scoring the output of the system |
Boolean |
True |
— |
false |
— |
sieves |
DCoRef parameter: Sieve passes - each class is defined in dcoref/sievepasses/. |
String |
True |
— |
false |
— |
singleton |
DCoRef parameter: setting singleton predictor |
Boolean |
True |
— |
false |
— |
CrowdSourcing (1)
Entity Annotation Job Builder
Category: CrowdSourcing
Framework: GATE
Version: unknown
Build a CrowdFlower job asking users to annotate entities within a snippet of text
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
apiKey |
— |
java.lang.String |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
detailFeatureName |
— |
java.lang.String |
— |
detail |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
entityASName |
— |
java.lang.String |
— |
— |
— |
true |
entityAnnotationType |
— |
java.lang.String |
— |
— |
— |
true |
goldFeatureName |
— |
java.lang.String |
— |
gold |
— |
true |
goldFeatureValue |
— |
java.lang.String |
— |
yes |
— |
true |
goldReasonFeatureName |
— |
java.lang.String |
— |
reason |
— |
true |
jobId |
— |
java.lang.Long |
— |
— |
— |
true |
skipExisting |
— |
java.lang.Boolean |
— |
true |
— |
true |
snippetASName |
— |
java.lang.String |
— |
— |
— |
true |
snippetAnnotationType |
— |
java.lang.String |
— |
Sentence |
— |
true |
tokenASName |
— |
java.lang.String |
— |
— |
— |
true |
tokenAnnotationType |
— |
java.lang.String |
— |
Token |
— |
true |
Developers/Debugging (9)
DependencyDumper
Category: Developers/Debugging
Framework: DKPro Core (UIMA)
Version: 1.8.0
Dump dependencies to screen.
DocumentMetaDataStripper
Category: Developers/Debugging
Framework: DKPro Core (UIMA)
Version: 1.8.0
Removes fields from the document meta data which may be different depending on the machine a test is run on.
EDT Monitor
Category: Developers/Debugging
Framework: GATE
Version: unknown
Warns whenever an AWT component is updated from anywhere other than the event dispatch thread
JCasHolder
Category: Developers/Debugging
Framework: DKPro Core (UIMA)
Version: 1.8.0
Utility analysis engine for use with CAS multipliers in uimaFIT pipelines.
Java Heap Dumper
Category: Developers/Debugging
Framework: GATE
Version: unknown
Dumps the Java heap to the specified file
Log4J Level: ALL
Category: Developers/Debugging
Framework: GATE
Version: unknown
Allows the Log4J log level to be set to ALL from within the GUI
Stopwatch
Category: Developers/Debugging
Framework: DKPro Core (UIMA)
Version: 1.8.0
Can be used to measure how long the processing between two points in a pipeline takes. For that purpose, the AE needs to be added two times, before and after the part of the pipeline that should be measured.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
timerName |
Name of the timer pair. Upstream and downstream timer need to use the same name. |
String |
True |
— |
false |
— |
timerOutputFile |
Name of the timer pair. Upstream and downstream timer need to use the same name. |
String |
False |
— |
false |
— |
TagsetDescriptionStripper
Category: Developers/Debugging
Framework: DKPro Core (UIMA)
Version: 1.8.0
Copyright 2012 Ubiquitous Knowledge Processing (UKP) Lab Technische Universität Darmstadt Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Evaluation (2)
CompareElements
Category: Evaluation
Framework: AlvisNLP
Version: 2012-04-30
Compares two sets of elements.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
face |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
outFile |
— |
org.bibliome.util.streams.TargetStream |
True |
— |
— |
— |
predicted |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
reference |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sections |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
showFullMatches |
— |
java.lang.Boolean |
True |
— |
— |
— |
showPrecision |
— |
java.lang.Boolean |
True |
— |
— |
— |
showRecall |
— |
java.lang.Boolean |
True |
— |
— |
— |
similarity |
— |
org.bibliome.alvisnlp.modules.compare.ElementSimilarity |
True |
— |
— |
— |
IAA Computation PR
Category: Evaluation
Framework: GATE
Version: unknown
Compute inter-annotator agreement (IAA).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annSetsForIaa |
— |
java.lang.String |
— |
— |
— |
true |
annTypesAndFeats |
— |
java.lang.String |
— |
— |
— |
true |
bdmScoreFile |
— |
java.net.URL |
— |
— |
— |
true |
measureType |
— |
gate.iaaplugin.MeasureType |
— |
FMEASURE |
— |
true |
verbosity |
— |
java.lang.String |
— |
1 |
— |
true |
Filtering (6)
AnnotationByLengthFilter
Category: Filtering
Framework: DKPro Core (UIMA)
Version: 1.8.0
Removes annotations that do not conform to minimum or maximum length constraints. (This was previously called TokenFilter).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
FilterTypes |
A set of annotation types that should be filtered. |
String |
True |
— |
true |
— |
MaxLengthFilter |
Any annotation in filterAnnotations shorter than this value will be removed. |
Integer |
True |
— |
false |
— |
MinLengthFilter |
Any annotation in filterTypes shorter than this value will be removed. |
Integer |
True |
— |
false |
— |
AnnotationByTextFilter
Category: Filtering
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads a list of words from a text file (one token per line) and retains only tokens or other annotations that match any of these words.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ignoreCase |
If true, annotation texts are filtered case-independently. Default: true, i.e. words that occur in the list with different casing are not filtered out. |
Boolean |
True |
— |
false |
— |
modelEncoding |
— |
String |
True |
— |
false |
— |
modelLocation |
— |
String |
True |
— |
false |
— |
typeName |
Annotation type to filter. Default: de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token. |
String |
True |
— |
false |
— |
Boilerpipe Content Detection
Category: Filtering
Framework: GATE
Version: unknown
Uses boilerpipe to determine which sections of a document are interesting content and which are just boilerplate
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
allContent |
— |
gate.creole.boilerpipe.Behaviour |
— |
NOT_LISTED |
— |
true |
annotateBoilerplate |
— |
java.lang.Boolean |
— |
false |
— |
true |
annotateContent |
— |
java.lang.Boolean |
— |
true |
— |
true |
boilerplateAnnotationName |
— |
java.lang.String |
— |
Boilerplate |
— |
true |
contentAnnotationName |
— |
java.lang.String |
— |
Content |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
debug |
— |
java.lang.Boolean |
— |
false |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
extractor |
— |
gate.creole.boilerpipe.Extractor |
— |
DEFAULT |
— |
true |
failOnMissingInputAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
mimeTypes |
— |
java.util.Set |
— |
text/html |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
useHintsFromOriginalMarkups |
— |
java.lang.Boolean |
— |
true |
— |
true |
PosFilter
Category: Filtering
Framework: DKPro Core (UIMA)
Version: 1.8.0
Removes all tokens/lemmas/stems/POS tags (depending on the "Mode" setting) that do not match the given parts of speech.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
Verbs |
Keep/remove verbs (true: keep, false: v) |
Boolean |
True |
— |
false |
— |
adj |
Keep/remove adjectives (true: keep, false: remove) |
Boolean |
True |
— |
false |
— |
adv |
Keep/remove adverbs (true: keep, false: remove) |
Boolean |
True |
— |
false |
— |
art |
Keep/remove articles (true: keep, false: remove) |
Boolean |
True |
— |
false |
— |
card |
Keep/remove cardinal numbers (true: keep, false: remove) |
Boolean |
True |
— |
false |
— |
conj |
Keep/remove conjunctions (true: keep, false: remove) |
Boolean |
True |
— |
false |
— |
n |
Keep/remove nouns (true: keep, false: remove) |
Boolean |
True |
— |
false |
— |
o |
Keep/remove "others" (true: keep, false: remove) |
Boolean |
True |
— |
false |
— |
pp |
Keep/remove prepositions (true: keep, false: remove) |
Boolean |
True |
— |
false |
— |
pr |
Keep/remove pronouns (true: keep, false: remove) |
Boolean |
True |
— |
false |
— |
punc |
Keep/remove punctuation (true: keep, false: remove) |
Boolean |
True |
— |
false |
— |
typeToRemove |
The fully qualified name of the type that should be filtered. |
String |
True |
— |
false |
— |
RegexTokenFilter
Category: Filtering
Framework: DKPro Core (UIMA)
Version: 1.8.0
Remove every token that does or does not match a given regular expression.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
mustMatch |
If this parameter is set to true (default), retain only tokens that match the regex given in #PARAM_REGEX. If set to false, all tokens that match the given regex are removed. |
Boolean |
True |
— |
false |
— |
regex |
Every token that does or does not match this regular expression will be removed. |
String |
True |
— |
false |
— |
StopWordRemover
Category: Filtering
Framework: DKPro Core (UIMA)
Version: 1.8.0
Remove all of the specified types from the CAS if their covered text is in the stop word dictionary. Also remove any other of the specified types that is covered by a matching instance.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
Paths |
Feature paths for annotations that should be matched/removed. The default is <pre> StopWord.class.getName() Token.class.getName() Lemma.class.getName()+"/value" </pre> |
String |
False |
— |
true |
— |
StopWordType |
Anything annotated with this type will be removed even if it does not match any word in the lists. |
String |
False |
— |
false |
— |
modelEncoding |
The character encoding used by the model. |
String |
True |
— |
false |
— |
modelLocation |
A list of URLs from which to load the stop word lists. If an URL is prefixed with a language code in square brackets, the stop word list is only used for documents in that language. Using no prefix or the prefix "[*]" causes the list to be used for every document. Example: "[de]classpath:/stopwords/en_articles.txt" |
String |
True |
— |
true |
— |
Flow (8)
Annotation Merging PR
Category: Flow
Framework: GATE
Version: unknown
Merge Annotations from different annotators.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annSetOutput |
— |
java.lang.String |
— |
— |
— |
true |
annSetsForMerging |
— |
java.lang.String |
— |
— |
— |
true |
annTypesAndFeats |
— |
java.lang.String |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
keepSourceForMergedAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
mergingMethod |
— |
gate.merger.MergingMethodsEnum |
— |
MajorityVoting |
— |
true |
minimalAnnNum |
— |
java.lang.String |
— |
1 |
— |
true |
Annotation Set Transfer
Category: Flow
Framework: GATE
Version: unknown
Annotation set transfer component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationTypes |
— |
java.util.ArrayList |
— |
— |
— |
true |
copyAnnotations |
— |
java.lang.Boolean |
— |
false |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
tagASName |
— |
java.lang.String |
— |
— |
— |
true |
textTagName |
— |
java.lang.String |
— |
— |
— |
true |
transferAllUnlessFound |
— |
java.lang.Boolean |
— |
true |
— |
true |
Combine Members PR
Category: Flow
Framework: GATE
Version: unknown
Combines documents in a composite document.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
combiningMethod |
— |
java.lang.String |
— |
gate.composite.impl.DefaultCombiningMethod |
— |
false |
document |
— |
gate.Document |
— |
— |
— |
true |
parameters |
— |
java.lang.String |
— |
unitAnnotationType=Sentence;inputASName=;copyUnderlyingAnnotations=true; |
— |
true |
Delete Member PR
Category: Flow
Framework: GATE
Version: unknown
Deletes one member document from a compound doc.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
document |
— |
gate.Document |
— |
— |
— |
true |
documentID |
— |
java.lang.String |
— |
— |
— |
true |
Document Reset PR
Category: Flow
Framework: GATE
Version: unknown
Remove named annotation sets or reset the default annotation set
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationTypes |
— |
java.util.List |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
keepOriginalMarkupsAS |
— |
java.lang.Boolean |
— |
true |
— |
true |
setsToKeep |
— |
java.util.List |
— |
Key |
— |
true |
setsToRemove |
— |
java.util.List |
— |
— |
— |
true |
Scriptable Controller
Category: Flow
Framework: GATE
Version: unknown
A controller whose execution strategy is controlled by a Groovy script
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
Segment Processing PR
Category: Flow
Framework: GATE
Version: unknown
Processes individual segments as separate documents
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
analyser |
— |
gate.LanguageAnalyser |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
segmentAnnotationFeatureName |
— |
java.lang.String |
— |
— |
— |
true |
segmentAnnotationFeatureValue |
— |
java.lang.String |
— |
— |
— |
true |
segmentAnnotationType |
— |
java.lang.String |
— |
Section |
— |
true |
Gazetteer (16)
ANNIE Gazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
A list lookup component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
gazetteerFeatureSeparator |
— |
java.lang.String |
— |
: |
— |
— |
listsURL |
— |
java.net.URL |
— |
resources/gazetteer/lists.def |
— |
— |
longestMatchOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
wholeWordsOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
Arabic Gazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
A list lookup component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
gazetteerFeatureSeparator |
— |
java.lang.String |
— |
: |
— |
— |
listsURL |
— |
java.net.URL |
— |
resources/gazetteer/lists.def |
— |
— |
longestMatchOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
wholeWordsOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
Arabic Infered Gazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
A list lookup component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
gazetteerFeatureSeparator |
— |
java.lang.String |
— |
: |
— |
— |
listsURL |
— |
java.net.URL |
— |
resources/inferred-gazetteer/lists.def |
— |
— |
longestMatchOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
wholeWordsOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
Cebuano Gazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
A list lookup component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
gazetteerFeatureSeparator |
— |
java.lang.String |
— |
: |
— |
— |
listsURL |
— |
java.net.URL |
— |
resources/gazetteer/cebuano/lists.def |
— |
— |
longestMatchOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
wholeWordsOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
DictionaryAnnotator
Category: Gazetteer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Takes a plain text file with phrases as input and annotates the phrases in the CAS file. The annotation type defaults to NGram, but can be changed. The component requires that Tokens and Sentencees are annotated in the CAS. The format of the phrase file is one phrase per line, tokens are separated by space:
this is a phrase another phrase
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationType |
The annotation to create on matching phases. If nothing is specified, this defaults to NGram. |
String |
False |
— |
false |
— |
modelEncoding |
The character encoding used by the model. |
String |
True |
— |
false |
— |
modelLocation |
The file must contain one phrase per line - phrases will be split at " " |
String |
True |
— |
false |
— |
value |
The value to set the feature configured in #PARAM_VALUE_FEATURE to. |
String |
False |
— |
false |
— |
valueFeature |
Set this feature on the created annotations. |
String |
False |
— |
false |
— |
Flexible Gazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
A more flexible list lookup component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
document |
— |
gate.Document |
— |
— |
— |
true |
gazetteerInst |
— |
gate.creole.gazetteer.Gazetteer |
— |
— |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
inputFeatureNames |
— |
java.util.List |
— |
— |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
Hash Gazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
A list lookup component implemented by OntoText Lab. The licence information is also available in licence.ontotext.html in the lib folder of GATE
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
listsURL |
— |
java.net.URL |
— |
resources/gazetteer/lists.def |
— |
— |
Hindi Gazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
A list lookup component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
document |
— |
gate.corpora.DocumentImpl |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
listsURL |
— |
java.net.URL |
— |
resources/gazetteer/lists.def |
— |
— |
wholeWordsOnly |
— |
java.lang.Boolean |
— |
true |
— |
— |
Hindi Tokeniser Gazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
A list lookup component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
document |
— |
gate.corpora.DocumentImpl |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
listsURL |
— |
java.net.URL |
— |
resources/tokeniser/lists.def |
— |
— |
wholeWordsOnly |
— |
java.lang.Boolean |
— |
true |
— |
— |
Inflectional gazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
Gazetteer with support for inflectional morphology
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
config |
— |
java.net.URL |
— |
resources/inflection_gaz/main.conf |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
importOnlyTheseTypes |
— |
java.util.List |
— |
person_first;person_full;surname |
— |
— |
Large KB Gazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
KIM KB based alias-lookup commponent
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationLimit |
— |
java.lang.Integer |
— |
— |
— |
true |
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
dictionaryPath |
— |
java.net.URL |
— |
dictionary |
— |
false |
document |
— |
gate.Document |
— |
— |
— |
true |
forceCaseSensitive |
— |
java.lang.Boolean |
— |
— |
— |
false |
Onto Root Gazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
An ontology lookup component
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
considerHeuristicRules |
— |
java.lang.Boolean |
— |
false |
— |
— |
considerProperties |
— |
java.lang.Boolean |
— |
true |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
longestMatchOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
ontology |
— |
gate.creole.ontology.Ontology |
— |
— |
— |
— |
propertiesToExclude |
— |
java.lang.String |
— |
— |
— |
— |
propertiesToInclude |
— |
java.lang.String |
— |
— |
— |
— |
rootFinderApplication |
— |
gate.CorpusController |
— |
— |
— |
— |
separateCamelCasedWords |
— |
java.lang.Boolean |
— |
true |
— |
— |
typesToConsider |
— |
java.util.Set |
— |
class;instance;property |
— |
true |
useResourceUri |
— |
java.lang.Boolean |
— |
true |
— |
— |
wholeWordsOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
OntoGazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
A list lookup component based on mapping between ontology classes and gazetteer lists.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
gazetteerName |
— |
java.lang.String |
— |
com.ontotext.gate.gazetteer.HashGazetteer |
— |
— |
listsURL |
— |
java.net.URL |
— |
../ANNIE/resources/gazetteer/lists.def |
— |
— |
mappingURL |
— |
java.net.URL |
— |
../ANNIE/resources/gazetteer/mapping.def |
— |
— |
Romanian Gazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
A list lookup component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
document |
— |
gate.corpora.DocumentImpl |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
listsURL |
— |
java.net.URL |
— |
resources/Gazeteer/list.lst |
— |
— |
wholeWordsOnly |
— |
java.lang.Boolean |
— |
true |
— |
— |
Russian Gazetteer
Category: Gazetteer
Framework: GATE
Version: unknown
Customised version of the hash gazetteer
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
listsURL |
— |
java.net.URL |
— |
resources/gazetteer/lists.def |
— |
— |
Sharable Gazettee
Category: Gazetteer
Framework: GATE
Version: unknown
A list lookup component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
bootstrapGazetteer |
— |
gate.creole.gazetteer.DefaultGazetteer |
— |
— |
— |
— |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
gazetteerFeatureSeparator |
— |
java.lang.String |
— |
: |
— |
— |
listsURL |
— |
java.net.URL |
— |
resources/gazetteer/lists.def |
— |
— |
longestMatchOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
wholeWordsOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
Irrelevant (1)
Keywords/Terms (3)
KEA Keyphrase Extractor
Category: Keywords/Terms
Framework: GATE
Version: unknown
A Keyphrase Extractor by Eibe Frank.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
disallowInternalPeriods |
— |
java.lang.Boolean |
— |
true |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputAS |
— |
java.lang.String |
— |
— |
— |
true |
keyphraseAnnotationType |
— |
java.lang.String |
— |
Keyphrase |
— |
true |
maxPhraseLength |
— |
java.lang.Integer |
— |
3 |
— |
true |
minNumOccur |
— |
java.lang.Integer |
— |
2 |
— |
true |
minPhraseLength |
— |
java.lang.Integer |
— |
1 |
— |
true |
outputAS |
— |
java.lang.String |
— |
— |
— |
true |
phrasesToExtract |
— |
java.lang.Integer |
— |
5 |
— |
true |
trainingMode |
— |
java.lang.Boolean |
— |
true |
— |
true |
useKFrequency |
— |
java.lang.Boolean |
— |
true |
— |
true |
KeywordsSelector
Category: Keywords/Terms
Framework: AlvisNLP
Version:
Selects most relevant keywords in documents.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
charset |
— |
java.lang.String |
True |
— |
— |
— |
documentId |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
documents |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
keywordCount |
— |
java.lang.Integer |
True |
— |
— |
— |
keywordFeature |
— |
java.lang.String |
False |
— |
— |
— |
keywordForm |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
keywords |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
outFile |
— |
org.bibliome.util.streams.TargetStream |
False |
— |
— |
— |
scoreFeature |
— |
java.lang.String |
False |
— |
— |
— |
scoreFunction |
— |
org.bibliome.alvisnlp.modules.keyword.KeywordScoreFunction |
True |
— |
— |
— |
scoreThreshold |
— |
java.lang.Double |
True |
— |
— |
— |
separator |
— |
java.lang.Character |
True |
— |
— |
— |
YateaExtractor
Category: Keywords/Terms
Framework: AlvisNLP
Version: 2010-10-28
Extract terms from the corpus using the YaTeA term extractor.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
bioYatea |
— |
java.lang.Boolean |
False |
— |
— |
— |
configDir |
— |
org.bibliome.util.files.InputDirectory |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
documentTokens |
— |
java.lang.Boolean |
True |
— |
— |
— |
formFeature |
— |
java.lang.String |
True |
— |
— |
— |
language |
— |
java.lang.String |
False |
— |
— |
— |
lemmaFeature |
— |
java.lang.String |
True |
— |
— |
— |
localeDir |
— |
org.bibliome.util.files.InputDirectory |
False |
— |
— |
— |
outputDir |
— |
org.bibliome.util.files.OutputDirectory |
False |
— |
— |
— |
perlLib |
— |
java.lang.String |
False |
— |
— |
— |
posFeature |
— |
java.lang.String |
True |
— |
— |
— |
postProcessingConfig |
— |
org.bibliome.util.files.InputFile |
False |
— |
— |
— |
postProcessingOutput |
— |
org.bibliome.util.files.OutputFile |
False |
— |
— |
— |
rcFile |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
True |
— |
— |
— |
suffix |
— |
java.lang.String |
False |
— |
— |
— |
testifiedTerminology |
— |
org.bibliome.alvisnlp.modules.yatea.TestifiedTerminology |
False |
— |
— |
— |
wordLayerName |
— |
java.lang.String |
True |
— |
— |
— |
workingDir |
— |
org.bibliome.util.files.WorkingDirectory |
True |
— |
— |
— |
yateaDefaultConfig |
— |
alvisnlp.module.types.Mapping |
True |
— |
— |
— |
yateaExecutable |
— |
org.bibliome.util.files.ExecutableFile |
True |
— |
— |
— |
yateaOptions |
— |
alvisnlp.module.types.Mapping |
True |
— |
— |
— |
Language Identifier (7)
LangDetectLanguageIdentifier
Category: Language Identifier
Framework: DKPro Core (UIMA)
Version: 1.8.0
Langdetect language identifier based on character n-grams.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
modelLocation |
Location from which the model is read. |
String |
False |
— |
false |
— |
modelVariant |
Variant of a model the model. Used to address a specific model if here are multiple models for one language. |
String |
False |
— |
false |
— |
LanguageDetectorWeb1T
Category: Language Identifier
Framework: DKPro Core (UIMA)
Version: 1.8.0
Language detector based on n-gram frequency counts, e.g. as provided by Web1T
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
maxNGramSize |
The maximum n-gram size that should be considered. Default is 3. |
Integer |
True |
— |
false |
— |
minNGramSize |
The minimum n-gram size that should be considered. Default is 1. |
Integer |
True |
— |
false |
— |
LanguageIdentifier
Category: Language Identifier
Framework: DKPro Core (UIMA)
Version: 1.8.0
Detection based on character n-grams. Uses the Java Text Categorizing Library based on a technique by Cavnar and Trenkle.
References:
- Cavnar, W. B. and J. M. Trenkle (1994). N-Gram-Based Text Categorization. In Proceedings of Third Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, UNLV Publications/Reprographics, pp. 161-175, 11-13 April 1994.
LingPipe Language Identifier PR
Category: Language Identifier
Framework: GATE
Version: unknown
GATE PR for language identification using LingPipe
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
annotationType |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
languageIdFeatureName |
— |
java.lang.String |
— |
lang |
— |
true |
modelFileUrl |
— |
java.net.URL |
— |
resources/models/langid-leipzig.classifier |
— |
— |
TextCat Fingerprint Generator
Category: Language Identifier
Framework: GATE
Version: unknown
Generate language fingerprints for use with the TextCat Language Indentification PR
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
annotationType |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
fingerprintURL |
— |
java.net.URL |
— |
— |
— |
true |
TextCat Language Identification
Category: Language Identifier
Framework: GATE
Version: unknown
Recognizes the document language using TextCat
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
annotationType |
— |
java.lang.String |
— |
— |
— |
true |
configURL |
— |
java.net.URL |
— |
resources/default-names.conf |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
languageFeatureName |
— |
java.lang.String |
— |
lang |
— |
true |
Textalytics Language Identification
Category: Language Identifier
Framework: GATE
Version: unknown
Textalytics Language Identification
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
apiURL |
— |
java.lang.String |
— |
— |
true |
|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
debug |
— |
java.lang.Boolean |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
inputASTypes |
— |
java.util.List |
— |
— |
— |
true |
key |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
Textalytics |
— |
true |
Lemmatizer (7)
ClearNlpLemmatizer
Category: Lemmatizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Lemmatizer using Clear NLP.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
GateLemmatizer
Category: Lemmatizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Wrapper for the GATE rule based lemmatizer. Based on code by Asher Stern from the BIUTEE textual entailment tool.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
ILSP Lemmatizer
Category: Lemmatizer
Framework: ILSP (UIMA)
Version: 1.1
ILSP Lemmatizer consults a assigns lemmas to tokens from Greek texts. ILSP Lemmatizer consults the ILSP Morphological Lexicon to assign lemmas to tokens. The AE uses POS tags (if they exist in the input) to select between lemmas when the ILSP ML returns more that one results for one token.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
LexicaDir |
The directory containing the Berkeley DB lexical resources. Default is /opt/ilsp-nlp/lexica/fbt. |
String |
False |
— |
false |
— |
LanguageToolLemmatizer
Category: Lemmatizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Naive lexicon-based lemmatizer. The words are looked up using the wordform lexicons of LanguageTool. Multiple readings are produced. The annotator simply takes the most frequent lemma from those readings. If no readings could be found, the original text is assigned as lemma.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
sanitize |
— |
Boolean |
True |
— |
false |
— |
sanitizeChars |
— |
String |
True |
— |
true |
— |
MateLemmatizer
Category: Lemmatizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
DKPro Annotator for the MateToolsLemmatizer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
uppercase |
Try reconstructing proper casing for lemmata. This is useful for German, but e.g. for English creates odd results. |
Boolean |
True |
— |
false |
— |
variant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
MorphaLemmatizer
Category: Lemmatizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Lemmatize based on a finite-state machine. Uses the Java port of Morpha.
References:
- Minnen, G., J. Carroll and D. Pearce (2001). Applied morphological processing of English, Natural Language Engineering, 7(3). 207-223.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
readPOS |
Pass part-of-speech information on to Morpha. Since we currently do not know in which format the part-of-speech tags are expected by Morpha, we just pass on the actual pos tag value we get from the token. This may produce worse results than not passing on pos tags at all, so this is disabled by default. |
Boolean |
True |
— |
false |
— |
StanfordLemmatizer
Category: Lemmatizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Stanford Lemmatizer component. The Stanford Morphology-class computes the base form of English words, by removing just inflections (not derivational morphology). That is, it only does noun plurals, pronoun case, and verb endings, and not things like comparative adjectives or derived nominals. It is based on a finite-state transducer implemented by John Carroll et al., written in flex and publicly available. See: http://www.informatics.susx.ac.uk/research/nlp/carroll/morph.html
This only works for ENGLISH.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ptb3Escaping |
Enable all traditional PTB3 token transforms (like -LRB-, -RRB-). |
Boolean |
True |
— |
false |
— |
quoteBegin |
List of extra token texts (usually single character strings) that should be treated like opening quotes and escaped accordingly before being sent to the parser. |
String |
False |
— |
true |
— |
quoteEnd |
List of extra token texts (usually single character strings) that should be treated like closing quotes and escaped accordingly before being sent to the parser. |
String |
False |
— |
true |
— |
Machine Learning (2)
Batch Learning PR
Category: Machine Learning
Framework: GATE
Version: unknown
Supports training, application and evaluation of machine learning models for NLP tasks
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
configFileURL |
— |
java.net.URL |
— |
— |
— |
false |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
learningMode |
— |
gate.learning.RunMode |
— |
TRAINING |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
runProtocolDir |
— |
java.net.URL |
— |
— |
— |
true |
Machine Learning PR
Category: Machine Learning
Framework: GATE
Version: unknown
Trains a machine learning algorithm from a corpus. For new code, consider using the "learning" plugin instead.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
configFileURL |
— |
java.net.URL |
— |
— |
— |
— |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
training |
— |
java.lang.Boolean |
— |
true |
— |
true |
MorphTagger (3)
GATE Morphological analyser
Category: MorphTagger
Framework: GATE
Version: unknown
Morphological Analyzer for the English Language.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
affixFeatureName |
— |
java.lang.String |
— |
affix |
— |
true |
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
false |
— |
false |
considerPOSTag |
— |
java.lang.Boolean |
— |
true |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
failOnMissingInputAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
rootFeatureName |
— |
java.lang.String |
— |
root |
— |
true |
rulesFile |
— |
java.net.URL |
— |
resources/morph/default.rul |
— |
false |
RASP2 Morphological Analyser
Category: MorphTagger
Framework: GATE
Version: unknown
RASP morphological analyser, which adds lemma and suffix to the WordForm annotations produced by the RASP POS tagger (or the ANNIE POS tagger plus the RASP converter)
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
charset |
— |
java.lang.String |
— |
ISO-8859-1 |
— |
true |
debug |
— |
java.lang.Boolean |
— |
false |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
raspHome |
— |
java.net.URL |
— |
file:/usr/local/bin/RASP |
— |
false |
SfstAnnotator
Category: MorphTagger
Framework: DKPro Core (UIMA)
Version: 1.8.0
Sfst morphological analyzer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
MorphMappingLocation |
— |
String |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
mode |
— |
String |
True |
— |
false |
— |
modelEncoding |
Specifies the model encoding. |
String |
True |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
printTagSet |
Write the tag set(s) to the log when a model is loaded. |
Boolean |
True |
— |
false |
— |
writeLemma |
Write lemma information. Default: true |
Boolean |
True |
— |
false |
— |
writePOS |
Write part-of-speech information. Default: true |
Boolean |
True |
— |
false |
— |
Named Entity Recognizer (11)
ABNER
Category: Named Entity Recognizer
Framework: NaCTeM (UIMA)
Version: 1.0
Wraps the ABNER entity identification system into the UIMA framework. ABNER was developed by Burr Settles and is available here: http://pages.cs.wisc.edu/~bsettles/abner/
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
mode |
0=NLPBA Corpus, 1=BioCreative Corpus, 2=Custom |
String |
False |
— |
false |
— |
model |
Custom model file (if mode == 2) |
String |
False |
— |
false |
— |
types |
Custom type mapping; each string is <entity>=<class> |
String |
False |
— |
true |
— |
CRF++ Trainer
Category: Named Entity Recognizer
Framework: NaCTeM (UIMA)
Version: 1.0
Produces a Conditional Random Fields model. Based on CRF++, an implementation of CRF for labeling sequential data (http://crfpp.googlecode.com/svn/trunk/doc/index.html).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
FeatureFrequencyThreshold |
CRF++ uses features that occur no less than this value. Default value is 1. |
Integer |
False |
— |
false |
— |
LabelAnnotationTypes |
Fully qualified names of annotation types which will serve as labels during the training of the CRF. |
String |
True |
— |
true |
— |
ModelFileName |
Specifies the filename to store the model in. |
String |
True |
— |
false |
— |
OverfittingBalance |
Default value is 1 |
Float |
False |
— |
false |
— |
RegularizationAlgorithm |
Default value is CRF-L2. You can use CRF-L1. |
String |
False |
— |
false |
— |
ILSP NERC
Category: Named Entity Recognizer
Framework: ILSP (UIMA)
Version: 1.2
This module uses a Maximum Entropy NER engine focusing on EL or EN textual newsy data.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
DatabaseDriverClass |
The JDBC driver to use to connect to this DB. Default is org.postgresql.Driver |
String |
False |
— |
false |
— |
DatabaseHost |
The host where the server resides. Default is localhost. |
String |
False |
— |
false |
— |
DatabaseName |
The name of the database. |
String |
False |
— |
false |
— |
DatabasePass |
Use this password for read-only access to the database |
String |
False |
— |
false |
— |
DatabasePort |
The port the server listens to. Default is 5432. |
Integer |
False |
— |
false |
— |
DatabaseServer |
The type of server the AE connects to. Default is postgresql. |
String |
False |
— |
false |
— |
DatabaseUser |
Use this user name for read-only access to the database |
String |
False |
— |
false |
— |
Language |
ISO language code for text language |
String |
False |
— |
false |
— |
ModelDir |
— |
String |
False |
— |
false |
— |
NercEngine |
The NercEngine to be used. The default value is "mener". |
String |
False |
— |
false |
— |
LingPipe NER PR
Category: Named Entity Recognizer
Framework: GATE
Version: unknown
LingPipe Named Entity Recognizer
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
document |
— |
gate.Document |
— |
— |
— |
true |
modelFileUrl |
— |
java.net.URL |
— |
— |
— |
false |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
OpenNLP NER
Category: Named Entity Recognizer
Framework: GATE
Version: unknown
NER PR using a set of OpenNLP maxent models
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
config |
— |
java.net.URL |
— |
models/english/en-ner.conf |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
OpenNlpNamedEntityRecognizer
Category: Named Entity Recognizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
OpenNLP name finder wrapper.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
NamedEntityMappingLocation |
Location of the mapping file for named entity tags to UIMA types. |
String |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Location from which the model is read. |
String |
False |
— |
false |
— |
modelVariant |
Variant of a model the model. Used to address a specific model if here are multiple models for one language. |
String |
True |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. |
Boolean |
True |
— |
false |
— |
SVMLight Trainer
Category: Named Entity Recognizer
Framework: NaCTeM (UIMA)
Version: 1.0
Produces an SVMLight model based on user-specified learning parameters.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ModelFile |
A file where the model will be written to. |
String |
True |
— |
false |
— |
NormFile |
A file where the value of the norm for normalising continuous-valued features will be written to. |
String |
True |
— |
false |
— |
ParameterString |
A string with the desired learning parameters. |
String |
False |
— |
false |
— |
Stanford NER
Category: Named Entity Recognizer
Framework: GATE
Version: unknown
Stanford Named Entity Recogniser
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
baseSentenceAnnotationType |
— |
java.lang.String |
— |
Sentence |
— |
true |
baseTokenAnnotationType |
— |
java.lang.String |
— |
Token |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
failOnMissingInputAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
modelFile |
— |
java.net.URL |
— |
resources/english.all.3class.distsim.crf.ser.gz |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
outsideLabel |
— |
java.lang.String |
— |
O |
— |
true |
StanfordNER
Category: Named Entity Recognizer
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
classifierFile |
— |
org.bibliome.util.files.InputFile |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
formFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
labelFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
searchInContents |
— |
java.lang.Boolean |
False |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
wordLayerName |
— |
java.lang.String |
True |
— |
— |
— |
StanfordNamedEntityRecognizer
Category: Named Entity Recognizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Stanford Named Entity Recognizer component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
NamedEntityMappingLocation |
Location of the mapping file for named entity tags to UIMA types. |
String |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Location from which the model is read. |
String |
False |
— |
false |
— |
modelVariant |
Variant of a model the model. Used to address a specific model if here are multiple models for one language. |
String |
False |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. |
Boolean |
True |
— |
false |
— |
ptb3Escaping |
Enable all traditional PTB3 token transforms (like -LRB-, -RRB-). |
Boolean |
True |
— |
false |
— |
quoteBegin |
List of extra token texts (usually single character strings) that should be treated like opening quotes and escaped accordingly before being sent to the parser. |
String |
False |
— |
true |
— |
quoteEnd |
List of extra token texts (usually single character strings) that should be treated like closing quotes and escaped accordingly before being sent to the parser. |
String |
False |
— |
true |
— |
Yeast Metabliner
Category: Named Entity Recognizer
Framework: NaCTeM (UIMA)
Version: 1.0
This service is to annotate yeast metabolites with a supervised NER system using CRF. It receives an input string and a user id and returns a list of recognised yeast metabolites with offset information and score from CRF. The dictionary used in this system is based on a consensus reconstruction of yeast metabolism (http://www.comp-sys-bio.org/yeastnet/).
Normalizer (19)
ApplyChangesAnnotator
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Applies changes annotated using a SofaChangeAnnotation.
Backmapper
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
After processing a file with the ApplyChangesAnnotator this annotator can be used to map the annotations created in the cleaned view back to the original view.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
Chain |
Chain of views for backmapping. This should be the reverse of the chain of views that the ApplyChangesAnnotator has used. For example, if view A has been mapped to B using ApplyChangesAnnotator, then this parameter should be set using an array containing [B, A]. |
String |
False |
— |
true |
— |
CapitalizationNormalizer
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Takes a text and replaces wrong capitalization
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
typesToCopy |
A list of fully qualified type names that should be copied to the transformed CAS where available. By default, no types are copied apart from DocumentMetaData, i.e. all other annotations are omitted. |
String |
True |
— |
true |
— |
CjfNormalizer
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Converts traditional Chinese to simplified Chinese or vice-versa.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
direction |
— |
String |
True |
— |
false |
— |
typesToCopy |
A list of fully qualified type names that should be copied to the transformed CAS where available. By default, no types are copied apart from DocumentMetaData, i.e. all other annotations are omitted. |
String |
True |
— |
true |
— |
Date Annotation Normalizer
Category: Normalizer
Framework: GATE
Version: unknown
provides normalized values for all existing date annotations
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationFeature |
— |
java.lang.String |
— |
string |
— |
true |
annotationName |
— |
java.lang.String |
— |
Date |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
dateFormat |
— |
java.lang.String |
— |
dd/MM/yyyy |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
failOnMissingInputAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
locale |
— |
java.lang.String |
— |
— |
— |
— |
normalizedDocumentFeature |
— |
java.lang.String |
— |
normalized-date |
— |
true |
numericOutput |
— |
java.lang.Boolean |
— |
false |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
sourceOfDocumentDate |
— |
java.util.List |
— |
— |
— |
true |
wholeMatchOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
Date Normalizer
Category: Normalizer
Framework: GATE
Version: unknown
provides normalized values for all known dates
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationName |
— |
java.lang.String |
— |
Date |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
dateFormat |
— |
java.lang.String |
— |
dd/MM/yyyy |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
failOnMissingInputAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
locale |
— |
java.lang.String |
— |
— |
— |
— |
normalizedDocumentFeature |
— |
java.lang.String |
— |
normalized-date |
— |
true |
numericOutput |
— |
java.lang.Boolean |
— |
false |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
sourceOfDocumentDate |
— |
java.util.List |
— |
— |
— |
true |
DictionaryBasedTokenTransformer
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads a tab-separated file containing mappings from one token to another. All tokens that match an entry in the first column are changed to the corresponding token in the second column.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
commentMarker |
Lines starting with this character (or String) are ignored. Default: '#' |
String |
True |
— |
false |
— |
modelEncoding |
— |
String |
True |
— |
false |
— |
modelLocation |
— |
String |
True |
— |
false |
— |
separator |
Separator for mappings file. Default: "\t" (TAB). |
String |
True |
— |
false |
— |
typesToCopy |
A list of fully qualified type names that should be copied to the transformed CAS where available. By default, no types are copied apart from DocumentMetaData, i.e. all other annotations are omitted. |
String |
True |
— |
true |
— |
Document normalizer
Category: Normalizer
Framework: GATE
Version: unknown
Normalize document content to remove "smart quotes" etc.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
replacementsURL |
— |
java.net.URL |
— |
resources/replacements.lst |
— |
— |
ExpressiveLengtheningNormalizer
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Takes a text and shortens extra long words
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
typesToCopy |
A list of fully qualified type names that should be copied to the transformed CAS where available. By default, no types are copied apart from DocumentMetaData, i.e. all other annotations are omitted. |
String |
True |
— |
true |
— |
FileBasedTokenTransformer
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Replaces all tokens that are listed in the file in #PARAM_MODEL_LOCATION by the string specified in #PARAM_REPLACEMENT.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ignoreCase |
— |
Boolean |
True |
— |
false |
— |
modelLocation |
— |
String |
True |
— |
false |
— |
replacement |
— |
String |
True |
— |
false |
— |
typesToCopy |
A list of fully qualified type names that should be copied to the transformed CAS where available. By default, no types are copied apart from DocumentMetaData, i.e. all other annotations are omitted. |
String |
True |
— |
true |
— |
HyphenationRemover
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Simple dictionary-based hyphenation remover.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
modelEncoding |
— |
String |
True |
— |
false |
— |
modelLocation |
— |
String |
True |
— |
false |
— |
typesToCopy |
A list of fully qualified type names that should be copied to the transformed CAS where available. By default, no types are copied apart from DocumentMetaData, i.e. all other annotations are omitted. |
String |
True |
— |
true |
— |
RegexBasedTokenTransformer
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
A JCasTransformerChangeBased_ImplBase implementation that replaces tokens based on a regular expressions.
The parameters #PARAM_REGEX defines the regular expression to be searcher, #PARAM_REPLACEMENT defines the string with which matching patterns are replaces.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
regex |
Define the regular expression to be replaced |
String |
True |
— |
false |
— |
replacement |
Define the string to replace matching tokens with |
String |
True |
— |
false |
— |
typesToCopy |
A list of fully qualified type names that should be copied to the transformed CAS where available. By default, no types are copied apart from DocumentMetaData, i.e. all other annotations are omitted. |
String |
True |
— |
true |
— |
ReplacementFileNormalizer
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Takes a text and replaces desired expressions This class should not work on tokens as some expressions might span several tokens
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
modelLocation |
Location of a file which contains all replacing characters |
String |
True |
— |
false |
— |
srcExpressionSurroundings |
— |
String |
True |
— |
false |
— |
targetExpressionSurroundings |
— |
String |
True |
— |
false |
— |
SharpSNormalizer
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Takes a text and replaces sharp s
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
MinFrequencyThreshold |
— |
Integer |
True |
— |
false |
— |
typesToCopy |
A list of fully qualified type names that should be copied to the transformed CAS where available. By default, no types are copied apart from DocumentMetaData, i.e. all other annotations are omitted. |
String |
True |
— |
true |
— |
SpellingNormalizer
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Converts annotations of the type SpellingAnomaly into a SofaChangeAnnoatation.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
typesToCopy |
A list of fully qualified type names that should be copied to the transformed CAS where available. By default, no types are copied apart from DocumentMetaData, i.e. all other annotations are omitted. |
String |
True |
— |
true |
— |
StanfordPtbTransformer
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Uses the normalizing tokenizer of the Stanford CoreNLP tools to escape the text PTB-style. This component operates directly on the text and does not require prior segmentation.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
typesToCopy |
A list of fully qualified type names that should be copied to the transformed CAS where available. By default, no types are copied apart from DocumentMetaData, i.e. all other annotations are omitted. |
String |
True |
— |
true |
— |
TokenCaseTransformer
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Change tokens to follow a specific casing: all upper case, all lower case, or 'normal case': lowercase everything but the first character of a token and the characters immediately following a hyphen.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
tokenCase |
The case to convert tokens to: <ul> <li>UPPERCASE: uppercase everything.</li> <li>LOWERCASE: lowercase everything.</li> <li>NORMALCASE: retain first letter in word and after hyphens, lowercase everything else.</li> </ul> |
String |
True |
— |
false |
— |
typesToCopy |
A list of fully qualified type names that should be copied to the transformed CAS where available. By default, no types are copied apart from DocumentMetaData, i.e. all other annotations are omitted. |
String |
True |
— |
true |
— |
Tweet Normaliser
Category: Normalizer
Framework: GATE
Version: unknown
Normalise texts in tweets (convert into standard English spelling mistakes, colloquialisms, typing variations and so on)
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
dictURL |
— |
java.net.URL |
— |
resources/normaliser/english.jaspell |
— |
— |
document |
— |
gate.Document |
— |
— |
— |
true |
initialTextFeature |
— |
java.lang.String |
— |
string |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
maxDistance |
— |
java.lang.String |
— |
2.0 |
— |
true |
normTextFeature |
— |
java.lang.String |
— |
string |
— |
true |
origTextFeature |
— |
java.lang.String |
— |
origString |
— |
true |
orthURL |
— |
java.net.URL |
— |
resources/normaliser/orth.en.csv |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
UmlautNormalizer
Category: Normalizer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Takes a text and checks for umlauts written as "ae", "oe", or "ue" and normalizes them if they really are umlauts depending on a frequency model.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
MinFrequencyThreshold |
— |
Integer |
True |
— |
false |
— |
typesToCopy |
A list of fully qualified type names that should be copied to the transformed CAS where available. By default, no types are copied apart from DocumentMetaData, i.e. all other annotations are omitted. |
String |
True |
— |
true |
— |
Parser (24)
BerkeleyParser
Category: Parser
Framework: DKPro Core (UIMA)
Version: 1.8.0
Berkeley Parser annotator . Requires Sentences to be annotated before.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ConstituentMappingLocation |
Location of the mapping file for constituent tags to UIMA types. |
String |
False |
— |
false |
— |
POSMappingLocation |
Location of the mapping file for part-of-speech tags to UIMA types. |
String |
False |
— |
false |
— |
accurate |
Set thresholds for accuracy. <p> Default: false (set thresholds for efficiency) |
Boolean |
True |
— |
false |
— |
binarize |
Output binarized trees. <p> Default: false |
Boolean |
True |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spaming the heap with thousands of strings representing only a few different tags. Default: true |
Boolean |
False |
— |
false |
— |
keepFunctionLabels |
Retain predicted function labels. Model must have been trained with function labels. <p> Default: false |
Boolean |
True |
— |
false |
— |
language |
Use this language instead of the language set in the CAS to locate the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. Default: false |
Boolean |
True |
— |
false |
— |
readPOS |
Sets whether to use or not to use already existing POS tags from another annotator for the parsing process. <p> Default: false |
Boolean |
True |
— |
false |
— |
scores |
Output inside scores (only for binarized viterbi trees). <p> Default: false |
Boolean |
True |
— |
false |
— |
substates |
Output sub-categories (only for binarized Viterbi trees). <p> Default: false |
Boolean |
True |
— |
false |
— |
variational |
Use variational rule score approximation instead of max-rule <p> Default: false |
Boolean |
True |
— |
false |
— |
viterbi |
Compute Viterbi derivation instead of max-rule tree. <p> Default: false (max-rule) |
Boolean |
True |
— |
false |
— |
writePOS |
Sets whether to create or not to create POS tags. The creation of constituent tags must be turned on for this to work. <p> Default: true |
Boolean |
True |
— |
false |
— |
writePennTree |
If this parameter is set to true, each sentence is annotated with a PennTree-Annotation, containing the whole parse tree in Penn Treebank style format. <p> Default: false |
Boolean |
True |
— |
false |
— |
CCGParser
Category: Parser
Framework: AlvisNLP
Version: 2012-04-30
Syntax parsing with CCG Parser.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
dependentRole |
— |
java.lang.String |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
executable |
— |
org.bibliome.util.files.ExecutableFile |
True |
— |
— |
— |
formFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
headRole |
— |
java.lang.String |
True |
— |
— |
— |
internalEncoding |
— |
java.lang.String |
True |
— |
— |
— |
labelFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
lpTransformation |
— |
java.lang.Boolean |
False |
— |
— |
— |
maxRuns |
— |
java.lang.Integer |
True |
— |
— |
— |
maxSuperCats |
— |
java.lang.Integer |
True |
— |
— |
— |
parserModel |
— |
org.bibliome.util.files.InputDirectory |
True |
— |
— |
— |
posFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
relationName |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
True |
— |
— |
— |
sentenceRole |
— |
java.lang.String |
True |
— |
— |
— |
stanfordMarkedUpScript |
— |
org.bibliome.util.files.InputFile |
False |
— |
— |
— |
stanfordScript |
— |
org.bibliome.util.files.ExecutableFile |
False |
— |
— |
— |
superModel |
— |
org.bibliome.util.files.InputDirectory |
True |
— |
— |
— |
wordLayerName |
— |
java.lang.String |
True |
— |
— |
— |
ClearNlpParser
Category: Parser
Framework: DKPro Core (UIMA)
Version: 1.8.0
Clear parser annotator.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Location from which the model is read. |
String |
False |
— |
false |
— |
modelVariant |
Variant of a model the model. Used to address a specific model if here are multiple models for one language. |
String |
False |
— |
false |
— |
printTagSet |
Write the tag set(s) to the log when a model is loaded. |
Boolean |
True |
— |
false |
— |
English Dependency Parser
Category: Parser
Framework: GATE
Version: unknown
Ready-made application for Stanford English parser
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
Stanford Parser |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
sample_parser_en.gapp |
— |
— |
English POS Tagger and Dependency Parser
Category: Parser
Framework: GATE
Version: unknown
Ready-made application for Stanford English POS tagger and parser
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
Stanford Parser |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
sample_pos+parser_en.gapp |
— |
— |
Enju Parser
Category: Parser
Framework: NaCTeM (UIMA)
Version: 1.1
A syntactic parser for English. With a wide-coverage probabilistic HPSG grammar and an efficient parsing algorithm, this parser can effectively analyze syntactic/semantic structures of English sentences and provide a user with phrase structures and predicate-argument structures. Main features: Accurate deep analysis - the parser can output both phrase structures and predicate-argument structures. The accuracy of predicate-argument relations is around 90% for newswire articles and biomedical papers. High speed - parsing speed is less than 500 msec. per sentence by default (faster than most Penn Treebank parsers), and less than 50 msec. when using the high-speed setting ("mogura"). Enju website: http://www.nactem.ac.uk/enju
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
DisablePOSTagging |
Take tokens and their corresponding POS tags as input from a preceding component. |
Boolean |
True |
— |
false |
— |
DisableTokenisation |
Take tokens as input from a preceding component. |
Boolean |
True |
— |
false |
— |
UseBiomedicalModel |
Use the biomedical model trained on the GENIA corpus. |
Boolean |
True |
— |
false |
— |
UseHighSpeedParser |
Use the high speed parser "mogura". |
Boolean |
True |
— |
false |
— |
EnjuParser
Category: Parser
Framework: AlvisNLP
Version:
Parses sentences with the ENJU dependency parser.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
biology |
— |
java.lang.Boolean |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
dependenciesRelationName |
— |
java.lang.String |
True |
— |
— |
— |
dependencyHeadRole |
— |
java.lang.String |
True |
— |
— |
— |
dependencyLabelFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
enjuEncoding |
— |
java.lang.String |
True |
— |
— |
— |
enjuExecutable |
— |
org.bibliome.util.files.ExecutableFile |
True |
— |
— |
— |
nBest |
— |
java.lang.Integer |
True |
— |
— |
— |
parseNumberFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
parseStatusFeature |
— |
java.lang.String |
True |
— |
— |
— |
posFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
True |
— |
— |
— |
sentenceRole |
— |
java.lang.String |
True |
— |
— |
— |
wordFormFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
wordLayerName |
— |
java.lang.String |
True |
— |
— |
— |
EnjuParser2
Category: Parser
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
biology |
— |
java.lang.Boolean |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
dependenciesRelationName |
— |
java.lang.String |
True |
— |
— |
— |
dependencyDependentRole |
— |
java.lang.String |
True |
— |
— |
— |
dependencyHeadRole |
— |
java.lang.String |
True |
— |
— |
— |
dependencyLabelFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
dependentTypeFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
enjuEncoding |
— |
java.lang.String |
True |
— |
— |
— |
enjuExecutable |
— |
org.bibliome.util.files.ExecutableFile |
True |
— |
— |
— |
nBest |
— |
java.lang.Integer |
True |
— |
— |
— |
parseNumberFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
parseStatusFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
posFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
True |
— |
— |
— |
sentenceRole |
— |
java.lang.String |
True |
— |
— |
— |
wordFormFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
wordLayerName |
— |
java.lang.String |
True |
— |
— |
— |
FreelingShallowParser
Category: Parser
Framework: NaCTeM (UIMA)
Version: 1.0
Performs tokenisation, lemmatisation, POS tagging and shallow parsing (chunking). Operates on different languages by setting the "language" parameter. Default language is English (en). Also operates on Spanish (es), Catalan (ca), Galician (gl), and Asturian (ast).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
— |
String |
True |
— |
false |
— |
GENIA Dependency Parser
Category: Parser
Framework: NaCTeM (UIMA)
Version: 1.0
A dependency parser for biomedical text. The model was trained on the GENIA Treebank. Original software developed by Tsujii Lab (University of Tokyo) and the Institute for Creative Technologies (University of Southern California). Website: http://people.ict.usc.edu/~sagae/parser/gdep/
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
DisableTokenisation |
Take tokens as input from a preceding component. |
Boolean |
True |
— |
false |
— |
ILSP Dependency Parser
Category: Parser
Framework: ILSP (UIMA)
Version: 1.15
ILSP Dependency Parser is a tool trained on the Greek Dependency Treebank (Prokopidis et al., 2005), a resource which comprises data annotated at several linguistic levels. Training data at the level of syntax consisted of ~150+ KWords annotated using a dependency-based syntactic scheme that includes 25 main relations. Different types of parsers (transition-based. graph-based, Maltparser, MateParser) are used during training and application of learned models. ILSP Dependency Parser is used in parsing EL POS-tagged and lemmatized sentences.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
useDepParser |
The dependency parser to use |
String |
True |
— |
false |
— |
MaltParser
Category: Parser
Framework: DKPro Core (UIMA)
Version: 1.8.0
Dependency parsing using MaltPaser.
Required annotations:
- Token
- Sentence
- POS
- Dependency (annotated over sentence-span)
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ignoreMissingFeatures |
Process anyway, even if the model relies on features that are not supported by this component. Default: false |
Boolean |
True |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. Default: false |
Boolean |
True |
— |
false |
— |
MateParser
Category: Parser
Framework: DKPro Core (UIMA)
Version: 1.8.0
DKPro Annotator for the MateToolsParser.
Please cite the following paper, if you use the parser: Bernd Bohnet. 2010. Top Accuracy and Fast Dependency Parsing is not a Contradiction. The 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
DependencyMappingLocation |
Load the dependency to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. Default: false |
Boolean |
True |
— |
false |
— |
Minipar Wrapper
Category: Parser
Framework: GATE
Version: unknown
MiniPar is a shallow parser. It determines the dependency relationships between the words of a sentence.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationInputSetName |
— |
java.lang.String |
— |
— |
— |
true |
annotationOutputSetName |
— |
java.lang.String |
— |
— |
— |
true |
annotationTypeName |
— |
java.lang.String |
— |
DepTreeNode |
— |
false |
document |
— |
gate.Document |
— |
— |
— |
true |
miniparBinary |
— |
java.net.URL |
— |
— |
— |
true |
miniparDataDir |
— |
java.net.URL |
— |
— |
— |
true |
MstParser
Category: Parser
Framework: DKPro Core (UIMA)
Version: 1.8.0
Dependency parsing using MSTParser.
Wrapper for the MSTParser (high memory requirements). More information about the parser can be found here here
The MSTParser models tend to be very large, e.g. the Eisner model is about 600 MB uncompressed. With this model, parsing a simple sentence with MSTParser requires about 3 GB heap memory.
This component feeds MSTParser only with the FORM (token) and POS (part-of-speech) fields. LEMMA, CPOS, and other columns from the CONLL 2006 format are not generated (cf. mstparser.DependencyInstance DependencyInstance).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
DependencyMappingLocation |
Load the dependency to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
order |
Specifies the order/scope of features. 1 only has features over single edges and 2 has features over pairs of adjacent edges in the tree. The model must have been trained with the respective order set here. |
Integer |
False |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. Default: false |
Boolean |
True |
— |
false |
— |
OpenNLP Parser
Category: Parser
Framework: GATE
Version: unknown
Syntactic parser from Apache OpenNLP
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
model |
— |
java.net.URL |
— |
models/english/en-parser-chunking.bin |
— |
— |
OpenNLPParser
Category: Parser
Framework: NaCTeM (UIMA)
Version: 1.0
Parse the document and create phrasal and clausal annotations over the text. Uses the OpenNLP MaxEnt parser. This analysis engine takes a parameter called "ParseTagMapping" which maps each parse tag to a syntax annotation type. The parse tags come from the standard Penn Tree Bank phrase and clause tags (produced by the OpenNLP parser), and each syntax annotation type must be defined in the type system and have a corresponding JCas Java class.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
AdvancePercentage |
— |
Float |
False |
— |
false |
— |
BeamSize |
— |
Integer |
False |
— |
false |
— |
CaseSensitiveTagDictionary |
— |
Boolean |
False |
— |
false |
— |
ModelDirectory |
— |
String |
True |
— |
false |
— |
ParseTagMappings |
— |
String |
True |
— |
true |
— |
UseTagDictionary |
— |
Boolean |
False |
— |
false |
— |
OpenNlpParser
Category: Parser
Framework: DKPro Core (UIMA)
Version: 1.8.0
OpenNLP parser. The parser ignores existing POS tags and internally creates new ones. However, these tags are only added as annotation if explicitly requested via #PARAM_WRITE_POS.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ConstituentMappingLocation |
Location of the mapping file for constituent tags to UIMA types. |
String |
False |
— |
false |
— |
POSMappingLocation |
Load the part-of-speech tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spaming the heap with thousands of strings representing only a few different tags. <p>Default: true</p> |
Boolean |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. <p>Default: false</p> |
Boolean |
True |
— |
false |
— |
writePOS |
Sets whether to create or not to create POS tags. The creation of constituent tags must be turned on for this to work. <p>Default: true</p> |
Boolean |
True |
— |
false |
— |
writePennTree |
If this parameter is set to true, each sentence is annotated with a PennTree-Annotation, containing the whole parse tree in Penn Treebank style format. <p>Default: false</p> |
Boolean |
True |
— |
false |
— |
RASP2 Parser
Category: Parser
Framework: GATE
Version: unknown
RASP dependency parser
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
charset |
— |
java.lang.String |
— |
ISO-8859-1 |
— |
true |
debug |
— |
java.lang.Boolean |
— |
false |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputFormat |
— |
java.lang.String |
— |
-og |
— |
true |
phrasalVerbs |
— |
java.lang.Boolean |
— |
true |
— |
true |
raspHome |
— |
java.net.URL |
— |
file:/usr/local/bin/RASP |
— |
false |
subcategorisation |
— |
java.lang.Boolean |
— |
true |
— |
true |
Stanford Dependency Parser
Category: Parser
Framework: NaCTeM (UIMA)
Version: 1.6.1
Generates Stanford-style dependencies together with POS tokens for English. It wraps parts of the Stanford Parser version 1.6.1. The project's website: http://www-nlp.stanford.edu/downloads/lex-parser.shtml.
StanfordDependencyConverter
Category: Parser
Framework: DKPro Core (UIMA)
Version: 1.8.0
Converts a constituency structure into a dependency structure.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
Use this language instead of the document language to resolve the model and tag set mapping. |
String |
False |
— |
false |
— |
mode |
Sets the kind of dependencies being created. <p>Default: DependenciesMode#COLLAPSED TREE |
String |
False |
— |
false |
— |
originalDependencies |
Create original dependencies. If this is disabled, universal dependencies are created. The default is to create the original dependencies. |
Boolean |
True |
— |
false |
— |
StanfordParser
Category: Parser
Framework: GATE
Version: unknown
Stanford parser wrapper
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
addConstituentAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
addDependencyAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
addDependencyFeatures |
— |
java.lang.Boolean |
— |
true |
— |
true |
addPosTags |
— |
java.lang.Boolean |
— |
false |
— |
true |
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
debug |
— |
java.lang.Boolean |
— |
false |
— |
true |
dependencyMode |
— |
gate.stanford.DependencyMode |
— |
Typed |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
includeExtraDependencies |
— |
java.lang.Boolean |
— |
false |
— |
true |
inputSentenceType |
— |
java.lang.String |
— |
Sentence |
— |
true |
inputTokenType |
— |
java.lang.String |
— |
Token |
— |
true |
mappingFile |
— |
java.net.URL |
— |
— |
— |
— |
parserFile |
— |
java.net.URL |
— |
resources/englishRNN.ser.gz |
— |
— |
reusePosTags |
— |
java.lang.Boolean |
— |
false |
— |
true |
tlppClass |
— |
java.lang.String |
— |
edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams |
— |
— |
useMapping |
— |
java.lang.Boolean |
— |
false |
— |
true |
StanfordParser
Category: Parser
Framework: DKPro Core (UIMA)
Version: 1.8.0
Stanford Parser component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ConstituentMappingLocation |
Location of the mapping file for constituent tags to UIMA types. |
String |
False |
— |
false |
— |
POSMappingLocation |
Location of the mapping file for part-of-speech tags to UIMA types. |
String |
False |
— |
false |
— |
annotationTypeToParse |
This parameter can be used to override the standard behavior which uses the <i>Sentence</i> annotation as the basic unit for parsing. <p>If the parameter is set with the name of an annotation type <i>x</i>, the parser will no longer parse <i>Sentence</i>-annotations, but <i>x</i>-Annotations.</p> <p>Default: null |
String |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model and tag set mapping. |
String |
False |
— |
false |
— |
maxItems |
Controls when the factored parser considers a sentence to be too complex and falls back to the PCFG parser. <p> Default: 200000 |
Integer |
True |
— |
false |
— |
maxSentenceLength |
Maximum number of tokens in a sentence. Longer sentences are not parsed. This is to avoid out of memory exceptions. <p> Default: 130 |
Integer |
True |
— |
false |
— |
mode |
Sets the kind of dependencies being created. <p>Default: DependenciesMode#COLLAPSED TREE |
String |
False |
— |
false |
— |
modelLocation |
Location from which the model is read. |
String |
False |
— |
false |
— |
modelVariant |
Variant of a model the model. Used to address a specific model if here are multiple models for one language. |
String |
False |
— |
false |
— |
printTagSet |
Write the tag set(s) to the log when a model is loaded. |
Boolean |
True |
— |
false |
— |
ptb3Escaping |
Enable all traditional PTB3 token transforms (like -LRB-, -RRB-). |
Boolean |
True |
— |
false |
— |
quoteBegin |
List of extra token texts (usually single character strings) that should be treated like opening quotes and escaped accordingly before being sent to the parser. |
String |
False |
— |
true |
— |
quoteEnd |
List of extra token texts (usually single character strings) that should be treated like closing quotes and escaped accordingly before being sent to the parser. |
String |
False |
— |
true |
— |
readPOS |
Sets whether to use or not to use already existing POS tags from another annotator for the parsing process. <p> Default: true |
Boolean |
True |
— |
false |
— |
writeConstituent |
Sets whether to create or not to create constituent tags. This is required for POS-tagging and lemmatization. <p> Default: true |
Boolean |
True |
— |
false |
— |
writeDependency |
Sets whether to create or not to create dependency annotations. <p>Default: true |
Boolean |
True |
— |
false |
— |
writePOS |
Sets whether to create or not to create POS tags. The creation of constituent tags must be turned on for this to work. <p> Default: false |
Boolean |
True |
— |
false |
— |
writePennTree |
If this parameter is set to true, each sentence is annotated with a PennTree-Annotation, containing the whole parse tree in Penn Treebank style format. <p> Default: false |
Boolean |
True |
— |
false |
— |
Textalytics Lemmatization, PoS and Parsing
Category: Parser
Framework: GATE
Version: unknown
Textalytics Lemmatization, PoS and Parsing
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
apiURL |
— |
java.lang.String |
— |
— |
true |
|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
debug |
— |
java.lang.Boolean |
— |
— |
— |
true |
dictionary |
— |
java.lang.String |
— |
— |
— |
true |
disambiguationLevel |
— |
daedalus.textalytics.gate.param.DisambiguationLevel |
— |
strong_disambiguation |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASTypes |
— |
java.util.List |
— |
— |
— |
true |
inputASname |
— |
java.lang.String |
— |
— |
— |
true |
key |
— |
java.lang.String |
— |
— |
— |
true |
lang |
— |
java.lang.String |
— |
— |
— |
true |
outputASname |
— |
java.lang.String |
— |
Textalytics |
— |
true |
relaxedTypography |
— |
java.lang.Boolean |
— |
— |
— |
true |
ud |
— |
java.lang.String |
— |
— |
— |
true |
unknownWords |
— |
java.lang.Boolean |
— |
— |
— |
true |
Pre-built Workflows (12)
Arabic IE System
Category: Pre-built Workflows
Framework: GATE
Version: unknown
Ready-made Arabic IE application
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
Cebuano IE System
Category: Pre-built Workflows
Framework: GATE
Version: unknown
Ready-made Cebuano IE application
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
Chinese IE System
Category: Pre-built Workflows
Framework: GATE
Version: unknown
Ready-made Chinese IE application
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
French IE System
Category: Pre-built Workflows
Framework: GATE
Version: unknown
Ready-made French IE application
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
German IE System
Category: Pre-built Workflows
Framework: GATE
Version: unknown
Ready-made German IE application
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
Measurements
Category: Pre-built Workflows
Framework: GATE
Version: unknown
Ready-made application for measurement annotator
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
Romanian IE System
Category: Pre-built Workflows
Framework: GATE
Version: unknown
Ready-made Romanian IE application
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
RussIE
Category: Pre-built Workflows
Framework: GATE
Version: unknown
Basic version of the RussIE application
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
[[RussIE_-Inflectional_Gazetteer&_OrthoMatcher]] ==== RussIE + Inflectional Gazetteer & OrthoMatcher
Category: Pre-built Workflows
Framework: GATE
Version: unknown
RussIE application with orthomatcher and inflexional gazetteer
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
RussIE + Inflectional Gazetter
Category: Pre-built Workflows
Framework: GATE
Version: unknown
RussIE application with inflexional gazetteer
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
RussIE + OrthoMatcher
Category: Pre-built Workflows
Framework: GATE
Version: unknown
RussIE application with orthomatcher
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
[[TwitIE_(EN)]] ==== TwitIE (EN)
Category: Pre-built Workflows
Framework: GATE
Version: unknown
English TwitIE application
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
Readability (1)
Reader (91)
ACE Corpus Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.0
Reads ...
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
folders |
A list of folders containing ACE 2005 corpus files. The folders must contain pairs of *.sgm and *.apf.xml files. |
String |
True |
— |
true |
— |
AclAnthologyReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reada the ACL anthology corpus and outputs CASes with plain text documents.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
Encoding |
Name of configuration parameter that contains the character encoding used by the input files. If not specified, the default system encoding will be used. |
String |
True |
— |
false |
— |
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
Aimed Collection Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.0
Reads Aimed corpus (225 abstracts from MEDLINE) with the gold standard sentence, protein, protein-protein interaction anntations.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
GeneratePpiAnnotations |
— |
Boolean |
True |
— |
false |
— |
GenerateProteinAnnotations |
— |
Boolean |
True |
— |
false |
— |
GenerateSentenceAnnotations |
— |
Boolean |
True |
— |
false |
— |
NumberOfArticles |
— |
Integer |
False |
— |
false |
— |
PubmedIds |
Specifies pubmedIDs to pick articles. This parameter has the highest priority. |
Integer |
False |
— |
true |
— |
StartingFromArticle |
— |
Integer |
False |
— |
false |
— |
AlvisAEReader
Category: Reader
Framework: AlvisNLP
Version:
reads documents and annotations from an AlvisAE campaign.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
campaignId |
— |
java.lang.Integer |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
groupItemRolePrefix |
— |
java.lang.String |
True |
— |
— |
— |
htmlLayerName |
— |
java.lang.String |
True |
— |
— |
— |
linkToAnnotation |
— |
java.lang.Boolean |
True |
— |
— |
— |
maxDate |
— |
java.lang.String |
False |
— |
— |
— |
password |
— |
java.lang.String |
True |
— |
— |
— |
schema |
— |
java.lang.String |
True |
— |
— |
— |
sectionName |
— |
java.lang.String |
True |
— |
— |
— |
taskId |
— |
java.lang.Integer |
False |
— |
— |
— |
textBoundFragmentRolePrefix |
— |
java.lang.String |
True |
— |
— |
— |
textBoundRelationName |
— |
java.lang.String |
True |
— |
— |
— |
typeFeature |
— |
java.lang.String |
True |
— |
— |
— |
url |
— |
java.lang.String |
True |
— |
— |
— |
userId |
— |
java.lang.Integer |
False |
— |
— |
— |
userLayerName |
— |
java.lang.String |
True |
— |
— |
— |
username |
— |
java.lang.String |
True |
— |
— |
— |
AlvisAEReader2
Category: Reader
Framework: AlvisNLP
Version:
reads documents and annotations from an AlvisAE campaign.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
adjudicate |
— |
java.lang.Boolean |
False |
— |
— |
— |
annotationIdFeature |
— |
java.lang.String |
True |
— |
— |
— |
annotationSetIdFeature |
— |
java.lang.String |
True |
— |
— |
— |
campaignId |
— |
java.lang.Integer |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
createdFeature |
— |
java.lang.String |
True |
— |
— |
— |
descriptionFeature |
— |
java.lang.String |
True |
— |
— |
— |
docDescriptions |
— |
java.lang.String[] |
False |
— |
— |
— |
docExternalIds |
— |
java.lang.String[] |
False |
— |
— |
— |
docIds |
— |
java.lang.Integer[] |
False |
— |
— |
— |
externalIdFeature |
— |
java.lang.String |
True |
— |
— |
— |
fragmentRolePrefix |
— |
java.lang.String |
True |
— |
— |
— |
fragmentTypeFeature |
— |
java.lang.String |
True |
— |
— |
— |
fragmentsLayerName |
— |
java.lang.String |
True |
— |
— |
— |
head |
— |
java.lang.Boolean |
True |
— |
— |
— |
itemRolePrefix |
— |
java.lang.String |
True |
— |
— |
— |
kindFeature |
— |
java.lang.String |
True |
— |
— |
— |
loadDependencies |
— |
java.lang.Boolean |
False |
— |
— |
— |
loadGroups |
— |
java.lang.Boolean |
True |
— |
— |
— |
loadRelations |
— |
java.lang.Boolean |
True |
— |
— |
— |
loadTextBound |
— |
java.lang.Boolean |
True |
— |
— |
— |
oldModel |
— |
java.lang.Boolean |
False |
— |
— |
— |
password |
— |
java.lang.String |
True |
— |
— |
— |
referentFeature |
— |
java.lang.String |
True |
— |
— |
— |
schema |
— |
java.lang.String |
True |
— |
— |
— |
sectionName |
— |
java.lang.String |
True |
— |
— |
— |
sourceRolePrefix |
— |
java.lang.String |
True |
— |
— |
— |
taskFeature |
— |
java.lang.String |
False |
— |
— |
— |
taskId |
— |
java.lang.Integer |
False |
— |
— |
— |
taskName |
— |
java.lang.String |
False |
— |
— |
— |
typeFeature |
— |
java.lang.String |
True |
— |
— |
— |
url |
— |
java.lang.String |
True |
— |
— |
— |
userFeature |
— |
java.lang.String |
False |
— |
— |
— |
userIds |
— |
java.lang.Integer[] |
False |
— |
— |
— |
userNames |
— |
java.lang.String[] |
False |
— |
— |
— |
username |
— |
java.lang.String |
True |
— |
— |
— |
AnimalReader
Category: Reader
Framework: AlvisNLP
Version: 2012-04-30
Project-specific file reader.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
bodySectionName |
— |
java.lang.String |
True |
— |
— |
— |
charset |
— |
java.lang.String |
True |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
linesLimit |
— |
java.lang.Integer |
False |
— |
— |
— |
sizeLimit |
— |
java.lang.Integer |
False |
— |
— |
— |
sourcePath |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
titleSectionName |
— |
java.lang.String |
True |
— |
— |
— |
xmlDir |
— |
org.bibliome.util.files.InputDirectory |
True |
— |
— |
— |
[[AssertAnnotations$InternalStringReader]] ==== AssertAnnotations$InternalStringReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Descriptor automatically generated by uimaFIT
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
documentText |
— |
String |
True |
— |
false |
— |
language |
— |
String |
True |
— |
false |
— |
BIO Format Collection Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.0
Reads BIO format files from specified directory.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
Directory |
Directory where .bio |
.iob |
.BIO |
.IOB files are stored. |
String |
True |
— |
false |
— |
TypeToBioSuffixMap |
Fully qualified type name, comma, suffix string |
String |
True |
BinaryCasReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
UIMA Binary CAS formats reader.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
typeSystemLocation |
The location from which to obtain the type system when the CAS is stored in form 0. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
BioC Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.0
Reads a file in BioC format. A BioC file contains a collection of documents with annotations. BioC website: http://bioc.sourceforge.net/
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
inputFile |
A path to a BioC file. |
String |
True |
— |
false |
— |
BioCreative CHEMDNER Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 0.1
Reads data prepared specifically for the BioCreative IV's CHEMDNER track. This component transcribes annotations into the BioC type system.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
abstractsFile |
A file with a set of abstracts |
String |
True |
— |
false |
— |
annotationsFile |
A file with standoff annotations |
String |
True |
— |
false |
— |
BioNLP ST Data Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.1
Reads files formatted for the BioNLP Shared Task series and outputs documents with named entity, relation and event annotations. File syntax is available on http://2013.bionlp-st.org/.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
folders |
A list of folders containing BioNLP Shared Task-format files. The folders must contain at least ".txt" files and optionally ".a1" and ".a2" files. |
String |
True |
— |
true |
— |
BioNLPSTReader
Category: Reader
Framework: AlvisNLP
Version:
Reads documents and annotations in the BioNLP-ST 2013 a1/a2 format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
a1Dir |
— |
org.bibliome.util.files.InputDirectory |
False |
— |
— |
— |
a2Dir |
— |
org.bibliome.util.files.InputDirectory |
False |
— |
— |
— |
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
charset |
— |
java.lang.String |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
equivalenceItemPrefix |
— |
java.lang.String |
True |
— |
— |
— |
equivalenceRelationName |
— |
java.lang.String |
True |
— |
— |
— |
eventKind |
— |
java.lang.String |
True |
— |
— |
— |
fragmentCountFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
idFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
kindFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
relationKind |
— |
java.lang.String |
True |
— |
— |
— |
schema |
— |
org.bibliome.util.bionlpst.schema.DocumentSchema |
False |
— |
— |
— |
sectionName |
— |
java.lang.String |
True |
— |
— |
— |
textBoundAsAnnotations |
— |
java.lang.Boolean |
False |
— |
— |
— |
textBoundFragmentRolePrefix |
— |
java.lang.String |
True |
— |
— |
— |
textDir |
— |
org.bibliome.util.files.InputDirectory |
True |
— |
— |
— |
textKind |
— |
java.lang.String |
True |
— |
— |
— |
triggerRole |
— |
java.lang.String |
True |
— |
— |
— |
typeFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
BlikiWikipediaReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Bliki-based Wikipedia reader.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
The language of the wiki installation. |
String |
True |
— |
false |
— |
outputPlainText |
Whether the reader outputs plain text or wiki markup. |
Boolean |
True |
— |
false |
— |
pageTitles |
Which page titles should be retrieved. |
String |
True |
— |
true |
— |
sourceLocation |
Wikiapi URL E.g. for the English Wikipedia it should be: http://en.wikipedia.org/w/api.php |
String |
True |
— |
false |
— |
BncReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reader for the British National Corpus (XML version).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Location of the mapping file for part-of-speech tags to UIMA types. |
String |
False |
— |
false |
— |
POSTagSet |
Use this part-of-speech tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. |
String |
False |
— |
false |
— |
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
BratReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reader for the brat format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
relationTypes |
Types that are relations. It is mandatory to provide the type name followed by two feature names that represent Arg1 and Arg2 separated by colons, e.g. <code>de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency:Governor:Dependent{A}</code>. Additionally, a subcategorization feature may be specified. |
String |
True |
— |
true |
— |
sourceEncoding |
Name of configuration parameter that contains the character encoding used by the input files. |
String |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
textAnnotationTypes |
Types that are text annotations. It is mandatory to provide the type name which can optionally be followed by a subcategorization feature. Using this parameter is only necessary to specify a subcategorization feature. Otherwise, text annotation types are automatically detected. |
String |
True |
— |
true |
— |
typeMappings |
— |
String |
False |
— |
true |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
CombinationReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Combines multiple readers into a single reader.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
readers |
— |
String |
True |
— |
true |
— |
Conll2000Reader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads the Conll 2000 chunking format.
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
deficit NN I-NP
will MD B-VP
narrow VB I-VP
to TO B-PP
only RB B-NP
# # I-NP
1.8 CD I-NP
billion CD I-NP
in IN B-PP
September NNP B-NP
. . O
- FORM - token
- POSTAG - part-of-speech tag
- CHUNK - chunk (BIO encoded)
Sentences are separated by a blank new line.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ChunkMappingLocation |
Load the chunk tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
ChunkTagSet |
Use this chunk tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. This can be useful if a custom model is specified which does not have such meta data, or it can be used in readers. |
String |
False |
— |
false |
— |
POSMappingLocation |
Load the part-of-speech tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
POSTagSet |
Use this part-of-speech tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. This can be useful if a custom model is specified which does not have such meta data, or it can be used in readers. |
String |
False |
— |
false |
— |
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spamming the heap with thousands of strings representing only a few different tags. Default: true |
Boolean |
False |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
readChunk |
Write chunk information. Default: true |
Boolean |
True |
— |
false |
— |
readPOS |
Write part-of-speech information. Default: true |
Boolean |
True |
— |
false |
— |
sourceEncoding |
Character encoding of the input data. |
String |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
Conll2002Reader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads the CoNLL 2002 named entity format. The columns are separated by a single space, like illustrated below.
Wolff B-PER
, O
currently O
a O
journalist O
in O
Argentina B-LOC
, O
played O
with O
Del B-PER
Bosque I-PER
in O
the O
final O
years O
of O
the O
seventies O
in O
Real B-ORG
Madrid I-ORG
. O
- FORM - token
- NER - named entity (BIO encoded)
Sentences are separated by a blank new line.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spamming the heap with thousands of strings representing only a few different tags. Default: true |
Boolean |
False |
— |
false |
— |
language |
The language. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
readNamedEntity |
Write named entity information. Default: true |
Boolean |
True |
— |
false |
— |
sourceEncoding |
Character encoding of the input data. |
String |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
Conll2006Reader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads a file in the CoNLL-2006 format (aka CoNLL-X).
Heutzutage heutzutage ADV _ _ ADV _ _
- ID - (ignored) Token counter, starting at 1 for each new sentence.
- FORM - (Token) Word form or punctuation symbol.
- LEMMA - (Lemma) Fine-grained part-of-speech tag, where the tagset depends on the language, or identical to the coarse-grained part-of-speech tag if not available.
- CPOSTAG - (unused)
- POSTAG - (POS) Fine-grained part-of-speech tag, where the tagset depends on the language, or identical to the coarse-grained part-of-speech tag if not available.
- FEATS - (MorphologicalFeatures) Unordered set of syntactic and/or morphological features (depending on the particular language), separated by a vertical bar (|), or an underscore if not available.
- HEAD - (Dependency) Head of the current token, which is either a value of ID or zero ('0'). Note that depending on the original treebank annotation, there may be multiple tokens with an ID of zero.
- DEPREL - (Dependency) Dependency relation to the HEAD. The set of dependency relations depends on the particular language. Note that depending on the original treebank annotation, the dependency relation may be meaningful or simply 'ROOT'.
- PHEAD - (ignored) Projective head of current token, which is either a value of ID or zero ('0'), or an underscore if not available. Note that depending on the original treebank annotation, there may be multiple tokens an with ID of zero. The dependency structure resulting from the PHEAD column is guaranteed to be projective (but is not available for all languages), whereas the structures resulting from the HEAD column will be non-projective for some sentences of some languages (but is always available).
- PDEPREL - (ignored) Dependency relation to the PHEAD, or an underscore if not available. The set of dependency relations depends on the particular language. Note that depending on the original treebank annotation, the dependency relation may be meaningful or simply 'ROOT'.
Sentences are separated by a blank new line.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Load the part-of-speech tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
POSTagSet |
Use this part-of-speech tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. This can be useful if a custom model is specified which does not have such meta data, or it can be used in readers. |
String |
False |
— |
false |
— |
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
readDependency |
— |
Boolean |
True |
— |
false |
— |
readLemma |
— |
Boolean |
True |
— |
false |
— |
readMorph |
— |
Boolean |
True |
— |
false |
— |
readPOS |
— |
Boolean |
True |
— |
false |
— |
sourceEncoding |
— |
String |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
Conll2009Reader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads a file in the CoNLL-2009 format.
- ID - (ignored) Token counter, starting at 1 for each new sentence.
- FORM - (Token) Word form or punctuation symbol.
- LEMMA - (Lemma) Fine-grained part-of-speech tag, where the tagset depends on the language, or identical to the coarse-grained part-of-speech tag if not available.
- PLEMMA - (ignored) Automatically predicted lemma of FORM
- POS - (POS) Fine-grained part-of-speech tag, where the tagset depends on the language, or identical to the coarse-grained part-of-speech tag if not available.
- PPOS - (ignored) Automatically predicted major POS by a language-specific tagger
- FEAT - (MorphologicalFeatures) Unordered set of syntactic and/or morphological features (depending on the particular language), separated by a vertical bar (|), or an underscore if not available.
- PFEAT - (ignored) Automatically predicted morphological features (if applicable)
- HEAD - (Dependency) Head of the current token, which is either a value of ID or zero ('0'). Note that depending on the original treebank annotation, there may be multiple tokens with an ID of zero.
- PHEAD - (ignored) Automatically predicted syntactic head
- DEPREL - (Dependency) Dependency relation to the HEAD. The set of dependency relations depends on the particular language. Note that depending on the original treebank annotation, the dependency relation may be meaningfull or simply 'ROOT'.
- PDEPREL - (ignored) Automatically predicted dependency relation to PHEAD
- FILLPRED - (ignored) Contains 'Y' for argument-bearing tokens
- PRED - (SemanticPredicate) (sense) identifier of a semantic 'predicate' coming from a current token
- APREDs - (SemanticArgument) Columns with argument labels for each semantic predicate (in the ID order)
Sentences are separated by a blank new line.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Load the part-of-speech tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
POSTagSet |
Use this part-of-speech tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. This can be useful if a custom model is specified which does not have such meta data, or it can be used in readers. |
String |
False |
— |
false |
— |
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
readDependency |
— |
Boolean |
True |
— |
false |
— |
readLemma |
— |
Boolean |
True |
— |
false |
— |
readMorph |
— |
Boolean |
True |
— |
false |
— |
readPOS |
— |
Boolean |
True |
— |
false |
— |
readSemanticPredicate |
— |
Boolean |
True |
— |
false |
— |
sourceEncoding |
— |
String |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
Conll2012Reader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads a file in the CoNLL-2009 format.
- Document ID - (ignored) This is a variation on the document filename.
- Part number - (ignored) Some files are divided into multiple parts numbered as 000, 001, 002, ... etc.
- Word number - (ignored)
- Word itself - (document text) This is the token as segmented/tokenized in the Treebank. Initially the *_skel file contain the placeholder [WORD] which gets replaced by the actual token from the Treebank which is part of the OntoNotes release.
- Part-of-Speech - (POS)
- Parse bit - (Constituent) This is the bracketed structure broken before the first open parenthesis in the parse, and the word/part-of-speech leaf replaced with a *. The full parse can be created by substituting the asterix with the "([pos] [word])" string (or leaf) and concatenating the items in the rows of that column.
- Predicate lemma - (Lemma) The predicate lemma is mentioned for the rows for which we have semantic role information. All other rows are marked with a "-"
- Predicate Frameset ID - (SemanticPredicate) This is the PropBank frameset ID of the predicate in Column 7.
- Word sense - (ignored) This is the word sense of the word in Column 3.
- Speaker/Author - (ignored) This is the speaker or author name where available. Mostly in Broadcast Conversation and Web Log data.
- Named Entities - (NamedEntity) These columns identifies the spans representing various named entities.
- Predicate Arguments - (SemanticPredicate) There is one column each of predicate argument structure information for the predicate mentioned in Column 7.
- Coreference - (CoreferenceChain) Coreference chain information encoded in a parenthesis structure.
Sentences are separated by a blank new line.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ConstituentMappingLocation |
Load the constituent tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
ConstituentTagSet |
Use this constituent tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. This can be useful if a custom model is specified which does not have such meta data, or it can be used in readers. |
String |
False |
— |
false |
— |
POSMappingLocation |
Load the part-of-speech tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
POSTagSet |
Use this part-of-speech tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. This can be useful if a custom model is specified which does not have such meta data, or it can be used in readers. |
String |
False |
— |
false |
— |
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spaming the heap with thousands of strings representing only a few different tags. Default: true |
Boolean |
False |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
readConstituent |
— |
Boolean |
True |
— |
false |
— |
readCoreference |
— |
Boolean |
True |
— |
false |
— |
readLemma |
Disabled by default because CoNLL 2012 format does not include lemmata for all words, only for predicates. |
Boolean |
True |
— |
false |
— |
readNamedEntity |
— |
Boolean |
True |
— |
false |
— |
readPOS |
— |
Boolean |
True |
— |
false |
— |
readSemanticPredicate |
— |
Boolean |
True |
— |
false |
— |
readWordSense |
— |
Boolean |
True |
— |
false |
— |
sourceEncoding |
— |
String |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
useHeaderMetadata |
Use the document ID declared in the file header instead of using the filename. |
Boolean |
True |
— |
false |
— |
writeTracesToText |
— |
Boolean |
False |
— |
false |
— |
Entity Annotation Results Importer
Category: Reader
Framework: GATE
Version: unknown
Import judgments from a CrowdFlower job created by the Entity Annotation Job Builder as GATE annotations.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotateSpans |
— |
java.lang.Boolean |
— |
true |
— |
true |
apiKey |
— |
java.lang.String |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
jobId |
— |
java.lang.Long |
— |
— |
— |
true |
resultASName |
— |
java.lang.String |
— |
crowdResults |
— |
true |
resultAnnotationType |
— |
java.lang.String |
— |
— |
— |
true |
snippetASName |
— |
java.lang.String |
— |
— |
— |
true |
snippetAnnotationType |
— |
java.lang.String |
— |
Sentence |
— |
true |
tokenASName |
— |
java.lang.String |
— |
— |
— |
true |
tokenAnnotationType |
— |
java.lang.String |
— |
Token |
— |
true |
EuropePMC Open Access Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.0
Reads open-access full-text articles from the Europe PMC web service
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
cacheSize |
Size of the queue to store articles loaded preemptively. |
Integer |
False |
— |
false |
— |
ids |
List of article ids (e.g. PMC4489390) from which to retrieve the full text. NOTE: This or 'query' must be set, but not both. |
String |
False |
— |
true |
— |
includeAbstract |
Size of the queue to store articles loaded preemptively. |
Boolean |
True |
— |
false |
— |
includeSubArticles |
Size of the queue to store articles loaded preemptively. |
Boolean |
True |
— |
false |
— |
includeTitle |
Size of the queue to store articles loaded preemptively. |
Boolean |
True |
— |
false |
— |
limit |
Maximum number of full text articles to retrieve. NOTE: Only applies when 'query' is set. |
Integer |
False |
— |
false |
— |
numRetries |
— |
Integer |
False |
— |
false |
— |
query |
Query term used to retrieve full text articles. NOTE: This or 'ids' must be set, but not both. |
String |
False |
— |
false |
— |
recorderEnabled |
— |
Boolean |
True |
— |
false |
— |
recorderJdbcUrl |
— |
String |
False |
— |
false |
— |
recorderPassword |
— |
String |
False |
— |
false |
— |
recorderUsername |
— |
String |
False |
— |
false |
— |
retryOnError |
— |
Boolean |
True |
— |
false |
— |
retrySeconds |
— |
Integer |
False |
— |
false |
— |
sortByPublicationDate |
Retrieve the most recently published articles first. NOTE: Only applies when 'query' is set. |
Boolean |
False |
— |
false |
— |
FSOVFileReader
Category: Reader
Framework: AlvisNLP
Version: 2012-04-30
Project-specific text file reader.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
bodySectionName |
— |
java.lang.String |
True |
— |
— |
— |
charset |
— |
java.lang.String |
True |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
linesLimit |
— |
java.lang.Integer |
False |
— |
— |
— |
sizeLimit |
— |
java.lang.Integer |
False |
— |
— |
— |
sourcePath |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
titleSectionName |
— |
java.lang.String |
True |
— |
— |
— |
xmlDir |
— |
org.bibliome.util.files.InputDirectory |
True |
— |
— |
— |
Fast Infoset Document Format
Category: Reader
Framework: GATE
Version: unknown
Format parser for GATE XML stored in the binary Fast Infoset format
GATE .cochrane.txt document format
Category: Reader
Framework: GATE
Version: unknown
Load this to allow the opening of Cochrane text documents, and choose the mime type "text/x-cochrane", or use the correct file extension.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
excludeFromFeatures |
— |
java.util.List |
— |
TI;AB |
— |
— |
fieldPattern |
— |
java.lang.String |
— |
(?<CODE>[A-Z]+): (?<VALUE>.*) |
— |
— |
fieldsForText |
— |
java.util.List |
— |
TI=title;ID=id;AU=authors;AB=abstract |
— |
— |
ignorePattern |
— |
java.lang.String |
— |
— |
— |
— |
GATE .pubMed.txt document format
Category: Reader
Framework: GATE
Version: unknown
Load this to allow the opening of PubMed text documents, and choose the mime type "text/x-pubmed"or use the correct file extension.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
excludeFromFeatures |
— |
java.util.List |
— |
TI;AB |
— |
— |
fieldPattern |
— |
java.lang.String |
— |
(?<CODE>….)- (?<VALUE>.*) |
— |
— |
fieldsForText |
— |
java.util.List |
— |
TI=title;PMID=id;AU=authors;AB=abstract |
— |
— |
ignorePattern |
— |
java.lang.String |
— |
— |
— |
— |
GATE DataSift JSON Document Format
Category: Reader
Framework: GATE
Version: unknown
Format parser for DataSift JSON files
GATE JSON Tweet Document Format
Category: Reader
Framework: GATE
Version: unknown
Format parser for Twitter JSON files
GateXMLReaderDescriptor
Category: Reader
Framework: ILSP (UIMA)
Version: 0.9
Reads GATE documents created with ILSP tools
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
InputDirectory |
Directory of xml files to read in |
String |
False |
— |
false |
— |
InputEncoding |
Character encoding for the documents. If not specified, the default system encoding will be used. Note that this parameter only applies if there is no CAS Initializer provided; otherwise, it is the CAS Initializer’s responsibility to deal with character encoding issues. |
String |
False |
— |
false |
— |
InputFile |
Single file to be processed |
String |
False |
— |
false |
— |
StripExt |
The file extension to strip from the original filenames. Only files with this extension will be processed by the reader. |
String |
False |
— |
false |
— |
GeniaJSONReader
Category: Reader
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
annotationsLayerName |
— |
java.lang.String |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
instanceIdFeature |
— |
java.lang.String |
True |
— |
— |
— |
source |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
GeniaReader
Category: Reader
Framework: AlvisNLP
Version: 2012-04-30
Reads text files and their associated annotation files in BioNLP Shared Task format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
aDir |
— |
org.bibliome.util.files.InputDirectory |
False |
— |
— |
— |
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
dependencyLabelFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
dependencyRelationName |
— |
java.lang.String |
True |
— |
— |
— |
dependentRoleName |
— |
java.lang.String |
True |
— |
— |
— |
entitiesLayerName |
— |
java.lang.String |
False |
— |
— |
— |
equivalenceRelationName |
— |
java.lang.String |
True |
— |
— |
— |
equivalenceRolePrefix |
— |
java.lang.String |
True |
— |
— |
— |
headRoleName |
— |
java.lang.String |
True |
— |
— |
— |
idFeatureKey |
— |
java.lang.String |
False |
— |
— |
— |
layerNames |
— |
alvisnlp.module.types.Mapping |
True |
— |
— |
— |
readA1 |
— |
java.lang.Boolean |
True |
— |
— |
— |
readA2 |
— |
java.lang.Boolean |
False |
— |
— |
— |
sectionName |
— |
java.lang.String |
True |
— |
— |
— |
sourcePath |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
typeFeatureKey |
— |
java.lang.String |
False |
— |
— |
— |
wordLayerName |
— |
java.lang.String |
True |
— |
— |
— |
HtmlReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads the contents of a given URL and strips the HTML. Returns only the textual contents.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
Set this as the language of the produced documents. |
String |
False |
— |
false |
— |
sourceEncoding |
Name of configuration parameter that contains the character encoding used by the input files. |
String |
True |
— |
false |
— |
sourceLocation |
URL from which the input is read. |
String |
True |
— |
false |
— |
I2B2Reader
Category: Reader
Framework: AlvisNLP
Version:
read files in the format of the I2B2 challenge.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
assertionFeature |
— |
java.lang.String |
True |
— |
— |
— |
assertionsDir |
— |
org.bibliome.util.files.InputDirectory |
False |
— |
— |
— |
conceptTypeFeature |
— |
java.lang.String |
True |
— |
— |
— |
conceptsDir |
— |
org.bibliome.util.files.InputDirectory |
False |
— |
— |
— |
conceptsLayerName |
— |
java.lang.String |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
leftRole |
— |
java.lang.String |
True |
— |
— |
— |
linenoFeature |
— |
java.lang.String |
True |
— |
— |
— |
linesLayerName |
— |
java.lang.String |
True |
— |
— |
— |
relationsDir |
— |
org.bibliome.util.files.InputDirectory |
False |
— |
— |
— |
rightRole |
— |
java.lang.String |
True |
— |
— |
— |
sectionName |
— |
java.lang.String |
True |
— |
— |
— |
textDir |
— |
org.bibliome.util.files.InputDirectory |
True |
— |
— |
— |
tokenNumberFeature |
— |
java.lang.String |
True |
— |
— |
— |
tokensLayerName |
— |
java.lang.String |
True |
— |
— |
— |
ILSP File System Collection Reader
Category: Reader
Framework: ILSP (UIMA)
Version: 1.0
Reads files from the filesystem. This CollectionReader may be used with or without a CAS Initializer. If a CAS Initializer is supplied, it will be passed an InputStream to the file and must populate the CAS from that InputStream. If no CAS Initializer is supplied, this CollectionReader will read the file itself and set treat the entire contents of the file as the document to be inserted into the CAS. Uses code from the Apache UIMA framwork licensed under the ASF License.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
InputDirectory |
Directory containing input files |
String |
True |
— |
false |
— |
InputEncoding |
Character encoding for the documents. If not specified, the default system encoding will be used. Note that this parameter only applies if there is no CAS Initializer provided; otherwise, it is the CAS Initializer’s responsibility to deal with character encoding issues. |
String |
False |
— |
false |
— |
InputFile |
Single file to be processed |
String |
False |
— |
false |
— |
InputLanguage |
ISO language code for the documents |
String |
False |
— |
false |
— |
MaxSize |
Input file allowed max size in KB. |
Integer |
False |
— |
false |
— |
ProcessParameters |
Process parameters to be passed to an AE. |
String |
False |
— |
true |
— |
StripExt |
The file extension to strip from the original filenames. Only files with this extension will be processed by the reader. |
String |
False |
— |
false |
— |
ImsCwbReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads a tab-separated format including pseudo-XML tags.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Location of the mapping file for part-of-speech tags to UIMA types. |
String |
False |
— |
false |
— |
POSTagSet |
Specify which tag set should be used to locate the mapping file. |
String |
False |
— |
false |
— |
generateNewIds |
If true, the unit IDs are used only to detect if a new document (CAS) needs to be created, but for the purpose of setting the document ID, a new ID is generated. (Default: false) |
Boolean |
True |
— |
false |
— |
idIsUrl |
If true, the unit text ID encoded in the corpus file is stored as the URI in the document meta data. This setting has is not affected by #PARAM_GENERATE_NEW_IDS (Default: false) |
Boolean |
True |
— |
false |
— |
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
readLemma |
Read lemmas. Default: true |
Boolean |
True |
— |
false |
— |
readPOS |
Read part-of-speech tags and generate POS annotations or subclasses if a #PARAM_POS_TAG_SET tag set or #PARAM_POS_MAPPING_LOCATION mapping file is used. Default: true |
Boolean |
True |
— |
false |
— |
readSentence |
Read sentences. Default: true |
Boolean |
True |
— |
false |
— |
readToken |
Read tokens and generate Token annotations. Default: true |
Boolean |
True |
— |
false |
— |
replaceNonXml |
Replace non-XML characters with spaces. (Default: true) |
Boolean |
True |
— |
false |
— |
sourceEncoding |
— |
String |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
Input Text Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.0
Reads text supplied in a parameter. This component is useful if you want to quickly process a single document by simply copy-pasting its content.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
inputText |
The text to be processed. |
String |
True |
— |
false |
— |
JdbcReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Collection reader for JDBC database.The obtained data will be written into CAS DocumentText as well as fields of the DocumentMetaData annotation.
The field names are available as constants and begin with CAS_
. Please specify the
mapping of the columns and the field names in the query. For example,
SELECT text AS cas_text, title AS cas_metadata_title FROM test_table
will create a CAS for each record, write the content of "text" column into CAS documen text and that of "title" column into the document title field of the DocumentMetaData annotation.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
connection |
Specifies the URL to the database. <p> If used with uimaFIT and the value is not given, <code>jdbc:mysql://127.0.0.1/</code> will be taken. |
String |
True |
— |
false |
— |
database |
Specifies name of the database to be accessed. |
String |
True |
— |
false |
— |
driver |
Specify the class name of the JDBC driver. <p> If used with uimaFIT and the value is not given, <code>com.mysql.jdbc.Driver</code> will be taken. |
String |
True |
— |
false |
— |
language |
Specifies the language. |
String |
False |
— |
false |
— |
password |
Specifies the password for database access. |
String |
True |
— |
false |
— |
query |
Specifies the query. |
String |
True |
— |
false |
— |
user |
Specifies the user name for database access. |
String |
True |
— |
false |
— |
KEA Corpus Importer
Category: Reader
Framework: GATE
Version: unknown
Imports a KEA-style corpus into GATE
LIBSVMReader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.0
Reads a dataset in LIBSVM format
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
InputLIBSVMDataset |
Folder contains svm datasets (this can be a single file) |
String |
True |
— |
false |
— |
LLLReader
Category: Reader
Framework: AlvisNLP
Version:
Read files and annotations in LLL format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
agentFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
dependenciesRelationName |
— |
java.lang.String |
True |
— |
— |
— |
dependencyLabelFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
dependentRole |
— |
java.lang.String |
True |
— |
— |
— |
genicAgentRole |
— |
java.lang.String |
True |
— |
— |
— |
genicInteractionRelationName |
— |
java.lang.String |
True |
— |
— |
— |
genicTargetRole |
— |
java.lang.String |
True |
— |
— |
— |
headRole |
— |
java.lang.String |
True |
— |
— |
— |
idFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
lemmaFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
sectionName |
— |
java.lang.String |
True |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
True |
— |
— |
— |
source |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
targetFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
wordLayerName |
— |
java.lang.String |
True |
— |
— |
— |
MediaWiki Corpus Populater
Category: Reader
Framework: GATE
Version: unknown
Populate a corpus from a MediaWiki XML dump
MediaWiki Document Format
Category: Reader
Framework: GATE
Version: unknown
Document format for parsing MediaWiki markup
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ignorableTags |
— |
java.util.Set |
— |
script;style |
— |
— |
MediaWiki XML Document Format
Category: Reader
Framework: GATE
Version: unknown
Deprecated MediaWiki importer
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ignorableTags |
— |
java.util.Set |
— |
script;style |
— |
— |
Merge GENIA-coref with -term Collection Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.0
Read GENIA-coref files and GENIA-event/-term files and merge each couple into one CAS. Pre-conditions: -The number of files in 2 input directories must equal and file names must be the same. -The texts in the two corresponding files must be the same.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
InputDirectory1 |
Directory containing input files |
String |
True |
— |
false |
— |
InputDirectory2 |
— |
String |
True |
— |
false |
— |
OutputLogFile |
— |
String |
True |
— |
false |
— |
NegraExportReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
This CollectionReader reads a file which is formatted in the NEGRA export format. The texts and add. information like constituent structure is reproduced in CASes, one CAS per text (article) .
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Location of the mapping file for part-of-speech tags to UIMA types. |
String |
False |
— |
false |
— |
POSTagSet |
Use this part-of-speech tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. This can be useful if a custom model is specified which does not have such meta data, or it can be used in readers. |
String |
False |
— |
false |
— |
collectionId |
The collection ID to the written to the document meta data. (Default: none) |
String |
False |
— |
false |
— |
documentUnit |
What indicates if a new CAS should be started. E.g., if set to DocumentUnit#ORIGIN_NAME ORIGIN_NAME, a new CAS is generated whenever the origin name of the current sentence differs from the origin name of the last sentence. (Default: ORIGIN_NAME) |
String |
True |
— |
false |
— |
generateNewIds |
If true, the unit IDs are used only to detect if a new document (CAS) needs to be created, but for the purpose of setting the document ID, a new ID is generated. (Default: false) |
Boolean |
True |
— |
false |
— |
language |
The language. |
String |
False |
— |
false |
— |
readLemma |
Write lemma information. Default: true |
Boolean |
True |
— |
false |
— |
readPOS |
Write part-of-speech information. Default: true |
Boolean |
True |
— |
false |
— |
readPennTree |
Write Penn Treebank bracketed structure information. Mind this may not work with all tagsets, in particular not with such that contain "(" or ")" in their tags. The tree is generated using the original tag set in the corpus, not using the mapped tagset! Default: false |
Boolean |
True |
— |
false |
— |
sourceEncoding |
Character encoding of the input data. |
String |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
True |
— |
false |
— |
OBOReader
Category: Reader
Framework: AlvisNLP
Version:
Reads terms in OBO files as documents.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
ancestorsFeature |
— |
java.lang.String |
False |
— |
— |
— |
childrenFeature |
— |
java.lang.String |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
excludeOBOBuiltins |
— |
java.lang.Boolean |
True |
— |
— |
— |
idPrefix |
— |
java.lang.String |
True |
— |
— |
— |
nameSectionName |
— |
java.lang.String |
True |
— |
— |
— |
oboFiles |
— |
java.lang.String[] |
True |
— |
— |
— |
parentFeature |
— |
java.lang.String |
True |
— |
— |
— |
pathFeature |
— |
java.lang.String |
True |
— |
— |
— |
synonymSectionName |
— |
java.lang.String |
True |
— |
— |
— |
PdfReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Collection reader for PDF files. Uses simple heuristics to detect headings and paragraphs.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
endPage |
The last page to be extracted from the PDF. |
Integer |
False |
— |
false |
— |
headingType |
The type used to annotate headings. |
String |
False |
— |
false |
— |
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
paragraphType |
The type used to annotate paragraphs. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
startPage |
The first page to be extracted from the PDF. |
Integer |
False |
— |
false |
— |
substitutionTableLocation |
The location of the substitution table use to post-process the text extracted form the PDF, e.g. to convert ligatures to separate characters. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
PennTreebankChunkedReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Penn Treebank chunked format reader.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Location of the mapping file for part-of-speech tags to UIMA types. |
String |
False |
— |
false |
— |
POSTagSet |
Use this part-of-speech tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. This can be useful if a custom model is specified which does not have such meta data, or it can be used in readers. |
String |
False |
— |
false |
— |
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
readChunk |
Write chunk annotations to the CAS. |
Boolean |
True |
— |
false |
— |
readPOS |
Write part-of-speech annotations to the CAS. |
Boolean |
True |
— |
false |
— |
readSentence |
Write sentence annotations to the CAS. |
Boolean |
True |
— |
false |
— |
readToken |
Write token annotations to the CAS. |
Boolean |
True |
— |
false |
— |
sourceEncoding |
Character encoding of the input data. |
String |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
PennTreebankCombinedReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Penn Treebank combined format reader.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ConstituentMappingLocation |
Load the constituent tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
ConstituentTagSet |
Use this constituent tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. This can be useful if a custom model is specified which does not have such meta data, or it can be used in readers. |
String |
False |
— |
false |
— |
POSMappingLocation |
Load the part-of-speech tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
POSTagSet |
Use this part-of-speech tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. This can be useful if a custom model is specified which does not have such meta data, or it can be used in readers. |
String |
False |
— |
false |
— |
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spaming the heap with thousands of strings representing only a few different tags. <p>Default: true</p> |
Boolean |
False |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
readPOS |
Sets whether to create or not to create POS tags. The creation of constituent tags must be turned on for this to work. <p>Default: true</p> |
Boolean |
True |
— |
false |
— |
removeTraces |
— |
Boolean |
False |
— |
false |
— |
sourceEncoding |
Name of configuration parameter that contains the character encoding used by the input files. |
String |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
writeTracesToText |
— |
Boolean |
False |
— |
false |
— |
PubMed Abstract Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.0
Fetches PubMed abstracts from NaCTeM's Kleio service.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
PubMedIDs |
A list of PubMed IDs. Any format is accepted as long as IDs are separated by non-numerical characters. |
String |
True |
— |
false |
— |
PubTatorReader
Category: Reader
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
classFeature |
— |
java.lang.String |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
offsetFeature |
— |
java.lang.String |
True |
— |
— |
— |
sourcePath |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
typeFeature |
— |
java.lang.String |
True |
— |
— |
— |
RDF Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.0
Reads Common Annotation Structures (CASes) from RDF-encoded files. The files have presumably been written with RDF Writer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
inputFileOrFolder |
A file or folder where RDF-encoded Common Annotation Structures will be read from. |
String |
True |
— |
false |
— |
RTFReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Read RTF (Rich Test Format) files. Uses RTFEditorKit for parsing RTF..
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
Reuters21578SgmlReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Read a Reuters-21578 corpus in SGML format.
Set the directory that contains the SGML files with #PARAM_SOURCE_LOCATION.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
sourceLocation |
The directory that contains the Reuters-21578 SGML files. |
String |
True |
— |
false |
— |
Reuters21578TxtReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Read a Reuters-21578 corpus that has been transformed into text format using ExtractReuters in the lucene-benchmarks project.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
sourceLocation |
The directory that contains the Reuters-21578 text files, named according to the pattern #FILE_PATTERN. |
String |
True |
— |
false |
— |
SFTP Document Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.0
Reads plain-text documents from a remote directory on a user-specified server via SFTP.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
Password |
— |
String |
True |
— |
false |
— |
RemoteDirectory |
— |
String |
True |
— |
false |
— |
ServerURL |
— |
String |
True |
— |
false |
— |
Username |
— |
String |
True |
— |
false |
— |
SFTP XMI Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.0
Reads an XMI-formatted corpus from an SFTP-enabled server.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
Password |
— |
String |
True |
— |
false |
— |
RemoteDirectory |
— |
String |
True |
— |
false |
— |
ServerURL |
— |
String |
True |
— |
false |
— |
Username |
— |
String |
True |
— |
false |
— |
SerializedCasReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
typeSystemLocation |
The file from which to obtain the type system if it is not embedded in the serialized CAS. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
Shared Task 2004 Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 0.0.1-SNAPSHOT
Reads training or evaluation data from the BioNLP/NLPBA 2004 Bio-Entity Recognition Task
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
limit |
— |
Integer |
False |
— |
false |
— |
readTrainingData |
True if training data is to be read, otherwise evaluation data will be read |
Boolean |
True |
— |
false |
— |
StringReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Simple reader that generates a CAS from a String. This can be useful in situations where a reader is preferred over manually crafting a CAS using JCasFactory#createJCas().
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
collectionId |
The collection ID to set in the DocumentMetaData. |
String |
True |
— |
false |
— |
documentBaseUri |
The document base URI to set in the DocumentMetaData. |
String |
False |
— |
false |
— |
documentId |
The document ID to set in the DocumentMetaData. |
String |
True |
— |
false |
— |
documentText |
The document text. |
String |
True |
— |
false |
— |
documentUri |
The document URI to set in the DocumentMetaData. |
String |
True |
— |
false |
— |
language |
Set this as the language of the produced documents. |
String |
True |
— |
false |
— |
TSV Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.0
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
InputFile |
A tab-separated-value file containing the columns "#URI", "#type", and feature names appropriate for the types. |
String |
True |
— |
false |
— |
TabularReader
Category: Reader
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
addToLayer |
— |
java.lang.Boolean |
False |
— |
— |
— |
checkNumColumns |
— |
java.lang.Integer |
False |
— |
— |
— |
commitLines |
— |
java.lang.Boolean |
False |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
createAnnotations |
— |
java.lang.Boolean |
False |
— |
— |
— |
createDocuments |
— |
java.lang.Boolean |
False |
— |
— |
— |
createRelations |
— |
java.lang.Boolean |
False |
— |
— |
— |
createSections |
— |
java.lang.Boolean |
False |
— |
— |
— |
createTuples |
— |
java.lang.Boolean |
False |
— |
— |
— |
deleteElements |
— |
java.lang.Boolean |
False |
— |
— |
— |
lineActions |
— |
alvisnlp.corpus.expressions.Expression[] |
True |
— |
— |
— |
removeFromLayer |
— |
java.lang.Boolean |
False |
— |
— |
— |
setArguments |
— |
java.lang.Boolean |
False |
— |
— |
— |
setFeatures |
— |
java.lang.Boolean |
False |
— |
— |
— |
skipBlank |
— |
java.lang.Boolean |
False |
— |
— |
— |
source |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
sourceElement |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
trimColumns |
— |
java.lang.Boolean |
True |
— |
— |
— |
TcfReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reader for the WebLicht TCF format. It reads all the available annotation Layers from the TCF file and convert it to a CAS annotations. The TCF data do not have begin/end offsets for all of its annotations which is required in CAS annotation. Hence, addresses are manually calculated per tokens and stored in a map (token_id, token(CAS object)) where later we get can get the offset from the token
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
TeiReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reader for the TEI XML.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Location of the mapping file for part-of-speech tags to UIMA types. |
String |
False |
— |
false |
— |
POSTagSet |
Use this part-of-speech tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. This can be useful if a custom model is specified which does not have such meta data, or it can be used in readers. |
String |
False |
— |
false |
— |
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
omitIgnorableWhitespace |
Do not write <em>ignoreable whitespace</em> from the XML file to the CAS. |
Boolean |
True |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
readConstituent |
Write constituent annotations to the CAS. |
Boolean |
True |
— |
false |
— |
readLemma |
Write lemma annotations to the CAS. |
Boolean |
True |
— |
false |
— |
readNamedEntity |
Write named entity annotations to the CAS. |
Boolean |
True |
— |
false |
— |
readPOS |
Write part-of-speech annotations to the CAS. |
Boolean |
True |
— |
false |
— |
readParagraph |
Write paragraphs annotations to the CAS. |
Boolean |
True |
— |
false |
— |
readSentence |
Write sentence annotations to the CAS. |
Boolean |
True |
— |
false |
— |
readToken |
Write token annotations to the CAS. |
Boolean |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
useFilenameId |
When not using the XML ID, use only the filename instead of the whole URL as ID. Mind that the filenames should be unique in this case. |
Boolean |
True |
— |
false |
— |
useXmlId |
Use the xml:id attribute on the TEI elements as document ID. Mind that many TEI files may not have this attribute on all TEI elements and you may end up with no document ID at all. Also mind that the IDs should be unique. |
Boolean |
True |
— |
false |
— |
utterancesAsSentences |
Interpret utterances "u" as sentenes "s". (EXPERIMENTAL) |
Boolean |
True |
— |
false |
— |
TextFileReader
Category: Reader
Framework: AlvisNLP
Version: 2010-10-28
Reads files and adds a document in the corpus for each file.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
baseNameId |
— |
java.lang.Boolean |
False |
— |
— |
— |
charset |
— |
java.lang.String |
True |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
linesLimit |
— |
java.lang.Integer |
False |
— |
— |
— |
sectionName |
— |
java.lang.String |
True |
— |
— |
— |
sizeLimit |
— |
java.lang.Integer |
False |
— |
— |
— |
sourcePath |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
TextReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
UIMA collection reader for plain text files.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
sourceEncoding |
Name of configuration parameter that contains the character encoding used by the input files. |
String |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
TigerXmlReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
UIMA collection reader for TIGER-XML files. Also supports the augmented format used in the Semeval 2010 task which includes semantic role data.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Location of the mapping file for part-of-speech tags to UIMA types. |
String |
False |
— |
false |
— |
POSTagSet |
Use this part-of-speech tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. This can be useful if a custom model is specified which does not have such meta data, or it can be used in readers. |
String |
False |
— |
false |
— |
ignoreIllegalSentences |
If a sentence has an illegal structure (e.g. TIGER 2.0 has non-terminal nodes that do not have child nodes), then just ignore these sentences. Default: false |
Boolean |
True |
— |
false |
— |
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
readPennTree |
Write Penn Treebank bracketed structure information. Mind this may not work with all tagsets, in particular not with such that contain "(" or ")" in their tags. The tree is generated using the original tag set in the corpus, not using the mapped tagset! Default: false |
Boolean |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
TreeTaggerReader
Category: Reader
Framework: AlvisNLP
Version: 2010-10-28
Read files in tree-tagger output format and creates a document for each file read.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
charset |
— |
java.lang.String |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
lemmaFeatureKey |
— |
java.lang.String |
False |
— |
— |
— |
posFeatureKey |
— |
java.lang.String |
False |
— |
— |
— |
sectionName |
— |
java.lang.String |
True |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
True |
— |
— |
— |
sourcePath |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
wordLayerName |
— |
java.lang.String |
True |
— |
— |
— |
TueppReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
- Only the part-of-speech with the best rank (rank 1) is read, if there is a tie between multiple tags, the first one from the XML file is read.
- Only the first lemma (baseform) from the XML file is read.
- Token are read, but not the specific kind of token (e.g. TEL, AREA, etc.).
- Article boundaries are not read.
- Paragraph boundaries are not read.
- Lemma information is read, but morphological information is not read.
- Chunk, field, and clause information is not read.
- Meta data headers are not read.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Location of the mapping file for part-of-speech tags to UIMA types. |
String |
False |
— |
false |
— |
POSTagSet |
Use this part-of-speech tag set to use to resolve the tag set mapping instead of using the tag set defined as part of the model meta data. This can be useful if a custom model is specified which does not have such meta data, or it can be used in readers. |
String |
False |
— |
false |
— |
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
sourceEncoding |
Character encoding of the input data. |
String |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
Twitter Collection Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version:
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
count |
The number of tweets to return per page, up to a maximum of 100. Defaults to 15. Example Values: 100 |
Integer |
False |
— |
false |
— |
debugEnabled |
— |
Boolean |
False |
— |
false |
— |
geoCode |
Returns tweets by users located within a given radius of the given latitude/longitude. The location is preferentially taking from the Geotagging API, but will fall back to their Twitter profile. The parameter value is specified by 'latitude,longitude,radius', where radius units must be specified as either 'mi' (miles) or 'km' (kilometers). Note that you cannot use the near operator via the API to geocode arbitrary locations; however you can use this geocode parameter to search near geocodes directly. A maximum of 1,000 distinct 'sub-regions' will be considered when using the radius modifier. Example Values: 37.781157,-122.398720,1mi |
Float |
False |
— |
false |
— |
lang |
Restricts tweets to the given language, given by an ISO 639-1 code. Language detection is best-effort. Example Values: eu |
String |
False |
— |
false |
— |
locale |
Specify the language of the query you are sending (only ja is currently effective). This is intended for language-specific consumers and the default should work in the majority of cases. Example Values: ja |
String |
False |
— |
false |
— |
oAuthAccessToken |
— |
String |
False |
— |
false |
— |
oAuthAccessTokenSecret |
— |
String |
False |
— |
false |
— |
oAuthConsumerKey |
— |
String |
False |
— |
false |
— |
oAuthConsumerSecret |
— |
String |
False |
— |
false |
— |
query |
A UTF-8, URL-encoded search query of 1,000 characters maximum, including operators. Queries may additionally be limited by complexity. Example Values: @noradio |
String |
True |
— |
false |
— |
resultType |
Specifies what type of search results you would prefer to receive. The current default is 'mixed' Valid values include: mixed: Include both popular and real time results in the response. recent: return only the most recent results in the response. popular: return only the most popular results in the response. Example Values: mixed, recent, popular |
String |
False |
— |
false |
— |
sinceId |
Returns results with an ID greater than (that is, more recent than) the specified ID. There are limits to the number of Tweets which can be accessed through the API. If the limit of Tweets has occured since the since_id, the since_id will be forced to the oldest ID available. Example Values: 12345 |
String |
False |
— |
false |
— |
totalCount |
The total number of tweets to return. Defaults to 1000. Example Values: 500 |
Integer |
False |
— |
false |
— |
Twitter Corpus Populator
Category: Reader
Framework: GATE
Version: unknown
Populate a corpus from Twitter JSON containing multiple Tweets
WebOfKnowledgeReader
Category: Reader
Framework: AlvisNLP
Version:
Reads Web of Knowledge search result import files.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
source |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
tabularFormat |
— |
java.lang.Boolean |
False |
— |
— |
— |
WikipediaArticleInfoReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads all general article infos without retrieving the whole Page objects
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
CreateDBAnno |
Sets whether the database configuration should be stored in the CAS, so that annotators down the pipeline can access additional data. |
Boolean |
True |
— |
false |
— |
Database |
The name of the database. |
String |
True |
— |
false |
— |
Host |
The host server. |
String |
True |
— |
false |
— |
Language |
The language of the Wikipedia that should be connected to. |
String |
True |
— |
false |
— |
Password |
The password of the database account. |
String |
True |
— |
false |
— |
User |
The username of the database account. |
String |
True |
— |
false |
— |
WikipediaArticleReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads all article pages. A parameter controls whether the full article or only the first paragraph is set as the document text. No Redirects, disambiguation pages, or discussion pages are regarded, however.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
CreateDBAnno |
Sets whether the database configuration should be stored in the CAS, so that annotators down the pipeline can access additional data. |
Boolean |
True |
— |
false |
— |
Database |
The name of the database. |
String |
True |
— |
false |
— |
Host |
The host server. |
String |
True |
— |
false |
— |
Language |
The language of the Wikipedia that should be connected to. |
String |
True |
— |
false |
— |
OnlyFirstParagraph |
If set to true, only the first paragraph instead of the whole article is used. |
Boolean |
True |
— |
false |
— |
OutputPlainText |
Whether the reader outputs plain text or wiki markup. |
Boolean |
True |
— |
false |
— |
PageBuffer |
The page buffer size (#pages) of the page iterator. |
Integer |
True |
— |
false |
— |
PageIdFromArray |
Defines an array of page ids of the pages that should be retrieved. (Optional) |
String |
False |
— |
true |
— |
PageIdsFromFile |
Defines the path to a file containing a line-separated list of page ids of the pages that should be retrieved. (Optional) |
String |
False |
— |
false |
— |
PageTitleFromFile |
Defines the path to a file containing a line-separated list of page titles of the pages that should be retrieved. (Optional) |
String |
False |
— |
false |
— |
PageTitlesFromArray |
Defines an array of page titles of the pages that should be retrieved. (Optional) |
String |
False |
— |
true |
— |
Password |
The password of the database account. |
String |
True |
— |
false |
— |
User |
The username of the database account. |
String |
True |
— |
false |
— |
WikipediaDiscussionReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads all discussion pages.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
CreateDBAnno |
Sets whether the database configuration should be stored in the CAS, so that annotators down the pipeline can access additional data. |
Boolean |
True |
— |
false |
— |
Database |
The name of the database. |
String |
True |
— |
false |
— |
Host |
The host server. |
String |
True |
— |
false |
— |
Language |
The language of the Wikipedia that should be connected to. |
String |
True |
— |
false |
— |
OutputPlainText |
Whether the reader outputs plain text or wiki markup. |
Boolean |
True |
— |
false |
— |
PageBuffer |
The page buffer size (#pages) of the page iterator. |
Integer |
True |
— |
false |
— |
PageIdFromArray |
Defines an array of page ids of the pages that should be retrieved. (Optional) |
String |
False |
— |
true |
— |
PageIdsFromFile |
Defines the path to a file containing a line-separated list of page ids of the pages that should be retrieved. (Optional) |
String |
False |
— |
false |
— |
PageTitleFromFile |
Defines the path to a file containing a line-separated list of page titles of the pages that should be retrieved. (Optional) |
String |
False |
— |
false |
— |
PageTitlesFromArray |
Defines an array of page titles of the pages that should be retrieved. (Optional) |
String |
False |
— |
true |
— |
Password |
The password of the database account. |
String |
True |
— |
false |
— |
User |
The username of the database account. |
String |
True |
— |
false |
— |
WikipediaLinkReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Read links from Wikipedia.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
AllowedLinkTypes |
Which types of links are allowed? |
String |
True |
— |
true |
— |
CreateDBAnno |
Sets whether the database configuration should be stored in the CAS, so that annotators down the pipeline can access additional data. |
Boolean |
True |
— |
false |
— |
Database |
The name of the database. |
String |
True |
— |
false |
— |
Host |
The host server. |
String |
True |
— |
false |
— |
Language |
The language of the Wikipedia that should be connected to. |
String |
True |
— |
false |
— |
OutputPlainText |
Whether the reader outputs plain text or wiki markup. |
Boolean |
True |
— |
false |
— |
PageBuffer |
The page buffer size (#pages) of the page iterator. |
Integer |
True |
— |
false |
— |
PageIdFromArray |
Defines an array of page ids of the pages that should be retrieved. (Optional) |
String |
False |
— |
true |
— |
PageIdsFromFile |
Defines the path to a file containing a line-separated list of page ids of the pages that should be retrieved. (Optional) |
String |
False |
— |
false |
— |
PageTitleFromFile |
Defines the path to a file containing a line-separated list of page titles of the pages that should be retrieved. (Optional) |
String |
False |
— |
false |
— |
PageTitlesFromArray |
Defines an array of page titles of the pages that should be retrieved. (Optional) |
String |
False |
— |
true |
— |
Password |
The password of the database account. |
String |
True |
— |
false |
— |
User |
The username of the database account. |
String |
True |
— |
false |
— |
WikipediaPageReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads all Wikipedia pages in the database (articles, discussions, etc). A parameter controls whether the full article or only the first paragraph is set as the document text. No Redirects or disambiguation pages are regarded, however.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
CreateDBAnno |
Sets whether the database configuration should be stored in the CAS, so that annotators down the pipeline can access additional data. |
Boolean |
True |
— |
false |
— |
Database |
The name of the database. |
String |
True |
— |
false |
— |
Host |
The host server. |
String |
True |
— |
false |
— |
Language |
The language of the Wikipedia that should be connected to. |
String |
True |
— |
false |
— |
OnlyFirstParagraph |
If set to true, only the first paragraph instead of the whole article is used. |
Boolean |
True |
— |
false |
— |
OutputPlainText |
Whether the reader outputs plain text or wiki markup. |
Boolean |
True |
— |
false |
— |
PageBuffer |
The page buffer size (#pages) of the page iterator. |
Integer |
True |
— |
false |
— |
PageIdFromArray |
Defines an array of page ids of the pages that should be retrieved. (Optional) |
String |
False |
— |
true |
— |
PageIdsFromFile |
Defines the path to a file containing a line-separated list of page ids of the pages that should be retrieved. (Optional) |
String |
False |
— |
false |
— |
PageTitleFromFile |
Defines the path to a file containing a line-separated list of page titles of the pages that should be retrieved. (Optional) |
String |
False |
— |
false |
— |
PageTitlesFromArray |
Defines an array of page titles of the pages that should be retrieved. (Optional) |
String |
False |
— |
true |
— |
Password |
The password of the database account. |
String |
True |
— |
false |
— |
User |
The username of the database account. |
String |
True |
— |
false |
— |
WikipediaQueryReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads all article pages that match a query created by the numerous parameters of this class.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
CreateDBAnno |
Sets whether the database configuration should be stored in the CAS, so that annotators down the pipeline can access additional data. |
Boolean |
True |
— |
false |
— |
Database |
The name of the database. |
String |
True |
— |
false |
— |
Host |
The host server. |
String |
True |
— |
false |
— |
Language |
The language of the Wikipedia that should be connected to. |
String |
True |
— |
false |
— |
MaxCategories |
Maximum number of categories. Articles with a higher number of categories will not be returned by the query. |
Integer |
False |
— |
false |
— |
MaxInlinks |
Maximum number of incoming links. Articles with a higher number of incoming links will not be returned by the query. |
Integer |
False |
— |
false |
— |
MaxOutlinks |
Maximum number of outgoing links. Articles with a higher number of outgoing links will not be returned by the query. |
Integer |
False |
— |
false |
— |
MaxRedirects |
Maximum number of redirects. Articles with a higher number of redirects will not be returned by the query. |
Integer |
False |
— |
false |
— |
MaxTokens |
Maximum number of tokens. Articles with a higher number of tokens will not be returned by the query. |
Integer |
False |
— |
false |
— |
MinCategories |
Minimum number of categories. Articles with a lower number of categories will not be returned by the query. |
Integer |
False |
— |
false |
— |
MinInlinks |
Minimum number of incoming links. Articles with a lower number of incoming links will not be returned by the query. |
Integer |
False |
— |
false |
— |
MinOutlinks |
Minimum number of outgoing links. Articles with a lower number of outgoing links will not be returned by the query. |
Integer |
False |
— |
false |
— |
MinRedirects |
Minimum number of redirects. Articles with a lower number of redirects will not be returned by the query. |
Integer |
False |
— |
false |
— |
MinTokens |
Minimum number of tokens. Articles with a lower number of tokens will not be returned by the query. |
Integer |
False |
— |
false |
— |
OnlyFirstParagraph |
If set to true, only the first paragraph instead of the whole article is used. |
Boolean |
True |
— |
false |
— |
OutputPlainText |
Whether the reader outputs plain text or wiki markup. |
Boolean |
True |
— |
false |
— |
PageBuffer |
The page buffer size (#pages) of the page iterator. |
Integer |
True |
— |
false |
— |
PageIdFromArray |
Defines an array of page ids of the pages that should be retrieved. (Optional) |
String |
False |
— |
true |
— |
PageIdsFromFile |
Defines the path to a file containing a line-separated list of page ids of the pages that should be retrieved. (Optional) |
String |
False |
— |
false |
— |
PageTitleFromFile |
Defines the path to a file containing a line-separated list of page titles of the pages that should be retrieved. (Optional) |
String |
False |
— |
false |
— |
PageTitlesFromArray |
Defines an array of page titles of the pages that should be retrieved. (Optional) |
String |
False |
— |
true |
— |
Password |
The password of the database account. |
String |
True |
— |
false |
— |
TitlePattern |
SQL-style title pattern. Only articles that match the pattern will be returned by the query. |
String |
False |
— |
false |
— |
User |
The username of the database account. |
String |
True |
— |
false |
— |
WikipediaRevisionPairReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads pairs of adjacent revisions of all articles.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
CreateDBAnno |
Sets whether the database configuration should be stored in the CAS, so that annotators down the pipeline can access additional data. |
Boolean |
True |
— |
false |
— |
Database |
The name of the database. |
String |
True |
— |
false |
— |
Host |
The host server. |
String |
True |
— |
false |
— |
Language |
The language of the Wikipedia that should be connected to. |
String |
True |
— |
false |
— |
MaxChange |
Restrict revision pairs to cases where the length of the revisions does not differ more than this value (counted in characters). |
Integer |
True |
— |
false |
— |
MinChange |
Restrict revision pairs to cases where the length of the revisions differ more than this value (counted in characters). |
Integer |
True |
— |
false |
— |
OutputPlainText |
Whether the reader outputs plain text or wiki markup. |
Boolean |
True |
— |
false |
— |
PageBuffer |
The page buffer size (#pages) of the page iterator. |
Integer |
True |
— |
false |
— |
Password |
The password of the database account. |
String |
True |
— |
false |
— |
RevisionIdFromArray |
Defines an array of revision ids of the revisions that should be retrieved. (Optional) |
String |
False |
— |
true |
— |
RevisionIdsFromFile |
Defines the path to a file containing a line-separated list of revision ids of the revisions that should be retrieved. (Optional) |
String |
False |
— |
false |
— |
SkipFirstNPairs |
The number of revision pairs that should be skipped in the beginning. |
Integer |
False |
— |
false |
— |
User |
The username of the database account. |
String |
True |
— |
false |
— |
WikipediaRevisionReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads Wikipedia page revisions.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
CreateDBAnno |
Sets whether the database configuration should be stored in the CAS, so that annotators down the pipeline can access additional data. |
Boolean |
True |
— |
false |
— |
Database |
The name of the database. |
String |
True |
— |
false |
— |
Host |
The host server. |
String |
True |
— |
false |
— |
Language |
The language of the Wikipedia that should be connected to. |
String |
True |
— |
false |
— |
OutputPlainText |
Whether the reader outputs plain text or wiki markup. |
Boolean |
True |
— |
false |
— |
PageBuffer |
The page buffer size (#pages) of the page iterator. |
Integer |
True |
— |
false |
— |
Password |
The password of the database account. |
String |
True |
— |
false |
— |
RevisionIdFromArray |
Defines an array of revision ids of the revisions that should be retrieved. (Optional) |
String |
False |
— |
true |
— |
RevisionIdsFromFile |
Defines the path to a file containing a line-separated list of revision ids of the revisions that should be retrieved. (Optional) |
String |
False |
— |
false |
— |
User |
The username of the database account. |
String |
True |
— |
false |
— |
WikipediaTemplateFilteredArticleReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reads all pages that contain or do not contain the templates specified in the template whitelist and template blacklist.
It is possible to just define a whitelist OR a blacklist. If both whitelist and blacklist are provided, the articles are chosen that DO contain the templates from the whitelist and at the same time DO NOT contain the templates from the blacklist (= the intersection of the "whitelist page set" and the "blacklist page set")
This reader only works if template tables have been generated for the JWPL database using the WikipediaTemplateInfoGenerator.
NOTE: This reader directly extends the WikipediaReaderBase and not the WikipediaStandardReaderBase
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
CreateDBAnno |
Sets whether the database configuration should be stored in the CAS, so that annotators down the pipeline can access additional data. |
Boolean |
True |
— |
false |
— |
Database |
The name of the database. |
String |
True |
— |
false |
— |
DoubleCheckAssociatedPages |
If this option is set, discussion pages are rejected that are associated with a blacklisted article. Analogously, articles are rejected that are associated with a blacklisted discussion page. <p> This check is rather expensive and could take a long time. This is option is not active if only a whitelist is used. </p> <p> Default Value: false </p> |
Boolean |
True |
— |
false |
— |
ExactTemplateMatching |
Defines whether to match the templates exactly or whether to match all templates that start with the String given in the respective parameter list. <p>Default Value: true</p> |
Boolean |
True |
— |
false |
— |
Host |
The host server. |
String |
True |
— |
false |
— |
IncludeDiscussions |
Whether the reader should read also include talk pages. |
Boolean |
True |
— |
false |
— |
Language |
The language of the Wikipedia that should be connected to. |
String |
True |
— |
false |
— |
LimitNUmberOfArticlesToRead |
Optional parameter that allows to define the max number of articles that should be delivered by the reader. <p> This avoids unnecessary filtering if only a small number of articles is needed. </p> |
Integer |
False |
— |
false |
— |
OnlyFirstParagraph |
If set to true, only the first paragraph instead of the whole article is used. |
Boolean |
True |
— |
false |
— |
OutputPlainText |
Whether the reader outputs plain text or wiki markup. |
Boolean |
True |
— |
false |
— |
PageBuffer |
The page buffer size (#pages) of the page iterator. |
Integer |
True |
— |
false |
— |
Password |
The password of the database account. |
String |
True |
— |
false |
— |
TemplateBlacklist |
Defines templates that the articles MUST NOT contain. <p> If you also define a whitelist, the intersection of both sets is used. (= pages that DO contain templates from the whitelist, but DO NOT contain templates from the blacklist) </p> |
String |
False |
— |
true |
— |
TemplateWhitelist |
Defines templates that the articles MUST contain. <p> If you also define a blacklist, the intersection of both sets is used. (= pages that DO contain templates from the whitelist, but DO NOT contain templates from the blacklist) </p> |
String |
False |
— |
true |
— |
User |
The username of the database account. |
String |
True |
— |
false |
— |
XMI Reader
Category: Reader
Framework: NaCTeM (UIMA)
Version: 1.1
Reads common annotation structures (CAS) from files in XMI format. Files must have .xmi extension.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
files |
The files to read |
String |
True |
— |
true |
— |
ignoreUnknownTypes |
If true, allows unknown types to be ignored If false, unknown types will cause an exception Default is true |
Boolean |
False |
— |
false |
— |
XMLReader
Category: Reader
Framework: AlvisNLP
Version: 2010-10-28
Reads a corpus in XML files.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
sourcePath |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
stringParams |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
xslTransform |
— |
org.bibliome.util.streams.SourceStream |
False |
— |
— |
— |
XMLReader2
Category: Reader
Framework: AlvisNLP
Version: 2012-04-30
Reads XML files and creates elements.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
html |
— |
java.lang.Boolean |
False |
— |
— |
— |
rawTagNames |
— |
java.lang.Boolean |
False |
— |
— |
— |
sourcePath |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
stringParams |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
xslTransform |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
XcesReaderDescriptor
Category: Reader
Framework: ILSP (UIMA)
Version: 1.7
Reads XCES XML files.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
InputDirectory |
Directory of xml files to read in |
String |
False |
— |
false |
— |
InputEncoding |
Character encoding for the documents. If not specified, the default system encoding will be used. Note that this parameter only applies if there is no CAS Initializer provided; otherwise, it is the CAS Initializer’s responsibility to deal with character encoding issues. |
String |
False |
— |
false |
— |
InputFile |
Single file to be processed |
String |
False |
— |
false |
— |
ProcessBoilerplate |
— |
Boolean |
False |
— |
false |
— |
StripExt |
The file extension to strip from the original filenames. Only files with this extension will be processed by the reader. |
String |
False |
— |
false |
— |
XcesType |
The type of XCES files: basic (with paragraph segmentation only) and annot (with sentence boudaries and token annotations up to lemma). |
String |
False |
— |
false |
— |
XmiReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reader for UIMA XMI files.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
lenient |
In lenient mode, unknown types are ignored and do not cause an exception to be thrown. |
Boolean |
True |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
XmlReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Reader for XML files.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
DocIdTag |
tag which contains the docId |
String |
False |
— |
false |
— |
ExcludeTag |
optional, tags those should not be worked on. Out them should no text be extracted and also no Annotations be produced. |
String |
True |
— |
true |
— |
IncludeTag |
optional, tags those should be worked on (if empty, then all tags except those ExcludeTags will be worked on) |
String |
True |
— |
true |
— |
collectionId |
The collection ID to set in the DocumentMetaData. |
String |
False |
— |
false |
— |
language |
Set this as the language of the produced documents. |
String |
False |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
True |
— |
false |
— |
XmlTextReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
includeHidden |
Include hidden files and directories. |
Boolean |
True |
— |
false |
— |
language |
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified, this information will be added to the CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
False |
— |
true |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
XmlXPathReader
Category: Reader
Framework: DKPro Core (UIMA)
Version: 1.8.0
A component reader for XML files implemented with XPath.
This is currently optimized for TREC format, which means the style topics are presented in. You should provide the parameter XPath expression that of the parent node And the child nodes of each parent node will be stored separately in its own CAS.
If your expression evaluates to leaf nodes, empty CASes will be created.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
caseSensitive |
States whether the matching is done case sensitive. (default: true) |
Boolean |
False |
— |
false |
— |
docIdTag |
Tag which contains the docId. If it is given, it will be ensured that within the same document there is only one id tag and it is not empty |
String |
False |
— |
false |
— |
excludeTags |
Tags which should be ignored. If empty then all tags will be processed. <p> If this and PARAM_INCLUDE_TAGS are both provided, tags in set PARAM_INCLUDE_TAGS - PARAM_EXCLUDE_TAGS will be processed. |
String |
True |
— |
true |
— |
includeTags |
Tags which should be worked on. If empty then all tags will be processed. <p> If this and PARAM_EXCLUDE_TAGS are both provided, tags in set PARAM_INCLUDE_TAGS - PARAM_EXCLUDE_TAGS will be processed. |
String |
True |
— |
true |
— |
language |
Language of the documents. If given, it will be set in each CAS. |
String |
False |
— |
false |
— |
patterns |
A set of Ant-like include/exclude patterns. A pattern starts with #INCLUDE_PREFIX [+] if it is an include pattern and with #EXCLUDE_PREFIX [-] if it is an exclude pattern. The wildcard <code>/**/</code> can be used to address any number of sub-directories. The wildcard * can be used to a address a part of a name. |
String |
True |
— |
true |
— |
rootXPath |
Specifies the XPath expression to all nodes to be processed. Different segments will be separated via PARAM_ID_TAG, and each segment will be stored in a separate CAS. |
String |
True |
— |
false |
— |
sourceLocation |
Location from which the input is read. |
String |
False |
— |
false |
— |
useDefaultExcludes |
Use the default excludes. |
Boolean |
True |
— |
false |
— |
workingDir |
Specify to substitute tag names in CAS. <p> Please give the substitutions each in before - after order. For example to substitute "foo" with "bar", and "hey" with "ho", you can provide { "foo", "bar", "hey", "ho" }. |
String |
False |
— |
true |
— |
SRL (2)
ClearNlpSemanticRoleLabeler
Category: SRL
Framework: DKPro Core (UIMA)
Version: 1.8.0
ClearNLP semantic role labeller.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
expandArguments |
<p>Normally the arguments point only to the head words of arguments in the dependency tree. With this option enabled, they are expanded to the text covered by the minimal and maximal token offsets of all descendants (or self) of the head word.</p> <p>Warning: this parameter should be used with caution! For one, if the descentants of a head word cover a non-continuous region of the text, this information is lost. The arguments will appear to be spanning a continuous region. For another, the arguments may overlap with each other. E.g. if a sentence contains a relative clause with a verb, the subject of the main clause may be recognized as a dependent of the verb and may cause the whole main clause to be recorded in the argument.</p> |
Boolean |
True |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelVariant |
Variant of a model the model. Used to address a specific model if here are multiple models for one language. |
String |
False |
— |
false |
— |
predModelLocation |
Location from which the predicate identifier model is read. |
String |
False |
— |
false |
— |
printTagSet |
Write the tag set(s) to the log when a model is loaded. |
Boolean |
True |
— |
false |
— |
roleModelLocation |
Location from which the roleset classification model is read. |
String |
False |
— |
false |
— |
srlModelLocation |
Location from which the semantic role labeling model is read. |
String |
False |
— |
false |
— |
MateSemanticRoleLabeler
Category: SRL
Framework: DKPro Core (UIMA)
Version: 1.8.0
DKPro Annotator for the MateTools Semantic Role Labeler.
Please cite the following paper, if you use the semantic role labeler Anders Björkelund, Love Hafdell, and Pierre Nugues. Multilingual semantic role labeling. In Proceedings of The Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009), pages 43--48, Boulder, June 4--5 2009.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
Scripted analytics (6)
Groovy scripting PR
Category: Scripted analytics
Framework: GATE
Version: unknown
Runs a Groovy script as a processing resource
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
scriptParams |
— |
gate.FeatureMap |
— |
— |
— |
true |
scriptURL |
— |
java.net.URL |
— |
— |
— |
— |
JAPE Transducer
Category: Scripted analytics
Framework: GATE
Version: unknown
A module for executing Jape grammars.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationAccessors |
— |
java.util.List |
— |
— |
— |
— |
binaryGrammarURL |
— |
java.net.URL |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
enableDebugging |
— |
java.lang.Boolean |
— |
false |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
grammarURL |
— |
java.net.URL |
— |
— |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
ontology |
— |
gate.creole.ontology.Ontology |
— |
— |
— |
true |
operators |
— |
java.util.List |
— |
— |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
JAPE-Plus Transducer
Category: Scripted analytics
Framework: GATE
Version: unknown
An optimised, JAPE-compatible transducer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationAccessors |
— |
java.util.List |
— |
— |
— |
— |
binaryGrammarURL |
— |
java.net.URL |
— |
— |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
enableDebugging |
— |
java.lang.Boolean |
— |
false |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
grammarURL |
— |
java.net.URL |
— |
— |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
ontology |
— |
gate.creole.ontology.Ontology |
— |
— |
— |
true |
operators |
— |
java.util.List |
— |
— |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
RunProlog
Category: Scripted analytics
Framework: AlvisNLP
Version:
Runs a Prolog program with the corpus data structure encoded as facts.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
addToLayer |
— |
java.lang.Boolean |
False |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
createAnnotations |
— |
java.lang.Boolean |
False |
— |
— |
— |
createDocuments |
— |
java.lang.Boolean |
False |
— |
— |
— |
createRelations |
— |
java.lang.Boolean |
False |
— |
— |
— |
createSections |
— |
java.lang.Boolean |
False |
— |
— |
— |
createTuples |
— |
java.lang.Boolean |
False |
— |
— |
— |
deleteElements |
— |
java.lang.Boolean |
False |
— |
— |
— |
facts |
— |
org.bibliome.alvisnlp.modules.prolog.FactDefinition[] |
True |
— |
— |
— |
goals |
— |
org.bibliome.alvisnlp.modules.prolog.GoalDefinition[] |
True |
— |
— |
— |
removeFromLayer |
— |
java.lang.Boolean |
False |
— |
— |
— |
setArguments |
— |
java.lang.Boolean |
False |
— |
— |
— |
setFeatures |
— |
java.lang.Boolean |
False |
— |
— |
— |
target |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
theory |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
Script
Category: Scripted analytics
Framework: AlvisNLP
Version: 2010-10-28
Runs a script.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
language |
— |
java.lang.String |
True |
— |
— |
— |
script |
— |
java.lang.String |
True |
— |
— |
— |
UIMA Analysis Engine
Category: Scripted analytics
Framework: GATE
Version: unknown
Wrapper for a Text Analysis Engine from UIMA.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
analysisEngineDescriptor |
— |
java.net.URL |
— |
— |
— |
— |
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
mappingDescriptor |
— |
java.net.URL |
— |
— |
— |
— |
Segmenter (55)
ANNIE English Tokeniser
Category: Segmenter
Framework: GATE
Version: unknown
A customisable English tokeniser.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
tokeniserRulesURL |
— |
java.net.URL |
— |
resources/tokeniser/DefaultTokeniser.rules |
— |
— |
transducerGrammarURL |
— |
java.net.URL |
— |
resources/tokeniser/postprocess.jape |
— |
— |
ANNIE Sentence Splitter
Category: Segmenter
Framework: GATE
Version: unknown
ANNIE sentence splitter.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
gazetteerListsURL |
— |
java.net.URL |
— |
resources/sentenceSplitter/gazetteer/lists.def |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
transducerURL |
— |
java.net.URL |
— |
resources/sentenceSplitter/grammar/main-single-nl.jape |
— |
— |
Arabic Tokeniser
Category: Segmenter
Framework: GATE
Version: unknown
A customisable English tokeniser.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
tokeniserRulesURL |
— |
java.net.URL |
— |
resources/tokeniser/arabicTokeniser.rules |
— |
— |
transducerGrammarURL |
— |
java.net.URL |
— |
resources/tokeniser/postprocess.jape |
— |
— |
ArktweetTokenizer
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
ArkTweet tokenizer.
Banner Base Tokenizer
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
Tokens returned by this class consist primarily of contiguous alphanumeric characters or single punctuation marks, however certain constructs such * as real numbers, percentages are recognized and returned as a single token.
Banner Simple Tokenizer
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
Tokens ouput by this tokenizer consist of a contiguous block of alphanumeric characters or a single punctuation mark. Note, therefore, that any * construction which contains a punctuation mark (such as a contraction or a real number) will necessarily span over at least three tokens.
Banner Whitespace Tokenizer
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
* Instances of this class tokenize {@link Sentence}s only at whitespace characters. All other boundaries (such as between alphabetic characters and * punctuation) are ignored.
BreakIteratorSegmenter
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
BreakIterator segmenter.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
The language. |
String |
False |
— |
false |
— |
splitAtApostrophe |
Per default the Java BreakIterator does not split off contractions like John’s into two tokens. When this parameter is enabled, a non-default token split is generated when an apostrophe (') is encountered. |
Boolean |
True |
— |
false |
— |
strictZoning |
Strict zoning causes the segmentation to be applied only within the boundaries of a zone annotation. This works only if a single zone type is specified (the zone annotations should NOT overlap) or if no zone type is specified - in which case the whole document is taken as a zone. If strict zoning is turned off, multiple zone types can be specified. A list of all zone boundaries (start and end) is created and segmentation happens between them. |
Boolean |
True |
— |
false |
— |
writeSentence |
Create Sentence annotations. |
Boolean |
True |
— |
false |
— |
writeToken |
Create Token annotations. |
Boolean |
True |
— |
false |
— |
zoneTypes |
A list of type names used for zoning. |
String |
False |
— |
true |
— |
Cafetiere Sentence Splitter
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
Uses a set of heuristics and patterns to find sentence boundaries. Works with English.
CamelCaseTokenSegmenter
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
Split up existing tokens again if they are camel-case text.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
deleteCover |
Wether to remove the original token. Default: true |
Boolean |
True |
— |
false |
— |
Cebuano Gazetteer Tokeniser
Category: Segmenter
Framework: GATE
Version: unknown
A list lookup component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
gazetteerFeatureSeparator |
— |
java.lang.String |
— |
: |
— |
— |
listsURL |
— |
java.net.URL |
— |
resources/tokeniser/lists.def |
— |
— |
longestMatchOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
wholeWordsOnly |
— |
java.lang.Boolean |
— |
true |
— |
true |
Cebuano Tokeniser
Category: Segmenter
Framework: GATE
Version: unknown
A customisable English tokeniser.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
tokeniserRulesURL |
— |
java.net.URL |
— |
resources/tokeniser/DefaultTokeniser.rules |
— |
— |
transducerGrammarURL |
— |
java.net.URL |
— |
resources/tokeniser/postprocess.jape |
— |
— |
Chinese Segmenter PR
Category: Segmenter
Framework: GATE
Version: unknown
Segment the Chinese text into words, based on the PAUM learning algorithm.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
learningAlg |
— |
java.lang.String |
— |
PAUM |
— |
true |
learningMode |
— |
gate.chineseSeg.RunMode |
— |
SEGMENTING |
— |
true |
modelURL |
— |
java.net.URL |
— |
— |
— |
true |
textCode |
— |
java.lang.String |
— |
UTF-8 |
— |
true |
textFilesURL |
— |
java.net.URL |
— |
— |
— |
true |
ClearNlpSegmenter
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
Tokenizer using Clear NLP.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
strictZoning |
Strict zoning causes the segmentation to be applied only within the boundaries of a zone annotation. This works only if a single zone type is specified (the zone annotations should NOT overlap) or if no zone type is specified - in which case the whole document is taken as a zone. If strict zoning is turned off, multiple zone types can be specified. A list of all zone boundaries (start and end) is created and segmentation happens between them. |
Boolean |
True |
— |
false |
— |
writeSentence |
Create Sentence annotations. |
Boolean |
True |
— |
false |
— |
writeToken |
Create Token annotations. |
Boolean |
True |
— |
false |
— |
zoneTypes |
A list of type names used for zoning. |
String |
False |
— |
true |
— |
CompoundAnnotator
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
Annotates compound parts and linking morphemes.
Freeling Sentence Splitter
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
Performs tokenisation. Operates on English (en), Spanish (es) and Catalan (ca), Asturian (ast), Welsh (cy), Galician (gl), Italian (it) and Portuguese (pt) by setting the "language" parameter (default is English).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
— |
String |
True |
— |
false |
— |
FreelingTokenizer
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
Performs tokenisation. Operates on English (en), Spanish (es) and Catalan (ca), Asturian (ast), Welsh (cy), Galician (gl), Italian (it) and Portuguese (pt) by setting the "language" parameter (default is English).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
— |
String |
True |
— |
false |
— |
GATE Unicode Tokeniser
Category: Segmenter
Framework: GATE
Version: unknown
A customisable Unicode tokeniser.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
rulesURL |
— |
java.net.URL |
— |
resources/tokeniser/DefaultTokeniser.rules |
— |
— |
GENIA Sentence Splitter
Category: Segmenter
Framework: GATE
Version: unknown
A processing resource that takes document and corpus parameters
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
debug |
— |
java.lang.Boolean |
— |
false |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
splitterBinary |
— |
java.net.URL |
— |
— |
— |
true |
GENIA Sentence Splitter
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
Machine learning-based sentence splitter optimized for biomedical texts. Features: - The classification model is based on supervised leaning method using maximum entropy modeling (using simple MaxEnt library). - Trained on the GENIA corpus. The classifier achieved an F-score of 99.7 on 200 unseen GENIA abstracts. Website: http://www.nactem.ac.uk/y-matsu/geniass/
Hashtag Tokenizer
Category: Segmenter
Framework: GATE
Version: unknown
Tokenizes Multi-Word Hashtags
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
debug |
— |
java.lang.Boolean |
— |
false |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
gazetteerURL |
— |
java.net.URL |
— |
resources/hashtag/gazetteer/lists.def |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
Hindi Splitter
Category: Segmenter
Framework: GATE
Version: unknown
A Sentence Splitter.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
gazetteerListsURL |
— |
java.net.URL |
— |
resources/splitter/gazetteer/lists.def |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
transducerURL |
— |
java.net.URL |
— |
resources/splitter/grammar/main.jape |
— |
— |
Hindi Tokeniser
Category: Segmenter
Framework: GATE
Version: unknown
A customisable Hindi tokeniser.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
tokeniserRulesURL |
— |
java.net.URL |
— |
resources/tokeniser/multiTokeniser.rules |
— |
— |
transducerGrammarURL |
— |
java.net.URL |
— |
resources/tokeniser/postprocess.jape |
— |
— |
ILSP Paragraph, Sentence and Token Segmentor
Category: Segmenter
Framework: ILSP (UIMA)
Version: 1.15
This module is a regex and abbreviation based segmentor targetting texts written in Greek.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
Mode |
mode: default: let ilsp-sst decide on sentence splits; nla: force ilsp-sst to always use newlines as sentence splits; nlo: force ilsp-sst to use only newlines as sentence splits |
String |
False |
— |
false |
— |
IULATokenizer
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
Performs paragraph splitting, sentence splitting, and tokenisation. Also detects proper names. Operates on Spanish (es) and Catalan (ca), by setting the "language" parameter (default is Spanish).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
— |
String |
True |
— |
false |
— |
JTokSegmenter
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
JTok segmenter.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
The language. |
String |
False |
— |
false |
— |
strictZoning |
Strict zoning causes the segmentation to be applied only within the boundaries of a zone annotation. This works only if a single zone type is specified (the zone annotations should NOT overlap) or if no zone type is specified - in which case the whole document is taken as a zone. If strict zoning is turned off, multiple zone types can be specified. A list of all zone boundaries (start and end) is created and segmentation happens between them. |
Boolean |
True |
— |
false |
— |
writeParagraph |
Create Paragraph annotations. |
Boolean |
True |
— |
false |
— |
writeSentence |
Create Sentence annotations. |
Boolean |
True |
— |
false |
— |
writeToken |
Create Token annotations. |
Boolean |
True |
— |
false |
— |
zoneTypes |
A list of type names used for zoning. |
String |
False |
— |
true |
— |
LanguageToolSegmenter
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
Segmenter using LanguageTool to do the heavy lifting. LanguageTool internally uses different strategies for tokenization.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
The language. |
String |
False |
— |
false |
— |
strictZoning |
Strict zoning causes the segmentation to be applied only within the boundaries of a zone annotation. This works only if a single zone type is specified (the zone annotations should NOT overlap) or if no zone type is specified - in which case the whole document is taken as a zone. If strict zoning is turned off, multiple zone types can be specified. A list of all zone boundaries (start and end) is created and segmentation happens between them. |
Boolean |
True |
— |
false |
— |
writeSentence |
Create Sentence annotations. |
Boolean |
True |
— |
false |
— |
writeToken |
Create Token annotations. |
Boolean |
True |
— |
false |
— |
zoneTypes |
A list of type names used for zoning. |
String |
False |
— |
true |
— |
LineBasedSentenceSegmenter
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
Annotates each line in the source text as a sentence. This segmenter is not capable of creating tokens! All respective parameters have no functionality.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
The language. |
String |
False |
— |
false |
— |
strictZoning |
Strict zoning causes the segmentation to be applied only within the boundaries of a zone annotation. This works only if a single zone type is specified (the zone annotations should NOT overlap) or if no zone type is specified - in which case the whole document is taken as a zone. If strict zoning is turned off, multiple zone types can be specified. A list of all zone boundaries (start and end) is created and segmentation happens between them. |
Boolean |
True |
— |
false |
— |
writeSentence |
Create Sentence annotations. |
Boolean |
True |
— |
false |
— |
writeToken |
Create Token annotations. |
Boolean |
True |
— |
false |
— |
zoneTypes |
A list of type names used for zoning. |
String |
False |
— |
true |
— |
LingPipe Sentence Splitter
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
Sentence splitter based on LingPipe models. Website: http://alias-i.com/lingpipe/
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
useBiomedicalModel |
true if the LingPipe MEDLINE sentence model should be used |
Boolean |
False |
— |
false |
— |
LingPipe Sentence Splitter PR
Category: Segmenter
Framework: GATE
Version: unknown
Provides an interface to LingPipe sentence splitter API.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
document |
— |
gate.Document |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
LingPipe Tokenizer PR
Category: Segmenter
Framework: GATE
Version: unknown
Provides a LingPipe tokenizer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
document |
— |
gate.Document |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
MLRS Maltese Tokeniser
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
Tokenises Maltese text
MLRS Paragraph Splitter
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
Identifies the paragraphs in the text, creating a Paragraph annotation for each one
MLRS Sentence Splitter
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
Identifies the sentences in the text, creating a Sentence annotation for each
OSCAR 4 Tokeniser
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
Segments text into tokens. Derived from the OSCAR 4 chemical NER tool, this tokeniser is specifically tuned for processing chemical text.
OgmiosTokenizer
Category: Segmenter
Framework: AlvisNLP
Version: 2010-10-28
Tokenizes the sections contents according to the Ogmios tokenizer specifications.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
separatorTokens |
— |
java.lang.Boolean |
True |
— |
— |
— |
targetLayerName |
— |
java.lang.String |
True |
— |
— |
— |
tokenTypeFeature |
— |
java.lang.String |
True |
— |
— |
— |
OpenNLP Sentence Splitter
Category: Segmenter
Framework: GATE
Version: unknown
Sentence splitter using an OpenNLP maxent model
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
model |
— |
java.net.URL |
— |
models/english/en-sent.bin |
— |
— |
OpenNLP Tokenizer
Category: Segmenter
Framework: GATE
Version: unknown
Tokenizer using an OpenNLP maxent model
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
model |
— |
java.net.URL |
— |
models/english/en-token.bin |
— |
— |
OpenNLPTokenizer
Category: Segmenter
Framework: NaCTeM (UIMA)
Version: 1.0
Tokenize the text and create token annotations that span the tokens. The tokenization is performed using the OpenNLP MaxEnt tokenizer, which tokenizes according to the Penn Tree Bank tokenization standard. In general, tokens are separated by white space, but punctuation marks (e.g., ".", ",", "!", "?", etc.) and apostrophed endings (e.g., "'s", "'nt", etc.) are separate tokens.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ModelFile |
OpenNLP MaxEnt model file for the tokenizer. |
String |
True |
— |
false |
— |
OpenNlpSegmenter
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
Tokenizer and sentence splitter using OpenNLP.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
segmentationModelLocation |
Load the segmentation model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
strictZoning |
Strict zoning causes the segmentation to be applied only within the boundaries of a zone annotation. This works only if a single zone type is specified (the zone annotations should NOT overlap) or if no zone type is specified - in which case the whole document is taken as a zone. If strict zoning is turned off, multiple zone types can be specified. A list of all zone boundaries (start and end) is created and segmentation happens between them. |
Boolean |
True |
— |
false |
— |
tokenizationModelLocation |
Load the tokenization model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
writeSentence |
Create Sentence annotations. |
Boolean |
True |
— |
false |
— |
writeToken |
Create Token annotations. |
Boolean |
True |
— |
false |
— |
zoneTypes |
A list of type names used for zoning. |
String |
False |
— |
true |
— |
ParagraphSplitter
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
This class creates paragraph annotations for the given input document. It searches for the occurrence of two or more line-breaks (Unix and Windows) and regards this as the boundary between paragraphs.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
splitPattern |
A regular expression used to detect paragraph splits. Default: #DOUBLE_LINE_BREAKS_PATTERN (split on two consecutive line breaks) |
String |
True |
— |
false |
— |
PatternBasedTokenSegmenter
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
Split up existing tokens again at particular split-chars. The prefix states whether the split chars should be added as separate Token Tokens. If the #INCLUDE_PREFIX precedes the split pattern, the pattern is included. Consequently, patterns following the #EXCLUDE_PREFIX, will not be added as a Token.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
deleteCover |
Wether to remove the original token. Default: true |
Boolean |
True |
— |
false |
— |
patterns |
A list of regular expressions, prefixed with #INCLUDE_PREFIX or #EXCLUDE_PREFIX. If neither of the prefixes is used, #EXCLUDE_PREFIX is assumed. |
String |
True |
— |
true |
— |
Penn BioTokenizer
Category: Segmenter
Framework: GATE
Version: unknown
Tokenizer for biomedical text
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
tokenizerURL |
— |
java.net.URL |
— |
resources/BioTok.bin.gz |
— |
— |
RASP2 Tokenizer
Category: Segmenter
Framework: GATE
Version: unknown
RASP2 Tokenizer. Faster than the original GATE component but generates Tokens which have only a 'string' feature. Requires annotations of type Sentence. See RASP package for platform restrictions.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
charset |
— |
java.lang.String |
— |
ISO-8859-1 |
— |
true |
debug |
— |
java.lang.Boolean |
— |
false |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
RegEx Sentence Splitter
Category: Segmenter
Framework: GATE
Version: unknown
A sentence splitter based on regular expressions.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
externalSplitListURL |
— |
java.net.URL |
— |
resources/regex-splitter/external-split-patterns.txt |
— |
— |
internalSplitListURL |
— |
java.net.URL |
— |
resources/regex-splitter/internal-split-patterns.txt |
— |
— |
nonSplitListURL |
— |
java.net.URL |
— |
resources/regex-splitter/non-split-patterns.txt |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
RegexTokenizer
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
This segmenter splits sentences and tokens based on regular expressions that define the sentence and token boundaries.
The default behaviour is to split sentences by a line break and tokens by whitespace.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
The language. |
String |
False |
— |
false |
— |
sentenceBoundaryRegex |
Define the sentence boundary. Default: \n (assume one sentence per line). |
String |
True |
— |
false |
— |
strictZoning |
Strict zoning causes the segmentation to be applied only within the boundaries of a zone annotation. This works only if a single zone type is specified (the zone annotations should NOT overlap) or if no zone type is specified - in which case the whole document is taken as a zone. If strict zoning is turned off, multiple zone types can be specified. A list of all zone boundaries (start and end) is created and segmentation happens between them. |
Boolean |
True |
— |
false |
— |
tokenBoundaryRegex |
Defines the pattern that is used as token end boundary. Default: [\s\n]+ (matching whitespace and linebreaks. <p> When setting custom patterns, take into account that the final token is often terminated by a linebreak rather than the boundary character. Therefore, the newline typically has to be added to the group of matching characters, e.g. "tokenized-text" is correctly tokenized with the pattern [-\n]. |
String |
True |
— |
false |
— |
writeSentence |
Create Sentence annotations. |
Boolean |
True |
— |
false |
— |
writeToken |
Create Token annotations. |
Boolean |
True |
— |
false |
— |
zoneTypes |
A list of type names used for zoning. |
String |
False |
— |
true |
— |
Romanian Tokeniser
Category: Segmenter
Framework: GATE
Version: unknown
A customisable Romanian tokeniser.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
tokeniserRulesURL |
— |
java.net.URL |
— |
resources/Tokeniser/OBtokeniser.rules |
— |
— |
transducerGrammarURL |
— |
java.net.URL |
— |
resources/Tokeniser/postprocess.jape |
— |
— |
Stanford PTB Tokenizer
Category: Segmenter
Framework: GATE
Version: unknown
Stanford Penn Treebank v3 Tokenizer, for English
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
failOnMissingInputAnnotations |
— |
java.lang.Boolean |
— |
false |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
spaceLabel |
— |
java.lang.String |
— |
SpaceToken |
— |
true |
tokenLabel |
— |
java.lang.String |
— |
Token |
— |
true |
StanfordSegmenter
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
allowEmptySentences |
Whether to generate empty sentences. |
Boolean |
True |
— |
false |
— |
boundaryFollowers |
This is a Set of String that are matched with .equals() which are allowed to be tacked onto the end of a sentence after a sentence boundary token, for example ")". |
String |
False |
— |
true |
— |
boundaryToDiscard |
The set of regex for sentence boundary tokens that should be discarded. |
String |
False |
— |
true |
— |
boundaryTokenRegex |
The set of boundary tokens. If null, use default. |
String |
False |
— |
false |
— |
isOneSentence |
Whether to treat all input as one sentence. |
Boolean |
True |
— |
false |
— |
language |
The language. |
String |
False |
— |
false |
— |
languageFallback |
— |
String |
False |
— |
false |
— |
newlineIsSentenceBreak |
Strategy for treating newlines as paragraph breaks. |
String |
False |
— |
false |
— |
regionElementRegex |
A regular expression for element names containing a sentence region. Only tokens in such elements will be included in sentences. The start and end tags themselves are not included in the sentence. |
String |
False |
— |
false |
— |
strictZoning |
Strict zoning causes the segmentation to be applied only within the boundaries of a zone annotation. This works only if a single zone type is specified (the zone annotations should NOT overlap) or if no zone type is specified - in which case the whole document is taken as a zone. If strict zoning is turned off, multiple zone types can be specified. A list of all zone boundaries (start and end) is created and segmentation happens between them. |
Boolean |
True |
— |
false |
— |
tokenRegexesToDiscard |
The set of regex for sentence boundary tokens that should be discarded. |
String |
False |
— |
true |
— |
writeSentence |
Create Sentence annotations. |
Boolean |
True |
— |
false |
— |
writeToken |
Create Token annotations. |
Boolean |
True |
— |
false |
— |
xmlBreakElementsToDiscard |
These are elements like "p" or "sent", which will be wrapped into regex for approximate XML matching. They will be deleted in the output, and will always trigger a sentence boundary. |
String |
False |
— |
true |
— |
zoneTypes |
A list of type names used for zoning. |
String |
False |
— |
true |
— |
TokenMerger
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
Merges any Tokens that are covered by a given annotation type. E.g. this component can be used to create a single tokens from all tokens that constitute a multi-token named entity.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Override the tagset mapping. |
String |
False |
— |
false |
— |
annotationType |
Annotation type for which tokens should be merged. |
String |
True |
— |
false |
— |
constraint |
A constraint on the annotations that should be considered in form of a JXPath statement. Example: set #PARAM_ANNOTATION_TYPE to a NamedEntity type and set the #PARAM_CONSTRAINT to ".[value = 'LOCATION']" to merge only tokens that are part of a location named entity. |
String |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model and tag set mapping. |
String |
False |
— |
false |
— |
lemmaMode |
Configure what should happen to the lemma of the merged tokens. It is possible to JOIN the lemmata to a single lemma (space separated), to REMOVE the lemma or LEAVE the lemma of the first token as-is. |
String |
True |
— |
false |
— |
posType |
Set a new POS tag for the new merged token. This is the mapped type. If this is specified, tag set mapping will not be performed. This parameter has no effect unless PARAM_POS_VALUE is also set. |
String |
False |
— |
false |
— |
posValue |
Set a new POS value for the new merged token. This is the actual tag set value and is subject to tagset mapping. For example when merging tokens for named entities, the new POS value may be set to "NNP" (English/Penn Treebank Tagset). |
String |
False |
— |
false |
— |
TokenTrimmer
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
Remove prefixes and suffixes from tokens.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
prefixes |
List of prefixes to remove. |
String |
True |
— |
true |
— |
suffixes |
List of suffixes to remove. |
String |
True |
— |
true |
— |
TrailingCharacterRemover
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
Removing trailing character (sequences) from tokens, e.g. punctuation.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
minTokenLength |
All tokens that are shorter than the minimum token length after removing trailing chars are completely removed. By default (1), empty tokens are removed. Set to 0 or a negative value if no tokens should be removed. <p> Shorter tokens that do not have trailing chars removed are always retained, regardless of their length. |
Integer |
True |
— |
false |
— |
pattern |
A regex to be trimmed from the end of tokens. <p> Default: "[\\Q,-“^»*’()&/\"'©§'—«·=\\E0-9A-Z]+" (remove punctuations, special characters and capital letters). |
String |
True |
— |
false |
— |
[[Twitter_Tokenizer_(EN)]] ==== Twitter Tokenizer (EN)
Category: Segmenter
Framework: GATE
Version: unknown
Tokenizer tuned for Tweets
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
tokeniserRulesURL |
— |
java.net.URL |
— |
resources/tokeniser/DefaultTokeniser.rules |
— |
— |
transducerGrammarURL |
— |
java.net.URL |
— |
resources/tokeniser/twitter+English.jape |
— |
— |
WhitespaceTokenizer
Category: Segmenter
Framework: DKPro Core (UIMA)
Version: 1.8.0
A strict whitespace tokenizer, i.e. tokenizes according to whitespaces and linebreaks only.
If PARAM_WRITE_SENTENCES is set to true, one sentence per line is assumed. Otherwise, no sentences are created.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
The language. |
String |
False |
— |
false |
— |
strictZoning |
Strict zoning causes the segmentation to be applied only within the boundaries of a zone annotation. This works only if a single zone type is specified (the zone annotations should NOT overlap) or if no zone type is specified - in which case the whole document is taken as a zone. If strict zoning is turned off, multiple zone types can be specified. A list of all zone boundaries (start and end) is created and segmentation happens between them. |
Boolean |
True |
— |
false |
— |
writeSentence |
Create Sentence annotations. |
Boolean |
True |
— |
false |
— |
writeToken |
Create Token annotations. |
Boolean |
True |
— |
false |
— |
zoneTypes |
A list of type names used for zoning. |
String |
False |
— |
true |
— |
Semantics (2)
Semantic Enrichment PR
Category: Semantics
Framework: GATE
Version: unknown
The Semantic Enrichment PR allows adding new data to semantic annotations by querying external RDF (Linked Data) repositories.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationTypes |
— |
java.util.List |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
deleteOnNoRelations |
— |
java.lang.Boolean |
— |
false |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
query |
— |
java.lang.String |
— |
— |
— |
true |
repositoryUrl |
— |
java.lang.String |
— |
— |
— |
|
version |
— |
java.lang.String |
— |
to be loaded from jar manifest |
— |
— |
SemanticFieldAnnotator
Category: Semantics
Framework: DKPro Core (UIMA)
Version: 1.8.0
This Analysis Engine annotates English single words with semantic field information retrieved from an ExternalResource. This could be a lexical resource such as WordNet or a simple key-value map. The annotation is stored in the SemanticField annotation type.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationType |
Annotation types which should be annotated with semantic fields |
String |
True |
— |
false |
— |
constraint |
A constraint on the annotations that should be considered in form of a JXPath statement. Example: set #PARAM_ANNOTATION_TYPE to a NamedEntity type and set the #PARAM_CONSTRAINT to ".[value = 'LOCATION']" to annotate only tokens with semantic fields that are part of a location named entity. |
String |
False |
— |
false |
— |
Sentiment (1)
Textalytics Sentiment Analysis
Category: Sentiment
Framework: GATE
Version: unknown
Textalytics Sentiment Analysis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
apiURL |
— |
java.lang.String |
— |
— |
true |
|
concepts |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
debug |
— |
java.lang.Boolean |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
entities |
— |
java.lang.String |
— |
— |
— |
true |
inputASTypes |
— |
java.util.List |
— |
— |
— |
true |
inputASname |
— |
java.lang.String |
— |
— |
— |
true |
key |
— |
java.lang.String |
— |
— |
— |
true |
model |
— |
java.lang.String |
— |
— |
— |
true |
outputASname |
— |
java.lang.String |
— |
Textalytics |
— |
true |
Spelling/Grammar (5)
CorrectionsContextualizer
Category: Spelling/Grammar
Framework: DKPro Core (UIMA)
Version: 1.8.0
This component assumes that some spell checker has already been applied upstream (e.g. Jazzy). It then uses ngram frequencies from a frequency provider in order to rank the provided corrections.
JazzyChecker
Category: Spelling/Grammar
Framework: DKPro Core (UIMA)
Version: 1.8.0
This annotator uses Jazzy for the decision whether a word is spelled correctly or not.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ScoreThreshold |
Determines the maximum edit distance (as an int value) that a suggestion for a spelling error may have. E.g. if set to one suggestions are limited to words within edit distance 1 to the original word. |
Integer |
True |
— |
false |
— |
modelEncoding |
The character encoding used by the model. |
String |
True |
— |
false |
— |
modelLocation |
Location from which the model is read. The model file is a simple word-list with one word per line. |
String |
True |
— |
false |
— |
LanguageToolChecker
Category: Spelling/Grammar
Framework: DKPro Core (UIMA)
Version: 1.8.0
Detect grammatical errors in text using LanguageTool a rule based grammar checker.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
NorvigSpellingCorrector
Category: Spelling/Grammar
Framework: DKPro Core (UIMA)
Version: 1.8.0
Creates SofaChangeAnnotations containing corrections for previously identified spelling errors.
Textalytics Spell, Grammar and Style Proofreading
Category: Spelling/Grammar
Framework: GATE
Version: unknown
Textalytics Spell, Grammar and Style Proofreading
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
apiURL |
— |
java.lang.String |
— |
— |
true |
|
confusion |
— |
java.lang.Boolean |
— |
— |
— |
true |
consonantRed |
— |
java.lang.Boolean |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
debug |
— |
java.lang.Boolean |
— |
— |
— |
true |
dictionary |
— |
java.lang.String |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
foreign |
— |
java.lang.Boolean |
— |
— |
— |
true |
inputASTypes |
— |
java.util.List |
— |
— |
— |
true |
inputASname |
— |
java.lang.String |
— |
— |
— |
true |
key |
— |
java.lang.String |
— |
— |
— |
true |
lang |
— |
java.lang.String |
— |
— |
— |
true |
manyErrors |
— |
java.lang.String |
— |
— |
— |
true |
openingClosing |
— |
java.lang.Boolean |
— |
— |
— |
true |
outputASname |
— |
java.lang.String |
— |
Textalytics |
— |
true |
percentage |
— |
java.lang.Boolean |
— |
— |
— |
true |
prefixed |
— |
java.lang.Boolean |
— |
— |
— |
true |
properNouns |
— |
java.lang.Boolean |
— |
— |
— |
true |
punctuation |
— |
java.lang.Boolean |
— |
— |
— |
true |
quotesOrItalics |
— |
java.lang.Boolean |
— |
— |
— |
true |
spacing |
— |
java.lang.Boolean |
— |
— |
— |
true |
tautologyAndLanMisuse |
— |
java.lang.Boolean |
— |
— |
— |
true |
too_longSent |
— |
java.lang.Boolean |
— |
— |
— |
true |
Stemmer (4)
BulStem
Category: Stemmer
Framework: GATE
Version: unknown
This plugin is an implementation of the BulStem stemmer algorithm for Bulgarian developed by Preslav Nakov.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
annotationType |
— |
java.lang.String |
— |
Token |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
failOnMissingInputAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
pathToRules |
— |
java.net.URL |
— |
resources/stem_rules_context_2_UTF-8.txt |
— |
— |
PorterStemmer
Category: Stemmer
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
formFeature |
— |
java.lang.String |
True |
— |
— |
— |
language |
— |
java.lang.String |
True |
— |
— |
— |
layerName |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
stemFeature |
— |
java.lang.String |
True |
— |
— |
— |
SnowballStemmer
Category: Stemmer
Framework: DKPro Core (UIMA)
Version: 1.8.0
UIMA wrapper for the Snowball stemmer. Annotation types to be stemmed can beconfigured by a FeaturePath.
If you use this component in a pipeline which uses stop word removal, make sure that it runs after the stop word removal step, so only words that are no stop words are stemmed.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
filterConditionOperator |
Specifies the operator for a filtering condition. <p> It is only used if <code>PARAM_FILTER_FEATUREPATH</code> is set. |
String |
False |
— |
false |
— |
filterConditionValue |
Specifies the value for a filtering condition. <p> It is only used if <code>PARAM_FILTER_FEATUREPATH</code> is set. |
String |
False |
— |
false |
— |
filterFeaturePath |
Specifies a feature path that is used in the filter. If this is set, you also have to specify <code>PARAM_FILTER_CONDITION_OPERATOR</code> and <code>PARAM_FILTER_CONDITION_VALUE</code>. |
String |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
lowerCase |
Per default the stemmer runs in case-sensitive mode. If this parameter is enabled, tokens are lower-cased before being passed to the stemmer. <table border="1" cellspacing="0"> <caption>Examples</caption> <tr><th></th><th>false (default)</th><th>true</th></tr> <tr><td>EDUCATIONAL</td><td>EDUCATIONAL</td><td>educ</td></tr> <tr><td>Educational</td><td>Educat</td><td>educ</td></tr> <tr><td>educational</td><td>educ</td><td>educ</td></tr> </table> |
Boolean |
False |
— |
false |
— |
paths |
Specify a path that is used for annotation. Format is de.type.name/feature/path. All type objects will be annotated with a IndexTermAnnotation. The value of the IndexTerm is specified by the feature path. |
String |
False |
— |
true |
— |
Stemmer PR
Category: Stemmer
Framework: GATE
Version: unknown
Wrapper for the Snowball stemmer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationFeature |
— |
java.lang.String |
— |
string |
— |
true |
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
annotationType |
— |
java.lang.String |
— |
Token |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
language |
— |
java.lang.String |
— |
english |
— |
— |
Tagger (52)
ABNER Tagger
Category: Tagger
Framework: GATE
Version: unknown
GATE wrapper over ABNER
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
abnerMode |
— |
gate.abner.AbnerRunMode |
— |
BIOCREATIVE |
— |
true |
annotationName |
— |
java.lang.String |
— |
Tagger |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
ANNIE POS Tagger
Category: Tagger
Framework: GATE
Version: unknown
Mark Hepple's Brill-style POS tagger
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
baseSentenceAnnotationType |
— |
java.lang.String |
— |
Sentence |
— |
true |
baseTokenAnnotationType |
— |
java.lang.String |
— |
Token |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
— |
— |
— |
failOnMissingInputAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
lexiconURL |
— |
java.net.URL |
— |
resources/heptag/lexicon |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputAnnotationType |
— |
java.lang.String |
— |
Token |
— |
true |
posTagAllTokens |
— |
java.lang.Boolean |
— |
true |
— |
true |
rulesURL |
— |
java.net.URL |
— |
resources/heptag/ruleset |
— |
— |
Anatomical Entity Tagger
Category: Tagger
Framework: NaCTeM (UIMA)
Version: 1.0
Tags anatomical entities using Brown, UMLS and OBO Anatomy dictionary features
ArktweetPosTagger
Category: Tagger
Framework: DKPro Core (UIMA)
Version: 1.8.0
Wrapper for Twitter Tokenizer and POS Tagger. As described in: Olutobi Owoputi, Brendan O’Connor, Chris Dyer, Kevin Gimpel, Nathan Schneider and Noah A. Smith. Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters In Proceedings of NAACL 2013.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Location of the mapping file for part-of-speech tags to UIMA types. |
String |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model and tag set mapping. |
String |
False |
— |
false |
— |
modelLocation |
Location from which the model is read. |
String |
False |
— |
false |
— |
modelVariant |
Variant of a model the model. Used to address a specific model if here are multiple models for one language. |
String |
False |
— |
false |
— |
BANNER CRF Tagger
Category: Tagger
Framework: NaCTeM (UIMA)
Version: 1.0
A UIMA wrapper for BANNER entity tagger. BANNER uses CRF.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ModelFile |
File location of CRF trained model generated by BANNER, abstract path is recommended. If not specified, BANNER’s default model is used. |
String |
False |
— |
false |
— |
TypeToBioSuffixMap |
Mappings from BIO suffix to the UIMA type names. |
String |
True |
— |
true |
— |
UseNumericNormalization |
— |
Boolean |
True |
— |
false |
— |
UseParenthesisPostProcessing |
— |
Boolean |
True |
— |
false |
— |
BioCreative Gene Mention Tagger
Category: Tagger
Framework: NaCTeM (UIMA)
Version: 0.0.1-SNAPSHOT
Tags Gene mentions using a model trained on BioCreative GM task data, with Entrez Gene and UMLS dictionary features.
CCGPosTagger
Category: Tagger
Framework: AlvisNLP
Version: 2012-04-30
Applies the CCG POS tagger on annotations.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
executable |
— |
org.bibliome.util.files.ExecutableFile |
True |
— |
— |
— |
formFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
internalEncoding |
— |
java.lang.String |
True |
— |
— |
— |
keepPreviousPos |
— |
java.lang.Boolean |
False |
— |
— |
— |
maxRuns |
— |
java.lang.Integer |
True |
— |
— |
— |
model |
— |
org.bibliome.util.files.InputDirectory |
True |
— |
— |
— |
posFeatureName |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
True |
— |
— |
— |
silent |
— |
java.lang.Boolean |
False |
— |
— |
— |
wordLayerName |
— |
java.lang.String |
True |
— |
— |
— |
CRF++ Tagger
Category: Tagger
Framework: NaCTeM (UIMA)
Version: 1.0
Uses Conditional Random Fields model for labeling. Based on CRF++, an implementation of CRF for labeling sequential data (http://crfpp.googlecode.com/svn/trunk/doc/index.html).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
IgnoreMalformedSequences |
Weather malformed sequences such as {O, I-X, O} or {B-X, I-Y} should be ignored. If false then the algorithm will attempt to create annotations. |
Boolean |
True |
— |
false |
— |
IgnoreUnknownTypes |
— |
Boolean |
True |
— |
false |
— |
ModelFileName |
Specifies the filename to store the model in. |
String |
True |
— |
false |
— |
Cebuano POS Tagger
Category: Tagger
Framework: GATE
Version: unknown
Mark Hepple's Brill-style POS tagger, adapted for languages where entries are multiword
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
lexiconURL |
— |
java.net.URL |
— |
resources/postag/lexicon |
— |
— |
rulesURL |
— |
java.net.URL |
— |
resources/postag/ruleset |
— |
— |
Chemistry Tagger
Category: Tagger
Framework: GATE
Version: unknown
A tagger for chemical names.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
compoundListsURL |
— |
java.net.URL |
— |
resources/compound.def |
— |
— |
document |
— |
gate.corpora.DocumentImpl |
— |
— |
— |
true |
elementListsURL |
— |
java.net.URL |
— |
resources/element.def |
— |
— |
elementMapURL |
— |
java.net.URL |
— |
resources/element_map.txt |
— |
— |
removeElements |
— |
java.lang.Boolean |
— |
true |
— |
true |
transducerGrammarURL |
— |
java.net.URL |
— |
resources/main.jape |
— |
— |
ClearNlpPosTagger
Category: Tagger
Framework: DKPro Core (UIMA)
Version: 1.8.0
Part-of-Speech annotator using Clear NLP. Requires Sentences to be annotated before.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Load the part-of-speech tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
dictLocation |
Load the dictionary from this location instead of locating the dictionary automatically. |
String |
False |
— |
false |
— |
dictVariant |
Override the default variant used to locate the dictionary. |
String |
False |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spaming the heap with thousands of strings representing only a few different tags. |
Boolean |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the pos-tagging model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the pos-tagging model. |
String |
False |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. |
Boolean |
True |
— |
false |
— |
FreelingTagger
Category: Tagger
Framework: NaCTeM (UIMA)
Version: 1.0
Performs tokenisation, lemmatisation and POS tagging. Operates on English (en). Spanish (es) and Catalan (ca), Welsh (cy), Galician (gl), Italian (it) and Portuguese (pt) by setting the "language" parameter (default is English).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
— |
String |
True |
— |
false |
— |
GENIA Tagger
Category: Tagger
Framework: NaCTeM (UIMA)
Version: 1.0
Tags biological named entities: proteins, cell lines, cell types, DNAs, and RNAs. It has its own tokeniser, part-of-speech tagger, and shallow parser. The models were trained on the GENIA corpus. Project website: http://www.nactem.ac.uk/GENIA/tagger/
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
chunkTag |
If true, chunk tags will be found (default is true) |
Boolean |
False |
— |
false |
— |
neTag |
If true, ne tags will be found (default true) |
Boolean |
False |
— |
false |
— |
tokenize |
True if the Sentences found should be tokenized, false if the tagger should use pre-set Tokens |
Boolean |
False |
— |
false |
— |
GenericTagger
Category: Tagger
Framework: GATE
Version: unknown
The Generic Tagger is Generic!
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
debug |
— |
java.lang.Boolean |
— |
false |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
ISO-8859-1 |
— |
true |
failOnMissingInputAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
failOnUnmappableCharacter |
— |
java.lang.Boolean |
— |
true |
— |
true |
featureMapping |
— |
gate.FeatureMap |
— |
string=1;category=2;lemma=3 |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
inputAnnotationType |
— |
java.lang.String |
— |
Token |
— |
true |
inputTemplate |
— |
java.lang.String |
— |
${string} |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputAnnotationType |
— |
java.lang.String |
— |
Token |
— |
true |
postProcessURL |
— |
java.net.URL |
— |
— |
— |
— |
preProcessURL |
— |
java.net.URL |
— |
— |
— |
— |
regex |
— |
java.lang.String |
— |
(.) (.) (.+) |
— |
true |
taggerBinary |
— |
java.net.URL |
— |
— |
— |
true |
taggerDir |
— |
java.net.URL |
— |
— |
— |
true |
taggerFlags |
— |
java.util.List |
— |
— |
— |
true |
updateAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
GeniaTagger
Category: Tagger
Framework: AlvisNLP
Version: 2012-04-30
Runs Genia Tagger on annotations.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
chunk |
— |
java.lang.String |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
entity |
— |
java.lang.String |
False |
— |
— |
— |
geniaCharset |
— |
java.lang.String |
True |
— |
— |
— |
geniaDir |
— |
java.io.File |
True |
— |
— |
— |
geniaTaggerExecutable |
— |
java.io.File |
True |
— |
— |
— |
lemma |
— |
java.lang.String |
True |
— |
— |
— |
pos |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentences |
— |
java.lang.String |
True |
— |
— |
— |
wordForm |
— |
java.lang.String |
True |
— |
— |
— |
words |
— |
java.lang.String |
True |
— |
— |
— |
Hepple POS Tagger
Category: Tagger
Framework: NaCTeM (UIMA)
Version: 1.0
Mark Hepple's POS tagger, from dragontools/Banner toolkit.
HepplePosTagger
Category: Tagger
Framework: DKPro Core (UIMA)
Version: 1.8.0
GATE Hepple part-of-speech tagger.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Load the part-of-speech tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spaming the heap with thousands of strings representing only a few different tags. Default: true |
Boolean |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
lexiconLocation |
Load the lexicon from this location instead of locating it automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. Default: false |
Boolean |
True |
— |
false |
— |
rulesetLocation |
Load the ruleset from this location instead of locating it automatically. |
String |
False |
— |
false |
— |
Hindi POS Tagger
Category: Tagger
Framework: GATE
Version: unknown
Mark Hepple's Brill-style POS tagger, adapted for languages where entries are multiword
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
lexiconURL |
— |
java.net.URL |
— |
resources/tagger/hindi_lexicon |
— |
— |
rulesURL |
— |
java.net.URL |
— |
resources/tagger/ruleset |
— |
— |
HunPosTagger
Category: Tagger
Framework: DKPro Core (UIMA)
Version: 1.8.0
Part-of-Speech annotator using HunPos. Requires Sentences to be annotated before.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Load the part-of-speech tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spaming the heap with thousands of strings representing only a few different tags. Default: true |
Boolean |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. Default: false |
Boolean |
True |
— |
false |
— |
ILSP FBT Tagger
Category: Tagger
Framework: ILSP (UIMA)
Version: 1.14
ILSP FBT Tagger is an adaptation of the Brill tagger trained on Greek text. It uses a PAROLE compatible tagset of 584 different tags which capture the morphosyntactic particularities of the Greek language. Working on the output of a sentence detection and tokenisation tool, the tagger assigns initial tags, looking the words up in a lexicon created from a manually annotated corpus during training. A suffix lexicon is used for initially tagging of unknown words. 799 contextual rules are then applied to improve the initial phase output.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
LexicaDir |
The directory containing the Berkeley DB lexical resources. Default is /opt/ilsp-nlp/lexica/fbt. |
String |
False |
— |
false |
— |
IULATagger
Category: Tagger
Framework: NaCTeM (UIMA)
Version: 1.0
Performs paragraph splitting, sentence splitting, tokenisation and POS tagging. Also detects proper names. Operates on Spanish (es) and Catalan (ca), by setting the "language" parameter (default is Spanish).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
— |
String |
True |
— |
false |
— |
LingPipe POS Tagger PR
Category: Tagger
Framework: GATE
Version: unknown
Provides a LingPipe part of speech tagger.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
applicationMode |
— |
gate.lingpipe.POSApplicationMode |
— |
FIRSTBEST |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
modelFileUrl |
— |
java.net.URL |
— |
— |
— |
false |
MateMorphTagger
Category: Tagger
Framework: DKPro Core (UIMA)
Version: 1.8.0
DKPro Annotator for the MateToolsMorphTagger.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
MatePosTagger
Category: Tagger
Framework: DKPro Core (UIMA)
Version: 1.8.0
DKPro Annotator for the MateToolsPosTagger
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Load the part-of-speech tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. Default: false |
Boolean |
True |
— |
false |
— |
MeCabTagger
Category: Tagger
Framework: DKPro Core (UIMA)
Version: 1.8.0
Annotator for the MeCab Japanese POS Tagger.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
language |
The language. |
String |
False |
— |
false |
— |
strictZoning |
Strict zoning causes the segmentation to be applied only within the boundaries of a zone annotation. This works only if a single zone type is specified (the zone annotations should NOT overlap) or if no zone type is specified - in which case the whole document is taken as a zone. If strict zoning is turned off, multiple zone types can be specified. A list of all zone boundaries (start and end) is created and segmentation happens between them. |
Boolean |
True |
— |
false |
— |
writeSentence |
Create Sentence annotations. |
Boolean |
True |
— |
false |
— |
writeToken |
Create Token annotations. |
Boolean |
True |
— |
false |
— |
zoneTypes |
A list of type names used for zoning. |
String |
False |
— |
true |
— |
Measurement Tagger
Category: Tagger
Framework: GATE
Version: unknown
A measurement tagger based upon GNU Units
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
commonURL |
— |
java.net.URL |
— |
resources/common_words.txt |
— |
— |
consumeNumberAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
failOnMissingInputAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
ignoredAnnotations |
— |
java.util.Set |
— |
Date;Money |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
japeURL |
— |
java.net.URL |
— |
resources/jape/main.jape |
— |
— |
locale |
— |
java.lang.String |
— |
en_GB |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
unitsURL |
— |
java.net.URL |
— |
resources/units.dat |
— |
— |
Medical Condition Tagger
Category: Tagger
Framework: NaCTeM (UIMA)
Version: 1.0
A tagger that recognises mentions of medical conditions. Implemented based on string matching against entries in the Index of Diseases (http://resource.nlm.nih.gov/63540040R) and the Nomenclature of Diseases (http://resource.nlm.nih.gov/31910070R).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
useApproximateStringMatching |
true if approximate string matching should be used |
Boolean |
False |
— |
false |
— |
NormaGene Tagger
Category: Tagger
Framework: GATE
Version: unknown
A processing resource that takes document and corpus parameters
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
threshold |
— |
java.lang.Double |
— |
0.6 |
— |
true |
Numbers Tagger
Category: Tagger
Framework: GATE
Version: unknown
Finds numbers in (both words and digits) and annotates them with their numeric value
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
allowWithinWords |
— |
java.lang.Boolean |
— |
false |
— |
true |
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
configURL |
— |
java.net.URL |
— |
resources/languages/all.xml |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
failOnMissingInputAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
postProcessURL |
— |
java.net.URL |
— |
resources/jape/post-process.jape |
— |
— |
useHintsFromOriginalMarkups |
— |
java.lang.Boolean |
— |
true |
— |
true |
OpenCalais Tagger
Category: Tagger
Framework: GATE
Version: unknown
An OpenCalais based semantic annotator
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
allowDistribution |
— |
java.lang.Boolean |
— |
false |
— |
true |
allowSearch |
— |
java.lang.Boolean |
— |
false |
— |
true |
calculateRelevanceScore |
— |
java.lang.Boolean |
— |
false |
— |
true |
docRDFaccessible |
— |
java.lang.Boolean |
— |
false |
— |
true |
document |
— |
gate.corpora.DocumentImpl |
— |
— |
— |
true |
enableMetadataType |
— |
gate.opencalais.MetadataType |
— |
— |
— |
true |
externalID |
— |
java.lang.String |
— |
— |
— |
true |
licenseID |
— |
java.lang.String |
— |
— |
— |
— |
openCalaisURL |
— |
java.net.URL |
— |
— |
— |
|
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
submitter |
— |
java.lang.String |
— |
— |
— |
true |
OpenNLP POS Tagger
Category: Tagger
Framework: GATE
Version: unknown
POS Tagger using an OpenNLP maxent model
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
model |
— |
java.net.URL |
— |
models/english/en-pos-maxent.bin |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
OpenNlpPosTagger
Category: Tagger
Framework: DKPro Core (UIMA)
Version: 1.8.0
Part-of-Speech annotator using OpenNLP. Requires Sentences to be annotated before.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Load the part-of-speech tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spaming the heap with thousands of strings representing only a few different tags. Default: true |
Boolean |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. Default: false |
Boolean |
True |
— |
false |
— |
POS Mapper
Category: Tagger
Framework: GATE
Version: unknown
Map complex Russian morphology tags into simpler POS categories
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
Penn BioTagger
Category: Tagger
Framework: GATE
Version: unknown
Ready-made application for the Penn BioTagger
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
menu |
— |
java.util.List |
— |
— |
— |
— |
pipelineURL |
— |
java.net.URL |
— |
— |
— |
— |
Penn BioTagger: Genes
Category: Tagger
Framework: GATE
Version: unknown
Penn BioTagger for Genes
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
modelURL |
— |
java.net.URL |
— |
resources/geneModel.crf.gz |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
Penn BioTagger: Malignancy
Category: Tagger
Framework: GATE
Version: unknown
Penn BioTagger for malignancy types
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
modelURL |
— |
java.net.URL |
— |
resources/malignancyModel.crf.gz |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
Penn BioTagger: Variation
Category: Tagger
Framework: GATE
Version: unknown
Penn BioTagger for variations
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
modelURL |
— |
java.net.URL |
— |
resources/variationModel.crf.gz |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
PosMapper
Category: Tagger
Framework: DKPro Core (UIMA)
Version: 1.8.0
Maps existing POS tags from one tagset to another using a user provided properties file.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
dkproMappingLocation |
A properties file containing mappings from the new tagset to (fully qualified) DKPro POS classes.<br> If such a file is not supplied, the DKPro POS classes stay the same regardless of the new POS tag value, and only the value is changed. |
String |
False |
— |
false |
— |
mappingFile |
A properties file containing POS tagset mappings. |
String |
True |
— |
false |
— |
RASP POS Converter
Category: Tagger
Framework: GATE
Version: unknown
Converts from PennTreebank POS tags to the C2 tagset used by RASP. Generates annotations of type MorphObj which hold the tag and lemma
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
grammarURL |
— |
java.net.URL |
— |
resources/main.jape |
— |
— |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
RASP2 POS Tagger
Category: Tagger
Framework: GATE
Version: unknown
RASP part-of-speech tagger, creating WordForm annotations
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
charset |
— |
java.lang.String |
— |
ISO-8859-1 |
— |
true |
debug |
— |
java.lang.Boolean |
— |
false |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
generateMultipleTags |
— |
java.lang.Boolean |
— |
true |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
raspHome |
— |
java.net.URL |
— |
file:/usr/local/bin/RASP |
— |
false |
RfTagger
Category: Tagger
Framework: DKPro Core (UIMA)
Version: 1.8.0
Rftagger morphological analyzer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
MorphMappingLocation |
— |
String |
False |
— |
false |
— |
POSMappingLocation |
Load the part-of-speech tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelEncoding |
The character encoding used by the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
printTagSet |
Write the tag set(s) to the log when a model is loaded. |
Boolean |
True |
— |
false |
— |
Roman Numerals Tagger
Category: Tagger
Framework: GATE
Version: unknown
Finds and annotates Roman numerals
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
allowLowerCase |
— |
java.lang.Boolean |
— |
false |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
maxTailLength |
— |
java.lang.Integer |
— |
0 |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
Russian POS Tagger
Category: Tagger
Framework: GATE
Version: unknown
Part-of-speech tagger for Russian
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
caseSensitive |
— |
java.lang.Boolean |
— |
true |
— |
— |
config |
— |
java.net.URL |
— |
resources/morphology/main.conf |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
encoding |
— |
java.lang.String |
— |
UTF-8 |
— |
— |
SVMLight Tagger
Category: Tagger
Framework: NaCTeM (UIMA)
Version: 1.0
Applies an SVMLight-trained model on instances.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
ModelFile |
The SVMLight model |
String |
True |
— |
false |
— |
NormFile |
The file containing the value of the norm, generated during model training |
String |
True |
— |
false |
— |
Stanford POS Tagger
Category: Tagger
Framework: GATE
Version: unknown
Stanford Part-of-Speech Tagger
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
baseSentenceAnnotationType |
— |
java.lang.String |
— |
Sentence |
— |
true |
baseTokenAnnotationType |
— |
java.lang.String |
— |
Token |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
failOnMissingInputAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
modelFile |
— |
java.net.URL |
— |
resources/english-left3words-distsim.tagger |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputAnnotationType |
— |
java.lang.String |
— |
Token |
— |
true |
posTagAllTokens |
— |
java.lang.Boolean |
— |
true |
— |
true |
useExistingTags |
— |
java.lang.Boolean |
— |
true |
— |
true |
StanfordPosTagger
Category: Tagger
Framework: DKPro Core (UIMA)
Version: 1.8.0
Stanford Part-of-Speech tagger component.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Location of the mapping file for part-of-speech tags to UIMA types. |
String |
False |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spaming the heap with thousands of strings representing only a few different tags. Default: false |
Boolean |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model and tag set mapping. |
String |
False |
— |
false |
— |
maxSentenceLength |
Sentences with more tokens than the specified max amount will be ignored if this parameter is set to a value larger than zero. The default value zero will allow all sentences to be POS tagged. |
Integer |
False |
— |
false |
— |
modelLocation |
Location from which the model is read. |
String |
False |
— |
false |
— |
modelVariant |
Variant of a model the model. Used to address a specific model if here are multiple models for one language. |
String |
False |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. Default: false |
Boolean |
True |
— |
false |
— |
ptb3Escaping |
Enable all traditional PTB3 token transforms (like -LRB-, -RRB-). |
Boolean |
True |
— |
false |
— |
quoteBegin |
List of extra token texts (usually single character strings) that should be treated like opening quotes and escaped accordingly before being sent to the parser. |
String |
False |
— |
true |
— |
quoteEnd |
List of extra token texts (usually single character strings) that should be treated like closing quotes and escaped accordingly before being sent to the parser. |
String |
False |
— |
true |
— |
TreeTagger
Category: Tagger
Framework: AlvisNLP
Version: 2010-10-28
Runs tree-tagger.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
formFeature |
— |
java.lang.String |
True |
— |
— |
— |
inputCharset |
— |
java.lang.String |
True |
— |
— |
— |
lemmaFeature |
— |
java.lang.String |
True |
— |
— |
— |
lexiconFile |
— |
org.bibliome.util.streams.SourceStream |
False |
— |
— |
— |
noUnknownLemma |
— |
java.lang.Boolean |
False |
— |
— |
— |
outputCharset |
— |
java.lang.String |
True |
— |
— |
— |
parFile |
— |
org.bibliome.util.files.InputFile |
True |
— |
— |
— |
posFeature |
— |
java.lang.String |
True |
— |
— |
— |
recordCharset |
— |
java.lang.String |
True |
— |
— |
— |
recordDir |
— |
org.bibliome.util.files.OutputDirectory |
False |
— |
— |
— |
recordFeatures |
— |
java.lang.String[] |
False |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
True |
— |
— |
— |
treeTaggerExecutable |
— |
org.bibliome.util.files.ExecutableFile |
True |
— |
— |
— |
wordLayerName |
— |
java.lang.String |
True |
— |
— |
— |
TreeTaggerPosTagger
Category: Tagger
Framework: DKPro Core (UIMA)
Version: 1.8.0
Part-of-Speech and lemmatizer annotator using TreeTagger.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
POSMappingLocation |
Load the part-of-speech tag to UIMA type mapping from this location instead of locating the mapping automatically. |
String |
False |
— |
false |
— |
executablePath |
Use this TreeTagger executable instead of trying to locate the executable automatically. |
String |
False |
— |
false |
— |
internTags |
Use the String#intern() method on tags. This is usually a good idea to avoid spaming the heap with thousands of strings representing only a few different tags. Default: true |
Boolean |
False |
— |
false |
— |
language |
Use this language instead of the document language to resolve the model. |
String |
False |
— |
false |
— |
modelEncoding |
The character encoding used by the model. |
String |
False |
— |
false |
— |
modelLocation |
Load the model from this location instead of locating the model automatically. |
String |
False |
— |
false |
— |
modelVariant |
Override the default variant used to locate the model. |
String |
False |
— |
false |
— |
performanceMode |
TT4J setting: Disable some sanity checks, e.g. whether tokens contain line breaks (which is not allowed). Turning this on will increase your performance, but the wrapper may throw exceptions if illegal data is provided. |
Boolean |
True |
— |
false |
— |
printTagSet |
Log the tag set(s) when a model is loaded. Default: false |
Boolean |
True |
— |
false |
— |
writeLemma |
Write lemma information. Default: true |
Boolean |
True |
— |
false |
— |
writePOS |
Write part-of-speech information. Default: true |
Boolean |
True |
— |
false |
— |
[[Twitter_POS_Tagger_(EN)]] ==== Twitter POS Tagger (EN)
Category: Tagger
Framework: GATE
Version: unknown
Stanford POS tagger trained on Tweets
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
baseSentenceAnnotationType |
— |
java.lang.String |
— |
Sentence |
— |
true |
baseTokenAnnotationType |
— |
java.lang.String |
— |
Token |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
failOnMissingInputAnnotations |
— |
java.lang.Boolean |
— |
true |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
modelFile |
— |
java.net.URL |
— |
resources/pos/gate-EN-twitter.model |
— |
— |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputAnnotationType |
— |
java.lang.String |
— |
Token |
— |
true |
posTagAllTokens |
— |
java.lang.Boolean |
— |
true |
— |
true |
useExistingTags |
— |
java.lang.Boolean |
— |
true |
— |
true |
Topics (3)
MalletTopicModelEstimator
Category: Topics
Framework: DKPro Core (UIMA)
Version: 1.8.0
Estimate an LDA topic model using Mallet and write it to a file. It stores all incoming CAS' to Mallet Instances before estimating the model, using a ParallelTopicModel.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
alphaSum |
The sum of alphas over all topics. Default: 1.0. <p> Another recommended value is 50 / T (number of topics). |
Float |
True |
— |
false |
— |
beta |
Beta for a single dimension of the Dirichlet prior. Default: 0.01. |
Float |
True |
— |
false |
— |
burninPeriod |
The number of iterations before hyperparameter optimization begins. Default: 100 |
Integer |
True |
— |
false |
— |
displayInterval |
The interval in which to display the estimated topics. Default: 50. |
Integer |
True |
— |
false |
— |
displayNTopicWords |
The number of top words to display during estimation. Default: 7. |
Integer |
True |
— |
false |
— |
minTokenLength |
Ignore tokens (or lemmas, respectively) that are shorter than the given value. Default: 3. |
Integer |
True |
— |
false |
— |
modelEntityType |
If specific, the text contained in the given segmentation type annotations are fed as separate units to the topic model estimator e.g. de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.sentence. Text that is not within such annotations is ignored. <p> By default, the full document text is used as a document. |
String |
False |
— |
false |
— |
nIterations |
The number of iterations during model estimation. Default: 1000. |
Integer |
True |
— |
false |
— |
nThreads |
The number of threads to use during model estimation. Default: 1. |
Integer |
True |
— |
false |
— |
nTopics |
The number of topics to estimate for the topic model. |
Integer |
True |
— |
false |
— |
optimizeInterval |
Interval for optimizing Dirichlet hyperparameters. Default: 50 |
Integer |
True |
— |
false |
— |
randomSeed |
Set random seed. If set to -1 (default), uses random generator. |
Integer |
True |
— |
false |
— |
saveInterval |
Define how often to save a serialized model during estimation. Default: 0 (only save when estimation is done). |
Integer |
True |
— |
false |
— |
targetLocation |
The target model file location. |
String |
True |
— |
false |
— |
typeName |
The annotation type to use for the topic model. Default: de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token. |
String |
True |
— |
false |
— |
useLemma |
If set, uses lemmas instead of original text as features. |
Boolean |
True |
— |
false |
— |
useSymmetricAlph |
Use a symmatric alpha value during model estimation? Default: false. |
Boolean |
True |
— |
false |
— |
MalletTopicModelInferencer
Category: Topics
Framework: DKPro Core (UIMA)
Version: 1.8.0
Infers the topic distribution over documents using a Mallet ParallelTopicModel.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
burnIn |
The number of iterations before hyperparameter optimization begins. Default: 1 |
Integer |
True |
— |
false |
— |
maxTopicAssignments |
Maximum number of topics to assign. If not set (or <= 0), the number of topics in the model divided by 10 is set. |
Integer |
True |
— |
false |
— |
minTokenLength |
Ignore tokens (or lemmas, respectively) that are shorter than the given value. Default: 3. |
Integer |
True |
— |
false |
— |
minTopicProb |
Minimum topic proportion for the document-topic assignment. |
Float |
True |
— |
false |
— |
modelLocation |
— |
String |
True |
— |
false |
— |
nIterations |
The number of iterations during inference. Default: 10. |
Integer |
True |
— |
false |
— |
thinning |
— |
Integer |
True |
— |
false |
— |
typeName |
The annotation type to use as tokens. Default: Token |
String |
True |
— |
false |
— |
useLemma |
If set, uses lemmas instead of original text as features. |
Boolean |
True |
— |
false |
— |
Textalytics Topics Extraction
Category: Topics
Framework: GATE
Version: unknown
Textalytics Topics Extraction
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
apiURL |
— |
java.lang.String |
— |
— |
true |
|
caseSensitive |
— |
java.lang.Boolean |
— |
— |
— |
true |
context |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
debug |
— |
java.lang.Boolean |
— |
— |
— |
true |
dictionary |
— |
java.lang.String |
— |
— |
— |
true |
disambiguationLevel |
— |
daedalus.textalytics.gate.param.DisambiguationLevel |
— |
strong_disambiguation |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASTypes |
— |
java.util.List |
— |
— |
— |
true |
inputASname |
— |
java.lang.String |
— |
— |
— |
true |
key |
— |
java.lang.String |
— |
— |
— |
true |
lang |
— |
java.lang.String |
— |
— |
— |
true |
outputASname |
— |
java.lang.String |
— |
Textalytics |
— |
true |
relaxedTypography |
— |
java.lang.Boolean |
— |
— |
— |
true |
subTopics |
— |
java.lang.Boolean |
— |
— |
— |
true |
timeref |
— |
java.lang.String |
— |
— |
— |
true |
topicTypes |
— |
java.lang.String |
— |
— |
— |
true |
udDictionaries |
— |
java.util.List |
— |
— |
— |
true |
unknownWords |
— |
java.lang.Boolean |
— |
— |
— |
true |
Validation (1)
Schema Enforcer
Category: Validation
Framework: GATE
Version: unknown
Produces an annotation set whose content is restricted by the specified set of schemas
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputASName |
— |
java.lang.String |
— |
— |
— |
true |
schemas |
— |
java.util.List |
— |
— |
— |
true |
useDefaults |
— |
java.lang.Boolean |
— |
false |
— |
true |
Viewer/Editor (18)
Compound Document Editor
Category: Viewer/Editor
Framework: GATE
Version: unknown
Editor for compound documents.
GATE Ontology Editor
Category: Viewer/Editor
Framework: GATE
Version: unknown
Ontology editing tool.
Gazetteer Editor
Category: Viewer/Editor
Framework: GATE
Version: unknown
Gazetteer viewer and editor.
JAPE-Plus Viewer
Category: Viewer/Editor
Framework: GATE
Version: unknown
A JAPE grammar file viewer
Pairbank Viewer
Category: Viewer/Editor
Framework: GATE
Version: unknown
viewer for the TermRaider Pairbank
RAT-I
Category: Viewer/Editor
Framework: GATE
Version: unknown
Relation Annotation Tool Instance view.
Schema Annotations Editor
Category: Viewer/Editor
Framework: GATE
Version: unknown
An annotation editor restricted by schemas.
Script Editor
Category: Viewer/Editor
Framework: GATE
Version: unknown
Editor for the Groovy script behind this PR
Shell
Category: Viewer/Editor
Framework: AlvisNLP
Version: 2012-04-30
Starts an interactive shell that allows to query the corpus data structure.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
historyFile |
— |
org.bibliome.util.files.OutputFile |
False |
— |
— |
— |
prompt |
— |
java.lang.String |
True |
— |
— |
— |
Shell2
Category: Viewer/Editor
Framework: AlvisNLP
Version:
Starts an interactive shell that allows to query the corpus data structure.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
constantAnnotationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantDocumentFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantRelationFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantSectionFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
constantTupleFeatures |
— |
alvisnlp.module.types.Mapping |
False |
— |
— |
— |
Simple Schema Viewer
Category: Viewer/Editor
Framework: GATE
Version: unknown
A Simple Annotation Schema Viewer
Syntax tree viewer
Category: Viewer/Editor
Framework: GATE
Version: unknown
Viewer for syntax trees generated by a parser.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
tokenType |
— |
java.lang.String |
— |
Token |
— |
false |
treeNodeAnnotationType |
— |
java.lang.String |
— |
SyntaxTreeNode |
— |
false |
Termbank Viewer
Category: Viewer/Editor
Framework: GATE
Version: unknown
viewer for the TermRaider Termbank
Writer (64)
ADBWriter
Category: Writer
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
annotationType |
— |
alvisnlp.corpus.expressions.Expression |
False |
— |
— |
— |
annotations |
— |
alvisnlp.corpus.expressions.Expression |
False |
— |
— |
— |
aspectId |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
docScopeAnnType |
— |
java.lang.String[] |
False |
— |
— |
— |
documents |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
fragments |
— |
alvisnlp.corpus.expressions.Expression |
False |
— |
— |
— |
groups |
— |
alvisnlp.corpus.expressions.Expression |
False |
— |
— |
— |
password |
— |
java.lang.String |
True |
— |
— |
— |
relations |
— |
alvisnlp.corpus.expressions.Expression |
False |
— |
— |
— |
schema |
— |
java.lang.String |
False |
— |
— |
— |
sections |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
toDocScopeAnnotation |
— |
alvisnlp.corpus.expressions.Expression[] |
False |
— |
— |
— |
url |
— |
java.lang.String |
True |
— |
— |
— |
username |
— |
java.lang.String |
True |
— |
— |
— |
AlvisDBIndexer
Category: Writer
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
append |
— |
java.lang.Boolean |
False |
— |
— |
— |
elements |
— |
org.bibliome.alvisnlp.modules.alvisdb.ADBElements[] |
True |
— |
— |
— |
indexDir |
— |
org.bibliome.util.files.OutputDirectory |
True |
— |
— |
— |
AlvisIRIndexer
Category: Writer
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
clearIndex |
— |
java.lang.Boolean |
True |
— |
— |
— |
documents |
— |
org.bibliome.alvisnlp.modules.alvisir2.IndexedDocuments |
True |
— |
— |
— |
fieldNames |
— |
java.lang.String[] |
True |
— |
— |
— |
indexDir |
— |
org.bibliome.util.files.OutputDirectory |
True |
— |
— |
— |
propertyKeys |
— |
java.lang.String[] |
True |
— |
— |
— |
recordGlobalIndexAttributes |
— |
java.lang.Boolean |
True |
— |
— |
— |
relations |
— |
alvisnlp.module.types.MultiMapping |
True |
— |
— |
— |
tokenPositionGap |
— |
java.lang.Integer |
True |
— |
— |
— |
BIO Format Writer Cas Consumer
Category: Writer
Framework: NaCTeM (UIMA)
Version: 1.0
Writes specified types of annotations to the specified directory in the BIO format. BIO format is one line per token, token [tab] label, empty line at the end of each sentence (if SentencePerLine is true, one line per sentence, tokenization by spaces where a token is followed by a label like "token|label";). Label is one of O, B-suffix, I-suffix. Suffix should be specified as a string list of mapping from fully qualified type name to its suffix by using comma, e.g. "org.u_compare.syntactic.Sentence,Sent". Sentence and Token annotations required.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
SentencePerLine |
If true, merges one sentence into one line with |
as delimiter. |
Boolean |
True |
— |
false |
— |
TypeToBioSuffixMap |
Fully qualified type name, comma, suffix string |
String |
True |
— |
true |
— |
outputDir |
output directory |
String |
True |
— |
false |
BinaryCasWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Format | Description | Type system on load | CAS Addresses preserved |
---|---|---|---|
S | CAS structures are dumped to disc as they are using Java serialization (CASSerializer ). Because these structures are pre-allocated in memory at larger sizes than what is actually required, files in this format may be larger than necessary. However, the CAS addresses of feature structures are preserved in this format. When the data is loaded back into a CAS, it must have been initialized with the same type system as the original CAS. | must be the same | yes |
S+ | CAS structures are dumped to disc as they are using Java serialization as in form 0, but now using the CASCompleteSerializer which includes CAS metadata like type system and index repositories. | is reinitialized | yes |
0 | CAS structures are dumped to disc as they are using Java serialization (CASSerializer ). This is basically the same as format S but includes a UIMA header and can be read using org.apache.uima.cas.impl.Serialization#deserializeCAS. | must be the same | yes |
4 | UIMA binary serialization saving all feature structures (reachable or not). This format internally uses gzip compression and a binary representation of the CAS, making it much more efficient than format 0. | must be the same | yes |
6 | UIMA binary serialization as format 4, but saving only reachable feature structures. | must be the same | no |
6+ | UIMA binary serialization as format 6, but also contains the type system defintion. This allows the BinaryCasReader to load data leniently into a CAS that has been initialized with a different type system. | lenient loading | no |
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
filenameExtension |
— |
String |
True |
— |
false |
— |
format |
— |
String |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
typeSystemLocation |
Location to write the type system to. The type system is saved using Java serialization, it is not saved as a XML type system description. We recommend to use the name typesystem.ser. <br> The #PARAM_COMPRESSION parameter has no effect on the type system. Instead, if the type system file should be compressed or not is detected from the file name extension (e.g. ".gz"). <br> If this parameter is set, the type system and index repository are no longer serialized into the same file as the test of the CAS. The SerializedCasReader can currently not read such files. Use this only if you really know what you are doing. <br> This parameter has no effect if formats S+ or 6+ are used as the type system information is embedded in each individual file. Otherwise, it is recommended that this parameter be set unless some other mechanism is used to initialize the CAS with the same type system and index repository during reading that was used during writing. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
BioC Writer
Category: Writer
Framework: NaCTeM (UIMA)
Version: 1.0
Writes BioC annotations to files. Each output file will consist of a single document only. BioC website: http://bioc.sourceforge.net/
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
outputFile |
A path to a file where an entire collection will be written to. |
String |
True |
— |
false |
— |
BioNLP ST Data Writer
Category: Writer
Framework: NaCTeM (UIMA)
Version: 1.0
Writes BioNLP entity and event annotations to files.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
OutputFolder |
A folder where BioNLP ST files will be written to. |
String |
True |
— |
false |
— |
BratWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Writer for the brat annotation format.
Known issues:
- Brat is unable to read relation attributes created by this writer.
- PARAM_TYPE_MAPPINGS not implemented yet
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
enableTypeMappings |
Enable type mappings. |
Boolean |
True |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
excludeTypes |
Types that will not be written to the exported file. |
String |
True |
— |
true |
— |
filenameSuffix |
Specify the suffix of output files. Default value <code>.ann</code>. If the suffix is not needed, provide an empty string as value. |
String |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
palette |
Colors to be used for the visual configuration that is generated for brat. |
String |
False |
— |
true |
— |
relationTypes |
Types that are relations. It is mandatory to provide the type name followed by two feature names that represent Arg1 and Arg2 separated by colons, e.g. <code>de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency:Governor:Dependent</code>. |
String |
True |
— |
true |
— |
shortAttributeNames |
Whether to render attributes by their short name or by their qualified name. |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
spanTypes |
Types that are text annotations (aka entities or spans). |
String |
True |
— |
true |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
textFilenameSuffix |
Specify the suffix of text output files. Default value <code>.txt</code>. If the suffix is not needed, provide an empty string as value. |
String |
True |
— |
false |
— |
typeMappings |
FIXME |
String |
False |
— |
true |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
writeNullAttributes |
Enable writing of features with null values. |
Boolean |
True |
— |
false |
— |
writeRelationAttributes |
The brat web application can currently not handle attributes on relations, thus they are disabled by default. Here they can be enabled again. |
Boolean |
True |
— |
false |
— |
CasDumpWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Dumps CAS content to a text file. This is useful when setting up test cases which contain a reference output to which an actually produced CAS is compared. The format produced by this component is more easily comparable than a XCAS or XMI format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
featurePatterns |
Include/exclude features according to the following patterns. Mind that the patterns do not actually match feature names but lines produced by FeatureStructure.toString(). |
String |
True |
— |
true |
— |
sort |
Sort increasing by begin, decreasing by end, increasing by name instead of relying on index order. |
Boolean |
True |
— |
false |
— |
targetLocation |
Output file. If multiple CASes as processed, their contents are concatenated into this file. Mind that a test case using this consumer with multiple CASes requires a reader which produced the CASes always in the same order. When this file is set to "-", the dump does to System#out (default). |
String |
True |
— |
false |
— |
typePatterns |
Include/exclude specified UIMA types in the output. |
String |
True |
— |
true |
— |
writeDocumentMetaData |
Whether to dump the content of the CAS#getDocumentAnnotation(). |
Boolean |
True |
— |
false |
— |
CoNLL2007 Cas Consumer
Category: Writer
Framework: ILSP (UIMA)
Version: 1.12
Writes sentences from the CAS in the CoNLL 2007 format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
AppendExt |
Extension to be appended to the output files. |
String |
False |
— |
false |
— |
OutputDirectory |
Directory where the output files will be written |
String |
True |
— |
false |
— |
PrintDepRels |
If true, prints dependency relations |
Boolean |
False |
— |
false |
— |
StripExt |
Extension to be stripped from the input files. |
String |
False |
— |
false |
— |
Configurable Exporter
Category: Writer
Framework: GATE
Version: unknown
Allows annotations to be exported according to a specified format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
configFileURL |
— |
java.net.URL |
— |
resources/configurableexporter/example.conf |
— |
— |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
instanceName |
— |
java.lang.String |
— |
— |
— |
true |
outputURL |
— |
java.net.URL |
— |
— |
— |
true |
Conll2000Writer
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Writes the CoNLL 2000 chunking format. The columns are separated by spaces.
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
deficit NN I-NP
will MD B-VP
narrow VB I-VP
to TO B-PP
only RB B-NP
# # I-NP
1.8 CD I-NP
billion CD I-NP
in IN B-PP
September NNP B-NP
. . O
- FORM - token
- POSTAG - part-of-speech tag
- CHUNK - chunk (BIO encoded)
Sentences are separated by a blank new line.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
filenameSuffix |
— |
String |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
sourceEncoding |
Name of configuration parameter that contains the character encoding used by the input files. |
String |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
writeChunk |
— |
Boolean |
True |
— |
false |
— |
writePOS |
— |
Boolean |
True |
— |
false |
— |
Conll2002Writer
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Writes the CoNLL 2002 named entity format. The columns are separated by a single space, unlike illustrated below.
Wolff B-PER
, O
currently O
a O
journalist O
in O
Argentina B-LOC
, O
played O
with O
Del B-PER
Bosque I-PER
in O
the O
final O
years O
of O
the O
seventies O
in O
Real B-ORG
Madrid I-ORG
. O
- FORM - token
- NER - named entity (BIO encoded)
Sentences are separated by a blank new line.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
filenameSuffix |
— |
String |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
sourceEncoding |
Name of configuration parameter that contains the character encoding used by the input files. |
String |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
writeNamedEntity |
— |
Boolean |
True |
— |
false |
— |
Conll2006Writer
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Writes a file in the CoNLL-2006 format (aka CoNLL-X).
Heutzutage heutzutage ADV _ _ ADV _ _
- ID - token number in sentence
- FORM - token
- LEMMA - lemma
- CPOSTAG - part-of-speech tag (coarse grained)
- POSTAG - part-of-speech tag
- FEATS - unused
- HEAD - target token for a dependency parsing
- DEPREL - function of the dependency parsing
- PHEAD - unused
- PDEPREL - unused
Sentences are separated by a blank new line
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
filenameSuffix |
— |
String |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
sourceEncoding |
Name of configuration parameter that contains the character encoding used by the input files. |
String |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
writeDependency |
— |
Boolean |
True |
— |
false |
— |
writeLemma |
— |
Boolean |
True |
— |
false |
— |
writeMorph |
— |
Boolean |
True |
— |
false |
— |
writePOS |
— |
Boolean |
True |
— |
false |
— |
Conll2009Writer
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Writes a file in the CoNLL-2009 format.
- ID - (ignored) Token counter, starting at 1 for each new sentence.
- FORM - (Token) Word form or punctuation symbol.
- LEMMA - (Lemma) Fine-grained part-of-speech tag, where the tagset depends on the language, or identical to the coarse-grained part-of-speech tag if not available.
- PLEMMA - (ignored) Automatically predicted lemma of FORM
- POS - (POS) Fine-grained part-of-speech tag, where the tagset depends on the language, or identical to the coarse-grained part-of-speech tag if not available.
- PPOS - (ignored) Automatically predicted major POS by a language-specific tagger
- FEAT - (MorphologicalFeatures) Unordered set of syntactic and/or morphological features (depending on the particular language), separated by a vertical bar (|), or an underscore if not available.
- PFEAT - (ignored) Automatically predicted morphological features (if applicable)
- HEAD - (Dependency) Head of the current token, which is either a value of ID or zero ('0'). Note that depending on the original treebank annotation, there may be multiple tokens with an ID of zero.
- PHEAD - (ignored) Automatically predicted syntactic head
- DEPREL - (Dependency) Dependency relation to the HEAD. The set of dependency relations depends on the particular language. Note that depending on the original treebank annotation, the dependency relation may be meaningfull or simply 'ROOT'.
- PDEPREL - (ignored) Automatically predicted dependency relation to PHEAD
- FILLPRED - (auto-generated) Contains 'Y' for argument-bearing tokens
- PRED - (SemanticPredicate) (sense) identifier of a semantic 'predicate' coming from a current token
- APREDs - (SemanticArgument) Columns with argument labels for each semantic predicate (in the ID order)
Sentences are separated by a blank new line
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
filenameSuffix |
— |
String |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
sourceEncoding |
Name of configuration parameter that contains the character encoding used by the input files. |
String |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
writeDependency |
— |
Boolean |
True |
— |
false |
— |
writeLemma |
— |
Boolean |
True |
— |
false |
— |
writeMorph |
— |
Boolean |
True |
— |
false |
— |
writePOS |
— |
Boolean |
True |
— |
false |
— |
writeSemanticPredicate |
— |
Boolean |
True |
— |
false |
— |
Conll2012Writer
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Writer for the CoNLL-2009 format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
filenameSuffix |
— |
String |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
sourceEncoding |
Name of configuration parameter that contains the character encoding used by the input files. |
String |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
writeLemma |
— |
Boolean |
True |
— |
false |
— |
writePOS |
— |
Boolean |
True |
— |
false |
— |
writeSemanticPredicate |
— |
Boolean |
True |
— |
false |
— |
EnrichedDocumentWriter
Category: Writer
Framework: AlvisNLP
Version: 2010-10-28
Writes the corpus in the infamous Alvis Enriched Document Format suitable for indexation with Zebra-Alvis.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
blockSize |
— |
java.lang.Integer |
True |
— |
— |
— |
blockStart |
— |
java.lang.Integer |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
idMetaFeature |
— |
java.lang.String |
True |
— |
— |
— |
lemmaFeature |
— |
java.lang.String |
True |
— |
— |
— |
metaTrans |
— |
alvisnlp.module.types.Mapping |
True |
— |
— |
— |
neCanonicalFormFeature |
— |
java.lang.String |
True |
— |
— |
— |
neLayerName |
— |
java.lang.String |
True |
— |
— |
— |
neTypeFeature |
— |
java.lang.String |
True |
— |
— |
— |
outDir |
— |
org.bibliome.util.files.OutputDirectory |
True |
— |
— |
— |
outFilePrefix |
— |
java.lang.String |
True |
— |
— |
— |
outFileSuffix |
— |
java.lang.String |
True |
— |
— |
— |
posFeature |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
semanticFeature |
— |
java.lang.String |
False |
— |
— |
— |
sentenceLayerName |
— |
java.lang.String |
True |
— |
— |
— |
termCanonicalFormFeature |
— |
java.lang.String |
True |
— |
— |
— |
termLayerName |
— |
java.lang.String |
True |
— |
— |
— |
tokenLayerName |
— |
java.lang.String |
True |
— |
— |
— |
tokenTypeFeature |
— |
java.lang.String |
True |
— |
— |
— |
urlPrefix |
— |
java.lang.String |
True |
— |
— |
— |
urlSuffixFeature |
— |
java.lang.String |
True |
— |
— |
— |
wordLayerName |
— |
java.lang.String |
True |
— |
— |
— |
ExportAlignmentPR
Category: Writer
Framework: GATE
Version: unknown
A PR to export alignment information in an xml file.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
inputASName |
— |
java.lang.String |
— |
— |
— |
true |
outputDirectory |
— |
java.net.URL |
— |
— |
— |
true |
parentOfUnitOfAlignment |
— |
java.lang.String |
— |
Sentence |
— |
true |
parentOfUnitOfAlignmentFeatureName |
— |
java.lang.String |
— |
sentence-alignment |
— |
true |
sourceDocumentID |
— |
java.lang.String |
— |
— |
— |
true |
targetDocumentID |
— |
java.lang.String |
— |
— |
— |
true |
unitAlignmentFeatureName |
— |
java.lang.String |
— |
word-alignment |
— |
true |
unitOfAlignment |
— |
java.lang.String |
— |
Token |
— |
true |
ExportCadixeJSON
Category: Writer
Framework: AlvisNLP
Version: 2012-04-30
Writes each document in a file in the AlvisAE protocol format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
annotationSets |
— |
org.bibliome.alvisnlp.modules.cadixe.AnnotationSet[] |
True |
— |
— |
— |
documentDescription |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
documentProperties |
— |
alvisnlp.module.types.ExpressionMapping |
False |
— |
— |
— |
fileName |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
outDir |
— |
org.bibliome.util.files.OutputDirectory |
True |
— |
— |
— |
owner |
— |
java.lang.Integer |
True |
— |
— |
— |
schemaFile |
— |
org.bibliome.util.files.InputFile |
False |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
ExpressionExtract
Category: Writer
Framework: AlvisNLP
Version: 2012-04-30
Write elements in a tab separated file.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
fields |
— |
alvisnlp.corpus.expressions.Expression[] |
True |
— |
— |
— |
headers |
— |
java.lang.String[] |
False |
— |
— |
— |
outFile |
— |
org.bibliome.util.streams.TargetStream |
True |
— |
— |
— |
target |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
Factored Tag Lem Consumer
Category: Writer
Framework: ILSP (UIMA)
Version: 1.2
Writes sentences from the CAS in the Factored Tag Lem format
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
AppendExt |
Extension to be appended to the output files. |
String |
False |
— |
false |
— |
OutputDirectory |
Directory where the output files will be written |
String |
True |
— |
false |
— |
StripExt |
Extension to be stripped from the input files. |
String |
False |
— |
false |
— |
Fast Infoset Exporter
Category: Writer
Framework: GATE
Version: unknown
Export GATE documents to GATE XML stored in the binary Fast Infoset format
FillDB
Category: Writer
Framework: AlvisNLP
Version: 2012-04-30
Stores the corpus into a SQL database.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
jdbcDriver |
— |
java.lang.String |
True |
— |
— |
— |
password |
— |
java.lang.String |
True |
— |
— |
— |
schema |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
url |
— |
java.lang.String |
True |
— |
— |
— |
username |
— |
java.lang.String |
True |
— |
— |
— |
Flexible Exporter
Category: Writer
Framework: GATE
Version: unknown
Exports a document with GATE annotations to its original format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
annotationTypes |
— |
java.util.ArrayList |
— |
Person;Location;Date |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
dumpTypes |
— |
java.util.ArrayList |
— |
Person;Location;Date |
— |
true |
includeFeatures |
— |
java.lang.Boolean |
— |
false |
— |
— |
outputDirectoryUrl |
— |
java.net.URL |
— |
— |
— |
true |
suffixForDumpFiles |
— |
java.lang.String |
— |
.gate |
— |
— |
useStandOffXML |
— |
java.lang.Boolean |
— |
false |
— |
— |
useSuffixForDumpFiles |
— |
java.lang.Boolean |
— |
true |
— |
— |
GATE JSON Exporter
Category: Writer
Framework: GATE
Version: unknown
Export documents and corpora in JSON format
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationTypes |
— |
java.util.Set |
— |
— |
— |
true |
documentAnnotationASName |
— |
java.lang.String |
— |
Original markups |
— |
true |
documentAnnotationType |
— |
java.lang.String |
— |
Tweet |
— |
true |
entitiesAnnotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
exportAsArray |
— |
java.lang.Boolean |
— |
false |
— |
true |
GATE XML Writer CAS Consumer
Category: Writer
Framework: ILSP (UIMA)
Version: 1.0
Writes the CAS to GATE XML format
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
AppendExt |
Extension to be appended to the output files. |
String |
False |
— |
false |
— |
OutputDirectory |
Directory where the XML files will be written |
String |
True |
— |
false |
— |
StripExt |
Extension to be stripped from the input files. |
String |
False |
— |
false |
— |
GeniaWriter
Category: Writer
Framework: AlvisNLP
Version: 2012-04-30
Writes each section in three files in the BioNLP challenge format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
dependencies |
— |
alvisnlp.corpus.expressions.Expression |
False |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
entities |
— |
alvisnlp.module.types.ExpressionMapping |
True |
— |
— |
— |
entityForm |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
eventExtra |
— |
alvisnlp.corpus.expressions.Expression |
False |
— |
— |
— |
events |
— |
alvisnlp.module.types.ExpressionMapping |
True |
— |
— |
— |
fileName |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
labelFeature |
— |
java.lang.String |
False |
— |
— |
— |
outputDir |
— |
org.bibliome.util.files.OutputDirectory |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceForm |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentences |
— |
alvisnlp.corpus.expressions.Expression |
False |
— |
— |
— |
wordForm |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
words |
— |
alvisnlp.corpus.expressions.Expression |
False |
— |
— |
— |
HTML5 Microdata Exporter
Category: Writer
Framework: GATE
Version: unknown
Exports Annotations as HTML5 Microdata
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
configURL |
— |
java.net.URL |
— |
resources/schema.org/ANNIE.xml |
— |
true |
ILSP GrAF Consumer
Category: Writer
Framework: ILSP (UIMA)
Version: 0.9
Writes sentences from the CAS to GrAF standoff format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
AppendExt |
Extension to be appended to the output files. |
String |
False |
— |
false |
— |
OutputChunkFile |
— |
String |
False |
— |
false |
— |
OutputDepFile |
— |
String |
False |
— |
false |
— |
OutputDirectory |
Directory where the XML files will be written |
String |
False |
— |
false |
— |
OutputDotFile |
— |
String |
False |
— |
false |
— |
OutputHeaderFile |
— |
String |
False |
— |
false |
— |
OutputNerFile |
— |
String |
False |
— |
false |
— |
OutputPosFile |
— |
String |
False |
— |
false |
— |
OutputRegFile |
— |
String |
False |
— |
false |
— |
OutputSegFile |
— |
String |
False |
— |
false |
— |
OutputSentFile |
— |
String |
False |
— |
false |
— |
OutputTxtFile |
— |
String |
False |
— |
false |
— |
StripExt |
Extension to be stripped from the input files. |
String |
False |
— |
false |
— |
ILSP PML Cas Consumer
Category: Writer
Framework: ILSP (UIMA)
Version: 0.9
Writes sentences from the CAS in the Prague Markup Language format for editing dependency structures in TrEd
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
AppendExt |
Extension to be appended to the output files. |
String |
False |
— |
false |
— |
OutputDirectory |
Directory where the output files will be written |
String |
True |
— |
false |
— |
StripExt |
Extension to be stripped from the input files. |
String |
False |
— |
false |
— |
ILSP XCES Consumer
Category: Writer
Framework: ILSP (UIMA)
Version: 0.9
Writes sentences from the CAS to the XCES format
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
AppendExt |
Extension to be appended to the output files. |
String |
False |
— |
false |
— |
OutputDirectory |
Directory where the XML files will be written |
String |
True |
— |
false |
— |
StripExt |
Extension to be stripped from the input files. |
String |
False |
— |
false |
— |
ILSP Xmi Writer CAS Consumer
Category: Writer
Framework: ILSP (UIMA)
Version: 0.9
Serializes the CAS to XMI.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
AppendExt |
Extension to be appended to the output files. |
String |
False |
— |
false |
— |
OutputDirectory |
Directory where the XMI files will be written |
String |
True |
— |
false |
— |
StripExt |
Extension to be stripped from the input files. |
String |
False |
— |
false |
— |
ImsCwbWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
This Consumer outputs the content of all CASes into the IMS workbench format. This writer produces a text file which needs to be converted to the binary IMS CWB index files using the command line tools that come with the CWB. It is possible to set the parameter #PARAM_CQP_HOME to directly create output in the native binary CQP format via the original CWB command line tools.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
additionalFeatures |
Write additional token-level annotation features. These have to be given as an array of fully qualified feature paths (fully.qualified.classname/featureName). The names for these annotations in CQP are their lowercase shortnames. |
String |
False |
— |
true |
— |
corpusName |
The name of the generated corpus. |
String |
True |
— |
false |
— |
cqpCompress |
Set this parameter to compress the token streams and the indexes using cwb-huffcode and cwb-compress-rdx. With modern hardware, this may actually slow down queries, so we turn it off by default. If you have large data sets, you best try yourself what works best for you. (default: false) |
Boolean |
True |
— |
false |
— |
cqpHome |
Set this parameter to the directory containing the cwb-encode and cwb-makeall commands if you want the write to directly encode into the CQP binary format. |
String |
False |
— |
false |
— |
cqpwebCompatibility |
Make document IDs compatible with CQPweb. CQPweb demands an id consisting of only letters, numbers and underscore. |
Boolean |
True |
— |
false |
— |
sentenceTag |
— |
String |
True |
— |
false |
— |
targetEncoding |
Character encoding of the output data. |
String |
True |
— |
false |
— |
targetLocation |
Location to which the output is written. |
String |
True |
— |
false |
— |
writeCPOS |
Write coarse-grained part-of-speech tags. These are the simple names of the UIMA types used to represent the part-of-speech tag. |
Boolean |
True |
— |
false |
— |
writeDocId |
Write the document ID for each token. It is usually a better idea to generate a #PARAM_WRITE_DOCUMENT_TAG document tag or a #PARAM_WRITE_TEXT_TAG text tag which also contain the document ID that can be queried in CQP. |
Boolean |
True |
— |
false |
— |
writeDocumentTag |
Write a pseudo-XML tag with the name document to mark the start and end of a document. |
Boolean |
True |
— |
false |
— |
writeLemma |
Write lemmata. |
Boolean |
True |
— |
false |
— |
writeOffsets |
Write the start and end position of each token. |
Boolean |
True |
— |
false |
— |
writePOS |
Write part-of-speech tags. |
Boolean |
True |
— |
false |
— |
writeTextTag |
Write a pseudo-XML tag with the name text to mark the start and end of a document. This is used by CQPweb. |
Boolean |
True |
— |
false |
— |
InlineXmlWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Writes an approximation of the content of a textual CAS as an inline XML file. Optionally applies an XSLT stylesheet.
Note this component inherits the restrictions from CasToInlineXml:
- Features whose values are FeatureStructures are not represented.
- Feature values which are strings longer than 64 characters are truncated.
- Feature values which are arrays of primitives are represented by strings that look like [ xxx, xxx ]
- The Subject of analysis is presumed to be a text string.
- Some characters in the document's Subject-of-analysis are replaced by blanks, because the characters aren't valid in xml documents.
- It doesn't work for annotations which are overlapping, because these cannot be properly represented as properly - nested XML.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
Xslt |
XSLT stylesheet to apply. |
String |
False |
— |
false |
— |
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
JsonWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
UIMA JSON format writer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
jsonContextFormat |
— |
String |
True |
— |
false |
— |
omitDefaultValues |
— |
Boolean |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
prettyPrint |
— |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
typeSystemFile |
Location to write the type system to. If this is not set, a file called typesystem.xml will be written to the XMI output path. If this is set, it is expected to be a file relative to the current work directory or an absolute file. <br> If this parameter is set, the #PARAM_COMPRESSION parameter has no effect on the type system. Instead, if the file name ends in ".gz", the file will be compressed, otherwise not. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
Legacy Coref Data Writer
Category: Writer
Framework: GATE
Version: unknown
A simple PR that converts co-reference data from the Relations-based model to the legacy format (based on 'matches' annotation and document features).
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
corpus |
— |
gate.Corpus |
— |
— |
— |
true |
document |
— |
gate.Document |
— |
— |
— |
true |
MalletTopicProportionsWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Write topic proportions to a file in the shape depends on the {@link TopicDistribution annotation which should have been created by MalletTopicModelInferencer before.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
— |
String |
True |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
MalletTopicsProportionsSortedWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Write the topic proportions according to an LDA topic model to an output file. The proportions need to be inferred in a previous step using MalletTopicModelInferencer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
nTopics |
— |
Integer |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
— |
String |
True |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
PennTreebankCombinedWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Penn Treebank combined format writer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
emptyRootLabel |
— |
Boolean |
True |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
filenameSuffix |
Specify the suffix of output files. Default value <code>.penn</code>. If the suffix is not needed, provide an empty string as value. |
String |
True |
— |
false |
— |
noRootLabel |
— |
Boolean |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
sourceEncoding |
Name of configuration parameter that contains the character encoding used by the input files. |
String |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
RDF Writer
Category: Writer
Framework: NaCTeM (UIMA)
Version: 1.0
Saves Common Annotation Structures into RDF files.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
outputFilePrefix |
A name that will be attached to the beginning of an output filename. Filenames will have the form of "<outputFilePrefix><count>.rdf". |
String |
True |
— |
false |
— |
outputFolder |
A folder where RDF files will be written to. |
String |
True |
— |
false |
— |
RDFExport
Category: Writer
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
charset |
— |
java.lang.String |
True |
— |
— |
— |
fileName |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
files |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
format |
— |
org.apache.jena.riot.RDFFormat |
True |
— |
— |
— |
outDir |
— |
org.bibliome.util.files.OutputDirectory |
True |
— |
— |
— |
prefixes |
— |
alvisnlp.module.types.Mapping |
True |
— |
— |
— |
statements |
— |
alvisnlp.corpus.expressions.Expression[] |
True |
— |
— |
— |
RelpWriter
Category: Writer
Framework: AlvisNLP
Version: 2012-04-30
Writes the corpus in relp format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
dependencyLabelFeature |
— |
java.lang.String |
True |
— |
— |
— |
dependencyRelation |
— |
java.lang.String |
True |
— |
— |
— |
dependentForm |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
dependentRole |
— |
java.lang.String |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
headForm |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
headRole |
— |
java.lang.String |
True |
— |
— |
— |
lemmaForm |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
linkageNumberFeature |
— |
java.lang.String |
False |
— |
— |
— |
outFile |
— |
org.bibliome.util.streams.TargetStream |
True |
— |
— |
— |
pmid |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentenceLayer |
— |
java.lang.String |
True |
— |
— |
— |
sentenceRole |
— |
java.lang.String |
True |
— |
— |
— |
wordForm |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
wordLayer |
— |
java.lang.String |
True |
— |
— |
— |
SFTP XMI Writer
Category: Writer
Framework: NaCTeM (UIMA)
Version: 1.0
Saves Common Annotation Structures to an SFTP server
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
password |
— |
String |
True |
— |
false |
— |
port |
— |
Integer |
False |
— |
false |
— |
recorderEnabled |
— |
Boolean |
True |
— |
false |
— |
recorderJdbcUrl |
— |
String |
False |
— |
false |
— |
recorderPassword |
— |
String |
False |
— |
false |
— |
recorderUsername |
— |
String |
False |
— |
false |
— |
remoteDirectory |
— |
String |
True |
— |
false |
— |
server |
— |
String |
True |
— |
false |
— |
username |
— |
String |
True |
— |
false |
— |
SerializedCasWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
filenameExtension |
— |
String |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
typeSystemLocation |
Location to write the type system to. The type system is saved using Java serialization, it is not saved as a XML type system description. We recommend to use the name typesystem.ser. <br> The #PARAM_COMPRESSION parameter has no effect on the type system. Instead, if the type system file should be compressed or not is detected from the file name extension (e.g. ".gz"). <br> If this parameter is set, the type system and index repository are no longer serialized into the same file as the test of the CAS. The SerializedCasReader can currently not read such files. Use this only if you really know what you are doing. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
Simplified Text Exporter
Category: Writer
Framework: GATE
Version: unknown
Simplified text exporter (HTML output)
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
Simplified Text Exporter
Category: Writer
Framework: GATE
Version: unknown
Simplified text exporter (plain text output)
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
annotationSetName |
— |
java.lang.String |
— |
— |
— |
true |
SolrWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
A simple implementation of SolrWriter_ImplBase
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
optimizeIndex |
If set to true, the index is optimized once all documents are uploaded. Default is false. |
Boolean |
True |
— |
false |
— |
queueSize |
The buffer size before the documents are sent to the server (default: 10000). |
Integer |
True |
— |
false |
— |
solrIdField |
The name of the id field in the Solr schema (default: "id"). |
String |
True |
— |
false |
— |
targetLocation |
Solr server URL string in the form <prot>://<host>:<port>/<path>, e.g. http://localhost:8983/solr/collection1. |
String |
True |
— |
false |
— |
textField |
The name of the text field in the Solr schema (default: "text"). |
String |
True |
— |
false |
— |
threads |
The number of background threads used to empty the queue. Default: 1. |
Integer |
True |
— |
false |
— |
update |
Define whether existing documents with same ID are updated (true) of overwritten (false)? Default: true (update). |
Boolean |
True |
— |
false |
— |
waitFlush |
When committing to the index, i.e. when all documents are processed, block until index changes are flushed to disk? Default: true. |
Boolean |
True |
— |
false |
— |
waitSearcher |
When committing to the index, i.e. when all documents are processed, block until a new searcher is opened and registered as the main query searcher, making the changes visible? Default: true. |
Boolean |
True |
— |
false |
— |
TGrepWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
TGrep2 corpus file writer. Requires PennTrees to be annotated before.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Method to compress the tgrep file (only used if PARAM_WRITE_T2C is true). Only NONE, GZIP and BZIP2 are supported. Default: CompressionMethod#NONE |
String |
True |
— |
false |
— |
dropMalformedTrees |
If true, silently drops malformed Penn Trees instead of throwing an exception. Default: false |
Boolean |
True |
— |
false |
— |
targetLocation |
Path to which the output is written. |
String |
True |
— |
false |
— |
writeComments |
Set this parameter to true if you want to add a comment to each PennTree which is written to the output files. The comment is of the form documentId,beginOffset,endOffset. Default: true |
Boolean |
True |
— |
false |
— |
writeT2c |
Set this parameter to true if you want to encode directly into the tgrep2 binary format. Default: true |
Boolean |
True |
— |
false |
— |
TSV Writer
Category: Writer
Framework: NaCTeM (UIMA)
Version: 0.1
Saves annotations of a selected type to a file in tab-separated-value format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
FeaturePathSizeLimit |
The maximum size of feature paths. Features of complex types will be traversed if this value is greater than 0. |
Integer |
True |
— |
false |
— |
OutputFile |
— |
String |
True |
— |
false |
— |
OutputTypeShortNames |
If true, short names of types will be used in the resulting file, e.g., "Annotation" instead of "uima.tcas.Annotation". |
Boolean |
False |
— |
false |
— |
TargetType |
A UIMA type whose instances will be saved. For example, uima.tcas.Annotation. |
String |
True |
— |
false |
— |
TabularExport
Category: Writer
Framework: AlvisNLP
Version: 2012-04-30
Writes the corpus data structure in files in tabular format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
append |
— |
java.lang.Boolean |
False |
— |
— |
— |
charset |
— |
java.lang.String |
True |
— |
— |
— |
columns |
— |
alvisnlp.corpus.expressions.Expression[] |
True |
— |
— |
— |
fileName |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
files |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
footers |
— |
alvisnlp.corpus.expressions.Expression[] |
False |
— |
— |
— |
headers |
— |
alvisnlp.corpus.expressions.Expression[] |
False |
— |
— |
— |
lines |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
outDir |
— |
org.bibliome.util.files.OutputDirectory |
True |
— |
— |
— |
separator |
— |
java.lang.String |
True |
— |
— |
— |
trim |
— |
java.lang.Boolean |
False |
— |
— |
— |
TcfWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Writer for the WebLicht TCF format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
filenameSuffix |
Specify the suffix of output files. Default value <code>.tcf</code>. If the suffix is not needed, provide an empty string as value. |
String |
True |
— |
false |
— |
merge |
Merge with source TCF file if one is available.<br> Default: true |
Boolean |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
preserveIfEmpty |
If there are no annotations for a particular layer in the CAS, preserve any potentially existing annotations in the original TCF.<br> Default: false |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
TeiWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
UIMA CAS consumer writing the CAS document text in TEI format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
cTextPattern |
A token matching this pattern is rendered as a TEI "c" element instead of a "w" element. |
String |
True |
— |
false |
— |
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
filenameSuffix |
Specify the suffix of output files. Default value <code>.xml</code>. If the suffix is not needed, provide an empty string as value. |
String |
True |
— |
false |
— |
indent |
Indent the XML. |
Boolean |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
writeConstituent |
Write constituent annotations to the CAS. Disabled by default because it requires type priorities to be set up (Constituents must have a higher prio than Tokens). |
Boolean |
True |
— |
false |
— |
writeNamedEntity |
Write named entity annotations to the CAS. Overlapping named entities are not supported. |
Boolean |
True |
— |
false |
— |
TextWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
UIMA CAS consumer writing the CAS document text as plain text file.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
filenameSuffix |
Specify the suffix of output files. Default value <code>.txt</code>. If the suffix is not needed, provide an empty string as value. |
String |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
TfidfConsumer
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
This consumer builds a DfModel. It collects the df (document frequency) counts for the processed collection. The counts are serialized as a DfModel-object.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
featurePath |
This annotator is type agnostic, so it is mandatory to specify the type of the working annotation and how to obtain the string representation with the feature path. |
String |
True |
— |
false |
— |
lowercase |
If set to true, the whole text is handled in lower case. |
Boolean |
True |
— |
false |
— |
targetLocation |
Specifies the path and filename where the model file is written. |
String |
True |
— |
false |
— |
TigerXmlWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
UIMA CAS consumer writing the CAS document text in the TIGER-XML format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
filenameSuffix |
Specify the suffix of output files. Default value <code>.xml</code>. If the suffix is not needed, provide an empty string as value. |
String |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
TokenizedTextWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
This class writes a set of pre-processed documents into a large text file containing one sentence per line and tokens split by whitespaces. Optionally, annotations other than tokens (e.g. lemmas) are written as specified by #PARAM_FEATURE_PATH.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
featurePath |
The feature path, e.g. de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Lemma/value for lemmas. Default: de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token (i.e. token texts). <p> In order to specify a different annotation use the annotation class' type name (e.g. Token.class.getTypeName()) and optionally append a field, e.g. /value to specify the feature path. If you do not specify a field, the covered text is used. |
String |
True |
— |
false |
— |
numberRegex |
All tokens that match this regex are replaced by NUM. Examples: <ul> <li>^$ <li>^[0-9,\.]$ <li>^[0-9]+(\.[0-9]*)?$ </ul> <p> Make sure that these regular expressions are fit to the segmentation, e.g. if your work on tokens, your tokenizer might split prefixes such as + and - from the rest of the number. |
String |
False |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
stopwordsFile |
All the tokens listed in this file (one token per line) are replaced by STOP. Empty lines and lines starting with # are ignored. Casing is ignored. |
String |
False |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetEncoding |
Encoding for the target file. Default is UTF-8. |
String |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |
Web1TWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
Web1T n-gram index format writer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
contextType |
The type being used for segments |
String |
True |
— |
false |
— |
createIndexes |
Create the indexes that jWeb1T needs to operate. (default: true) |
Boolean |
False |
— |
false |
— |
inputTypes |
Types to generate n-grams from. Example: Token.class.getName() + "/pos/PosValue" for part-of-speech n-grams |
String |
True |
— |
true |
— |
lowercase |
Create a lower case index. |
Boolean |
False |
— |
false |
— |
maxNgramLength |
Maximum n-gram length. Default: 3 |
Integer |
False |
— |
false |
— |
minFreq |
Specifies the minimum frequency a NGram must have to be written to the final index. The specified value is interpreted as inclusive value, the default is 1. Thus, all NGrams with a frequency of at least 1 or higher will be written. |
Integer |
False |
— |
false |
— |
minNgramLength |
Minimum n-gram length. Default: 1 |
Integer |
False |
— |
false |
— |
splitFileTreshold |
The input file(s) is/are split into smaller files for quick access. An own file is created if the first two starting letters (or the starting letter if the word has a length of 1 character) account for at least x% of all starting letters in the input file(s). The default value for splitting a file is 1.0%. Every word that has starting characters which does not suffice the threshold is written with other words that also did not meet the threshold into an own file for miscellaneous words. A high threshold will lead to only a few, but large files and a most likely very large misc. file. A low threshold results in many small files. Use a zero or a negative value to write everything to one file. |
Float |
False |
— |
false |
— |
targetEncoding |
Character encoding of the output data. |
String |
False |
— |
false |
— |
targetLocation |
Location to which the output is written. |
String |
True |
— |
false |
— |
WhatsWrongExport
Category: Writer
Framework: AlvisNLP
Version: 2012-04-30
Writes files in What's Wrong with my NLP format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
dependent |
— |
java.lang.String |
True |
— |
— |
— |
documentFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
entities |
— |
java.lang.String[] |
False |
— |
— |
— |
entityType |
— |
java.lang.String |
False |
— |
— |
— |
head |
— |
java.lang.String |
True |
— |
— |
— |
label |
— |
java.lang.String |
True |
— |
— |
— |
outFile |
— |
org.bibliome.util.streams.TargetStream |
True |
— |
— |
— |
relationName |
— |
java.lang.String |
True |
— |
— |
— |
sectionFilter |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
sentence |
— |
java.lang.String |
True |
— |
— |
— |
sentences |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
wordForm |
— |
java.lang.String |
True |
— |
— |
— |
words |
— |
java.lang.String |
True |
— |
— |
— |
XMI Writer
Category: Writer
Framework: NaCTeM (UIMA)
Version: 1.1
Serialises entires common annotation structures (CAS) to XMI format.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
outputFolder |
The folder to write to XMI files. |
String |
True |
— |
false |
— |
overwrite |
— |
Boolean |
True |
— |
false |
— |
XMLWriter
Category: Writer
Framework: AlvisNLP
Version: 2010-10-28
Writes an XML serialization of the corpus into a file.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
outFile |
— |
org.bibliome.util.streams.TargetStream |
True |
— |
— |
— |
XMLWriter2
Category: Writer
Framework: AlvisNLP
Version: 2012-04-30
Writes the corpus data structure into a file via an XSLT stylesheet.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
fileName |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
indent |
— |
java.lang.Boolean |
True |
— |
— |
— |
outDir |
— |
org.bibliome.util.files.OutputDirectory |
True |
— |
— |
— |
roots |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
xslTransform |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
XMLWriter2ForINIST
Category: Writer
Framework: AlvisNLP
Version:
synopsis
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
active |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
fileName |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
outDir |
— |
org.bibliome.util.files.OutputDirectory |
True |
— |
— |
— |
roots |
— |
alvisnlp.corpus.expressions.Expression |
True |
— |
— |
— |
xslTransform |
— |
org.bibliome.util.streams.SourceStream |
True |
— |
— |
— |
XmiWriter
Category: Writer
Framework: DKPro Core (UIMA)
Version: 1.8.0
UIMA XMI format writer.
Parameter | Description | Type | Mandatory | Default Value | Multi-value | Runtime |
---|---|---|---|---|---|---|
compression |
Choose a compression method. (default: CompressionMethod#NONE) |
String |
False |
— |
false |
— |
escapeDocumentId |
URL-encode the document ID in the file name to avoid illegal characters (e.g. \, :, etc.) |
Boolean |
True |
— |
false |
— |
overwrite |
Allow overwriting target files (ignored when writing to ZIP archives). |
Boolean |
True |
— |
false |
— |
singularTarget |
Treat target location as a single file name. This is particularly useful if only a single input file is processed and the result should be written to a pre-defined output file instead of deriving the file name from the document URI or document ID. It can also be useful if the user wishes to force multiple input files to be written to a single target file. The latter case does not work for all formats (e.g. binary, XMI, etc.), but can be useful, e.g. for Conll-based formats. This option has no effect if the target location points to an archive location (ZIP/JAR). The #PARAM_COMPRESSION is respected, but does not automatically add an extension. The #PARAM_STRIP_EXTENSION has no effect as the original extension is not preserved. |
Boolean |
True |
— |
false |
— |
stripExtension |
Remove the original extension. |
Boolean |
True |
— |
false |
— |
targetLocation |
Target location. If this parameter is not yet, data is written to stdout. |
String |
False |
— |
false |
— |
typeSystemFile |
Location to write the type system to. If this is not set, a file called typesystem.xml will be written to the XMI output path. If this is set, it is expected to be a file relative to the current work directory or an absolute file. <br> If this parameter is set, the #PARAM_COMPRESSION parameter has no effect on the type system. Instead, if the file name ends in ".gz", the file will be compressed, otherwise not. |
String |
False |
— |
false |
— |
useDocumentId |
Use the document ID as file name even if a relative path information is present. |
Boolean |
True |
— |
false |
— |