com.norconex.importer.parser.impl
Class AbstractTikaParser

java.lang.Object
  extended by com.norconex.importer.parser.impl.AbstractTikaParser
All Implemented Interfaces:
IDocumentParser, Serializable
Direct Known Subclasses:
FallbackParser, HTMLParser, PDFParser

public class AbstractTikaParser
extends Object
implements IDocumentParser

Base class wrapping Apache Tika parser for use by the importer.

Author:
Pascal Essiembre
See Also:
Serialized Form

Nested Class Summary
protected  class AbstractTikaParser.RecursiveMetadataParser
           
 
Field Summary
 
Fields inherited from interface com.norconex.importer.parser.IDocumentParser
RDF_BASE_URI, RDF_SUBJECT_CONTENT
 
Constructor Summary
AbstractTikaParser(org.apache.tika.parser.Parser parser, String format)
          Creates a new Tika-based parser.
 
Method Summary
protected  void addTikaMetadata(org.apache.tika.metadata.Metadata tikaMeta, Properties metadata)
           
 void parseDocument(InputStream inputStream, ContentType contentType, Writer output, Properties metadata)
          Parses a document.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AbstractTikaParser

public AbstractTikaParser(org.apache.tika.parser.Parser parser,
                          String format)
Creates a new Tika-based parser.

Parameters:
parser - Tika parser
format - one of Tika parser supported format
Method Detail

parseDocument

public final void parseDocument(InputStream inputStream,
                                ContentType contentType,
                                Writer output,
                                Properties metadata)
                         throws DocumentParserException
Description copied from interface: IDocumentParser
Parses a document.

Specified by:
parseDocument in interface IDocumentParser
Parameters:
inputStream - the document to parse
contentType - the content type of the document
output - where to save the extracted text
metadata - where to store the metadata
Throws:
DocumentParserException

addTikaMetadata

protected void addTikaMetadata(org.apache.tika.metadata.Metadata tikaMeta,
                               Properties metadata)


Copyright © 2009-2013 Norconex Inc.. All Rights Reserved.