com.norconex.importer.transformer
Class AbstractCharStreamTransformer

java.lang.Object
  extended by com.norconex.importer.transformer.AbstractRestrictiveTransformer
      extended by com.norconex.importer.transformer.AbstractCharStreamTransformer
All Implemented Interfaces:
IImportHandler, IDocumentTransformer, Serializable
Direct Known Subclasses:
AbstractStringTransformer

public abstract class AbstractCharStreamTransformer
extends AbstractRestrictiveTransformer

Base class for transformers dealing with text documents only. Subclasses can safely be used as either pre-parse or post-parse handlers.

For pre-parsing, non-text documents will simply be ignored and no transformation will occur. To find out if a document is a text-one, the metadata Importer.DOC_CONTENT_TYPE value is used. By default any content type starting with "text/" is considered text. This default behavior can be changed with the setContentTypeRegex(String) method. One must make sure to only match text documents to parsing exceptions.

For post-parsing, all documents are assumed to be text.

Sub-classes can restrict to which document to apply this transformation based on document metadata (see AbstractRestrictiveTransformer).

Subclasses implementing IXMLConfigurable should allow this inner configuration:

  <contentTypeRegex>
      (regex to identify text content-types, overridding default)
  </contentTypeRegex>
  <restrictTo
          caseSensitive="[false|true]" >
          property="(name of header/metadata name to match)"
      (regular expression of value to match)
  </restrictTo>
 

Author:
Pascal Essiembre
See Also:
Serialized Form

Constructor Summary
AbstractCharStreamTransformer()
           
 
Method Summary
 boolean equals(Object obj)
           
 String getContentTypeRegex()
           
 int hashCode()
           
protected  void loadFromXML(XMLConfiguration xml)
          Convenience method for subclasses to load content type regex.
protected  void saveToXML(XMLStreamWriter writer)
          Convenience method for subclasses to save content type regex.
 void setContentTypeRegex(String contentTypeRegex)
           
 String toString()
           
protected  void transformRestrictedDocument(String reference, InputStream input, OutputStream output, Properties metadata, boolean parsed)
           
protected abstract  void transformTextDocument(String reference, Reader input, Writer output, Properties metadata, boolean parsed)
           
 
Methods inherited from class com.norconex.importer.transformer.AbstractRestrictiveTransformer
setRestriction, transformDocument
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

AbstractCharStreamTransformer

public AbstractCharStreamTransformer()
Method Detail

getContentTypeRegex

public String getContentTypeRegex()

setContentTypeRegex

public void setContentTypeRegex(String contentTypeRegex)

transformRestrictedDocument

protected final void transformRestrictedDocument(String reference,
                                                 InputStream input,
                                                 OutputStream output,
                                                 Properties metadata,
                                                 boolean parsed)
                                          throws IOException
Specified by:
transformRestrictedDocument in class AbstractRestrictiveTransformer
Throws:
IOException

transformTextDocument

protected abstract void transformTextDocument(String reference,
                                              Reader input,
                                              Writer output,
                                              Properties metadata,
                                              boolean parsed)
                                       throws IOException
Throws:
IOException

loadFromXML

protected void loadFromXML(XMLConfiguration xml)
Convenience method for subclasses to load content type regex. (attribute "contentTypeRegex").

Overrides:
loadFromXML in class AbstractRestrictiveTransformer
Parameters:
xml - xml configuration

saveToXML

protected void saveToXML(XMLStreamWriter writer)
                  throws XMLStreamException
Convenience method for subclasses to save content type regex.

Overrides:
saveToXML in class AbstractRestrictiveTransformer
Parameters:
writer - XML writer
Throws:
XMLStreamException - problem saving

toString

public String toString()
Overrides:
toString in class AbstractRestrictiveTransformer

hashCode

public int hashCode()
Overrides:
hashCode in class AbstractRestrictiveTransformer

equals

public boolean equals(Object obj)
Overrides:
equals in class AbstractRestrictiveTransformer


Copyright © 2009-2013 Norconex Inc.. All Rights Reserved.