com.norconex.importer.transformer
Class AbstractStringTransformer

java.lang.Object
  extended by com.norconex.importer.transformer.AbstractRestrictiveTransformer
      extended by com.norconex.importer.transformer.AbstractCharStreamTransformer
          extended by com.norconex.importer.transformer.AbstractStringTransformer
All Implemented Interfaces:
IImportHandler, IDocumentTransformer, Serializable
Direct Known Subclasses:
StripAfterTransformer, StripBeforeTransformer, StripBetweenTransformer

public abstract class AbstractStringTransformer
extends AbstractCharStreamTransformer

Base class to facilitate creating transformers on text content, load text into StringBuilder for memory processing, also giving more options (like fancy regex). This class check for free memory every 10KB of text read. If enough memory, it keeps going for another 10KB or until all the content is read, or the buffer size reaches half the available memory. In either case, it pass the buffer content so far for transformation (all of it for small enough content, and in several chunks for large content).

Implementors should be conscious about memory when dealing with the string builder.

Subclasses implementing IXMLConfigurable should allow this inner configuration:

  <contentTypeRegex>
      (regex to identify text content-types, overridding default)
  </contentTypeRegex>
  <restrictTo
          caseSensitive="[false|true]" >
          property="(name of header/metadata name to match)"
      (regular expression of value to match)
  </restrictTo>
 

Author:
Pascal Essiembre
See Also:
Serialized Form

Constructor Summary
AbstractStringTransformer()
           
 
Method Summary
 boolean equals(Object obj)
           
 int hashCode()
           
 String toString()
           
protected abstract  void transformStringDocument(String reference, StringBuilder content, Properties metadata, boolean parsed, boolean partialContent)
           
protected  void transformTextDocument(String reference, Reader input, Writer output, Properties metadata, boolean parsed)
           
 
Methods inherited from class com.norconex.importer.transformer.AbstractCharStreamTransformer
getContentTypeRegex, loadFromXML, saveToXML, setContentTypeRegex, transformRestrictedDocument
 
Methods inherited from class com.norconex.importer.transformer.AbstractRestrictiveTransformer
setRestriction, transformDocument
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

AbstractStringTransformer

public AbstractStringTransformer()
Method Detail

transformTextDocument

protected final void transformTextDocument(String reference,
                                           Reader input,
                                           Writer output,
                                           Properties metadata,
                                           boolean parsed)
                                    throws IOException
Specified by:
transformTextDocument in class AbstractCharStreamTransformer
Throws:
IOException

transformStringDocument

protected abstract void transformStringDocument(String reference,
                                                StringBuilder content,
                                                Properties metadata,
                                                boolean parsed,
                                                boolean partialContent)

equals

public boolean equals(Object obj)
Overrides:
equals in class AbstractCharStreamTransformer

hashCode

public int hashCode()
Overrides:
hashCode in class AbstractCharStreamTransformer

toString

public String toString()
Overrides:
toString in class AbstractCharStreamTransformer


Copyright © 2009-2013 Norconex Inc.. All Rights Reserved.