Class RawText


  • public class RawText
    extends Sequence
    A Sequence supporting UNIX formatted text in byte[] format.

    Elements of the sequence are the lines of the file, as delimited by the UNIX newline character ('\n'). The file content is treated as 8 bit binary text, with no assumptions or requirements on character encoding.

    Note that the first line of the file is element 0, as defined by the Sequence interface API. Traditionally in a text editor a patch file the first line is line number 1. Callers may need to subtract 1 prior to invoking methods if they are converting from "line number" to "element index".

    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected byte[] content
      The file content for this sequence.
      static RawText EMPTY_TEXT
      A Rawtext of length 0
      protected IntList lines
      Map of line number to starting position within content.
    • Constructor Summary

      Constructors 
      Constructor Description
      RawText​(byte[] input)
      Create a new sequence from an existing content byte array.
      RawText​(java.io.File file)
      Create a new sequence from a file.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected java.lang.String decode​(int start, int end)
      Decode a region of the text into a String.
      java.lang.String getLineDelimiter()
      Get the line delimiter for the first line.
      java.lang.String getString​(int i)
      Get the text for a single line.
      java.lang.String getString​(int begin, int end, boolean dropLF)
      Get the text for a region of lines.
      static boolean isBinary​(byte[] raw)
      Determine heuristically whether a byte array represents binary (as opposed to text) content.
      static boolean isBinary​(byte[] raw, int length)
      Determine heuristically whether a byte array represents binary (as opposed to text) content.
      static boolean isBinary​(java.io.InputStream raw)
      Determine heuristically whether the bytes contained in a stream represents binary (as opposed to text) content.
      boolean isMissingNewlineAtEnd()
      Determine if the file ends with a LF ('\n').
      int size()  
      void writeLine​(java.io.OutputStream out, int i)
      Write a specific line to the output stream, without its trailing LF.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • EMPTY_TEXT

        public static final RawText EMPTY_TEXT
        A Rawtext of length 0
      • content

        protected final byte[] content
        The file content for this sequence.
      • lines

        protected final IntList lines
        Map of line number to starting position within content.
    • Constructor Detail

      • RawText

        public RawText​(byte[] input)
        Create a new sequence from an existing content byte array.

        The entire array (indexes 0 through length-1) is used as the content.

        Parameters:
        input - the content array. The array is never modified, so passing through cached arrays is safe.
      • RawText

        public RawText​(java.io.File file)
                throws java.io.IOException
        Create a new sequence from a file.

        The entire file contents are used.

        Parameters:
        file - the text file.
        Throws:
        java.io.IOException - if Exceptions occur while reading the file
    • Method Detail

      • size

        public int size()
        Specified by:
        size in class Sequence
        Returns:
        total number of items in the sequence.
      • writeLine

        public void writeLine​(java.io.OutputStream out,
                              int i)
                       throws java.io.IOException
        Write a specific line to the output stream, without its trailing LF.

        The specified line is copied as-is, with no character encoding translation performed.

        If the specified line ends with an LF ('\n'), the LF is not copied. It is up to the caller to write the LF, if desired, between output lines.

        Parameters:
        out - stream to copy the line data onto.
        i - index of the line to extract. Note this is 0-based, so line number 1 is actually index 0.
        Throws:
        java.io.IOException - the stream write operation failed.
      • isMissingNewlineAtEnd

        public boolean isMissingNewlineAtEnd()
        Determine if the file ends with a LF ('\n').
        Returns:
        true if the last line has an LF; false otherwise.
      • getString

        public java.lang.String getString​(int i)
        Get the text for a single line.
        Parameters:
        i - index of the line to extract. Note this is 0-based, so line number 1 is actually index 0.
        Returns:
        the text for the line, without a trailing LF.
      • getString

        public java.lang.String getString​(int begin,
                                          int end,
                                          boolean dropLF)
        Get the text for a region of lines.
        Parameters:
        begin - index of the first line to extract. Note this is 0-based, so line number 1 is actually index 0.
        end - index of one past the last line to extract.
        dropLF - if true the trailing LF ('\n') of the last returned line is dropped, if present.
        Returns:
        the text for lines [begin, end).
      • decode

        protected java.lang.String decode​(int start,
                                          int end)
        Decode a region of the text into a String. The default implementation of this method tries to guess the character set by considering UTF-8, the platform default, and falling back on ISO-8859-1 if neither of those can correctly decode the region given.
        Parameters:
        start - first byte of the content to decode.
        end - one past the last byte of the content to decode.
        Returns:
        the region [start, end) decoded as a String.
      • isBinary

        public static boolean isBinary​(byte[] raw)
        Determine heuristically whether a byte array represents binary (as opposed to text) content.
        Parameters:
        raw - the raw file content.
        Returns:
        true if raw is likely to be a binary file, false otherwise
      • isBinary

        public static boolean isBinary​(java.io.InputStream raw)
                                throws java.io.IOException
        Determine heuristically whether the bytes contained in a stream represents binary (as opposed to text) content. Note: Do not further use this stream after having called this method! The stream may not be fully read and will be left at an unknown position after consuming an unknown number of bytes. The caller is responsible for closing the stream.
        Parameters:
        raw - input stream containing the raw file content.
        Returns:
        true if raw is likely to be a binary file, false otherwise
        Throws:
        java.io.IOException - if input stream could not be read
      • isBinary

        public static boolean isBinary​(byte[] raw,
                                       int length)
        Determine heuristically whether a byte array represents binary (as opposed to text) content.
        Parameters:
        raw - the raw file content.
        length - number of bytes in raw to evaluate. This should be raw.length unless raw was over-allocated by the caller.
        Returns:
        true if raw is likely to be a binary file, false otherwise
      • getLineDelimiter

        public java.lang.String getLineDelimiter()
        Get the line delimiter for the first line.
        Returns:
        the line delimiter or null
        Since:
        2.0