Reader Class Reference
 
 
 
Reader Class Reference

This reference page is linked to from the following overview topics: New Classes and Methods, Unicode.


#include <maxtextfile.h>

Inheritance diagram for Reader:
BaseTextReader MaxHeapOperators

Class Description

Reads and interprets text files.

This class was designed to perform file and stream I/O in a code page neutral way.

It was designed to perform resolve the following problems: * Reads and interprets correctly the BOM (an invisible character at the beginning of unicode files.) * Detect correctly UTF-8 and UTF-16 files. (even if it's not signed.) * Detect encoding cookies. XML files usually begins with "<?xml encoding='????'>". The detection algorithm will interpret this directive correctly. * Prevent from splitting a character. In UTF-16, UTF-8 and some ANSI codepage, characters can be stored on 1 to 6 bytes. All the operations of this object are designed to avoid to returns a partial character.

Plugin developer should consider using this class to perform File I/O to ensure that the files they generate remain compatible to previous version of Max.

See also:
ReaderWriter class when file IO requirements are for reading and writing.

Public Types

enum   TextFileReaderEncoding { FAVOR_UTF8 = 0x10000000, FOUND_BOM = 0x20000000, FOUND_COOKIE = 0x40000000, FLIPPED = 0x80000000 }
  Text file reading encoding. More...
enum   EOFCharacterHandling { DEFAULT_EOF_HANDLING, STOP_READING_AT_EOF, FILTEROUT_EOF_CHARACTER, IGNORE_EOF_CHARACTER }
  EOF character handling. More...

Public Member Functions

  Reader ()
  Default Constructor.
virtual  ~Reader ()
  Destructor.
bool  Open (FILE *file, unsigned int encoding=0, LineEndMode mode=Text)
  Wrap ANSI C FILE pointer This service is used to allow a developer to access a file open with fopen but with the advantage to use the SDK API.
bool  Open (HANDLE fileHandle, unsigned int encoding=0, LineEndMode mode=Text)
  Wrap a Win32 file handle.
bool  Open (const MCHAR *fileName, unsigned int encoding=0, LineEndMode mode=Text)
  Open a file using a file name.
bool  Open (const MaxSDK::Util::MaxString &fileName, unsigned int encoding=0, LineEndMode mode=Text)
  Open a file using a file name.
void  Close ()
  Close the underlying stream and free any intermediate data.
MaxString  LastError () const
  Returns the last error return by the BinaryStream.
unsigned int  Encoding () const
  Returns the current encoding of this file.
LineEndMode  Mode () const
  Determine how this reader handles line ending.
void  SetReadBufferSize (size_t readSize)
  Set the read buffer size.
size_t  GetReadBufferSize () const
  Returns the read buffer size.
void  SetDetectSize (size_t detectSize)
  Set the default size of the buffer used to validate encoding.
size_t  GetDetectSize () const
  Returns the detect buffer size.
size_t  NumberOfChars () const
  Return the number of characters in the file.
size_t  NumberOfLines () const
  Calculate the total number of line in the file.
void  SetEOFCharacterHandling (EOFCharacterHandling)
  Set the EOF character handling method and refilter the buffer.
EOFCharacterHandling  GetEOFCharacterHandling () const
  Get the EOF character handling method.
virtual Char  ReadChar (bool peek=false) const
  Reads a single char.
virtual unsigned int  ReadCharUTF32 (bool peek=false) const
  Reads a single char and returns it's UTF32 representation.
virtual MaxString  ReadChars (size_t nchars) const
  Reads characters from the file.
virtual MaxString  ReadLine (size_t nchars=(size_t)-1, bool dontReturnEOL=false) const
  Reads a line from the file (or nchars, depending on which one comes first.)
virtual MaxString  ReadChunk (size_t len, bool dontReturnLastEOL=false) const
  Reads up to "len" bytes from the file and convert it to a unicode-compliant string.
virtual MaxString  ReadFull () const
  Reads the file in a single operation and returns it in a single string object.
virtual size_t  PositionBytes () const
  Get the number of bytes read so far.
size_t  Position () const
  Get the number of characters read so far.
size_t  LineNumber () const
  Get the current line number.
virtual size_t  Seek (long offset, int origin)
  Seek inside the stream.
virtual bool  IsEndOfFile () const
  Returns true if file at the end.
virtual bool  IsFileOpen () const
  Returns true if file is open.
virtual size_t  UnreadChar (const Char &c)
  Unread a character.

Static Public Member Functions

static bool  Detect (void *data, size_t len, unsigned int &encoding, size_t *ignoreBytes)
  Detect the encoding of the passed buffer.
static size_t  NumberOfChars (const void *data, size_t len, unsigned int encoding=CP_ACP)
  Determine the number of chars inside a buffer.
static size_t  NumberOfLines (const void *data, size_t length, unsigned int encoding=CP_ACP)
  Determine the number of line feed inside a buffer.
template<typename ChType , ChType ch>
static size_t  RemoveCharacter (ChType *data, size_t len)
  Remove ch character from the passed buffer.
template<typename ChType , ChType ch>
static size_t  TruncateAtCharacter (ChType *data, size_t len)
  Truncate buffer at ch character.

Protected Types

enum   TextFileReaderError {
  ALL_OK, STREAM_INVALID_ARGUMENT, STREAM_ALREADY_OPEN, STREAM_NOT_OPEN,
  ERR_INVALID_FORMAT, STREAM_ERROR
}
  Internal processing error code. More...

Protected Member Functions

void  Detect (size_t len) const
  Detect the opened file encoding by analyzing the first "len" bytes of the file.
size_t  FillBuffer (size_t len, bool force=false, bool binary=false, bool detecting_encoding=false) const
  Read and cache len bytes from stream.
size_t  Filter (size_t pos) const
  Apply the selected open mode on the internal cache buffer up to position.
size_t  Filter (size_t pos, bool processBufferBoundaryCRLF) const
  Apply the selected open mode on the internal cache buffer up to position.
size_t  EnsureBufferContains (size_t len) const
  Ensure buffer size can contains the passed length.
size_t  NumberOfChars (const void *, size_t) const
  Return the number of characters in the passed buffer depending on the current encoding.
size_t  ConvertNumUTF8CharsToNumBytes (const char *data, size_t num)
  Return the number of bytes corresponding to the num of UTF8 chars in passed buffer.
size_t  ConvertNumUTF16CharsToNumBytes (const MCHAR *data, size_t num)
  Return the number of bytes corresponding to the num of UTF16 chars in passed buffer.
size_t  ConvertNumCharsToNumBytes (const char *data, size_t num, unsigned int encoding)
  Return the number of bytes corresponding to the num of chars in passed buffer depending on the encoding.
size_t  NumberOfLines (const void *, size_t) const
  Calculate the total number of line in the passed buffer.
template<typename ChType , typename CharLengthFunctor , int maxCharLength>
ChType *  ReadChar (size_t &charLengthT, bool peek, const CharLengthFunctor &CharLengthFunction) const
  INTERNAL FUNCTION.
template<typename ChType >
MaxString  MakeString (const ChType *data, size_t length, bool dontReturnEndingCRLF) const
size_t  Unread (const MaxString &string)
  Unread String.
size_t  SeekToEnd (long offset=0)
  INTERNAL FUNCTION.
size_t  SeekToAbsolute (long offset)
  Seek to an absolute point inside the text stream.
size_t  SeekFromCurrent (long offset)
  Advance "offset" characters.
bool  Open (BinaryStream *stream, unsigned int encoding=0, LineEndMode mode=Text, bool closeOnDelete=false)
  Open an abstract BinaryStream.

Protected Attributes

BinaryStream _stream
bool  _streamDelete
bool  _readCR
bool  _readLF
LineEndMode  _endOfLineMode
size_t  _detectSize
size_t  _readSize
EOFCharacterHandling  _eofCharacterHandling
TextFileReaderError  _error
unsigned int  _encoding
bool  _encodingDetected
BinaryStreamMemory _backbuffer
size_t  _ignoreBytes
size_t  _positionBytes
size_t  _positionChars
size_t  _line

Friends

class  ReaderWriter
class  CharBinaryStream
class  BinaryStreamMemory

Member Enumeration Documentation

Text file reading encoding.

Enumerator:
FAVOR_UTF8 
FOUND_BOM 
FOUND_COOKIE 
FLIPPED 
                                    {
                /* If the file's encoding cannot be detected, favor UTF-8.
                   By default, we favor ACP encoding. */
                FAVOR_UTF8 = 0x10000000,

                /* Found a BOM at the beginning of the file. */
                FOUND_BOM = 0x20000000,

                /* Found a cookie at the beginning of the file. */
                FOUND_COOKIE = 0x40000000,

                /* Found flipped UTF-16 data. */
                FLIPPED = 0x80000000
        };

EOF character handling.

Enumerator:
DEFAULT_EOF_HANDLING 
STOP_READING_AT_EOF 

Used by ReaderWriter to override default value.

FILTEROUT_EOF_CHARACTER 

Reading of file terminate at EOF character - this is Reader in text mode.

IGNORE_EOF_CHARACTER 

EOF character are filter out - this is the ReaderWriter preferred mode.

EOF character are read as regular character - this is Reader in binary mode.

enum TextFileReaderError [protected]

Internal processing error code.

Used by LastError to determine the proper message to generate.

Enumerator:
ALL_OK 
STREAM_INVALID_ARGUMENT 

No errors.

STREAM_ALREADY_OPEN 

Invalid argument error.

STREAM_NOT_OPEN 

Stream already open.

ERR_INVALID_FORMAT 

Stream not open.

STREAM_ERROR 

Invalid.

Stream error


Constructor & Destructor Documentation

Reader ( )

Default Constructor.

virtual ~Reader ( ) [virtual]

Destructor.


Member Function Documentation

void Detect ( size_t  len ) const [protected]

Detect the opened file encoding by analyzing the first "len" bytes of the file.

Parameters:
len Size of the buffer to use to detect the encoding
Returns:
.
size_t FillBuffer ( size_t  len,
bool  force = false,
bool  binary = false,
bool  detecting_encoding = false 
) const [protected]

Read and cache len bytes from stream.

Parameters:
len
force
binary
size_t Filter ( size_t  pos ) const [protected]

Apply the selected open mode on the internal cache buffer up to position.

Parameters:
pos Position to stop filtering
Returns:
Size of the buffer filtered
size_t Filter ( size_t  pos,
bool  processBufferBoundaryCRLF 
) const [protected]

Apply the selected open mode on the internal cache buffer up to position.

Parameters:
pos Position to stop filtering
processBufferBoundaryCRLF,if true, take care of the CR or LF which is read in the last FillBuffer call
Returns:
Size of the buffer filtered
size_t EnsureBufferContains ( size_t  len ) const [protected]

Ensure buffer size can contains the passed length.

Parameters:
len The minimum size of the cache buffer
Returns:
The remaining buffer size
size_t NumberOfChars ( const void *  ,
size_t   
) const [protected]

Return the number of characters in the passed buffer depending on the current encoding.

Parameters:
Buffer to evaluate
Size of the passed buffer
Returns:
The total number of chars.
size_t ConvertNumUTF8CharsToNumBytes ( const char *  data,
size_t  num 
) [protected]

Return the number of bytes corresponding to the num of UTF8 chars in passed buffer.

Parameters:
Buffer to evaluate
num of chars in the passed buffer
Returns:
The total number of bytes.
size_t ConvertNumUTF16CharsToNumBytes ( const MCHAR *  data,
size_t  num 
) [protected]

Return the number of bytes corresponding to the num of UTF16 chars in passed buffer.

Parameters:
Buffer to evaluate
num of chars in the passed buffer
Returns:
The total number of bytes.
size_t ConvertNumCharsToNumBytes ( const char *  data,
size_t  num,
unsigned int  encoding 
) [protected]

Return the number of bytes corresponding to the num of chars in passed buffer depending on the encoding.

Parameters:
Buffer to evaluate
num of chars in the passed buffer
Returns:
The total number of bytes.
size_t NumberOfLines ( const void *  ,
size_t   
) const [protected]

Calculate the total number of line in the passed buffer.

Parameters:
The buffer to evaluate the number of line
The size of the buffer
Returns:
The total number of lines.
ChType* ReadChar ( size_t &  charLengthT,
bool  peek,
const CharLengthFunctor &  CharLengthFunction 
) const [protected]

INTERNAL FUNCTION.

Used in the implementation of ReadChar() and ReadCharUTF32().

MaxString MakeString ( const ChType *  data,
size_t  length,
bool  dontReturnEndingCRLF 
) const [protected]
size_t Unread ( const MaxString string ) [protected]

Unread String.

Put back a sequence of character inside the buffer. The data will be re-read next time you call read. This is used internally when parsing max scripts.

Parameters:
string String to put back in the buffer
Returns:
Unread size in number of characters.
size_t SeekToEnd ( long  offset = 0 ) [protected]

INTERNAL FUNCTION.

Used by Seek Seek inside this text stream with the end as the reference point.

Parameters:
offset Offset characters from end of file to seek to
Returns:
Returns the absolute position of the text file. (in chars)
size_t SeekToAbsolute ( long  offset ) [protected]

Seek to an absolute point inside the text stream.

Parameters:
offset Offset characters from end of file to seek to
Returns:
Returns the absolute position of the text file. (in chars)
size_t SeekFromCurrent ( long  offset ) [protected]

Advance "offset" characters.

Parameters:
offset Offset characters from end of file to seek to
Returns:
Returns the absolute position of the text file. (in chars)
bool Open ( BinaryStream stream,
unsigned int  encoding = 0,
LineEndMode  mode = Text,
bool  closeOnDelete = false 
) [protected]

Open an abstract BinaryStream.

Parameters:
stream Opened stream the Reader uses
encoding This parameter can contains hint to the detection algorithm. Acceptable values are all codepages numbers that are recognized by Windows.

In addition to that, you can also specify FAVOR_UTF8. It can be used to cascade the detection of the codepage. For example, if you specify "CP_ACP | FAVOR_UTF8", the detection algorithm will treat any non-UTF8 data as ACP.

See also:
TextFileReaderEncoding
Parameters:
mode
See also:
LineEndMode - default Text
Parameters:
closeOnDelete Delete the "stream" at the same time of this object.
bool Open ( FILE *  file,
unsigned int  encoding = 0,
LineEndMode  mode = Text 
)

Wrap ANSI C FILE pointer This service is used to allow a developer to access a file open with fopen but with the advantage to use the SDK API.

Using this service allows the developer to not worry about character encoding The developer is responsible to close the file once he is done.

Parameters:
file ANSI C FILE pointer
encoding This parameter can contains hint to the detection algorithm. Acceptable values are all codepages numbers that are recognized by Windows.

In addition to that, you can also specify FAVOR_UTF8. It can be used to cascade the detection of the codepage. For example, if you specify "CP_ACP | FAVOR_UTF8", the detection algorithm will treat any non-UTF8 data as ACP.

See also:
TextFileReaderEncoding The flag is only used when no BOM is present or if the file is a new file. If the file has been open with css=<encoding>, a BOM is present and this parameter is ignored.
Parameters:
mode
See also:
LineEndMode - default Text
Returns:
true if successful, false otherwise
bool Open ( HANDLE  fileHandle,
unsigned int  encoding = 0,
LineEndMode  mode = Text 
)

Wrap a Win32 file handle.

Parameters:
fileHandle File Handle
encoding This parameter can contains hint to the detection algorithm. Acceptable values are all codepages numbers that are recognized by Windows.

In addition to that, you can also specify FAVOR_UTF8. It can be used to cascade the detection of the codepage. For example, if you specify "CP_ACP | FAVOR_UTF8", the detection algorithm will treat any non-UTF8 data as ACP.

See also:
TextFileReaderEncoding
Parameters:
mode
See also:
LineEndMode - default Text
Returns:
true if successful, false otherwise
bool Open ( const MCHAR *  fileName,
unsigned int  encoding = 0,
LineEndMode  mode = Text 
)

Open a file using a file name.

Parameters:
fileName File name to open. If file does not exist, it will be created.
encoding This parameter can contains hint to the detection algorithm. Acceptable values are all codepages numbers that are recognized by Windows.

In addition to that, you can also specify FAVOR_UTF8. It can be used to cascade the detection of the codepage. For example, if you specify "CP_ACP | FAVOR_UTF8", the detection algorithm will treat any non-UTF8 data as ACP.

See also:
TextFileReaderEncoding
Parameters:
mode
See also:
LineEndMode - default Text
Returns:
true if successful, false otherwise
bool Open ( const MaxSDK::Util::MaxString fileName,
unsigned int  encoding = 0,
LineEndMode  mode = Text 
)

Open a file using a file name.

Parameters:
fileName File name to open. If file does not exist, it will be created.
encoding This parameter can contains hint to the detection algorithm. Acceptable values are all codepages numbers that are recognized by Windows.

In addition to that, you can also specify FAVOR_UTF8. It can be used to cascade the detection of the codepage. For example, if you specify "CP_ACP | FAVOR_UTF8", the detection algorithm will treat any non-UTF8 data as ACP.

See also:
TextFileReaderEncoding
Parameters:
mode
See also:
LineEndMode - default Text
Returns:
true if successful, false otherwise
void Close ( )

Close the underlying stream and free any intermediate data.

MaxString LastError ( ) const

Returns the last error return by the BinaryStream.

Returns:
Error string
unsigned int Encoding ( ) const

Returns the current encoding of this file.

See also:
TextFileReaderEncoding The actual code page can be retrieved this way : "Encoding() & MSDE_CP_MASK"
Returns:
Returns the actual encoding found
LineEndMode Mode ( ) const

Determine how this reader handles line ending.

Returns:
LineEndMode
void SetReadBufferSize ( size_t  readSize )

Set the read buffer size.

The larger the buffer is, the better read performance is.

Parameters:
readSize Size of the buffer to read. Default 4096
size_t GetReadBufferSize ( ) const

Returns the read buffer size.

void SetDetectSize ( size_t  detectSize )

Set the default size of the buffer used to validate encoding.

This parameter is used internally when calling Detect.

Parameters:
detectSize Size of the buffer used when detecting the current character type. Default 65536
size_t GetDetectSize ( ) const

Returns the detect buffer size.

size_t NumberOfChars ( ) const

Return the number of characters in the file.

Returns:
The total number of chars.
size_t NumberOfLines ( ) const

Calculate the total number of line in the file.

Returns:
The total number of lines.
void SetEOFCharacterHandling ( EOFCharacterHandling  )

Set the EOF character handling method and refilter the buffer.

See also:
EOFCharacterHandling
EOFCharacterHandling GetEOFCharacterHandling ( ) const

Get the EOF character handling method.

See also:
EOFCharacterHandling
virtual Char ReadChar ( bool  peek = false ) const [virtual]

Reads a single char.

Parameters:
peek Read a char but does not move the internal pointer to next char. Default is false so we move to next character
Returns:
The character read.

Implements BaseTextReader.

virtual unsigned int ReadCharUTF32 ( bool  peek = false ) const [virtual]

Reads a single char and returns it's UTF32 representation.

Parameters:
peek Read a char but does not move the internal pointer to next char. Default is false so we move to next character
Returns:
The UTF32 char representation

Implements BaseTextReader.

virtual MaxString ReadChars ( size_t  nchars ) const [virtual]

Reads characters from the file.

Parameters:
nchars Stop reading after 'nchars' characters.
Returns:
Line read.

Implements BaseTextReader.

virtual MaxString ReadLine ( size_t  nchars = (size_t)-1,
bool  dontReturnEOL = false 
) const [virtual]

Reads a line from the file (or nchars, depending on which one comes first.)

Parameters:
nchars Stop reading after 'nchars' characters even if the EOL was not found.
dontReturnEOL By default, this function will returns the line including it's end-of-line character(s) unless you set "dontReturnEOL" to true.

Implements BaseTextReader.

virtual MaxString ReadChunk ( size_t  len,
bool  dontReturnLastEOL = false 
) const [virtual]

Reads up to "len" bytes from the file and convert it to a unicode-compliant string.

Parameters:
len Number of bytes to take out of the underlying stream.
dontReturnLastEOL Determine if this function will trim the last EOL sequence.
virtual MaxString ReadFull ( ) const [virtual]

Reads the file in a single operation and returns it in a single string object.

Returns:
The full stream content
virtual size_t PositionBytes ( ) const [virtual]

Get the number of bytes read so far.

size_t Position ( ) const [virtual]

Get the number of characters read so far.

Implements BaseTextReader.

size_t LineNumber ( ) const [virtual]

Get the current line number.

Implements BaseTextReader.

virtual size_t Seek ( long  offset,
int  origin 
) [virtual]

Seek inside the stream.

Parameters:
offset The seeks operations are done in number of characters (not bytes).
origin The direction to move. Origin can be one of the following * SEEK_CUR Current position of file pointer. * SEEK_END End of file. * SEEK_SET Beginning of file.
Returns:
Returns the absolute position of the text file. (in chars)

Implements BaseTextReader.

virtual bool IsEndOfFile ( ) const [virtual]

Returns true if file at the end.

Implements BaseTextReader.

virtual bool IsFileOpen ( ) const [virtual]

Returns true if file is open.

Implements BaseTextReader.

virtual size_t UnreadChar ( const Char c ) [virtual]

Unread a character.

Put back a character inside the buffer. The data will be re-read next time you call read.

Parameters:
c Char to put back in the buffer
Returns:
Number of character written.

Implements BaseTextReader.

static bool Detect ( void *  data,
size_t  len,
unsigned int &  encoding,
size_t *  ignoreBytes 
) [static]

Detect the encoding of the passed buffer.

Parameters:
data Buffer to detect the encoding
len Size of the passed buffer
encoding (in/out) In input, tell the detector what to expect. On output it contains what the detector found.
ignoreBytes (out) On output, tell the caller how much bytes it must ignore at the beginning of the file because of the BOM.
Returns:
Returns true if the encoding was formally detected. Or false, if it was guessed.
static size_t NumberOfChars ( const void *  data,
size_t  len,
unsigned int  encoding = CP_ACP 
) [static]

Determine the number of chars inside a buffer.

It's more complex than just strlen or wcslen. Those two functions will returns the number of char or WCHAR entries. This function will returns the number of of characters (or symbol).

Parameters:
data Buffer containing a string to count the number of symbols
len Size of the buffer to check
encoding Encoding to use to count the number of symbols
Returns:
Number of symbols
static size_t NumberOfLines ( const void *  data,
size_t  length,
unsigned int  encoding = CP_ACP 
) [static]

Determine the number of line feed inside a buffer.

Parameters:
data Buffer in which '
' are counted.
length Length of data (in MCHAR)
encoding Encoding of "data". Can be any valid encoding. ie. MSDE_CP_UTF16, CP_UTF8, CP_ACP, etc.
Returns:
Number of line feed detected
static size_t RemoveCharacter ( ChType *  data,
size_t  len 
) [static]

Remove ch character from the passed buffer.

Parameters:
data Buffer in which ch are to be removed. len Length of data (in MCHAR)
static size_t TruncateAtCharacter ( ChType *  data,
size_t  len 
) [static]

Truncate buffer at ch character.

Parameters:
data Buffer to validate. len Length of data (in MCHAR)

Friends And Related Function Documentation


Member Data Documentation

bool _readCR [mutable, protected]
bool _readLF [mutable, protected]
unsigned int _encoding [mutable, protected]
size_t _ignoreBytes [mutable, protected]
size_t _positionBytes [mutable, protected]
size_t _positionChars [mutable, protected]
size_t _line [mutable, protected]