Collaboration diagram for Tokenizer:

Class Description

This class splits a given character array (e.g. a const char* or an AString object), which contains data separated by a delimiter character, into tokens of type Substring.

After an instance of this class is constructed, there are three methods available:

HasNext:
Indicates if there are further tokens available.
Next:
Sets the Substring in field Actual to reference the next token and returns it.
With each call to Next, a different delimiter can be provided, which then serves as the delimiter for this and subsequent tokens.
The returned token by default will be trimmed according to the current trimable characters.
Rest: Like Next, however returns the complete remaining region without searching for further delimiters (and tokens).
After this method was invoked, HasNext() will return false.

After a token was retrieved, it might be modified using the interface of class Substring. (In other words, the tokenizer does not rely on the bounds of the current token when receiving the next.) Consequently, it is allowed to recursively tokenize a token by creating a different instance of class Tokenizer providing the returned token as input.

If created or set using a reference of class AString, the buffer of AString is not copied. This allows efficient operations on sub-strings of class AString. However, the source string must not be changed (or only in a controlled way) during the use the Tokenizer instance.

Objects of this class can be reused by freshly initializing them using one of the overloaded Set methods.

Sample code:
The following code sample shows how to tokenize a string, including using one nested tokenizer:

    // data string to tokenize
    AString data= new AString( "test;  abc ; 1,2 , 3 ; xyz ; including;separator" );
 
    // create tokenizer on data with ';' as delimiter
    Tokenizer tknzr= new Tokenizer( data, ';' );
 
    // read tokens
 
    System.Console.WriteLine( tknzr.Next() ); // will print "test"
    System.Console.WriteLine( tknzr.Next() ); // will print "abc"
    System.Console.WriteLine( tknzr.Next() ); // will print "1,2 , 3"
 
    // tokenize actual (third) token (nested tokenizer)
    Tokenizer subTknzr= new Tokenizer( tknzr.Actual,  ',');
    System.Console.Write( subTknzr.Next().ToString() );
 
    while( subTknzr.HasNext() )
        System.Console.Write( "~" + subTknzr.Next().ToString() );
 
    System.Console.WriteLine();
 
    // continue with the main tokenizer
    System.Console.WriteLine( tknzr.Next().ToString() ); // will print "xyz"
 
    // grab the rest, as we know that the last token might include our separator character
    System.Console.WriteLine( tknzr.GetRest().ToString() ); // will print "including;separator"

The output will be:

test
abc
1,2 , 3
1~2~3
xyz
including;separator

Public Fields
Substring	Actual = new Substring()

Substring	Rest = new Substring()

char[]	Whitespaces = CString.DefaultWhitespaces

Public Methods
	Tokenizer ()

	Tokenizer (AString src, char delim, bool skipEmptyTokens=false)

	Tokenizer (String src, char delim, bool skipEmptyTokens=false)

	Tokenizer (Substring src, char delim, bool skipEmptyTokens=false)

Substring	GetRest (Whitespaces trimming=lang.Whitespaces.Trim)

bool	HasNext ()

Substring	Next (Whitespaces trimming=lang.Whitespaces.Trim, char newDelim='\0')

void	Set (AString src, char delim, bool skipEmptyTokens=false)

void	Set (String src, char delim, bool skipEmptyTokens=false)

void	Set (Substring src, char delim, bool skipEmptyTokens=false)

override String	ToString ()

Protected Fields
char	delim = '\0'

bool	skipEmptyTokens

Constructor & Destructor Documentation

◆ Tokenizer() [1/4]

Tokenizer ( )

inline

Constructs an empty tokenizer. Before use, method Set to initialize needs to be invoked.

◆ Tokenizer() [2/4]

Tokenizer	(	String	src,
		char	delim,
		bool	skipEmptyTokens = `false`
	)

inline

Constructs a tokenizer to work on a given cstring.

Parameters

src	The string to be tokenized.
delim	The delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokens	If `true`, empty tokens are omitted. Optional and defaults to `false`.

◆ Tokenizer() [3/4]

Tokenizer	(	AString	src,
		char	delim,
		bool	skipEmptyTokens = `false`
	)

inline

Constructs a tokenizer to work on a given AString.

Parameters

src	The string to be tokenized.
delim	The delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokens	If `true`, empty tokens are omitted. Optional and defaults to `false`.

◆ Tokenizer() [4/4]

Tokenizer	(	Substring	src,
		char	delim,
		bool	skipEmptyTokens = `false`
	)

inline

Constructs a tokenizer to work on a given Substring.

Parameters

src	The string to be tokenized.
delim	The delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokens	If `true`, empty tokens are omitted. Optional and defaults to `false`.

Member Function Documentation

◆ GetRest()

Substring GetRest ( Whitespaces trimming = lang.Whitespaces.Trim )

inline

Returns the currently remaining string (without searching for further delimiter characters).
After this call HasNext will return false and Next will return a nulled Substring.

Parameters

trimming Determines if the token is trimmed in respect to the white space characters defined in field Whitespaces. Defaults to Whitespaces.Trim.

Returns: The rest of the original source string, which was not returned by Next(), yet.

◆ HasNext()

bool HasNext ( )

inline

If this returns true, a call to Next will be successful and will return a Substring which is not nulled.

Returns: true if a next token is available.

◆ Next()

Substring Next	(	Whitespaces	trimming = `lang.Whitespaces.Trim`,
		char	newDelim = `'\0'`
	)

inline

Returns the next token, which is afterwards also available through field Actual. If no further token was available, the returned Substring will be 'nulled' (see Substring.IsNull). To prevent this, the availability of a next token should be checked using method HasNext().

For clarification, see the explanation and sample code in this classes documentation.

Parameters

trimming	Determines if the token is trimmed in respect to the white space characters defined in field Whitespaces. Defaults to Whitespaces.Trim.
newDelim	The delimiter separates the tokens. Defaults to 0, which keeps the current delimiter intact. A new delimiter can be provided for every next token.

Returns: true if a next token was available, false if not.

◆ Set() [1/3]

void Set	(	AString	src,
		char	delim,
		bool	skipEmptyTokens = `false`
	)

inline

Constructs a tokenizer to work on a given AString.

Parameters

src	The string to be tokenized.
delim	The delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokens	If `true`, empty tokens are omitted. Optional and defaults to `false`.

◆ Set() [2/3]

void Set	(	String	src,
		char	delim,
		bool	skipEmptyTokens = `false`
	)

inline

Sets the tokenizer to the new source and delim.

Parameters

src	The string to be tokenized.
delim	The delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokens	If `true`, empty tokens are omitted. Optional and defaults to `false`.

◆ Set() [3/3]

void Set	(	Substring	src,
		char	delim,
		bool	skipEmptyTokens = `false`
	)

inline

Constructs a tokenizer to work on a given Substring.

Parameters

src	The string to be tokenized.
delim	The delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokens	If `true`, empty tokens are omitted. Optional and defaults to `false`.

◆ ToString()

override String ToString ( )

inline

This is for debugging purposes. E.g. this enables the Monodevelop IDE to display object descriptions in the debugger.

Returns: A human readable string representation of this object.

Member Data Documentation

◆ Actual

Substring Actual = new Substring()

The actual token, which was returned the last recent invocation of Next() or Rest(). It is allowed to manipulate this field any time, for example changing its whitespace characters definition.

◆ delim

char delim = '\0'

protected

The most recently set delimiter used by default for the next token extraction.

◆ Rest

Substring Rest = new Substring()

A Substring that represents the part of the underlying data that has not been tokenized, yet.

◆ skipEmptyTokens

bool skipEmptyTokens

protected

If true, empty tokens are omitted.

◆ Whitespaces

char [] Whitespaces = CString.DefaultWhitespaces

The white space characters used to trim the tokens. Defaults to CString.DefaultWhitespaces

The documentation for this class was generated from the following file:

Tokenizer.cs

Class Description

Public Fields

Public Methods

Protected Fields

Constructor & Destructor Documentation

◆ Tokenizer() [1/4]

◆ Tokenizer() [2/4]

◆ Tokenizer() [3/4]

◆ Tokenizer() [4/4]

Member Function Documentation

◆ GetRest()

◆ HasNext()

◆ Next()

◆ Set() [1/3]

◆ Set() [2/3]

◆ Set() [3/3]

◆ ToString()

Member Data Documentation

◆ Actual

◆ delim

◆ Rest

◆ skipEmptyTokens

◆ Whitespaces