ALox  V. 2402 R. 0
Home ALox for C++ ALox for C# ALox for Java Download
Public Fields | Public Methods | Protected Fields | List of all members
Tokenizer Class Reference
Collaboration diagram for Tokenizer:
[legend]

Class Description


This class splits a given character array (e.g. a const char* or an AString object), which contains data separated by a delimiter character, into tokens of type Substring.

After an instance of this class is constructed, there are three methods available:

After a token was retrieved, it might be modified using the interface of class Substring. (In other words, the tokenizer does not rely on the bounds of the current token when receiving the next.) Consequently, it is allowed to recursively tokenize a token by creating a different instance of class Tokenizer providing the returned token as input.

If created or set using a reference of class AString, the buffer of AString is not copied. This allows efficient operations on sub-strings of class AString. However, the source string must not be changed (or only in a controlled way) during the use the Tokenizer instance.

Objects of this class can be reused by freshly initializing them using one of the overloaded Set methods.

Sample code:
The following code sample shows how to tokenize a string, including using one nested tokenizer:

// data string to tokenize
AString data= new AString( "test; abc ; 1,2 , 3 ; xyz ; including;separator" );
// create tokenizer on data with ';' as delimiter
Tokenizer tknzr= new Tokenizer( data, ';' );
// read tokens
System.Console.WriteLine( tknzr.Next() ); // will print "test"
System.Console.WriteLine( tknzr.Next() ); // will print "abc"
System.Console.WriteLine( tknzr.Next() ); // will print "1,2 , 3"
// tokenize actual (third) token (nested tokenizer)
Tokenizer subTknzr= new Tokenizer( tknzr.Actual, ',');
System.Console.Write( subTknzr.Next().ToString() );
while( subTknzr.HasNext() )
System.Console.Write( "~" + subTknzr.Next().ToString() );
System.Console.WriteLine();
// continue with the main tokenizer
System.Console.WriteLine( tknzr.Next().ToString() ); // will print "xyz"
// grab the rest, as we know that the last token might include our separator character
System.Console.WriteLine( tknzr.GetRest().ToString() ); // will print "including;separator"

The output will be:

test
abc
1,2 , 3
1~2~3
xyz
including;separator

Public Fields

Substring Actual = new Substring()
 
Substring Rest = new Substring()
 
char[] Whitespaces = CString.DefaultWhitespaces
 

Public Methods

 Tokenizer ()
 
 Tokenizer (AString src, char delim, bool skipEmptyTokens=false)
 
 Tokenizer (String src, char delim, bool skipEmptyTokens=false)
 
 Tokenizer (Substring src, char delim, bool skipEmptyTokens=false)
 
Substring GetRest (Whitespaces trimming=lang.Whitespaces.Trim)
 
bool HasNext ()
 
Substring Next (Whitespaces trimming=lang.Whitespaces.Trim, char newDelim='\0')
 
void Set (AString src, char delim, bool skipEmptyTokens=false)
 
void Set (String src, char delim, bool skipEmptyTokens=false)
 
void Set (Substring src, char delim, bool skipEmptyTokens=false)
 
override String ToString ()
 

Protected Fields

char delim = '\0'
 
bool skipEmptyTokens
 

Constructor & Destructor Documentation

◆ Tokenizer() [1/4]

Tokenizer ( )
inline

Constructs an empty tokenizer. Before use, method Set to initialize needs to be invoked.

◆ Tokenizer() [2/4]

Tokenizer ( String  src,
char  delim,
bool  skipEmptyTokens = false 
)
inline

Constructs a tokenizer to work on a given cstring.

Parameters
srcThe string to be tokenized.
delimThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

◆ Tokenizer() [3/4]

Tokenizer ( AString  src,
char  delim,
bool  skipEmptyTokens = false 
)
inline

Constructs a tokenizer to work on a given AString.

Parameters
srcThe string to be tokenized.
delimThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

◆ Tokenizer() [4/4]

Tokenizer ( Substring  src,
char  delim,
bool  skipEmptyTokens = false 
)
inline

Constructs a tokenizer to work on a given Substring.

Parameters
srcThe string to be tokenized.
delimThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

Member Function Documentation

◆ GetRest()

Substring GetRest ( Whitespaces  trimming = lang.Whitespaces.Trim)
inline

Returns the currently remaining string (without searching for further delimiter characters).
After this call HasNext will return false and Next will return a nulled Substring.

Parameters
trimmingDetermines if the token is trimmed in respect to the white space characters defined in field Whitespaces. Defaults to Whitespaces.Trim.
Returns
The rest of the original source string, which was not returned by Next(), yet.

◆ HasNext()

bool HasNext ( )
inline

If this returns true, a call to Next will be successful and will return a Substring which is not nulled.

Returns
true if a next token is available.

◆ Next()

Substring Next ( Whitespaces  trimming = lang.Whitespaces.Trim,
char  newDelim = '\0' 
)
inline

Returns the next token, which is afterwards also available through field Actual. If no further token was available, the returned Substring will be 'nulled' (see Substring.IsNull). To prevent this, the availability of a next token should be checked using method HasNext().

For clarification, see the explanation and sample code in this classes documentation.

Parameters
trimmingDetermines if the token is trimmed in respect to the white space characters defined in field Whitespaces. Defaults to Whitespaces.Trim.
newDelimThe delimiter separates the tokens. Defaults to 0, which keeps the current delimiter intact. A new delimiter can be provided for every next token.
Returns
true if a next token was available, false if not.

◆ Set() [1/3]

void Set ( AString  src,
char  delim,
bool  skipEmptyTokens = false 
)
inline

Constructs a tokenizer to work on a given AString.

Parameters
srcThe string to be tokenized.
delimThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

◆ Set() [2/3]

void Set ( String  src,
char  delim,
bool  skipEmptyTokens = false 
)
inline

Sets the tokenizer to the new source and delim.

Parameters
srcThe string to be tokenized.
delimThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

◆ Set() [3/3]

void Set ( Substring  src,
char  delim,
bool  skipEmptyTokens = false 
)
inline

Constructs a tokenizer to work on a given Substring.

Parameters
srcThe string to be tokenized.
delimThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

◆ ToString()

override String ToString ( )
inline

This is for debugging purposes. E.g. this enables the Monodevelop IDE to display object descriptions in the debugger.

Returns
A human readable string representation of this object.

Member Data Documentation

◆ Actual

Substring Actual = new Substring()

The actual token, which was returned the last recent invocation of Next() or Rest(). It is allowed to manipulate this field any time, for example changing its whitespace characters definition.

◆ delim

char delim = '\0'
protected

The most recently set delimiter used by default for the next token extraction.

◆ Rest

Substring Rest = new Substring()

A Substring that represents the part of the underlying data that has not been tokenized, yet.

◆ skipEmptyTokens

bool skipEmptyTokens
protected

If true, empty tokens are omitted.

◆ Whitespaces

The white space characters used to trim the tokens. Defaults to CString.DefaultWhitespaces


The documentation for this class was generated from the following file:
cs.aworx.lib.strings.util.Tokenizer.Tokenizer
Tokenizer()
Definition: Tokenizer.cs:99