ALox  V. 2402 R. 0
Home ALox for C++ ALox for C# ALox for Java Download
Public Fields | Public Methods | Protected Fields | List of all members
Tokenizer Class Reference
Collaboration diagram for Tokenizer:
[legend]

Class Description


This class splits a given character array (e.g. a const char* or an AString object), which contains data separated by a delimiter character, into tokens of type Substring.

After an instance of this class is constructed, there are three methods available:

After a token was retrieved, it might be modified using the interface of class Substring. (In other words, the tokenizer does not rely on the bounds of the current token when receiving the next.) Consequently, it is allowed to recursively tokenize a token by creating a different instance of class Tokenizer providing the returned token as input.

If created or set using a reference of class AString, the buffer of AString is not copied. This allows efficient operations on sub-strings of class AString. However, the source string must not be changed (or only in a controlled way) during the use the Tokenizer instance.

Objects of this class can be reused by freshly initializing them using one of the overloaded Set methods.

Sample code:
The following code sample shows how to tokenize a string, including using one nested tokenizer:

// data string to tokenize
AString data= new AString( "test; abc ; 1,2 , 3 ; xyz ; including;separator" );
// create tokenizer on data with ';' as delimiter
Tokenizer tknzr= new Tokenizer( data, ';' );
// read tokens
System.out.println( tknzr.next().toString() ); // will print "test"
System.out.println( tknzr.next().toString() ); // will print "abc"
System.out.println( tknzr.next().toString() ); // will print "1,2 , 3"
// tokenize actual (third) token (nested tokenizer)
Tokenizer subTknzr= new Tokenizer( tknzr.actual, ',');
System.out.print( subTknzr.next().toString() );
while( subTknzr.hasNext() )
System.out.print( "~" + subTknzr.next().toString() );
System.out.println();
// continue with the main tokenizer
System.out.println( tknzr.next().toString() ); // will print "xyz"
// grab the rest, as we know that the last token might include our separator character
System.out.println( tknzr.getRest().toString() ); // will print "including;separator"

The output will be:

test
abc
1,2 , 3
1~2~3
xyz
including;separator

Public Fields

Substring actual = new Substring()
 
Substring rest = new Substring()
 
char[] whitespaces =CString.DEFAULT_WHITESPACES
 

Public Methods

 Tokenizer ()
 
 Tokenizer (AString src, char delim)
 
 Tokenizer (AString src, char delim, boolean skipEmptyTokens)
 
 Tokenizer (String src, char delim)
 
 Tokenizer (String src, char delim, boolean skipEmptyTokens)
 
 Tokenizer (Substring src, char delim)
 
 Tokenizer (Substring src, char delim, boolean skipEmptyTokens)
 
Substring getRest ()
 
Substring getRest (Whitespaces trimming)
 
boolean hasNext ()
 
Substring next ()
 
Substring next (Whitespaces trimming)
 
Substring next (Whitespaces trimming, char newDelim)
 
void set (AString astring, char delim)
 
void set (AString astring, char delim, boolean skipEmptyTokens)
 
void set (String str, char delim)
 
void set (String str, char delim, boolean skipEmptyTokens)
 
void set (Substring substring, char delim)
 
void set (Substring substring, char delim, boolean skipEmptyTokens)
 
String toString ()
 

Protected Fields

char delim
 
boolean skipEmptyTokens
 

Constructor & Destructor Documentation

◆ Tokenizer() [1/7]

Tokenizer ( )

Constructs an empty tokenizer. Before use, method set to initialize needs to be invoked.

◆ Tokenizer() [2/7]

Tokenizer ( String  src,
char  delim,
boolean  skipEmptyTokens 
)

Constructs a tokenizer to work on a given cstring.

Parameters
srcThe string to be tokenized.
delimThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

◆ Tokenizer() [3/7]

Tokenizer ( String  src,
char  delim 
)

Constructs a tokenizer to work on a given cstring.

Parameters
srcThe character array to use as the source for the tokens.
delimThe delimiter that separates the tokens. Can be changed with every next token.

◆ Tokenizer() [4/7]

Tokenizer ( AString  src,
char  delim,
boolean  skipEmptyTokens 
)

Constructs a tokenizer to work on a given AString.

Parameters
srcThe string to be tokenized.
delimThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

◆ Tokenizer() [5/7]

Tokenizer ( AString  src,
char  delim 
)

Constructs a tokenizer to work on a given AString.

Parameters
srcThe AString to use as the source for the tokens.
delimThe delimiter that separates the tokens. Can be changed with every next token.

◆ Tokenizer() [6/7]

Tokenizer ( Substring  src,
char  delim,
boolean  skipEmptyTokens 
)

Constructs a tokenizer to work on a given Substring.

Parameters
srcThe string to be tokenized.
delimThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

◆ Tokenizer() [7/7]

Tokenizer ( Substring  src,
char  delim 
)

Constructs a tokenizer to work on a given Substring.

Parameters
srcThe substring to use as the source for the tokens.
delimThe delimiter that separates the tokens. Can be changed with every next token.

Member Function Documentation

◆ getRest() [1/2]

Substring getRest ( )

Returns the currently remaining string (without searching for further delimiter characters). After this call hasNext will return false and next will return a nulled Substring.

Returns
The rest of the original source string, which was not returned by next(), yet.

◆ getRest() [2/2]

Substring getRest ( Whitespaces  trimming)

Returns the currently remaining string (without searching for further delimiter characters).
After this call hasNext will return false and next will return a nulled Substring.

Parameters
trimmingDetermines if the token is trimmed in respect to the white space characters defined in field whitespaces. Defaults to Whitespaces.TRIM.
Returns
The rest of the original source string, which was not returned by next(), yet.

◆ hasNext()

boolean hasNext ( )

If this returns true, a call to next will be successful and will return a Substring which is not nulled.

Returns
true if a next token is available.

◆ next() [1/3]

Substring next ( )

Returns the next token, which is afterwards also available through field actual. If no further token was available, the returned Substring will be 'nulled' (see Substring.isNull). To prevent this, the availability of a next token should be checked using method hasNext().

For clarification, see the explanation and sample code in this classes documentation.

Returns
true if a next token was available, false if not.

◆ next() [2/3]

Substring next ( Whitespaces  trimming)

Returns the next token, which is afterwards also available through field actual. If no further token was available, the returned Substring will be 'nulled' (see Substring.isNull). To prevent this, the availability of a next token should be checked using method hasNext.

For clarification, see the explanation and sample code in this classes documentation.

Parameters
trimmingDetermines if the token is trimmed in respect to the white space characters defined in field whitespaces. Defaults to Whitespaces.TRIM.
Returns
true if a next token was available, false if not.

◆ next() [3/3]

Substring next ( Whitespaces  trimming,
char  newDelim 
)

Returns the next token, which is afterwards also available through field actual. If no further token was available, the returned Substring will be 'nulled' (see Substring.isNull). To prevent this, the availability of a next token should be checked using method hasNext().

For clarification, see the explanation and sample code in this classes documentation.

Parameters
trimmingDetermines if the token is trimmed in respect to the white space characters defined in field whitespaces. Defaults to Whitespaces.TRIM.
newDelimThe delimiter separates the tokens. Defaults to 0, which keeps the current delimiter intact. A new delimiter can be provided for every next token.
Returns
true if a next token was available, false if not.

◆ set() [1/6]

void set ( AString  astring,
char  delim 
)

Constructs a tokenizer to work on a given AString.

Parameters
astringThe AString to use as the source for the tokens.
delimThe delimiter that separates the tokens. Can be changed with every next token.

◆ set() [2/6]

void set ( AString  astring,
char  delim,
boolean  skipEmptyTokens 
)

Constructs a tokenizer to work on a given AString.

Parameters
astringThe AString to use as the source for the tokens.
delimThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

◆ set() [3/6]

void set ( String  str,
char  delim 
)

Sets the tokenizer to the new source and delim.

Parameters
strThe character array to use as the source for the tokens.
delimThe delimiter that separates the tokens. Can be changed with every next token.

◆ set() [4/6]

void set ( String  str,
char  delim,
boolean  skipEmptyTokens 
)

Sets the tokenizer to the new source and delim.

Parameters
strThe character array to use as the source for the tokens.
delimThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

◆ set() [5/6]

void set ( Substring  substring,
char  delim 
)

Constructs a tokenizer to work on a given Substring.

Parameters
substringThe substring to use as the source for the tokens.
delimThe delimiter that separates the tokens. Can be changed with every next token.

◆ set() [6/6]

void set ( Substring  substring,
char  delim,
boolean  skipEmptyTokens 
)

Constructs a tokenizer to work on a given Substring.

Parameters
substringThe substring to use as the source for the tokens.
delimThe delimiter that separates the tokens. Can be changed with every next token.
skipEmptyTokensIf true, empty tokens are omitted. Optional and defaults to false.

◆ toString()

String toString ( )

This is for debugging purposes. E.g. this enables the Eclipse IDE to display object descriptions in the debugger.

Returns
A human readable string representation of this object.

Member Data Documentation

◆ actual

Substring actual = new Substring()

The actual token, which was returned the last recent invocation of next() or rest(). It is allowed to manipulate this field any time, for example changing its whitespace characters definition.

◆ delim

char delim
protected

The most recently set delimiter used by default for the next token extraction.

◆ rest

Substring rest = new Substring()

A Substring that represents the part of the underlying data that has not been tokenized, yet.

◆ skipEmptyTokens

boolean skipEmptyTokens
protected

If true, empty tokens are omitted.

◆ whitespaces

char [] whitespaces =CString.DEFAULT_WHITESPACES

The white space characters used to trim the tokens. Defaults to CString.DEFAULT_WHITESPACES


The documentation for this class was generated from the following file:
com.aworx.lib.strings.util.Tokenizer.Tokenizer
Tokenizer()
Definition: Tokenizer.java:99