Tokenizing

at 2007-05-11 in Examples by friebe (0 comments)

To split up a string into tokens, you may have used strtok() (although it has problems, as is documented here). The XP Framework offers a replacement - the text.StringTokenizer class, which you may use as follows:

<?php 
$st= new StringTokenizer("Hello World!\nThis is an example", " \n");
while
($st->hasMoreTokens()) {
printf
("- %s\n", $st->nextToken());
}
?>

This is nothing new - this API has existed for over two years now. Since yesterday, a couple of things have been added, though. Continue reading to find out what other new features there are.

#1) StreamTokenizer
The StringTokenizer class is supplemented with a StreamTokenizer, which is the same except that it works on io.streams.InputStreams:

<?php 
$st= new StreamTokenizer(new FileInputStream(new File('test.txt')), "\n");
// ...
?>

Of course, you could also use the string tokenizer and FileUtil::getContents() to accomplish this, but with the overhead of having to load the entire file into memory first. The StreamTokenizer reads sequentially while tokens are returned, usually having only a small chunk of the file's contents in memory.

#2) foreach() support
Both of the two tokenizers now support iteration via foreach(). For example, the following will print ABC:
<?php 
foreach
(new StringTokenizer('A B C', ' ') as $letter) {
echo
$letter;
}
?>


#3) Resetting a tokenizer
Tokenizers may be reset by using the new reset() method. This is necessary if you want to iterate over a tokenizer more than once (and is automatically called by foreach before iteration starts).
<?php 
$st= new StringTokenizer(new MemoryInputStream('A B'), ' ');
echo
$st->nextToken(); // "A"
$st->reset();
echo
$st->nextToken(); // "A" again
?>


#4) Tests
The new features are tested here:



Subscribe

You can subscribe to the XP framework's news by using RSS syndication.


Categories

News
General
PHP5
Announcements
RFCs
Further reading
Examples
Editorial
EASC
Experiments
Unittests
Databases

Related

Find related articles by a search for «Tokenizing».