at 2008-05-22
in Experiments
by friebe
(0 comments)
Lately I was experimenting with the PHP extension bcompiler, which offers an API to write the opcodes generated from zend_compile() to files. For a general understanding of this you must know that PHP is a compiled language (but does compilation into memory and then executes that instead of compiling to disk and then running that). The compilation step is not exposed to the user in any way, and there is no defined serialization format for compiled PHP. This is where bcompiler comes in.
Note: Compiling PHP sourcecode does not improve runtime performance, it simply saves the (small) overhead generated by the compile step. My main focus in this experiment was not performance, though, but to test the ability to generate PHP bytecode, and if that would work transparently alongside PHP sourcecode.
The idea of compiling PHP is not new: There are a number of so-called "bytecode caches" like APC for the PHP world that basically cache the bytecodes generated from files in shared memory. Once a PHP file is invoked, the shared memory is checked if it has been previously compiled, and if so, these bytecodes are used instead of compiling the sourcecode first. These products are targeted to be run inside a webserver and boast speed improvements of 4 or 5 times over "regular" PHP. They are useless on the command line.
Compilation in PHP Given the following PHP sourcecode:
<?php $a= 'Hello'; echo $a; ?> ...what PHP does when running this is as follows:
- Tokenize the sourcecode
Basically what is exposed by the token_get_all() function the above is parsed into: T_VARIABLE[$a] '=' T_CONSTANT_ENCAPSED_STRING['Hello'] ';' T_ECHO T_VARIABLE[$a] ';'.
- Give these tokens to the parser
The (yacc-generated) parser will match these tokens to the grammar and generate bytecodes while doing so. The first statement is matched to the following rule: variable '=' expr { zend_check_writable_variable(&$1); zend_do_end_variable_parse(BP_VAR_W, 0 TSRMLS_CC); zend_do_assign(&$$, &$1, &$3 TSRMLS_CC); }
- Populate the opcode array
The above zend_*() function calls produce opcodes and add them to an opcode array.
# line name operands -- ---- ---------------- -------------------------------------- 0@ 1: assign CV[#1 $a] := C[string:"Hello"] (*V[#0]) 1@ 2: echo CV[#1 $a] 2@ 3: return C[null] 3@ 3: handle_exception
- Execute this
For each opcode, the zend engine has an opcode handler that will execute the opcode. The first opcode will be taken care of by the ZEND_ASSIGN handler and will basically do something like $locals['a']= 'Hello';, the second by ZEND_ECHO ($print_function($locals['a'])), and so on.
If we use opcode caches or bcompiler we can skip steps 1 through 3.
Using bcompiler The basic steps to use bcompiler to compile PHP sourcecode are:
<?php $f= fopen('compiled.phb', 'wb'); bcompiler_write_header($f); bcompiler_write_file($f, 'source.php'); bcompiler_write_footer($f); fclose($f); ?> Using compiled bytecodes is just like using sourcecode, bcompiler nicely makes this transparent by overwriting the include and require statements. The following are equivalent:
<?php include('compiled.phb'); include('source.php'); ?> except of course that the first one does not need to compile the sourcecode
Compiling the XP framework To compile the framework I used the io.collections API inside a command line class (the ones run by xpcli) to iterate on the source directory recursively, passing PHP source to the above functions and adding the resulting files to a XAR file.
<?php $archive= new Archive(new File('xp-compiled.xar')); $archive->open(ARCHIVE_CREATE); $filter= new NegationOfFilter(new AnyOfFilter(array( new CollectionFilter(), new RegexFilter('/.svn/') ))); foreach (new FilteredIOCollectionIterator($origin, $filter, TRUE) as $e) { if (xp::CLASS_FILE_EXT === substr($e->getUri(), -10)) { } else { } $archive->create(); ?> Then, instead of using xp-rt-VERSION.xar as boot class path, I'd use the xp-compiled.xar and see if it works by using the unittests.
Gotchas When running the above for the first time, I soon noticed a Fatal Error: Cannot redeclare class Object. The problem is that bcompiler_write_file() compiles the class into the current execution context, where the Object class is already declared. By using bcompiler_write_included_file() for the situation that the class is alreday loaded, this could be resolved, unfortunately only for other side effects to appear. To keep the scope clean I decided to fork a process which I would pass the filename to be compiled to its standard input, have it compile to a temporary file and return its name on standard output.
<?php $compiler= new Process( Runtime::getInstance()->getExecutable()->getFilename(), array('compiler.php', $build) ); $compiler->in->write($filename); sscanf($compiler->out->readLine(), "%c %[^\n]", $status, $data); if ('+' !== $status) { } while ($l= fgets(STDIN, 1024)) { $name= rtrim($l); $temp= tempnam($argv[1], 'comp'); echo '+ ', $temp, "\n"; } ?> Now the compile step worked perfectly and I had an archive containing (b)compiled PHP sourcecode.
$ xar tvf rel/xp-compiled.xar 33.869 lang/archive/Archive.class.php 29.443 lang/archive/ArchiveClassLoader.class.php 21.031 lang/archive/ArchiveReader.class.php 525 lang/archive/package-info.xp 10.011 lang/ChainedException.class.php [...]
To verify that the XP Framework works I first ran the core unittests (unittest ports/unittest/core.ini) and could see I had all of them working. This was looking good!
Framework patches necessary One of the things that wasn't working was the features the XP framework extracts from the class' sourcecode at runtime: annotations, return types, parameter types and thrown exceptions. For these to work, I moved the runtime parsing part (basically what lang.XPClass::detailsForClass() does) to the compiler and added the parsed results to the file. An example:
<?php /** * Customer service handler * */ class CustomerHandler extends Object { /** * Gets a customer by customer ID * * @param int id * @return com.example.vo.Customer * @throws util.NoSuchElementException in case nothing is found */ public function getCustomer($id) { } } ?> The parsed meta data looks like this:
{ fields => [] methods => [ getCustomer => [ ARGUMENTS => [ "int" ], RETURNS => "com.example.vo.Customer" THROWS => [ "util.NoSuchElementException" ] COMMENT => "Gets a customer by customer ID" ANNOTATIONS => [ webmethod => null ] NAME => "getCustomer" ] ] class => [ COMMENT => "Customer service handler" ANNOTATIONS => [] ] } By embedding this meta data inside the class before passing it to the compiler and patching the XPClass class to check if precompiled meta-data exists, all tests related to this functionality succeed again.
BCompiler bugs Also, when running other test suites, I did notice a couple of bugs in bcompiler:
Missing __toString() The string conversion handler __toString() lets you overwrite the behaviour what happens when an object is converted to a string. These handlers are not serialized correctly (they are part of the zend_class_entry structure, thus leading to error messages like "Object of class Bytes could not be converted to string".
Array type hints Array type hints - added in PHP 5.1 - are lost in the process of compilation. Thus, a method like the following:
<?php function implode($delim, array $input) { } ?> ...will not yield a E_RECOVERABLE_ERROR when its compiled version is passed - say - a string as its second argument. This effect also has an impact if you're using a compiled interface and source versions of classes implementing it, raising a Fatal error: Declaration of MockDialect::makeJoinBy() must be compatible with that of SQLDialect::makeJoinBy()".
Overly long keys When the filename of the compiled file is too long hash keys are generated that are too long to be serialized as a char. This leads to side effects of instruction being overwritten by overflowing data: "Fatal error: Argument 1 passed to ClassLoader::registerLoader() must implement interface IClassLoader" and "Fatal error: Exceptions must be valid objects derived from the Exception base class".
This can be worked around by using a short path name when compiling, e.g. /tmp/LocalClassName.class.php instead of ~/xp.forge/trunk/experiments/people/friebe/opcodes/LocalClassName.class.php.
Fixes Fixes for some of the above problems exist and have been posted to the PECL mailinglist - see here and here. The array type hint still needs to be fixed.
Conclusion BCompiler would be worth investing in, it would be very nice if the runtime part (reading bytecodes) was built in to PHP. This way, PHP would support running compiled scripts natively (at the moment, this only works if the bcompiler extension is activated, something that cannot be relied on).
The compiler part could be separate, for 1 million users using software that runs on PHP there's probably ten that develop with it (this is a non-educated guess) and would have the need for such a thing. Also, by separating the compiler from the bytecode format a competition for implementing the best compiler could spawn (maybe in form of a Google Soc project).
Philosophical If PHP supports compiled bytecode one day (and the effort to get that done is less than a week of programming and testing), it could also become a common runtime for multiple languages (like .NET or Parrot). Think about something to compile JavaScript to PHP bytecode: Why do we need to do programming in two different languages: JavaScript on the client and PHP on the server? Shouldn't the programming language be the choice of the developer? Why do I need to stick to a grammar I don't like? Why can't $m= $class->getMethods()[0] or new Date()->toString() work (they can, actually, it's just parser changes)? Should I really have to care about client and server abstraction in the world after WEB 2.0? Or should I try to focus on getting the job done?
|
Subscribe
You can subscribe to the XP framework's news by using RSS syndication.
CategoriesNews General PHP5 Announcements RFCs Further reading Examples Editorial EASC Experiments Unittests Databases
RelatedFind related articles by a search for «Compiling».
|