Development History

The ZPP wrapper classes are to a means of hiding low level PHP structure manipulations, normally done in C source code created in the source by C Language macros. These played part of the role of "C++ templates" in code generation. They are numerously defined in the PHP source code, and help to both to standardize and inline structures manipulation. As long as the author know which of many macros does what.

The ZPP methods are generally not declared as inline, and so they provide an isolation wall from PHP code contained inside. With the referenced counting handle wrappers, testing is required to ensure that reference counting safety is maintained with class construction, destruction, and all assignment and transfer operators.

When the author searched online today for existing PHP extension frameworks in C++, PHP-CPP is the only name that was mentioned.

PHP-CPP has been around in some form, since at least PHP 5. The changes encountered with PHP 7.0 were too much to maintain compatiblity, so a new version was created, and the PHP 5 versions archived.

C++ extensions were proposed over a decade ago, and PHP-CPP was created to promote this. I have looked into using PHP-CPP. This PHP C++ extension framework, was first created for PHP-5.6, which was back then the latest PHP version. PHP-CPP subsequent generations of PHP have been released, and PHP-CPP has maintained compile version compatibility since PHP 7.0. The current PHP-CPP source has many places where the code variation is defined for different versions of PHP.

The author of this Wcc project found PHP-CPP somewhat cumbersome, lagging behind PHP new releases, and eventually decided to try a "start again from scratch approach", using recent PHP releases since version 8.2. The PHP-CPP does have extensive functionality, and was studied for hints on how to procede afresh, and what might be improved.

The web extension framework classes that use zpp are contained in the wcc and wcd namespace and source code folders. These demonstrate some interacting class implementations of reasonable complexity. They also presume to being used as a suite, as classes such as Wcc\Services, Wcc\Config, Wcc\Hmap Wcc\ViewOuput are presumed to be instanced. The C++ class versions, are a subset of the whole framework, and also make many function calls internally to each other, which is a bypass the PHP interpreter in their method implementations. During development the PHP script classes API and C++ versions API where harmonized with each other to ensure call signature compatibility.

A debug version of PHP was compiled, and run with the memory check tool valgrind to detect memory leaks, and track down segment faults. Any significant changes to zpp would require going through this process again. zpp itself has been rewritten, at least twice over, to reflect authors insights into what seemed to work better. The project initially started with the Wcc\Route and Wcc\RouteSet classes, written as a much smaller extension, after attempting this using both Zephir and PHP-CPP.

Parts of the design of the these script classes were adapted from existing Phalcon framework that used the intermediate Zephir langange to be generated and compiled as C.

PHP Script classes helpers to work with.

The framework classes require a few small helper classes that do some tasks that would be hard to do, or have no benefit from implementation directly in the zpp C++ code base.

Two examples in mind, is a class loader function/class, and related to this is a class to load view templates. It is possible to have such in the C++ extension, but it duplicates what PHP engine does very well, and it helps development to have this available in a scripted XDebug environment.

A Performace comparison to PHP-CPP

Does it compare?

XmlRead is an extension class made to read a customized but flexible XML file format, that returns tree of PHP data as a PHP array or an Object with dynamic properties. Individual elements in xml are for PHP types, and use a "k" attribute if a keyed array value. A <tb> represents a keyed array, and <a> represents a packed array.

The file format is read return a PHP array. The format can include specified xml tags to instantiate a PHP class, and also common PHP datatypes, arrays, and keyed tables, using defined tags and attributes in typical XML style.

PHP already has optimized JASON functions, and the most versitile and fast data file format is PHP code. XML is a much spruiked as a portable format. Here the format is used to test the ZPP classes and its extension C++ helpers.

The code is implemented in 3 environments. All use the PHP extension class XMLReader, to read each xml node and attributes in sequence, and translate to a PHP array tree of types and values. A PHP sciprt is re-implementated using PHP-CPP classes, and also using the ZPP classes.

Each is compared using the same input test file over many iterations for an average time. The PHP timer function microtime measures time past for each iteration. Ten iterations are done as a "warm up", and then 200 iterations averaged, to get a value for microseconds per iteration.

The results show the average time in ms per iteration, and the ratio of this to the PHP script version, which to itself is 1.0.

The code for PHP-CPP and the zpp versions make many repeated function calls to XMLReader class object to move to each XML node, and read its changing properties.

The zpp version uses a cached instances of fn_call class, which improves PHP function call time, and overall is about 10% faster than the reference PHP script class implementation.

Versionms / iterationrelative
PHP script class1,9921.0
PHP-CPP version4,3552.8
ZPP version1,7880.9

This was using a slightly modified, recent version of the PHP-CPP code. I also compiled it with features from C++20, as this has string_view, rather than the C++11 default in the PHP-CPP makefile. PHP-CPP works with its Value type, which holds a zval, as a singular means of PHP type storage, interrogation and creation. From the XMLReader nodes and attributes we get strings. A vital part of PHP-CPP isolating a string from its Value class, is creating a std::string class, which means another memory allocation and later destruction, with every zval string examination.

This is an example of the temptation of during a PHP request processing, for example, a persistent php-fpm instance, to use C++ code conveniently without a custom allocator that uses PHP request allocated memory. This is probably OK for temporary stack variables, most of the time. PHP uses a different memory allocator from a pool of request lifetime memory, such that the entire pool is disposed of when the request ends. This is most starkly evident in creating a zend_string, where a flag argument controls if the string is allocated from permanent memory, as in C-malloc, used for "permanent" and interned strings, or is only going to exist during the lifetime of the request. Therefore care has to be taken to gaurantee clean up from none-request memory allocations.

Its clear that the PHP-CPP typical calls documented for making a PHP function call frequently are a little bit unoptimized. Later on a "souped up" version of PHP-CPP will be worked onxxxxxxxxxxxxx, that makes use of zpp classes, where this kind of comparison, uses a reworked class, using function call classes from zpp,`` that will be more on a par with the PHP script version.

Module initialization - zpp::state_init

A good place to create C++ lookup data structures, for use of request class method and function call, requires using "permanent" standard allocator, to use a derived static instance of zpp::state_init, and populating the structures of the instance inside its virtual init() function.

It is even possible to static allocate HashTable structures declared inside a static instance of zpp::state_init. In the init() function, zend_hash_init needs to be called. In the end() function, freehtmemory(&ht) works. This works if the string keys are already interned. This was done in wcc/route.cpp Route_init class.

Interned string class zpp::str_intern can also be used as std::map values, as done in the wcc/global_response.cpp in the Response_init class, to lookup status code strings. The map structure was initialized using a C++ initializer inside its Response_init::init() method.

Request initialization

The base constructor of zpp::state_init links itself into a single linked list, and the instances are iterated during module initialize, request initialize, request shutdown and module shutdown. So far an example of init_req() is in wcc/plate.cpp, where some PHP output buffer functions were initialized. This was due to uncertainty as to whether the cached function call information would persist across requests. This was moved to module init() without harm. Now the init_req() is used to create a global class object instance for the request, and frees them in end_req(). The examples are in wcc/services.cpp and wcc/reflect_cache.cpp.

// state_init.h
// iterate links for module start/end
        static void init_all();
        static void end_all();

// iterate links for request start/end
        static void  init_request();
        static void  end_request();

// override module init/end calls
        virtual void init();
        virtual void end();

// override request init/end calls
        virtual void init_req();
        virtual void end_req();

In terms of development history, the PHP script versions were created first, and the PHP-CPP version was created as part of trying to speed up this Xmlread class component as part of extension. Why would I do such a crazy class, and not simply use PHP as a common configuration format? Lets call it a crazy impulse to create and play.

A vital part of the ZPP classes is creating a zpp::str_ptr or zpp_::str_rc reference to recieve already existing zend_string*, and making use of the C++ class std::string_view for fast string comparisons. Although zpp::str_ptr has a method to create a std::string, and can create a zend_string from std::string, this has not been used anywhere in the code of classes created so far, and could be removed from the source. Of course the PHP script version never needs to do this, and also has optimal function calls.

Module Initialization

Interned strings

PHP script compilation turns all string contents into "interned" strings, which ensures that one unique zend_string* exists in a shared table across all its interpreter compiled units. Classes using zpp can make use of this, by creating a class based on zpp::state_init, and store instances of zpp::str_intern during PHP module initialization. This will minimize run-time conversion of embedded C-style strings into things like array keys, with direct reference to unchanged str_intern values for every iteration or request.

I suspect that the PHP-CPP code has some impositions on efficiency in its design, due to using a single Value class wrapper for all PHP types. It uses an extra class "HashMember" to return values of Arrays, to implement a convenience operator[], and so has a few overheads in its C++ structure management which could be elided.

Debugging reference count leaks

It was difficult to track down memory leaks due to mistakes in managing PHP reference counts, in C++ code. Compiling a debug version of PHP adds extra memory tracking information and checking to its data structures. The bug version of PHP hides memory tracking information around its objects, and prints out information about unfreed memory at the end of a run.

As a flavor, here is a direct copy and pasted output of a recent example of such a leak, which was later fixed.

[Tue Apr 15 23:24:06 2025]  Script:  '/home/michael/www/wcp/test/xmlread.php'
/home/michael/dev/php-8.4.5/Zend/zend_string.h(176) :  Freeing 0x00007f4f91203b40 (32 bytes), script=/home/michael/www/wcp/test/xmlread.php
Last leak repeated 3 times.

Minizing the use of *_rc objects reduces reference count management overhead, and the number of places where reference counting bugs might occur.

In fact a lot of time developing earlier versions of this zpp classes suite, was painful learning how to work C++ operators to do the right thing by their managed PHP pointer types. There are a couple of instances of zend_string API functions, that may or may not return a new string, like zend_string_tolower, which always return a refence count bump even if the same string is returned. And C++ assignment operators got stumped by this, especially when reassigned to the same class instance.

As a mitigation, the author resorted to placing the call as a self-mutation function, only allowed for zpp::str_rc class.


str_rc a_str = some_function();
// This was leaking a rc++
a_str = a_str.to_lower();
// This was still safe to do.
str_rc b_str = a_str.to_lower();

// This works better
a_str.lowercase();

// current implementations
void
str_rc::lowercase() 
{
	if (s)
	{
		// always added reference count
		zend_string* p = zend_string_tolower(s);
		if (p == s)
		{
			try_decref(p); // undo unwanted rc++
		}
		else {
			adopt(p);  // take up new pointer
		}
	}
}

str_rc 
str_rc::to_lower() 
{
	str_rc result(*this);
	if (result.ok())
	{
		result.lowercase();
	}
	return result;
}

Passing direct parameters to zend functions.

To minimize use of casting when calling zend API functions, the string, object and array and value wrappers all have casting operators to automatically pass the enclosed pointer type, like this.

// obj_ptr Inline cast to the enclosed pointer type.
operator zend_object* () const { return (zend_object*) obj_; }

Use PHP's extension class declare and build system, from _stub.php to _arginfo.h

ZPP uses the current PHP extension build process, starting with a extension skelaton, a folder generated from a PHP distribution. Extension class interface files of "class_arginfo.h" are generated from "class.stub.php" files. This generate all the class object function names and signatures. The generated registration for class registration, can install class constants, and will add class properties declared in a "stub.php" file.

For C++ source, need to specify C++ compiler specified in the "config.m4" file, instead of a C compiler. Here is the config.m4 lines from the wcc extension project. All *arginfo.h files (and included php source files) get included inside an extern "C" {} wrap, and each ZEND_FUNCTION can then coded in the normal way.

dnl Everything is compiled from the one file.
PHP_ARG_ENABLE([wcc],
  [whether to enable wcc support],
  [ --enable-wcc], [Enable wcc support.])

if test "$PHP_WCC" != "no"; then

  AC_DEFINE(COMPILE_DL_WCC, 1, [ Have wcc support ])
  
  FLAGS="-fPIC"
  CXXFLAGS="$CXXFLAGS -Wall -O2 --std=c++23 -I./include"
  
  PHP_REQUIRE_CXX()
  AC_LANG([C++])
  
  PHP_NEW_EXTENSION(wcc, wcc.cpp , $ext_shared, , $FLAGS)
fi

The build process

In preparation for code builds, an initialization step of "phpize" command needs to be done first, followed by ".\configure". A build using "make", and install using "(sudo) make install", should then work with the settings of the current configured PHP enviroment.

In this project the _stub.php and _arginfo.h files exist in the /stub folder, and a shell script build.sh calls the make command for each name individually.

ZPP is used and distributed as C++ source code only. If an extension has compiled this into its own binary (shared library) then only the C++ headers of the very same version (and compiler name mangling) are required, since two shared libraries with the same zpp binary code will behave rather badly.

Multiple extensions using ZPP should each have their own compiled version of the zpp namespace code. Linkage to other code libraries is not prohibited. Next chapter to step through the process of making such an extension.

Shared access to declared/exported variables

In linux OS, static variables declared extern are shared in the process memory space. They are found by look ups in the modules symbol table. Symbols used, but found in other modules, are also in the modules symbol table, marked as "U" with no address entry. A way to void clashes of shared symbol names, and rectify missing or unfilled symbol addresses is required.

C++ has a partial (practical? reasonable?) solution, using namespaces. Here is an example from extension modules used later in this book.

''' /* wccz.so includes the source code zpp/fn_call.cpp, where FTAB is a static instance. It contains static declarations of interned strings and callable PHP functions. B means "The symbol is in the BSS data section." */

nm -D --demangle wccz.so | grep FTAB 000000000005d560 B zpp::FTAB

/* wccr includes and references headers zpp/fn_call.h, where FTAB is declared, but its address is "Unknown". The contents and offsets of its members are not exported, but are embedded in the compiled code, which means the same version of the headers, and accessing code is required by all referring symbol tables in different extensions. */

nm -D --demangle wccr.so | grep FTAB U zpp::FTAB

/* on loading, wccr.so, on accessing any code that references FTAB, the address definition is filled in by using the loaded wccz.so, otherwise if not found, such as the owner module is not loaded, an error exception is thrown. */ '''