Preface

This book is for anyone that wants to try write a PHP extension using C++, to create functions or classes that manipulate PHP standard type structures, and work with the standard PHP source distribution. The ZPP C++ suite of classes presented here, allow manipulations of zend_string, zend_object, HashTable by themselves as indenpent entities outside of the zval structure.

An integrated base class and templated class manager class is provided.

The individual PHP data type wrapper classes and their helpers are introduced.

A set of scripted web site and database management classes are introduced, in PHP script form, which are substitutable for the Wcc extension that can be optionally used that implements many of the crucial classes in C++.

A chapter on building a single class extension, goes into design, build, and test and fix processes as is being done right in almost real time for the Wcc project as part the process of writing this book. Maybe this part is going to take at least a week. It will be date stamped.

At the moment the ZPP classes are not exported in a library. It will take up chunk of any compiled binary, but maybe not nearly so much as using PHP-CPP. They are intended to be used as a wrappers, for all the inline PHP code that extension authors might otherwise have to write.

C++ classes for PHP extensions

This book introduces a suite of C++ classes for building PHP extensions. These are contained in the namespace zpp.

The ZPP classes wrap frequently used internal Zend PHP structures from their pointers. There is also a zval wrapper, and zval* wrapper class. Methods and operators exist to automatically handle parameter passing, reference counting, assignment, for easier implementation of PHP class and function methods in C++.

There are two namespaces of a suite of integrated class examples, based on an existing custom content management system framework classes written in PHP. These examples may assist in finding ways to build or adapt your own classes using the zpp architecture.

For some small number of tasks, a few of the PHP classes required functions with environment restrictions are too awkward, or not available from the C++ PHP extension environment. Fortunately PHP can easily provides the means to create small helper classes to handle tricks only available to the PHP byte-code compiler within a scripted function. And these can be easily called back from a C++ extension. The PHP code of these will be provided in this book.

For more examples, there is an integrated suite of custom classes, C++ implementations of same API of PHP classes, that manage database operations, and SQL construction. These are in the namespace wcd.

ZPP - wrap Zend PHP Pointers

Writing a PHP extension in C or in C++ is a specialized programming task. The PHP api is extensive, its source code uses C programming language, with an extensive use of C- language macros. Which means time spent examining them "at depth" and the C structures they operate on, to confirm what code they cause to be compiled.

The C++ api here wraps several of the C API's for reference counted PHP values. The most common value types are Scalar values of Integers and Double floating point, boolean, stored entirely in a zval structure, and reference counted pointer values for variable length structures of Strings, Arrays and the organization of Php Objects that may use all of these.

Reference counted PHP entities managed by a simple "smart pointer" C++ wrapper are the zend_object, HashTable, and zend_string, as well as a pointer to zval structure, that may require reference count management of contents.

Useful zpp classes

All of these are wrapped in a namespace "zpp". Its for the zend engine, and many of its namespace functions and structures start with "z". So why not? Everything here assumes 64-bit sized pointers.

Wrapper classes PHP structures *_ptr, *_rc.

Not RCRCWrapped C API typesize(bytes)
obj_ptrobj_rczend_object*4
htab_ptrhtab_rcHashTable*4
str_ptrstr_rczend_string*4
val_ptrzval*4
val_rczval8

The top 3 classes here have a PHP api pointer as a member, and each of these 3 in second column, *_rc inherits from the corresponding non-reference counted class *_ptr in the first column. To this they add the reference counting of the enclosed pointer. None of these classes use virtual functions, so their is no use of polymorphism. Using virtual methods would add a hidden virtual function table pointer to the class, which would remove an advantage of being the same instance size as the wrapped pointer type. Standard C++ constructor, destructor and operations take care of any requied refence count changes.

The 3 specialized value classes can all be constructed or assigned from a zval* pointer, or its wrapper class val_ptr, and can be safely declared as members of a C++ class, or as stack variables in function code.

The reference counted structures have in common the first part of a "zend_refcounted_h", a combination of 4-byte integer refcount, and 4-byte type_info.

Also reference counted is the class val_rc, which encloses and initializes a zval, and takes the size of two pointers. This common Zend structure is the PHP varient record with a typeid that can hold many things. The"zval" (\zval_struct), is a 16-byte structure, partitioned to hold data and type information. The first 8 byte value is the data which can be a 64-bit long or floating point, or a pointer to a bigger structure. The second 8 byte is divided to identify type of the data, plus various flags and space reserved for the Zend engine execution management.

Special Helper classes

Additional zpp classes to help inside the PHP execution environments.

classPurpose
base_dParent class for C++ classes that are embedded inside a zend_object
base_obj_mgrParent class to manage PHP class instances with a base_d
datetime_objAn obj_rc for Datetime object
dt_intervalAn obj_rc for DateInterval object
fn_callBase class for calling PHP functions
fn_call_argsTemplate class to call PHP functions with N parameters
for_key_valueIterate keys, values and index of HashTable*.
htab_rwUpdate, Append or delete HashTable*, after doing copy on write.
htab_walkStep iteration of keys and values of HashTable*
pregMethods to manipute strings using using PCRE regular expression
str_bufStream text to PHP API "zend_smart_str"
str_internzend_string made as "interned"
str_outStream text to PHP output
str_permzend_string allocated in permanent memory
str_tempzend_string for throwaway
state_initSetup structures for interned strings and constants during module initialization
timezone_objAn obj_rc for TimeZone object
zarg_rdConfirm and transfer PHP function arguments from zend_execute_data

Where to use *_ptr and *_rc classes

The reference counting (RC) class variants are for object member storage, and creating or deleting instances of data. They are used when reference counting operations are required. For instance, returning a newly created value from a function. The none-reference counting wrappers (NRC) are for passing the pointers around, where reference counting is unnecessary. This includes arguments passed to C++ functions and methods, as object data is considered to have secure ownership in the calling environment. All classes have a constructor method for a zval*. Only *_rc classes will increment a reference count.

The *_ptr classes are for methods and functions that do not require changing the reference count of their passed argument. If the same functions where available for the val_ptr, or val_rc, frequent type retrieval and test would be required to check for an appropriate PHP handle type. Instead they are used directly in available methods appropriate to their type.

These classes have been only tested on PHP 8.2 and above. They certainly won't be useful with PHP versions before 7

The non-reference *_ptr classes useful as parameters passed by functions. This makes them useful for passing arguments from zval* to methods with *_ptr type parameters. They are useful for returning access to C++ class objects members, where a reference counting change will not occur. In returning a referenced handle as result of a registered zend function, the return_value parameter is a pointer to a zval, wherein the rewturned value must be set, and a reference count increment is required, as expected by the Zend interpreter.

In cases where a new value is created and returned by a method or function, specific *_rc types can be returned, where this indicates that function stack values have been erased, to keep the reference count. If multiple types can be returned, a val_rc structure is returned. All of the *_rc types have C++ move operators declared as &&, and the compiler will likely perform Return Value Optimisation.

Setting the return_value

All of the wrapper classes have 2 available functions which need to be called to return a value back to a PHP script function or object method call, to the "return_value" zval pointer.

// What Zend passes to all declared PHP C functions and methods.
#define INTERNAL_FUNCTION_PARAMETERS zend_execute_data *execute_data, zval *return_value

A function or method call from the PHP Zend Engine, is passed a pointer to a C structure "zend_execute_data". The number of parameters passed is accessed with C macro ZEND_CALL_NUM_ARGS(execute_data), and the zend_object* if a method call, is accessed by C macro ZEND_THIS which resolves to execute_data->This.

There is a large number of C API macros, and alternates to access and transfer parameters passed in the zend_execute_data structure to somewhere useful. A commonly used set of them start with ZEND_PARSE_PARAMETERS_START, or ZEND_PARSE_PARAMETERS_NONE. These test for parameter number and type compliance, and throw PHP defined exceptions when compliance fails.

In the more recent created classes of this code library I used a C++ class created with methods to replace the ZEND_PARSE_* macros.

These are the methods of zpp::zarg_rd class. Its methods take a reference to one of the above *_ptr reciever classes, and a zval* returned by the method need(size_t ix) or option(size_t ix) call, as an index into the execute_data array of zval parameters. It is a zero-based "slice" from the address of the zend_execute_data zval arguments.

As this class is new and different, it provides different error handling and error messages to ZEND_PARSE_* and C-macro helpers.

Returning values back to PHP

The second parameter of the INTERNAL_FUNCTION_PARAMETERS is the return_value, pointer to zval. By convention, each *_ptr class should use this method which will try to increment its reference count.

void return_zv(zval* ret) const;
/* For example, this does a reference counted copy to the zval*, even though obj_ptr is otherwise not involved in reference counting. If it didn't do this, PHP will "dissappear" the object passed to it. The recieving zval takes reference counting responsibility, not the obj_ptr.

This is most often used if the obj_ptr class is in the scope of the ZEND_FUNCTION body.
*/
void 
obj_ptr::return_zv(zval* ret) const
{
    if (obj_)
        ZVAL_OBJ_COPY(ret, obj_); // set and bump reference count
    else
        ZVAL_NULL(ret);
}

Instances of returned *_rc class, in ZEND_FUNCTION body, need to use their move_zv method and pass it the return_value pointer. This transfers the referenced value to the return_value, and sets its own valpointerue to a nullptr, which prevents the destructor performing a try decrement reference. This is a move operation, such that it makes no change in reference count of the copied pointer. The return_value keeps incremented reference count that belonged to _rc class.

void move_zv(zval* ret);
/* For example, this obj_rc already bumped the reference count on construction or assignment. It is therefor already counted. To ensure it does not decrement the reference count on object destruction,
after moving the pointer to the zval, its copy of the pointer is nulled. The zval is set with a macro that does not change the reference count. The ownership of the reference count is "moved".
Cannot call this with a const <_rc>& 
*/

void 
obj_rc::move_zv(zval* ret)
{
  if (obj_)
  {
    ZVAL_OBJ(ret, obj_); 
    //Z_TYPE_FLAGS_P(ret) = 0; // Not allowed to dereference
    obj_ = nullptr;// give up ownership privilege
  }
  else {
    ZVAL_NULL(ret);
  } 
}

To be sure, *_rc classes reimplement return_zv (without the descriminate declaration const) to do exactly the same as its move_zv method.

Reading values in HashTables, and function parameters.

The val_ptr is useful in checking the value returned from array read methods of htab_ptr. All of the wrapped zend_hash read methods return a zval* which is either the address of its stored zval, or a nullptr value.

"Not Found" is represented by nullptr zval* result. Otherwise the returned zval* is a pointer to the values internal storage. All of the val_ptr class methods check for a nullptr.

As the PHP array can be indexed by strings, integers, and arbitrarily some of each, the most used method signatures for "get" methods returning a zval* have argument types of zend_long, aned_string*, str_ptr.

For convenience of having useful *_ptr access functions, obj_rc inherits from obj_ptr, htab_rc inherits from htab_ptr, and str_rc inherits from str_ptr. This inheritance does not imply that any of other kind of polymorphism is catered for.

Calling PHP functions.

The val_rc class wraps the zval structure. It has no inheritance. It is useful for settling up arguments for callbacks to PHP functions. Therefore it has a lot of constructor and assignment methods take all of the other classes and also raw PHP pointers.

Calling PHP functions requires setting up a zval arguments block, and passing a pointer to the first one, and the number of arguments. This is a part of the "zend_fcall_info" structure, which also needs a function name string, a zval to store the result, and optionally a zend_object* for a method call. There is also a way to pass named parameters with a HashTable*.

The basics of function calling are handled by the class "zpp::fn_call". This allows for a no-parameters function call. It holds a val_rc class to store the result, a "zend_fcall_info" and a "zend_fcall_info_cache" structure. Repeated calls for the same function/method presumably use the cached data to speed up the process.

A templated class of this for the number of parameters as a template argument. Users of this use the argsptr() method to wipe the array and get a pointer to its start.

The execution of the call returns a val_rc&&, and the other *_rc classes have constructor and assignment operators to take this.

Tricky memory for function call objects in state_init

The function call objects stored in state_init structures did have a tricky issue that I felt needed to be handled.

The fn_call structure returned the result by a move operator from its internal val_rc result member. If this is embedded in static memory through a state_init instance, and no result result assignment occurred to C++ class operator method, the memory could be left hanging inside the function object.

The move assignment was chosen to have less reference counting ups and downs. Now it has been changed as returning a val_rc, and C++ return value optimisation may some of the desired efficiency.

Development History

The ZPP wrapper classes are to a means of hiding low level PHP structure manipulations, normally done in C source code created in the source by C Language macros. These played part of the role of "C++ templates" in code generation. They are numerously defined in the PHP source code, and help to both to standardize and inline structures manipulation. As long as the author know which of many macros does what.

The ZPP methods are generally not declared as inline, and so they provide an isolation wall from PHP code contained inside. With the referenced counting handle wrappers, testing is required to ensure that reference counting safety is maintained with class construction, destruction, and all assignment and transfer operators.

When the author searched online today for existing PHP extension frameworks in C++, PHP-CPP is the only name that was mentioned.

PHP-CPP has been around in some form, since at least PHP 5. The changes encountered with PHP 7.0 were too much to maintain compatiblity, so a new version was created, and the PHP 5 versions archived.

C++ extensions were proposed over a decade ago, and PHP-CPP was created to promote this. I have looked into using PHP-CPP. This PHP C++ extension framework, was first created for PHP-5.6, which was back then the latest PHP version. PHP-CPP subsequent generations of PHP have been released, and PHP-CPP has maintained compile version compatibility since PHP 7.0. The current PHP-CPP source has many places where the code variation is defined for different versions of PHP.

The author of this Wcc project found PHP-CPP somewhat cumbersome, lagging behind PHP new releases, and eventually decided to try a "start again from scratch approach", using recent PHP releases since version 8.2. The PHP-CPP does have extensive functionality, and was studied for hints on how to procede afresh, and what might be improved.

The web extension framework classes that use zpp are contained in the wcc and wcd namespace and source code folders. These demonstrate some interacting class implementations of reasonable complexity. They also presume to being used as a suite, as classes such as Wcc\Services, Wcc\Config, Wcc\Hmap Wcc\ViewOuput are presumed to be instanced. The C++ class versions, are a subset of the whole framework, and also make many function calls internally to each other, which is a bypass the PHP interpreter in their method implementations. During development the PHP script classes API and C++ versions API where harmonized with each other to ensure call signature compatibility.

A debug version of PHP was compiled, and run with the memory check tool valgrind to detect memory leaks, and track down segment faults. Any significant changes to zpp would require going through this process again. zpp itself has been rewritten, at least twice over, to reflect authors insights into what seemed to work better. The project initially started with the Wcc\Route and Wcc\RouteSet classes, written as a much smaller extension, after attempting this using both Zephir and PHP-CPP.

Parts of the design of the these script classes were adapted from existing Phalcon framework that used the intermediate Zephir langange to be generated and compiled as C.

PHP Script classes helpers to work with.

The framework classes require a few small helper classes that do some tasks that would be hard to do, or have no benefit from implementation directly in the zpp C++ code base.

Two examples in mind, is a class loader function/class, and related to this is a class to load view templates. It is possible to have such in the C++ extension, but it duplicates what PHP engine does very well, and it helps development to have this available in a scripted XDebug environment.

A Performace comparison to PHP-CPP

Does it compare?

XmlRead is an extension class made to read a customized but flexible XML file format, that returns tree of PHP data as a PHP array or an Object with dynamic properties. Individual elements in xml are for PHP types, and use a "k" attribute if a keyed array value. A <tb> represents a keyed array, and <a> represents a packed array.

The file format is read return a PHP array. The format can include specified xml tags to instantiate a PHP class, and also common PHP datatypes, arrays, and keyed tables, using defined tags and attributes in typical XML style.

PHP already has optimized JASON functions, and the most versitile and fast data file format is PHP code. XML is a much spruiked as a portable format. Here the format is used to test the ZPP classes and its extension C++ helpers.

The code is implemented in 3 environments. All use the PHP extension class XMLReader, to read each xml node and attributes in sequence, and translate to a PHP array tree of types and values. A PHP sciprt is re-implementated using PHP-CPP classes, and also using the ZPP classes.

Each is compared using the same input test file over many iterations for an average time. The PHP timer function microtime measures time past for each iteration. Ten iterations are done as a "warm up", and then 200 iterations averaged, to get a value for microseconds per iteration.

The results show the average time in ms per iteration, and the ratio of this to the PHP script version, which to itself is 1.0.

The code for PHP-CPP and the zpp versions make many repeated function calls to XMLReader class object to move to each XML node, and read its changing properties.

The zpp version uses a cached instances of fn_call class, which improves PHP function call time, and overall is about 10% faster than the reference PHP script class implementation.

Versionms / iterationrelative
PHP script class1,9921.0
PHP-CPP version4,3552.8
ZPP version1,7880.9

This was using a slightly modified, recent version of the PHP-CPP code. I also compiled it with features from C++20, as this has string_view, rather than the C++11 default in the PHP-CPP makefile. PHP-CPP works with its Value type, which holds a zval, as a singular means of PHP type storage, interrogation and creation. From the XMLReader nodes and attributes we get strings. A vital part of PHP-CPP isolating a string from its Value class, is creating a std::string class, which means another memory allocation and later destruction, with every zval string examination.

This is an example of the temptation of during a PHP request processing, for example, a persistent php-fpm instance, to use C++ code conveniently without a custom allocator that uses PHP request allocated memory. This is probably OK for temporary stack variables, most of the time. PHP uses a different memory allocator from a pool of request lifetime memory, such that the entire pool is disposed of when the request ends. This is most starkly evident in creating a zend_string, where a flag argument controls if the string is allocated from permanent memory, as in C-malloc, used for "permanent" and interned strings, or is only going to exist during the lifetime of the request. Therefore care has to be taken to gaurantee clean up from none-request memory allocations.

Its clear that the PHP-CPP typical calls documented for making a PHP function call frequently are a little bit unoptimized. Later on a "souped up" version of PHP-CPP will be worked onxxxxxxxxxxxxx, that makes use of zpp classes, where this kind of comparison, uses a reworked class, using function call classes from zpp,`` that will be more on a par with the PHP script version.

Module initialization - zpp::state_init

A good place to create C++ lookup data structures, for use of request class method and function call, requires using "permanent" standard allocator, to use a derived static instance of zpp::state_init, and populating the structures of the instance inside its virtual init() function.

It is even possible to static allocate HashTable structures declared inside a static instance of zpp::state_init. In the init() function, zend_hash_init needs to be called. In the end() function, freehtmemory(&ht) works. This works if the string keys are already interned. This was done in wcc/route.cpp Route_init class.

Interned string class zpp::str_intern can also be used as std::map values, as done in the wcc/global_response.cpp in the Response_init class, to lookup status code strings. The map structure was initialized using a C++ initializer inside its Response_init::init() method.

Request initialization

The base constructor of zpp::state_init links itself into a single linked list, and the instances are iterated during module initialize, request initialize, request shutdown and module shutdown. So far an example of init_req() is in wcc/plate.cpp, where some PHP output buffer functions were initialized. This was due to uncertainty as to whether the cached function call information would persist across requests. This was moved to module init() without harm. Now the init_req() is used to create a global class object instance for the request, and frees them in end_req(). The examples are in wcc/services.cpp and wcc/reflect_cache.cpp.

// state_init.h
// iterate links for module start/end
        static void init_all();
        static void end_all();

// iterate links for request start/end
        static void  init_request();
        static void  end_request();

// override module init/end calls
        virtual void init();
        virtual void end();

// override request init/end calls
        virtual void init_req();
        virtual void end_req();

In terms of development history, the PHP script versions were created first, and the PHP-CPP version was created as part of trying to speed up this Xmlread class component as part of extension. Why would I do such a crazy class, and not simply use PHP as a common configuration format? Lets call it a crazy impulse to create and play.

A vital part of the ZPP classes is creating a zpp::str_ptr or zpp_::str_rc reference to recieve already existing zend_string*, and making use of the C++ class std::string_view for fast string comparisons. Although zpp::str_ptr has a method to create a std::string, and can create a zend_string from std::string, this has not been used anywhere in the code of classes created so far, and could be removed from the source. Of course the PHP script version never needs to do this, and also has optimal function calls.

Module Initialization

Interned strings

PHP script compilation turns all string contents into "interned" strings, which ensures that one unique zend_string* exists in a shared table across all its interpreter compiled units. Classes using zpp can make use of this, by creating a class based on zpp::state_init, and store instances of zpp::str_intern during PHP module initialization. This will minimize run-time conversion of embedded C-style strings into things like array keys, with direct reference to unchanged str_intern values for every iteration or request.

I suspect that the PHP-CPP code has some impositions on efficiency in its design, due to using a single Value class wrapper for all PHP types. It uses an extra class "HashMember" to return values of Arrays, to implement a convenience operator[], and so has a few overheads in its C++ structure management which could be elided.

Debugging reference count leaks

It was difficult to track down memory leaks due to mistakes in managing PHP reference counts, in C++ code. Compiling a debug version of PHP adds extra memory tracking information and checking to its data structures. The bug version of PHP hides memory tracking information around its objects, and prints out information about unfreed memory at the end of a run.

As a flavor, here is a direct copy and pasted output of a recent example of such a leak, which was later fixed.

[Tue Apr 15 23:24:06 2025]  Script:  '/home/michael/www/wcp/test/xmlread.php'
/home/michael/dev/php-8.4.5/Zend/zend_string.h(176) :  Freeing 0x00007f4f91203b40 (32 bytes), script=/home/michael/www/wcp/test/xmlread.php
Last leak repeated 3 times.

Minizing the use of *_rc objects reduces reference count management overhead, and the number of places where reference counting bugs might occur.

In fact a lot of time developing earlier versions of this zpp classes suite, was painful learning how to work C++ operators to do the right thing by their managed PHP pointer types. There are a couple of instances of zend_string API functions, that may or may not return a new string, like zend_string_tolower, which always return a refence count bump even if the same string is returned. And C++ assignment operators got stumped by this, especially when reassigned to the same class instance.

As a mitigation, the author resorted to placing the call as a self-mutation function, only allowed for zpp::str_rc class.


str_rc a_str = some_function();
// This was leaking a rc++
a_str = a_str.to_lower();
// This was still safe to do.
str_rc b_str = a_str.to_lower();

// This works better
a_str.lowercase();

// current implementations
void
str_rc::lowercase() 
{
	if (s)
	{
		// always added reference count
		zend_string* p = zend_string_tolower(s);
		if (p == s)
		{
			try_decref(p); // undo unwanted rc++
		}
		else {
			adopt(p);  // take up new pointer
		}
	}
}

str_rc 
str_rc::to_lower() 
{
	str_rc result(*this);
	if (result.ok())
	{
		result.lowercase();
	}
	return result;
}

Passing direct parameters to zend functions.

To minimize use of casting when calling zend API functions, the string, object and array and value wrappers all have casting operators to automatically pass the enclosed pointer type, like this.

// obj_ptr Inline cast to the enclosed pointer type.
operator zend_object* () const { return (zend_object*) obj_; }

Use PHP's extension class declare and build system, from _stub.php to _arginfo.h

ZPP uses the current PHP extension build process, starting with a extension skelaton, a folder generated from a PHP distribution. Extension class interface files of "class_arginfo.h" are generated from "class.stub.php" files. This generate all the class object function names and signatures. The generated registration for class registration, can install class constants, and will add class properties declared in a "stub.php" file.

For C++ source, need to specify C++ compiler specified in the "config.m4" file, instead of a C compiler. Here is the config.m4 lines from the wcc extension project. All *arginfo.h files (and included php source files) get included inside an extern "C" {} wrap, and each ZEND_FUNCTION can then coded in the normal way.

dnl Everything is compiled from the one file.
PHP_ARG_ENABLE([wcc],
  [whether to enable wcc support],
  [ --enable-wcc], [Enable wcc support.])

if test "$PHP_WCC" != "no"; then

  AC_DEFINE(COMPILE_DL_WCC, 1, [ Have wcc support ])
  
  FLAGS="-fPIC"
  CXXFLAGS="$CXXFLAGS -Wall -O2 --std=c++23 -I./include"
  
  PHP_REQUIRE_CXX()
  AC_LANG([C++])
  
  PHP_NEW_EXTENSION(wcc, wcc.cpp , $ext_shared, , $FLAGS)
fi

The build process

In preparation for code builds, an initialization step of "phpize" command needs to be done first, followed by ".\configure". A build using "make", and install using "(sudo) make install", should then work with the settings of the current configured PHP enviroment.

In this project the _stub.php and _arginfo.h files exist in the /stub folder, and a shell script build.sh calls the make command for each name individually.

ZPP is used and distributed as C++ source code only. If an extension has compiled this into its own binary (shared library) then only the C++ headers of the very same version (and compiler name mangling) are required, since two shared libraries with the same zpp binary code will behave rather badly.

Multiple extensions using ZPP should each have their own compiled version of the zpp namespace code. Linkage to other code libraries is not prohibited. Next chapter to step through the process of making such an extension.

Shared access to declared/exported variables

In linux OS, static variables declared extern are shared in the process memory space. They are found by look ups in the modules symbol table. Symbols used, but found in other modules, are also in the modules symbol table, marked as "U" with no address entry. A way to void clashes of shared symbol names, and rectify missing or unfilled symbol addresses is required.

C++ has a partial (practical? reasonable?) solution, using namespaces. Here is an example from extension modules used later in this book.

''' /* wccz.so includes the source code zpp/fn_call.cpp, where FTAB is a static instance. It contains static declarations of interned strings and callable PHP functions. B means "The symbol is in the BSS data section." */

nm -D --demangle wccz.so | grep FTAB 000000000005d560 B zpp::FTAB

/* wccr includes and references headers zpp/fn_call.h, where FTAB is declared, but its address is "Unknown". The contents and offsets of its members are not exported, but are embedded in the compiled code, which means the same version of the headers, and accessing code is required by all referring symbol tables in different extensions. */

nm -D --demangle wccr.so | grep FTAB U zpp::FTAB

/* on loading, wccr.so, on accessing any code that references FTAB, the address definition is filled in by using the loaded wccz.so, otherwise if not found, such as the owner module is not loaded, an error exception is thrown. */ '''

The general idea of PHP scripts and extensions.

PHP is an interpreted script computer language, which calls functions and object methods defined in C-API extension methods for class methods and functions. Many extensions supply functionality across many domains and are distributed along with the core PHP interpreter source.

There are currently about 80 extensions, counting the names of sub-folders in the "ext" folder. There are many other useful and competent extensions available as 3rd party libraries.

Extensions are organized shared libraries (*.so in linux), that share their public data and functions with each other and the process that loads them.

Many compiler technologies exist to produce shared libraries. PHP is compile C language source code. Extensions can be written directly in C.

All those PHP API C macros - give me Zephir

Zephir is a PHP extension building script that parallels PHP script features and compiles to C source code. It was created because directly compiling PHP extensions, that are complex with a large number of classes, such as the Phalcon PHP classes bundle, is tedious, and slow in low level C.

Zephir source does have a degree of inflexibility and the generated C code looks machine generated, that is even harder than normal to read as C source. It is almost as flexible and productive as real PHP source.

PHP-CPP

PHP-CPP is a C++ class framework to build a shared library and PHP API extension classes and functions. Therefore C++ can be used to hide PHP low level complexity. Can PHP-CPP framework be improved?

PHP binary API

PHP script functions and class methods are shared C functions coded in extensions, with their names and arguments registered by each extension module so that they are callable from the PHP interpreter engine.

The full source of every published PHP version ever is downloadable as the "php-src" git repository on github. Types are in the zend_types.h file in the Zend folder. zend_types.h

Get the best code execution level

The aim of writing an extension is to provide efficient functions and object methods that improve on the the best script byte code interpreter, or JIT compiler, and hard-code into compiled native binary executable code. Extensions can link to other system libraries that support C-APIs and provide access to new and extended functionality.

How PHP exchanges data

Data is exchanged between interpreted and compiled code through exchanges of data managed by zval structures. These are passed directly through function arguments, and return values, in the zend_execute_data structure.

PHP provides C-Macros to check function arguments and further checked to extract some of the reference counted handle types. zend_string*, zend_object*, and HashTable* can already be directly extracted from zend_execute_data, by using ZEND_PARSE_PARAMETERS_START macro, found in zend_API.h.

Also found in this zend_API.h header, are C structures to assist calling other PHP functions and class methods, these are _zend_fcall_info_, and _zend_fcall_info_cache_. ZPP makes use of these in its zpp::fn_call and zpp::fun_call_args<T> structure declarations. Repeated use of the same call will speed the call using the cached reusable call data.

Usefulness of extensions

Apart from calls to and from the PHP API, the idea is that extension C/C++ code works with raw data structures of numbers, strings, and data storage, for which such compiled languages are designed. PHP strings and array storage is useful to use directly.

Array storage requires zval fetch and store, for the HashTable* API, and this is excellent for many purposes, as these are the all-purpose data store used in PHP scripts.

Object API and method calls is used extensively for services coordination between objects. Having underlying C++ objects in an extension, allows fast inter-object classes coordination to bypass the script engine. When extension objects provide really useful services, exposing classes and functions joins the community of useful extensions in PHP script space.

Extension source code

Many standard extension implementations distributed with PHP. This make PHP a rich, productive and efficient scripting environment. Inside, there are many zend API calls, many thousands of lines of source code. Deciphering it all can take a long time, even for a C-language expert.

Starting with C

There are beginner guides online, all assume that naturally PHP extensions are coded in C. There is some scattered information about for building a PHP extension using C++. I made use of the Zend - Perforce online guide as a beginner to building my first version of a PHP extensions in C, so I have been a while on that long journey. The current online link is Writing PHP Extensions,

Moving to C++

C++ can directly use the PHP C header files, and C++ will directly use the PHP source of structures, functions and macro constructions imported with extern "C". It therefore should be easy. C++ is a superset of C, so no problems? Right?

What follows will be what I learned as I coded a C++ extension for PHP to make useful classes for an efficient web-site framework. This is the PHP C++ Classes framework. I want to make these classes known, so they can be used and improved.

Merely "hard-coding" a framework as a shared library, does not by itself ensure execution efficiency.

Web Classes C++ extension

This Wcc framework objects also exists as pure PHP-script, which is backup for the same functionality, if the extension is not loaded. Wcc is an anacronym for "Web Classes C++ extension".

The framework can be used as PHP script code, no C++ extension necessary. The classes in PHP-script form have the same name and methods, and work the same, although some overall execution time increase can be measured.

The Wcc classes structure and design are I hope, reasonably efficient in terms of using the PHP environment, and achieving performance and usability goals. The script versions of the classes will not be loaded by the PHP engine, when the classes are already loaded as the C++ extension. Wcc is the namespace, and the extension name.

Extensions can't be overridden by scripts

When an extension version of any PHP class is already loaded, obviously the PHP engine does not call registered loaders to look for a script version. A forced load will produce an error.

  • The documentation of the derived classes is easier using tools that read the PHP script versions. But the underlying C++ PHP access classes that make them possible, need a separate document, as done here.

  • The Wcc extension objects API replicates a subset of the classes of the script version. They have the same class names and method names. Extension library users do not need to change any script code to use either. When using a PHP extension, script versions of the same class are not loaded. This also means that there are some classes that are required from the script version of the framework, that are not in the C++ extension, either because they provide important functionality not easily available from inside C/C++ extension, as "complementary classes", or they have not yet been coded in C++, and that this may be tried in the future.

  • The difference between script-only and using the extension is performance, as measured by memory usage, and execution speed, stability and systems code sharing. More built in classes can reduce PHP execution load time, presuming they are always needed. The PHP interpreter work includes dynamic reading script files, conversion into byte code, and then execution. All this takes up extra memory and processing time. Various cache mechanisms can be used to reduce script reading time and compile, execution memory overhead. The Wcc extension classes are for web site processing tasks, such as module,route, and template management, and gets a speed up of 60-70% of the time compared script versions.

  • The Wcc objects are coded and compiled in a typical, PHP generated, C-extension development environment that is a modified C extension development environment. The C arginfo interfaces, used to register classes and functions into PHP, are generated from the same PHP stubs function - class - method files. Only the C++ compiler is used instead of a C compiler. This is specified by a editing a few lines in the "config.m4" that is used to auto generate "configure" script and a Makefile.

  • The C++ extension, and the PHP script classes version of them were co-evolved, to ensure compatibility. Opportunities for improvement needed to be managed in both.

I had designed and tested these classes already in PHP, and I wanted to find out how much performance might improve when the PHP classes are implemented as "internal classes" using C++.

There are many popular PHP script web frameworks out there. All of them keep improving and hopefully display some conversion with current standards and best practices. However there is a contrary tendency with software development to building larger code bases with increasing complexity and obscure APIs.

There are many operations and computing resources not efficiently implementable at the level of PHP script code, because of, as well as despite its flexibility and forgiving nature.

C++ extensions improve on native PHP performance. Many of the "standard" extensions exist because of this, such as the regular expression functions and database interfaces. They take advantage of linking to task optimized algorithms in compiled machine code in external libraries. For example the regular expression string functions of PHP are provided by PCRE, the Pearl Compatible Regular Expressions library, and scripts have benefitted from its upgrades. PHP source builds can choose (or not) to incorporate external library versions distributed along with the php-src archive.

Extensions with Scripted Helper classes

"Extract" is a PHP Engine linked function used to push an associatred array of arbitrary keyed data into the symbol table of a current executing scripted function.

Extract is very useful for Php-Html templates. Extension functions do not have access to this, no surprise as their stack is their own native code binary, and breaking stack function call discipline of the callee is unthinkable. Closures cannot yet be consstructed by an internal class. Extension classes can implement an "Invoke method". These restrictions are circumvented by callbacks provided by the executing script.

When hard coded as an extension, classes are marked as "internal classes" by the Zend Engine. These are privileged classes regarding the Zend API, but there is some API functionality, that is not available, or is more difficult for an "internal" class. Examples of necessary but simple helper classes written as PHP script in the Wcc suite are the Wcc\Loader class and the Wcc\ViewOuput class.

Building a zend_object class

Memory layout of extension objects in PHP-CPP

It is worth discussing the PHP-CPP implementation. PHP-CPP has been a good source of implementation ideas and tested code. How can it be improved?

The author forked PHP-CPP and tried to hack it, This was a good strategy for learning its details. Some features of seem clunky, such as using C++ multiple inheritance, and exceptions.

PHP-CPP utilizes a Php::Base object, as the recommended C++ parent of PHP extension class objects. (include/base.h) Php::Base is a clean C++ object, that only has one member, a private pointer another C++ class, ObjectImpl*.

In the PHP-CPP source, ObjectImpl class is found in zend/objectimpl.h. It internally uses a pointer to a MixedObject structure, that contains a zend_object structure itself, and a pointer to its ObjectImpl, stored at the first negative pointer offset from the zend_object structure.

Storing custom data at negative offsets to the beginning of the is the recommended way to customize zend objects. Below I show the MixedObject declaration shown outside of ObjectImpl for clarity.

In this design, the C++ base can be independent of, but can find its zend_object, and the zend_object can find the Base C++ implementation, through the middleman ObjectImpl.

// PHP-CPP object creation
// zend/objectimpl.h
// Connect a zend_object to a C++ object.
using namespace Php;

struct MixedObject
{
	ObjectImpl *self;
	zend_object php;
};


class ObjectImpl
{
	MixedObject *_mixed, 
	std::unique_ptr<Base> _object;
    //class functions
};

class Base {
	ObjectImpl *_impl = nullptr;
}


To create a fully fledged PHP-CPP zend_object instance requires three memory allocations. ObjectImpl is allocated first, and ObjectImpl constructor then creates the MixedObject with the zend_object, and stores the pointer to itself. Only after this can a newly allocated C++ class implementation be given a pointer to ObjectImpl.

This pattern also means that the Base object instance can be created, without its zend_object existance, this being optional. I don't know why this may be useful, but it is possible to have classes derived from Base object, which do not become zend_object instances.

In zend\object.cpp, an Object Value constructor, is given PHP class name, and a Base instance, and creates the ObjectImpl, giving visible existance to the Zend engine as a callable object from PHP. It implies freedom to use different Base class implementations of same zend_object interface. This is correct.

All the different C++ class instances derived from Base can create a new ObjectImpl instance to connect themselves to a zend_object instance. Each ObjectImpl, after creating a standardized zend_object instance for C++ Base-derived objects, provides a bi-directional pointer access for the two kinds of object.

The standard ObjectImpl - zend_object constructor, requires a zend_class_entry*, from the PHP Zend engine. The zend structure, zend_class_entry*, handles class-specific data, such as class name and function handlers, that provide zend_object identity and behaviour.

A zend_class_entry* requires that all the functions that the zend_object provides to PHP scripts, standard property tables and more, have been registered with the PHP zend engine. The zend_class_entry* can be fetched using the PHP class name.

In the Perforce-Zend guide, as in PHP internals books, also online, it is also recommended to place custom C-data below (negative offsets from) the zend_object structure, which then can be created using a single memory allocation. C-Data locations are at negative offsets from the memory address of the beginning of the zend_object. All the extensions distributed with the PHP source do this. PHP managed data is on the other side.

PHP objects are created during the request handling state, and use variants of the ecalloc function. Its request allocated memory functions all start with the letter e, eg emalloc, efree. All request handling memory is freed at after the request has been handled and a response returned.

It isn't explicitly stated anywhere I have yet found in PHP-CPP documentation, but all its other memory allocations, the ObjectImpl, and Base, are using C++ standard library memory allocators. And hopefully freed as well.

The PHP-CPP design effectively separates the memory allocator worlds of C++ and the PHP zend engine, which makes it easier to use other C++ code, standard template library allocators.

Object Memory layout and object management for zpp

ZPP objects contain the entire C++ object in space available below the zend_object structure.

The object memory layout code is found in the files zpp/base.*

The zpp::base_d class uses its placement new operator to store a pointer plus the entire C++ object below the zend_object. In the allocated block, the zend_object structure sits on top of each C++ class instance, and a pointer to it. A single allocation is done for the combined storage with a call to zend_object_alloc.

Other design possibilities were storing a pointer to the base_obj_mgr class that creates the object. Which has stored the sizeoft_. However the most common usage is getting the C++ object pointer from the zend_object*,

template<typename T>
	T* zobj_toc(zend_object* zobj)
	{
		return *(((T**)(zobj))-1);
	}

Going the other way, getting the zend_object* from a base_d class should be as simple, except that inheriting classes will have various increases in size. The simplest approach is to store the zend_object* value as the first protected member of base_d.

//Wcc - include wc_base.h
class   base_d {
	protected:
		zend_object* self_;
	public:

		// for derived classes
};

Each base_d class expects to be part of a zend_object in usage.
base_d objects cannot safely allocating base_d objects on the stack. There should be no reasons for copying or moving them around in memory, as can be done with simpler C++ classe and structures. To become part of zend_object, wierdly allocated base_d classes would need to be copied into the instance inside its zend_object class it belongs to.

```C++
// part of base_obj_mgr<T>
// typedef base_obj_mgr<T> mydef;
// Create new zpp::obj_rc instance
static obj_rc new_zobj()
{
	obj_rc result;
	result.adopt(mydef::znew_ex(class_entry_));
	return result;
}

Each static instance of base_obj_mgr<T> must be configured with its classEntry(zend_class_entry*) method during the calls to individual PHP_MINIT_FUNCTION(my_class_reg).

template< typename T >
class base_obj_mgr : public mgr_link  {
	/**
	 * At module init time, one is instantiated for each class T.
	 */ 
public:
	typedef base_obj_mgr<T> mydef;

	static zend_class_entry* 	    class_entry_;
	static zend_object_handlers     handlers_;
	static base_obj_mgr<T>*         self_; 
	static size_t					self_count_;

	static size_t					obj_count_;
// more
};

For both Zend single and multi-threaded model, during module initialize, these setup their static memory which then becomes read-only during MINIT, and they create dynamic classes with writeable memory during request initialization - RINIT.

So the first 3 members of base_obj_mgr are maybe thread safe, as the base_obj_mgr class instance is created in the PHP_MINIT_FUNCTION, as called from the extensions main PHP_MINIT_FUNCTION. The zend_class_entry* is created by a register_class_<Namespaced_ClassName>. This function and registration information is generated by a script that reads a _stub.php file.

The static obj_count_ function, which I use for debugging to ensure all objects get freed at request end, could be an issue for threaded requests. Currently this is excised for non-debug code.

// In services.h
class Services : base_d {
//...
public: 
	// define a template type manager for my class
	static base_obj_mgr<Services> omg;
};

// In services.cpp 
base_obj_mgr<Services Services::omg;

// Typical PHP extension class initialize in Wcc.
// From wc_services.cpp : create zend_class_entry* from class registration file \_arginfo.h
PHP_MINIT_FUNCTION(wc_services_md)
{
	auto ce = register_class_Wcc_Services();

	Services::omg.classEntry(ce);

	return SUCCESS;
}

The PHP_MINIT_FUNCTION is a C macro, which generates a function name and arguments from its text argument. In the main module initialize function, is a matching call to this function, generated by a similar C-macro.

PHP_MINIT(wc_services_md)(INIT_FUNC_ARGS_PASSTHRU);

Make an object instance

Object instance construction is done by a templated static function of base_obj_mgr. This function knows the class name and thereby its required size, and does a complicated calculation to allocate a memory block, with the C++ object at the start, with its zend_object* stored its first address, and a pointer to C++ object start stored just after the end of the C++ object, after which begins the standard structure of the zend_object, which is what the zend_object* points to.

The C++ object inherits from the base_d class, which has a virtual destructor, and comes with few virtual functions already.

In a table form

LabelItemWhat?
0C++ object (base_d)zend_object* (1a)
?vtabptr_C++ Compilers inserts a virtual function table pointer
derived classes add C++ members

| 1 | Pointer to 0 | C++ object* | | 1a | zend_object | PHP managed | | | object handlers, | | property values,

In each templated singleton, member functions access their own set of static data members, including a zend_class_entry* value which is allocated by the class registration function in module initization.

The memory layout is exactly described by the placement new allocator. Deallocation is managed by the zend_object life cycle.


/** 
  the easiest way to create a new instance is from the
  object manager class new_zobj()
  eg obj_rc my_object = Services::omg.new_zobj();

  class base_d has a custom operator new.
  void* operator new(std::size_t msize,  zend_class_entry *ce)
  which sets up the above allocation.

*/

static  obj_rc new_zobj()
{
	// setup object with handlers
	obj_rc result;
	result.adopt(mydef::make_new());
	//showobj("new_zobj()", result);
	return result;
}

Customize zend object handlers, using a derived base_obj_mgr

The behaviour of zend_object is modified according function handlers table, pointed to by its zend_class_entry. This means that any custom version of base_obj_mgr can use its virtual function override of init_class_fn() to supply a static member function, to be set as the handler function, with the same zend defined function parameters.

An example is found in the hmap.* for the HMap class, to install custom property handlers. This is done when the static class manager object is setup, that is once only at class registration time. Each created zend_object of a kind, gets the same pointer to this table.

Below we hijack the handlers get_debug_info function, and divert it to call base_d objects virtual function, debug_info, so such objects can fill a HashTable with a list of property names and values.



virtual void init_class_fn()
{
	class_entry_->create_object = mydef::znew_ex;

	// std_object_handlers is somewhere in PHP
	memcpy(&handlers_, &std_object_handlers, sizeof(zend_object_handlers));

	handlers_.offset = sizeof(T) + sizeof(base_d*);
	handlers_.get_debug_info = mydef::base_debug_info; // can be set later?
	handlers_.clone_obj = nullptr; //cloning not supported
	handlers_.dtor_obj  = zend_objects_destroy_object;
	handlers_.free_obj  = mydef::z_free;
}

Object methods registration

The PHP source release provides a means to set up a default development environment for an extension, and provides a file with the file name type of ".stub.php". All the standard extensions have them, and the environment provides a parser for files of type "stub.php". The file converts interface and class method definitions written in familiar PHP class and method declaration style, into generated C-code function declarations and macro declarations, in a file of type "_arginfo.h".

So in the MINIT_FUNCTION scrap above, the function "register_class_Wcc_Services" exists in an _arginfo.h file which was generated from a stub.php, containing a namespace Wcc and a class declaration of Services, with all its public methods. Most all the standard PHP extensions use this feature. It takes away the tedious, error prone work of hand-coding the registration methods, and is thoroughly recommended.

PHP-CPP does not seem to make use of the stub.php to _arginfo.h generation, and requires some manual coding of a registration function for each class. It has its own PHP compatible data structures to hold object registration information, and uses some class handler functions connect these to the PHP engine. This has some declaration limitations derived from the limits of older PHP versions.

Call base_d class methods from PHP C function declarations.

The PHP zend_object method calls are declared and registered from the _arginfo.h generated file(s). The following pattern is used to implement the C function which calls a method of a base_d derived class.

/**
 * Implement Wcc_Services method in wpp.sub.php
 *
 * public function setObject(object $obj, string|null $key = null) : object {}
 */
// ZEND_METHOD macro generates the function header
ZEND_METHOD(Wcc_Services, setObject)
{
	zval* 		 obj;
	zend_string* skey = nullptr;
	
	// minimum and maximum parameters
	// parameter type check macros
	ZEND_PARSE_PARAMETERS_START(1, 2)
		Z_PARAM_OBJECT(obj)
		Z_PARAM_OPTIONAL
		Z_PARAM_STR(skey)
	ZEND_PARSE_PARAMETERS_END();

	Wcc_Services* svc = zval_toc<Wcc_Services>(ZEND_THIS);

	zval_own result = svc->setObject(obj, skey);
	result.move_zv(return_value);
}

In this not very complex implementation, of the "setObject" method of Wcc_Services, the passed zend_object* is stored in an array (PHP HashTable) accessed by a zend_string key, usually its class name. The design aim, for good or ill, is to store and retrieve an object instance by class name, in the Wcc_Services object.

Why doesn't this C++ object use an STL collection for greater efficiency? Well it could be done that way, but might increase the overhead of information exchange between PHP and the C++ classes, and would increase the memory size of the code, and an efficiency gain here may be not significant. This extension framework aims for PHP inter-operability, and PHP internals reuse, by using some lightweight C++ wrapper classes around common zend data structures and pointers to them.

Briefly, the return value, zval_own, is a C++ wrapper class around a PHP zval structure. It is one of small suite of light-weight C++ wrapper objects around the most used data structures in the PHP software. The PHP structures are the string (zend_string), HashTable, object (zend_object), and the zval structure which can contain any one of them. There are a lot of other PHP types that can be contained in zval structure. Scaler values include long integer and floating point double type, and boolean true or false values.

The zval_ptr is a C++ wrapper around a pointer to a zval structure. Both zval_own and zval_ptr have useful functions for interoperability. The _own suffix of the class name indicates ownership using the structures PHP reference counting is done. The _ptr suffix of the class name indicates that no ownership reference counting is done. No-reference counting versions are safe to use, if the execution flow of program indicates that ownership is already managed safely.

A zend_string is a PHP string with built in reference counting. The C++ wrapper classes are zstr_ptr, and zstr_own. The class member instances_ is a htab_own class, a C++ wrapper around a PHP HashTable pointer. Its non-reference counting version is a htab_ptr. C++ operator[] is only used for a read operation from the htab_ptr/htab_own wrapper. I find that setting up write mechanism functions for the operator[] a too much more unnecessary complication.

zval_own
Wcc_Services::setObject(zval_ptr obj, zend_string* key) { if (!key) { // get class name of object key = obj.className(); }

instances_.set(key, obj.ptr());
return obj;

}

The PHP data wrappers will get some more discussion and documentation to help usage in the next chapter.

Work across multiple extensions

PHP scripts can now use this big extension full of classes. But how might other extensions use it?

The size of the binary extension Wcc seems now too big, and contains classes across two namespaces. Hardly everyone is going to want or need the baggage of all of them together. A more useful stragety would be to split them into two or more collections that have minimal external dependencies.

The namespace of a class is part of its class name, and good way to organise classes and their files into a directory tree. There is no problem with sharing use of a namespace between extensions.

Very roughly the Wcc classes cover these functional spaces.

Associative key collections.

Wcc\Config, implements dynamic properties. Wcc\Hmap, implements internal array.

PHP provides two kinds of access, via object properties, (object declared properties with controlled access, dynamic properties) and array. Array is subject to copy on write, making shareable write access only possible by passing as reference.

PHP allows for variable or property names embedded in strings, but not array notation. Arrays of Objects of stdClass can optionally be returned in database query row results.

$s = "Hello $obj->property";
$s = "Hello $obj->{$name}";

PHP provides stdClass for inbuilt dynamic properties.

There is an inbetween, that of the class ArrayObject. This is usable with array notation, but is shareable as an object. A read access performance comparison of all of these will be presented later.

Classes for cache, cache package serialization.

Wcc\ICache, a cache front end for different cache backends. Wcc\ICacheData, a managed cache package handled by ICache.

Support and Cache implementation script classes.

Wcc\CacheAll, create multiple instances of Wcc\ICache from configuration by name, and provide means to read and cache files through a named cache, with attention to expiry. Wcc\Cache\SFile - A useful cache implementation for Session files. prefixes expiry time at beginning of file. Wcc\Cache\Noop - A cache implementation that has always expired. Wcc\Cache\Libmemcached - Uses the Memcached PHP extension Wcc\Cache\Apcu - Uses the APCu PHP extension

Route coding and matching

Wcc\Route, Data for a route - URL, HTML verb, object-method target. Wcc\RouteSet, A serialized collection of Routes, both fixed and regular expression matched. Wcc\RouteMatch, A route, along with its decoded match arguments. Wcc\Target, Route target, a simple class with Object class name, method name, and module name. Wcc\Pair, A simple class to hold two properties. Wcc\RequestGlobals, Interface access to the SuperGlobals for _GET, _POST, _REQUEST, _SERVER. Uses Wcc\Hmap. Wcc\FileUpload, File details for uploads.

Manage view hierarchy and buffering to create HTML output.

This makes use of the Plates-like classes pattern after some shrinkage and mutilation towards simplication.

Wcc\IfLoadHtml, Interface with one function, getHtml(path,data):string Wcc\ViewOutput, Script implementation of IfLoadHtml, called by Plate class to extract data into the symbol table, then file load, amd optionally buffer the output, and return the output.

Wcc\Plate, Manage name and path to template, exposed data, and named sections of html to be inserted. Calls the IfLoadHtml class obtained from Wcc\PlateEngine.

Wcc\PlateEngine, Holds named Wcc\Plates, and shared data, plus other functions that have most likely atrophied. A shared resource for Plate instances. Wcc\SimpleView, Render from a single template file. cd Wcc\HtmlPlates, build and organize a view heirarchy, with data, and start rendering.

Wcc\HtmlGem, A monster html output class, many functions taking array arguments to output defined html segments. Wcc\IfFindLeaf , Interface to find first name-extension match in an ordered list of directories. Wcc\SearchList, Implementation of IfFindLeaf.

Wcc\MoneyFmt , A few functions to format numbers as currency.

Services, Dependency Injection

Wcc\Services, manages two lists, one of active functions/objects/data, another of deferred data, usually callable functions to create something active. If active service isn't found, the deferred list is searched for activation. Services manages a list of singleton classes by class name, "instances". Wcc\ServiceAccess, inherit from this class to use magic method to access an active service, from a services instance, or the global services.

Class finder

Wcc\Finder, breaks down a class name, including namespace, to search for it amoung namespace indexed directory paths.

Class Loader

Wcc\Loader, uses Wcc\Finder, and tries to load the class if found.

ReflectCache

Wcc\ReflectCache Manages a cache of Reflection classes, and is a way of creating a class instance and calling its constructor, with or without arguments.

Assets

Wcc\Assets, Assets are javascript, css, inline styles, injected into HTML outout. Reads a configuation file, and has functions indexed asset selection, and various output generation to call from view templates.

Response generation

Wcc\Headers. Generate headers part of HTML response. Wcc\GlobalResponse. Interact with PHP response generation.

How to break into pieces.

Maybe Singlular piece big enough to stand as an additional extension is the HTML generation classes. Another big piece is the Route related classes, including RequestGlobals. All the rest could be a "core" included with the shared binary.

Make a fresh folder extension start.

Perforce have a decent guide for building a PHP extension. Right now I'm looking at their Section 2, "Generating a PHP Extension Skeleton". This requires having installed a php-src distribution tarball. With latest distribution 8.4.12, have built a debug php version and environment.

In my linux home subdirectory ~/www I type the command.

php php-8.4.12/ext/ext_skel.php --ext runsa --dir .

The skeleton is for C coding. I wanted C++. Minimal changes, make config.m4 work for C++. In config.m4, before the PHP_NEW_EXTENSION macro, add 4 lines, and rename the .c file as .cpp

  FLAGS="-fPIC"
  CXXFLAGS="$CXXFLAGS -Wall -O2 --std=c++23 -I./include"
  PHP_REQUIRE_CXX()
  AC_LANG([C++])

Run phpize, then ./configure, and make. Some warnings come up. Fix them and wrap includes *arginfo.h with an "extern C" wrap and a define guard for good measure. php_runsa.h is the main include, with the extension version define, so move the other includes inside it from runsa.cpp.

#ifndef PHP_RUNSA_H
# define PHP_RUNSA_H

extern "C" {
#include "php.h"
#include "ext/standard/info.h"
#include "runsa_arginfo.h"
};

Oh dear, the script has made a traditional "Hello World" function as test2. Copy the zpp folder to the new project, and recode it using zpp techniques. Include the zpp code. Delete the "For compatibility with older PHP and its define". Function test2 can be rewritten with ZPP classes and a zpp::state_init instance.

// Include all zpp
#include "zpp/base.cpp"
// For debugging with define DEBUG_EXTRA
#include "zpp/show_zpp.cpp"

using namespace zpp;

class Global_init : public state_init {
public:
	str_intern world;

	void init() override {
		world = "World";
	}
};

Global_init Global_;

PHP_FUNCTION(test2)
{   
    zarg_rd args(execute_data);

	str_ptr name;
	str_rc  result;

	args.zstring(name, args.option(1));

	if (!args.throw_errors())
	{
		str_buf buf;

		if (!name.ok())
		{
			name = Global_.world;
		}
		buf << "Hello " << name;
		result = buf.zstr();
	}
	result.move_zv(return_value);
}

There's more! This skeleton needs a module init and shutdown functions, and request shutdown function for the state_init infrastructure.

    STANDARD_MODULE_HEADER,
   "Runsa",					    /* Extension name */
	ext_functions,		        /* zend_function_entry */
    PHP_MINIT(runsa),		    /* PHP_MINIT - Module initialization */
	PHP_MSHUTDOWN(runsa),	    /* PHP_MSHUTDOWN - Module shutdown */
	PHP_RINIT(runsa),			/* PHP_RINIT - Request initialization */
	PHP_RSHUTDOWN(runsa),		/* PHP_RSHUTDOWN - Request shutdown */

And all the zpp::state_init static instances need to be told what to do.

PHP_MINIT_FUNCTION(runsa)
{
#ifdef DEBUG_EXTRA
	dump_info::run_state_ = true;
#endif
    // Initialize static data instances
	zpp::state_init::init_all();
}

PHP_MSHUTDOWN_FUNCTION(wcc)
{
	zpp::state_init::end_all();

#ifdef DEBUG_EXTRA
	dump_info::run_state_ = false;
#endif
	return SUCCESS;
}

/* {{{ PHP_RINIT_FUNCTION */
PHP_RINIT_FUNCTION(runsa)
{
#if defined(ZTS) && defined(COMPILE_DL_WCC)
	ZEND_TSRMLS_CACHE_UPDATE();
#endif
	zpp::state_init::init_request();
	return SUCCESS;
}
/* }}} */

PHP_RSHUTDOWN_FUNCTION(runsa)
{
	zpp::state_init::end_request();

#ifdef BASE_DEBUG
	zpp::mgr_link::report();
#endif
	return SUCCESS;
}

After all this need to do "make install", edit the php.ini for the test PHP environment for loading this extension. Run "php -m" to find the name of the new module in the module list.

Write and run a little test script. Oh wait, they are exist already in the tests folder. Run "php run-tests.php". Get result of "Tests passed : 3 (100.0%) (100.0%)".

The next stage is to remove the baby test functions, and put in the intended class and methods.

Debug binaries are big

The simple Runsa module is reported as 1011632 bytes. This seems very big. The everything Wcc extension with umpteen classes as a debug compile is 7512928 bytes. Only seven times as big.

Optimized binaries are not too big?

Compiled for the systems PHP environment, which is non-debug, the runsa.so binary is 237904 bytes. Most of this will be from the zpp code, seemingly used here or not. The everything Wcc is 1823376 bytes, and about the same ratio. One can expect debug binaries to be more than 4 times the size of optimized binary builds.

The zpp class methods divide up a lot of callable reusable inlined PHP manipulation code between them.

Make the class stubs

This is "almost" cut and paste from the class to be hardcoded. Only the member headers and methods declarations without method code. The _arginfo.h generate script is very good, but has some restrictions. The namespace declaration is used, but it will not process "use" statements as yet, to indicate what the full namespace path names are of any "foreign" class name, so class names from other namespaces need to be written in full for each usage. This alone suggests using the most common namespace prefix and declaring it as the namespace for the stub class, as much as possible to have relating classes in the same namespace. It is possible to make mistakes when making and altering complex class name trees.

Only public or protected methods need to be exposed.

In the stub file of a class, only public or protected methods or static functions need to be declared. Luckily this Run class has only two of these altogether, and one is the __constructor! The main concern of __constructor is to set initialize all the read-only properties. The execute function, is manage the initialization, processing and destruction cleanup cycle, and as a top function, calls everything else, and is not itself called, except for providing values from named fixed properties. The stub arginfo.h generate C/C++ code which creates all of these declared properties in the class registration function. The __constructor is needed to get an instance variable to fetch its property parameters. A __destructor function is added for any needed run-time cleanup.

Inside these two declared functions all methods of implementation are hidden.

Write a build script

It gets boring to always type out make and arguments, so a build.sh script can dictate cleanups, rebuild and install command orders. To tidy a bit, create a stubs folder and move the stub.php and arginfo.h into it. Change the include path in runsa.cpp

#!/bin/bash
# buildme script
make clean
make stubs/runsa_arginfo.h
make 

PHP has only one table for global functions

Once the function declarations test1() and test(2) are deleted, and replaced with the Run class declaration the "ext_functions" array declaration disappears from arginfo.h as well, and can be replaced with a nullptr, in the zend_module_entry declaration. Please note that the instruction comments "@generate-class-entries" have to be at the beginning, before the namespace declaration, not after, or else only a very tiny amount of declaration code will be generated, without class entries or registration function.

<?php
/**
 * @generate-class-entries
 * @generate-legacy-arginfo 80400
 * @undocumentable
 */
namespace Wcc;

class Run {
    public readonly string $phproot;
    public readonly Finder $finder;
    //... more property declarations

    public function __constructor(string $phproot) {}
    public function __destructor() {}

    public function execute(string $bootstrap) : void {}
};

Create run.h and run.cpp

This class is going to be in namespace Wcc, for both PHP and C++, so make a folder wcc. Define guard their contents, in old-fashioned style. The extension main file will include the run.cpp, and test if defined for its inclusion. Its declaration is in run.h. Runsa is also the name of the main extension file, so change it to php_runsa.cpp, and update the config.m4 with this. Now this better matches the php_runsa.h. Will have to run phpize and ./configure again.

Explain Runsa

This class is going to have to be responsible for loading some other PHP files, but already there is no extension class for this. A simple solution is to create a simple PHP script custom loader function that calls require_once. If the name is agreed to be "simple_loader" for instance, there is no need to pass its name, so it can be hard-coded.

// index.php
// Required by run class, to bootstrap a more complex auto loader class
function simple_loader(string $file) : mixed 
{
    return require_once($file);
}

$run = new Run("wc/php");

$run->execute("bootstrap.php");

C++ class member declarations line up along the PHP api.

//wcc/run.h
namespace wcc {
	using namespace zpp;

	class Run : public base_d {
	public:

		static base_obj_mgr<Run> omg;

		void construct(str_ptr php_root);
		void destruct();
		void execute(str_ptr bootstrap);
	};
};

When the run_arginfo.h generated it has class method declarations and class register function, that can be copy-pasted, and fleshed out in wcc/run.cpp. The destruct() is easy, their are no parameters, and the

//These are outside the namespace brackets
using namespace zpp;
using namespace wcc;

ZEND_METHOD(Wcc_Run, __construct)
{
	zargs_rd args(execute_data);
	str_ptr path;

	args.zstring(path, args.need(1));

	if (!args.throw_errors())
	{
		Run* cobj = zobj_toc<Run>(ZEND_THIS);
		cobj->construct(path);
	}
}

ZEND_METHOD(Wcc_Run, __destruct)
{
	ZEND_PARSE_PARAMETERS_NONE();
	Run* cobj = zobj_toc<Run>(ZEND_THIS);
	cobj->destruct();
}

ZEND_METHOD(Wcc_Run, execute)
{
	zargs_rd args(execute_data);
	str_ptr bootstrap;

	args.zstring(bootstrap, args.need(1));

	if (!args.throw_errors())
	{
		Run* cobj = zobj_toc<Run>(ZEND_THIS);
		cobj->execute(bootstrap);
	}
}

PHP_MINIT_FUNCTION(wcc_run_reg)
{
	Run::omg.classEntry(register_class_Wcc_Run());
	return SUCCESS;
}

Error exceptions

The zargs_rd code zstring(str_ptr& s, zval* ) recieves the indexed zval* from the parameters array in the execute_data structure. If it doesn't exist or isn't a string, an error flag and message buffer is created, and these will be posted back to the zend engine with zend_throw_error. A C++ or C exception isn't actually thrown, as an error check and real PHP error exception are thrown by the ZEND engine after the function exits.

The module initialization stage may throw C++ std::logic_error, especially from the base_obj_mgr code, for things like nullptr, and on this the PHP instance will giveup with an error.

So far the zpp only posts, not throws zend_error exceptions. This is something that may later need to be revised, as these have so far just used the zend_ce_error class, not the zend_ce_exception class. The author is against the idea of making lots of different error exceptions for every class, since PHP usually provides excellent location information, except for where it came from in the C++ code, which will be hinted by the message content.

To have PHP produce its versions of argument error messages, uses the ZEND_PARSE_PARAMETERS_START and friends macros.

Need the PHP_MINIT_FUNCTION to register class

This needs to be called from the main module code, during module initialization, or the Run class will not be available. If their are multiple class includes, the #ifdef make it easier to not include them.

The Run::omg.classEntry function needs to be called with zend_class_entry pointer result. This is an opportunity for derived manager classes to tweak their structure of zend_object function handlers, or do other sophisticated things during module initialization.

Make sure that Run::omg has declared its instance of course, otherwise segfault.

// inside the PHP_MINIT_FUNCTION(runsa) function
#ifdef WCC_RUN_CPP
	PHP_MINIT(wcc_run_reg)(INIT_FUNC_ARGS_PASSTHRU);
#endif

Do something useful in the C++ class

Now this chapter has gone for long enough, its time to start another with whatever surprises will be encountered while doing the yet to be coded C++ methods for the Run class.

After testing with Run::construct

The test environment case now encountered is that the Wcc extension is loaded prior to Runsa, and Runsa now accesses the zpp functions only through its header files. This reduces the size of Runsa binary by a lot (to just 48120 bytes), and seems to work. After all, the extension is a shared library. But the test script for new Run($path) runs, but gives an error called double free.

After putting in a debug show statement in the state_init::init() it is obvious is a double assign as well. This is because both extensions call state_init::init_all(). This double inits the FTAB, of which there is only one copy in Wcc, as Runsa, only references the (mangled) C++ function names, in the Wcc shared library, which it is able to call successfully.

Interesting enough, running the Wcc also shows the Run_i static variable which only in the Runsa extension. Maybe no surprise that they are in the same PHP process, and both call the static function state_init::init_all() in the MINIT, and they must needs by sharing the same memory pools from calling malloc and emalloc.

The obvious try to fix is here to remove the MINIT event calls from Runsa, and let the primary big Wcc do it by itself.

After making this change, both extensions now run a simple test script without reporting error.

Sharing the zpp repository

zpp folder is now a submodule of both projects, meaning updating zpp also requires each using repository to update its submodule status. It is almost worth writing a script to do the commit updates for three repositories when zpp gets updated, then recompile both extensions. It's important as sharing headers need to refer to the same functions.

Try a better division of labor

Having proved that a zpp shared extension is possible, maybe it is now time consider a refactor to have the Wcc and Wcd as separate extensions, both dependent on a core Zpp extension, and investigate how this affects peformance.

nm --demangle modules/wcc.so | grep FTAB
00000000001650a0 B zpp::FTAB

PHP extensions share their memory and callable addresses with each other. So I split up a large collection of objects that used zpp arbitrarily into 5 seperate, yet sharing extensions.

To keep both a single source for a combined single extension, and create multiple extensions as parts, skeleton extension folders where created in sibling folders to the main wcc folder. Each included the wcc sources by C++ include path specified in their config.m4 file.

CXXFLAGS="$CXXFLAGS -Wall -O2 --std=c++23 -I./include -I../wcc"

It is only neccessary to include the header files for zpp , and source files for the classes, in the extension source, and one extension contains the source files for zpp. The principle a binary version of a callable function only needs to exist in one extension and be callable from the others.

The build files were put in seperate repositories. The wccz which contains the zpp binaries must always be loaded first. Each extension registers its own classes.

wccz (wcc-zero.git)

ClassFileDescription
zpp::*zpp/base.cppAll low level classes in zpp namespace
dump_infozpp/show_zpp.cppDump explict insides of Zend structures
Replacewcc/replace.cppReplace variable names in a string with properties in object
Configwcc/config.cppInstances store dynamic property values
Finderwcc/finder.cppSearch namespace name indexed paths for class source files
Hmapwcc/hmap.cppInstances store properties in an array.
ReflectCachewcc/reflect_cache.cppCreate objects from their reflection class
Serviceswcc/services.cppStore and retrieve callable dynamic properties and objects
ServiceAccesswcc/service_access.cppCached service access by magic __get
Strwcc/strfnsA few utility string static functions

wccr (wcc-route.git)

These wcc classes also use some of the wcc classes in wccz, as well as zpp, by including their header files.

ClassFileDescription
ICachewcc/icache.cppBase class for access to data/file cache
ICacheDatawcc/icachedata.cppA managed package of serialized cached data
CacheMgrwcc/cachemgr.cppManaged mamed caches with different properties
GlobalResponsewcc/global_response.cppInterface to setup HTML response and send
Pairwcc/pair.cppHold two properties as "key" and "value"
RequestGlobalswcc/request_globals.cppAccess to the "SuperGlobals" data for HTML
Routewcc/route.cppA URL, verb, and target object - method details
RouteMatchwcc/route_match.cppOne of these to hold reorganised route details for dispatch
RouteSetA collection fixed URL and Regular expression routes to match
TargetA simple 3-tuple of Class name, method name, and module name for a route

wcch (wcc-html.git)

Generally classes for organising template html views, and generating html elements.

ClassFileDescription
Assetswcc/assets.cppInsert links, css, javascript, html blobs into views
HtmlGemwcc/htmlgem.cppGenerate various Html elements from array parameters
HtmlPlateswcc/htmlplates.cppOrganise views in layers, from inside to outer
MoneyFmtwcc/money_fmt.cppSimple interface to PHP currency formats.
Platewcc/plate.cppShort for template. Hold location and data for a template
PlateEnginewcc/plate_engine.cppManage templates, their data and cache outputs
SearchListwcc/search_list.cppOrdered list of folders and search for files

wccm (wcc-more.git)

ClassFileDescription
XmlReadwcc/xmlread.cppRead xml format specifying PHP data, objects and arrays
Tomltoml/toml_php.cppA fast toml format reader in C code by https://github.com/cktan/tomlcpp

The toml read implementation uses source from https://github.com/cktan/tomlc99 - Copyright (c) CK Tan. It is mostly C source code, and its C memory allocator function pointers are switched to using emalloc and efree. This makes its memory C allocations in request memory.

wccd (wcc-database.git)

These may be specified later, theses classes have some complexities, sources in wcd/*.{h, cpp}

zval ubiquity

The ubiquitous zval is represented by val_ptr and val_rc.

val_ptr

val_ptr holds a zval*, given by PHP from a function call, or from inside an array. val_ptr does not do automatic reference counting on its content in constructor or destructor. There is no declared destructor function. val_ptr merely copies a zval* to its only internal member value, zval* p_, for further inspection and/or work.

The main externally used purpose of val_ptr, is to pass along zval* without reference counting, and read its enclose value type, and extract its encloses PHP types without altering the value within, as most functions are delared as () const;

The val_ptr does have some protected methods which will increment its reference count. These are used by some methods of the val_rc class, htab_ptr and htab_rw class, which declared as "friends".

There are also some static methods, inlined as declared in the val_ptr header. These are string_bind, object_bind, and array_bind. They so not increment their stored values reference count. This is because where used the increment happens only on success of the operation, usually a write to a HashTable, or in setting an object property. As this is mostly successful, the code really ought to increment the reference count, and undo only on unexpected failure. But it is to allow the use of "throwaway" temporary plain zval structure, increment not done, and also no need for a destructor with a check for reference decrement.

The author decided to indulge further and return a boolean value from string_bind, object_bind, and array_bind to be true if value is reference counted.

If the contained zval* is a nullptr, isNull() returns true, and ok() returns false. Method is_nullptr() returns true for the specific case of containing a nullptr.

// How zval temporary is used
void obj_ptr::property(str_ptr key, str_ptr value)
{
    zval temp = {0};
    // No rc++, because zval is thrown away on exit.
    val_ptr::string_bind(&temp, value);
    property(key, val_ptr(&temp));
}

// How zval temporary is assigned a specific type
void htab_rw::push_back(zend_string* zs)
{
	zval tmp = {0};
	bool refct = val_ptr::string_bind(&tmp,zs);

	if (zend_hash_next_index_insert(ht_, &tmp))
	{
        // avoid double checks
		if (refct) GC_ADDREF(zs);	
	}
}

// the static try_addref methods are type specific and also likely to be inlined
// 
static void str_rc::try_addref(zend_string* zs)
		{
			if (GC_TYPE_INFO(zs) & IS_STR_INTERNED)
			{
				return;
			}
			GC_ADDREF(zs);
		}

These methods are for inner part of the workings of some operators and classes of zpp, and should hardly ever be used externally by other C++ class code.

var_rc

var_rc does not inherited from val_ptr. Rather than holding a zval*, it holds a full zval structure. Which can hold anything PHP can put in it. It uses val_ptr methods in its internals, because, it can pass a pointer to its own zval to construct a val_ptr to do much of its work.

A major purpose of val_rc, is to allow the assignment of raw PHP values, handle reference counting for values returned from functions that are otherwise garbage collected, and sometime return a mixed value back to PHP.

'var_rc', like 'var_ptr' has methods to query the contained PHP type, and attempt to return common PHP handle types asked for, namely zobject(), zarray(), zstr(). It returns a nullptr if the type is not agreeable, does not throw an exception.

Using strings. zpp::str_rc, zpp::str_intern

Just like htab_ptr, htab_rc relationship, zend_string* management labour is divided into two C++ wrapper classes. str_ptr is the base class, but does no manipulation of zend_string* reference counts. str_rc adds the mechanism of zend_string* reference counting.

In the PHP architecture, strings are special. "interned", means it is undeletable during a request, and has a unique value in PHP internal interned string table. When a .php file is loaded all string literals become interned. The str_intern class gets this pointer when assigned.

String classes

classparenthas
str_ptrMethods for using zend_string*
str_rcstr_ptrreference counts contained string
str_emptystr_rcA valid zend_string* that is empty
str_permstr_rcNot allocated in request memory
str_tempstr_rcAllocated in request memory
str_internstr_rcBecomes pointer to "interned" string.
str_bufferstr_outStream buffer using << operator

zpp::state_init class

Strings are interned because they will be reused in multiple requests. This means their storage is persistent, created at module initialization, and deallocated at module shutdown.

As a result, most string literals in a PHP script, are "interned", as part of their compilation. The string hash function and a big interned string hash table seems to ensure that their is only one instance for the same string value used many times in multiple places.

class  state_init {
    protected:
        static state_init* first_;
        static state_init* last_;

        state_init* next_;
    public:
    	// module init/end calls
        virtual void init();
        virtual void end();
...
};

Static instances of state_init join themselves into a single-linked list. This is used by creating a child class of state_init, adding str_intern members, and assigning to them in an override of the init() method. Declare a static storage name for the class, and use that reference to its stored strings during requests.

Example from wcc/finder.*

class Finder_init : public state_init 
{
public:
	Finder_init() : state_init() {}

	str_intern folders_key;
	str_intern php_ext;
	str_intern dir_sep;

	void init() override 
	{
		folders_key = "folders";
		dir_sep = "/";
		php_ext = ".php";
	}

};

Finder_init  FDit;

This also suggests that it is good idea for Wcc classes to setup module initialised instances of str_intern. And creating as many reused values as possible is a good idea. Also it is not a major problem to have a few duplicate string values in multiple compilation units, which will end up referencing the same interned zend_string.

Strings as function names.

To use zpp classes call any function in PHP, its name needs to be in zend_string. For up to 4 arguments, the call can be made through a obj_ptr class, using one of its matching "call" methods

/* Methods of zpp::obj_ptr.
  If obj_ptr contains a nullptr, its a global function call.
  else its a call to a method of its zend_object
*/
val_rc call(str_ptr method);
val_rc call(str_ptr method, HashTable* args);
val_rc call(str_ptr method, zval* arg1);
//...
val_rc call(str_ptr method, 
	        zval* arg1, zval* arg2, zval* arg3, zval* arg4);

Callable objects

Sometimes an obj_ptr contains a closure or "isCallable", in which case the "callable" method should be used.

//Methods of zpp::obj_ptr.
val_rc callable();
val_rc callable(zval* arg1);
val_rc callable(zval* arg1, zval* arg2);

Custom function calls

The templated class zpp::fn_call_args<ARGCT> can be given a required number of arguments. The call methods of obj_ptr are using it. Declare the function call object, set its obj_ptr, method name, get a cleaned arguments array with argsptr() method , and call using call_fn, receive result in val_rc instance.

// Templated function call example using zpp::fn_call
	fn_call_args<1> caller;

    caller.set_fci(obj_, method);
    zval* args = caller.argsptr();
    ZVAL_COPY_VALUE(args, myarg);
    val_rc result = caller.call_fn();

The fn_call class retains cached call data

Repeated calls to the same function / method will be faster if the same call instance is reused. A fn_call instance can be embedded in an object, or the stack, or a state_init instance.

A number of arbitrary functions, and str_intern names of functions are declared in state_init instances in zpp/fn_call.h

Custom function objects can be derived from fn_call, and given their own call method, or functions written to use fn_call.

// from zpp/fn_call.*
class fn_fgetcsv : public fn_call_args<1> {
    public:
        fn_fgetcsv();
        val_rc call(val_ptr file_res);
    };

Discovering memory issues.

For use in repeated request processes, such as php-fpm workers, its important to clean the request memory up each time. So some attention has been given to test extensions made with Wcc for memory issues. This mean running the extension with a version of PHP compiled in debug mode, and sometimes running a test script using valgrind. Such testing uncovered instances of memory leak behaviour, and a few cases of circular pointer references, to ensure all objects get freed.

All the *_rc classes "should", when used normally, keep memory errors down to zero.

Missing features, and likely changes

Things are likely to be missing, or need changing.

Script compatibility

The Wcc and Wcd classes have their script-only objects and functions have been kept in near parallel compatibility with their compiled wcc extension versions. This makes it easier to test for changes in design before hard-coding into C++ classes.

str_ptr

This is a basic wrapper aroung the zend_string*, but does not do reference counting. It has a number of methods that return a str_rc, with a a new zend_string* that is created. Methods that are likely to change the reference count of the contained zend_string* are handled by str_rc, such as lowercase, uppercase and trim functions.

str_perm, str_temp

These differ in the memory allocation flag they pass to create a zend_string*. str_perm requires a malloc, outside of request allocated memory. str_temp is allocated out of the request memory pool.

str_intern

Does "internment". Allocated as persistant memory, and then posted to the interned string array. If such a string is already allocated, in the global interned string table, the previously allocated version replaces the newly allocated one, which is deallocated, so only one copy remains. Lots of str_intern members are assigned to state_init instances, called to inititialised MODULE_INIT time, from C strings in the source code, by assignment operator=. Such strings cannot be reference counted, and have their IMMUTABLE flag set.

str_empty

An immutable zend_string*, a valid pointer to a terminating null as the empty string.

str_buffer

This is a wrapper around the zend smart_str C API. The append process is terminated by its zstr() function, which finalizes the smart_str and stores its null-terminated zend_string*, as a str_rc. The stored str_rc will be wiped if the str_buffer is appended to again. It should be assigned to another str_rc if str_buffer reuse is required.

zend_smart_str_public.h
typedef struct {
	/** See smart_str_extract() */
	zend_string *s;
	size_t a;
} smart_str;

zpp::str_buf encloses a zend_smart_str structure, and uses its C functions to append.

using namespace zpp;
str_buffer buf;
buf << "Start appending " << 101 << " dalmations" << endl;
str_ptr a101 = buf.zstr(); // value exists until buf is changed or destructed

HashTable* wrap, class zpp::htab_ptr

C++ classes of an extension will be transferring data to and from the zend_object and zend_array world.

Behind the PHP array class is the zend_array structure, which also aliased by the name HashTable. This is storage structure of considerable sophistication, and for most purposes is a set and fetch store of zval structures, indexed either by string hash, or by integer indexes.

HashTable manipulation is managed by the C++ classes of htab_ptr, to wrap a HashTable* with common methods, and its derived class htab_own, which adds methods that can change its reference count. Php does "Copy on Write" - COW, to ensure that only HashTables that have a reference count of 1 can be updated. To avoid warnings and exceptions from the zend API, C++ classes have to respect this.

htab_ptr

  1. Is constructed by giving it a valid HashTable* from elsewhere
  2. Never bothers to touch the HashTable reference count
  3. Has many functions to fetch or set values, creating a zval if necessary.

htab_own

  1. Inherits methods from htab_ptr
  2. Additional methods to get, duplicate for COW, transfer or abandon HashTable* ownership with reference count manipulation.

This is a self-reminder that the source currently has very little in the way of code comments to explain how,why, what of various methods.

Constructors

htab_ptr gets its only data member a HashTable* , from any other wrapper class that might have one. The default constructor assigns a nullptr. This means a bool isNull() method exists to check that. Other sources are

  1. A bare HashTable*
  2. zval_own class
  3. zval_ptr class
  4. zval* pointer, usually from PHP land
  5. Another htab_ptr class.
htab_ptr() : ht_(nullptr) {}

htab_ptr(HashTable* ht);
htab_ptr(const zval_own& zw);
htab_ptr(const zval_ptr& zptr);
htab_ptr(zval* p);
htab_ptr(const htab_ptr& c) : ht_(c.ht_) {}


Has an assignment operator= from a zval*.

Pointer access

htab_ptr has a cast-operator () to a Hashtable* of course, and a ptr() method also returning it for redundancy.

No isset

No isset, because that is a special PHP function. The methods has_index, and has_key are provided.

bool  has_index(zend_long key);
bool  has_key(zend_string* skey);
// It is assumed zval_ptr has a zend_string , or zend_long value.
bool  has_key(zval_ptr skey);

Append methods

// Append string value, or zval* with anything, to next free integer index.
void push_back(zend_string* zs);
void push_back(const char* s, std::size_t slen);
void push_back(zval* zv);

Set or Unset by string key methods

There are a combinatorial number of these, with different ways to pass a string key, and a value source. Almost N^2^. There are far many more zend API C functions for setting an array value. They are all called set. Another version was added every time the need was felt. Unset at least only needs one argument.

void set(zend_string* key, zval* val);
void set(zend_long idx, zval_own& value); 
//...etc

bool unset(zend_long idx);
bool unset(zend_string* key);
bool unset(zval_own& key);
void unset(zval_ptr key);

Get or Array operator

The array operator calls the equivalent get function to do the job. The get methods were done first, and operator[] added for cuteness. They all return a zval*, since this is what the zend API functions return.

zval* get(zend_long idx) const;
zval* get(zval_ptr key) const;
zval* get(zend_string* zkey) const;
//...etc

try and fetch

All of these return a bool (true or false) for the success of the lookup and assign on success operation. There are a few more, like this. Note that none of this arguments for string key use any kind of reference to a zstr_ptr or zstr_own class, because C++ will automatically use their cast operator to push a zend_string*.

bool try_fetch(zval_own& key, zval_ptr& store);
bool try_fetch(zend_string* key, zval_own&  store);
bool try_fetch(zend_string* key, zval_ptr& store);
//...

Special assign to zval* return_value in PHP object method implementation.

Even though the htab_ptr doesn't care about reference counts, this is the one exception. When returning an array, as the PHP method return value, the zval* will be assigned the HashTable, and is then told to bump its reference count, which was at least 1 already.

This might make the API complain if another attempt to write to the HashTable is made. Due to copy-on-write practices for zend_array, warnings or exceptions will happen if its reference count is greater than 1. PHP normally ensures this by making a copy on write, but the above functions do not do this, and successful updates depend on (HashTable* reference count == 1).

void htab_ptr::return_zv(zval* return_value) const
{
    ZVAL_ARR(return_value, ht_);
    Z_TRY_ADDREF_P(return_value);
    ht_ = nullptr;
}

Getting out of zval land.

Several important zval types, in particular zend_string, zend_object, zend_array, zend_reference, hold a pointer to a structure that starts with -

zend_refcounted_h gc; 

A zval structure, because its a union type, a can only reference or hold one type at a time. That means an all-purpose zval wrapper, which is trying to be an interface for all common types, has to have a lot of methods, and type-specific methods have to perform type checks to see if they apply to the current stored type. The PHP-CPP Php::Value type is its general purpose zval wrapper, and it has methods galore to do many things. Php::Value has derived Array and Object classes, which add a few special purpose functions. There are a number of methods to work with C++ STL datatypes.

In the PHP-CPP source is an internal String class defined in zend/string.h. This is specialized for generating persistant zend_string values. This is used as an adjunct to the using of Php::Value as the basis of all PHP value structure management. This is not a surprise, given that scripting variables all have the _struct_zend_value

On the whole the PHP-CPP either expects programmers to work with raw zend API for its reference counted types, or to be using C++ data and STL types outside of the Value wrapper. Depending on what the extension is for, minimizing the time working with the Value abstraction, and minizing callbacks to PHP script-land, is a goal. Working too much with the Value abstraction of a zval, and PHP callbacks are therefore labelled as inefficient for those expecting some performance gains from a compiled C++ extension. There is truth in this, given the ease of script programming versus time spent coding C++ extensions.

Array operators []

Programmers expect to use lovely array brackets for array indexes. It is easy enough to define these in terms of an inline call to the get function. However a common return type needs to be settled on, as C++, does not allow multiple declared functions with different return types but everything else the same. I choose the return type to be zval_ptr class, which of course just contains a zval* pointer, as this is always returned by the PHP zend_array fetch API calls, and can be a nullptr value.

// inline operators
    zval_ptr operator[](zend_long idx) const { return get(idx); }
    zval_ptr operator[](zend_string* zkey) const { return get(zkey); }

The get functions return a nullptr into the zval_ptr class, if the key does not find a stored value.

In order to set a PHP value via C++, the plain set method is the most direct. For string keys and a direct zval* the following implementation is used.

void htab_ptr::set(zend_string* key, zval* val)
{
    if (zend_hash_update(ht_, key, val))
    {
        Z_TRY_ADDREF(*val);
    }
} 

The zend_hash_update function is for this purpose an add or update function, returns a pointer to the actual zval structure as where the data has been stored in the array. This is very unlikely to be the original zval* passed as an argument to ::set.

But the returned zval* should contain the same data. The contents have been already copied using the ZVAL_COPY_VALUE macro, which does nothing about reference counting. If zend_hash_upate returns a non-null value, the add or update to the HashTable has been successful, and the returned zval* must be holding the same referenced structure, that is held in the original zval* val, so either can be used to have its reference count incremented.

The zend API also provides functions that will only add, if a value does not already exist, and only update, providing a value already exists.

To use the returned zval* by zend_hash_update, in some form of override operator[] for set, would require further work. For instance, imagine that the hash_update call just sets a null value initially, then a write operator[] uses the returned storage zval address to be assigned the final intended value. It should be possible to provide C++ notational illusion of array[key] = value. It means extra coding to provide this the more direct set methods, and I do not see this as worth the extra coding and computational effort.

The C++ brackets assignment operator works well where the storage already exists prior to the method call, as in fixed arrays and matrices. In PHP the storage in the HashTable does not exist until the zend_hash_update returns the pointer to it, and has already updated it once. There is nothing I know in the design of C++ that easily rewrites a r-value to brackets l-value assignment, as a simple call to our set function, that already does the job.

PHP-CPP HashMember virtual function call chain

The design of the PHP-CPP array brackets operator returning the template HashMember is too complex for me to understand, just why it is so. In actual code, Value objects use the operator[] for read and write for keyed array or object property values.

These operators work their magic via returning a templated HashMember structure, which has a base class with many virtual functions. The parent class is HashParent, and Value itself inherits from HashParent. So a call to set via operator[] eventually calls a virtual method of Value to call set, which will call setRaw. HashMember has many C++ style override operators.

Its templated, so a compiler might only compile and link whatever was used. There is no use of zend_hash_update anywhere in the PHP-CPP code. Nevermind, there are a lot of zend array access functions, available in all flavours.

For the write operator[], PHP::Value returns an entire HashMember structure, containing a pointer to itself, and the value of the key for the intended write. This structure has assignment operator access for whatever value is being assigned, and the assignment operator calls back the Value class to do the actual array update operation.

So in PHP-CPP the final HashTable write calls are routed into Value::setRaw(), which is overridden with either and integer or string key. The string version calls add_assoc_zval_ex, which calls zend_symtable_str_update. This function checks if a string is numeric, and tries to convert it to a zend_long index, if so.

Here the PHP-CPP code is educational, to try and figure out the C++ virtual virtuosity. for the lengths it has gone to duplicate known PHP quirks, and provide an easy to code in set of classes. My low level attack on this issue expects the programmer to know when to use string keys or integer keys directly. Its by data type, and I do not care about string numeric content as keys for conversion to zend_long. Why should the string hash function care?

// in php-src, the zend_hash.c function called by setRaw(const char* key ...)
static zend_always_inline zval *zend_symtable_str_update(HashTable *ht, const char *str, size_t len, zval *pData)
{
    zend_ulong idx;

    if (ZEND_HANDLE_NUMERIC_STR(str, len, idx)) {
        return zend_hash_index_update(ht, idx, pData);
    } else { 
        return zend_hash_str_update(ht, str, len, pData);
    }
}

This is only a vague discriptions of the HashMember call chain for array[] notation, and property access It seems that there is some additional complexity cost overhead, requiring additional class creation, and destruction, in terms of function calls. The Php::Value class a kind of swiss army knife of PHP values access. It is versitile. In memory cost, also Value instance holds a hidden virtual function table pointer, as well as a full zval structure. Value harbours a lingering identity crisis, and must often check its zval structure every time to figure out just what it really is, and what any operator function needs to do.

Performance gains for PHP-CPP and Wcc vs PHP script

I made an XML file format for use as configuation files, similar aims to JSON, and YAML. For this simple parser script called xmlread.php has a xmlreader class to pull-parse the format into its final hierarchy of array values including objects. To get measurable values a test script repeated the file read 1,000 times.

I made a PHP-CPP version, and a Wcc version, of the same class, with no major algorithm differences, however the source code classes are different. All three give the same data output values. The C++ extensions manage this timed task at least 5 times faster.

The results on this AMD Ryzen 5 laptop were :- PHP Script - 0.25 seconds PHP-CPP - 0.048 seconds. Wcc - 0.036 seconds.

The Wcc value was 0.37-0.38 before the code was changed to use fcall_info_cache for some method calls.

The xmlread class works by callbacks to xmlreader object methods, such as to read the current node string value, fetch a node attribute value, and read current property values.

Both PHP-CPP and Wcc make use of the main callback facility for the Zend API, which is call_user_fn. Just a little further into the API it calls the function.

zend_result zend_call_function(zend_fcall_info *fci, zend_fcall_info_cache *fci_cache)

By putting the zend_fcall_info, and zend_fcall_info_cache, as C++ datamembers into the xmlread class itself, for the 3 most repeated method calls (read, get_attribute, read_string), to call zend_call_function, instead of via call_user_fn, overall time for the script read test diminished by 1-2%. The cached function calls themselves must have improved somewhat more than that, given that the time for all the other processing did not change, and their is enough work done in C++ extension xmlread class to make it faster than the PHP-Script version.

HashTable* Reference Counted, class zpp::htab_rc

The class htab_rc takes over a HashTable* and makes it its own, by incrementing its reference count, in the same way that a zval owner does. Just like htab_ptr, its HashTable* can be set to a nullptr. Like zend_string* the HashTable* can be marked as immutable, , a fixed constant in the PHP system, in which case nothing should be done to it, and the reference count is not touched. This is carried in the method lose(). In lose, the class disowns its current pointer, and has responsibility to call zend_array_destroy() if it is the last owner.

The default constructor of htab_own also creates its own minimum allocation HashTable*. This creates an initialized HashTable containing nothing, when used as a member variable in another class.

// Wcc htab_own. Disown the referenced pointer.
htab_own::~htab_own()
{
	lose();
}

void htab_own::lose()
{
	if (ht_) {
		auto za = ht_;
		ht_ = nullptr;
		if (za->gc.u.type_info & GC_IMMUTABLE) 
		{
			return;
		}
		if (GC_REFCOUNT(za) <= 1) {
			zend_array_destroy(za);
		}
		else {
			GC_TRY_DELREF(za);
		}
	}
}

Class htab_own has another of making sure that it has a writable HashTable* with a reference count of 1. The "make_own" method does similar to the SEPARATE_ARRAY macro for zval owned arrays. Creating a HashTable* from another htab_own has also requires direct array duplication. Two instances of htab_own cannot have the same HashTable*. The standard move operator constructor just switches pointer.

// Protected static function
HashTable* 
htab_own::make_own(HashTable *h)
{
	if (GC_REFCOUNT(h) > 1) {
		HashTable* dup = zend_array_dup(h);
		if (dup != h) {
			// assume h was properly owned already
			GC_TRY_DELREF(h);
		}
		return dup; 
	}
	return h;
}

htab_own::htab_own(const htab_own& c)
{
	ht_ = zend_array_dup(c.ht_);
}

// Moving HashTable* 
htab_own::htab_own(htab_own&& m)
{

	ht_ = m.ht_;
	m.ht_ = nullptr;
	dbg_dump(__FUNCTION__)
}

const htab_own& 
htab_own::operator=(const htab_own& c){
	if (ht_)
	{
		lose();
	}
	ht_ = c.ht_;
	own();
	ht_ = make_own(ht_);
	return *this;
}

Of course htab_own default constructor creates its own HashTable* with a default reference count of 1. HT_MIN_SIZE is currently 8, so the only true minimum array is actually a nullptr. The methods result() or clear() will result in an empty array.

// protected
htab_own::init()
{
	ht_ = zend_new_array(HT_MIN_SIZE);
}

htab_own::htab_own()
{
	init();
} 

HashTable* Writing values, class zpp::htab_rw

The PHP scripting language requires that Arrays with more than one reference, that is assigned to more than one variable, be copy on write. If a write occurs the copy is made of the array written to, so that again its reference count value is 1. Otherwise the contents of arrays are shared, only for that content just prior to the last write operation.

Using zpp::htab_rw

htab_rw is also a htab_ptr , with read methods, and adds all write and array modification methods. It is constructed using another source of a HashTable, and is not itself reference counted. The source class will have its HashTable checked, and if it has a reference count of greater than one, the HashTable pointer will be duplicated, and the new copy with reference count of 1 is stationed in its place.

Its maybe worth looking at the current implementation of copy-on-write

/**
 * static class functions of htab\_rc.

 * Copy on Write operation (cowop)
 * Ensures argument inout is a writeable HashTable with reference count of 1.
 * The owning object loses its access to the original, so try\_decref must be called here.
 * 
 * Zend engine has a global static zend_array structure.
 * - zend_empty_array. Its reference count is 2, and marked as immutable.
 * Return true if the pointer reference was changed.
 */
bool htab_rc::cowop(HashTable*& inout, size_t init)
{
	HashTable* used = inout;
	if ( (used == nullptr)
	   ||(used == const_cast<HashTable*>(&zend_empty_array)))
	{
		HashTable* newht = zend_new_array(init);
		inout = newht;
		return true;
	}
	if (GC_REFCOUNT(used) > 1) 
	{
		inout = zend_array_dup(used);
		htab_rc::try_decref(used);
	    return true;
	}
	return false;
}
/**
 * htab_rw will ensure the source class donating the HashTable* does
 * a copy-write array operation if the reference count is greater than
 * 1.  The size argument is used to make a new array if HashTable* is null 
 * or immutable. Fair Warning: This will replace value inside any zval.
 * htab_rw is then usable as a write agent acting on behalf of the donating source.
 */


htab_rw(htab_rc& mgr, size_t init);
htab_rw(val_rc& mgr, size_t init);
htab_rw(val_ptr mgr, size_t init);
htab_rw(zval* p, size_t init);

htab_rw(HashTable* h);

Obscure side-effects of setting zval value, especially as value in array.

The write methods of htab_rw, all call a specialist zval bind method, to ensure that particular flags in the zval are unset for immutable values. Otherwise some PHP value deletion code may be run on them, leading to a crash.

These static bind methods are part of the val_ptr class.

For example, the common PHP macro ZVAL_ARR(zval*, HashTable*) sets the type flags in an optimized fashion, using the bit mask IS_ARRAY_EX, not just IS_ARRAY, making it both reference counted and collectable. We don't want this for immutable values. Same idea applies for immutable objects or interned strings. A comment in PHP source zend_types.h "we should never set just Z_TYPE, we should set Z_TYPE_INFO". The u1 member of zval is a union structure, with the size of uint32_t (type_info), the type itself is uint8_t, and type_flags are uint8_t. So the macro Z_TYPE_INFO_P(zval*, int) conveniently sets all the bits of this union.

//should be done on zeroed zval
/**
 * The ZVAL_STR macro now also does the equivalent of this.
 * 
		Z_STR_P(tmp,s);
        if ((GC_FLAGS(s) & IS_STR_INTERNED))
    	{
    		Z_TYPE_INFO_P(tmp, IS_STRING);
    		//Z_TYPE_FLAGS_P(tmp) = 0; 
	    }
	    else {
	    	Z_TYPE_INFO_P(tmp, IS_STRING_EX);
	    }

 */
void val_ptr::array_bind(zval* tmp, HashTable* ht);
void val_ptr::string_bind(zval* tmp, zend_string* s);
void val_ptr::object_bind(zval* temp, zend_object* obj);

Using objects - obj_ptr, obj_rc

The zpp::obj_ptr wrapper class for a zend_object* does the expected. It has methods to call the object's methods, and read or write its properties.

Such an object may be a "special" callable object, such as an anonymous function, or object with an invoke method. Wcc\Services checks for this possibility on any named service. The obj_ptr callable methods exist for zero, one, or two arguments. They use the callable_fn method declared in fn_call.h, which can use any number of arguments.

// Wcc\\Services passes itself to callables as zend_object* first argument
val_rc Services::call_value(obj_ptr callme)
{
	val_rc self(this);
	return callme.callable(self);
}

// in Services::get
val_ptr value;
if (test.isCallable())
{
    obj_ptr callme = test.zobject();
    val_rc result = call_value(callme);
}

The obj_ptr call methods all take a str_ptr as first argument, the method name. Method overrides exist for up to four zval* arguments. Each uses a fn_call_args template internally. The fn_call class also supports passing named arguments as a HashTable*, using its set_named_args(HashTable*) method.

Property values

Property values can be returned one argument to the property method, or set with two arguments. A return zval* argument can be used in property_get.

// zpp::obj_ptr get property methods
val_rc    property(str_ptr key);
zval* 	  property_get(str_ptr key, zval* ret);
zval*     property_ptr(str_ptr key);

// set value
void      property(str_ptr key, val_ptr value);
void      property(str_ptr key, str_ptr value);
void      unset_property(str_ptr name);

// exists?
bool      has_property(str_ptr name);

//! property-values list
htab_rc  properties();

make a self object

Like val_rc, obj_ptr has a constructor with a base_d* argument, which sets it to the zend_object* value held by the base_d* this->self_, which is the PHP $this. For convenience base_d class has a self() method.

obj_ptr self() const { return obj_ptr(self_); }

sharing efficient property access with C++

PHP byte code efficiency of objects depends on having known offsets for class properties. This is behind the move to deprecate use of dynamic properties in objects not specifically tagged as allowing them. Object property names are case sensitive. They are subject to typo errors. Error or warning exceptions occur when trying to set the value of a property name that doesn't exist in an object, when it is not configured for dynamic properties.

The PHP object class data creation C code can be generated from the .stub.php files, including all property names, types and qualifiers. Object instances should be able to get the zval pointer to these property values, and cache them as val_ptr as part of their C++ data, and so have the most efficient read and write to the property value. If the property is public or read-only, PHP also has efficient access to the same value.

Comparing different methods of named property access

Wcc PHP objects in C++ have an example of a dynamic property oriented class - Wcc\Config , and also Wcc\Hmap class, which internally uses a PHP array to store properties.

Accessing properties and array values by string keys is the mainstay of PHP coding. To compare the efficiency of various kinds of PHP property/value access by key name, I created a script that repeated a numeric sum and division ( a + b ) / a, and averaged the time taken over a large number of iterations. The following table shows the kind of entity, and results ratioed two of chosen methods, being just local variables, and object-declared property. Compared are local array with string keys, dynamic property objects, and object properties implemented as array storage, and a call to a function implemented as a method of Wcc\Pair, just for this purpose.

The ratio of averaged times to complete are formatted to 2 decimal places here.

ImplementationX / Local variable (e)X / fixed property access (a)
Local variables (e)10.6
Wcc\Pair (C++) declared properties (a)1.661
EmptyTest (PHP) declared properties (i)1.661
Wcc\Pair call test_calc() (c)1.771.07
Wcc\Pair (C++) methods get (b)3.362.03
Wcc\Config dynamic properties1.781.08
Use local array (g)2.141.29
Extend stdClass (h)3.812.3
Hmap property handler (f)2.921.76
Hmap array handler (m)3.782.28
ArrayObject as [array] (k)4.973
ArrayObject ->Property (j)5.143.11

The results suggest the unsurprising find that using local variables is optimal, and object reference to declared properties take about 67% more time. For any none trivial calculation, a method call to compiled code (c) is good, for a trivial calculation, compared to calling a get method (b) 3 times while doing it locally.

Unknown is if PHP doing any optimisation for a common sub-expression used twice in the looped statements. All of short statement repeat processes are fast, since each of the 12 test functions timed a loop of 1000 iterations, and the whole sequence repeated 1000 times, which makes for 12 million iterations, and on a modern x86_64 processor, which took a few seconds.

Dynamic properties a little bit slower. Property handlers seem faster than array handlers. Except for the "ArrayObject" in its "ARRAY_AS_PROPS" setup, which lags the others, for reasons I cannot explain.

Wcc\Config and Wcc\Hmap are into the Wcc extension using the ZPP / base_d C++ classes, I can at least say for simple uses they are reasonably effective at basic get/set compared to ArrayObject.

For Wcc\Pairs , also in the Wcc extension, 3 function calls takes longer than

The script that produced this table is "tests/pairs.php" from the zphp2.git repository

PHP-ZPP : Transplanting zpp into PHP-CPP

Replacing Zval class

PHP-CPP Value class is a C++ virtual class, inheriting from HashParent. HashParent declares a number of abstract virtual functions. This means that Value objects are virtual function table pointer storage bigger than the zval that it wraps.

The Zval implementation of PHP-CPP is itself not compilable using C++23, because at least one deprecated code feature, is now disallowed. The PHP-CPP Zval contains a mysterious C++ template structure that has been deprecated in earlier versions, and it needs to be replaced.

// std::aligned_storage is not wanted by C++23
using aligned_zval_struct = typename std::aligned_storage<16>::type;
class Zval 
{
private:
    aligned_zval_struct _buffer;

/* Not only this is deprecated, but the Value object employs few boolean tests on the operator types returned to access the _buffer value. The boolean tests are used indicate if the value is zeroed out as in IS_UNDEFINED php type, or has a type. This was confusing when trying to understand the meaning of such code.
*/

The Zval class can be eliminated, which means something else needs to replace it inside the Value class, which is almost its only client. The alternate replacement is zpp::val_rc. This is a non-virtual C++ class that can both do its own reference counting, or receive external adjustments, via a wrap from zpp::val_ptr, or just returning the address of its zval structure.

zpp::val_rc, and its versatile zpp::val_ptr helper already have a large number of methods that can replace the work done on it by the Value wrapper class. This meant changing many Value methods so that it let existing val_rc methods do the work for it. Most important was ensuring the Value destructor did not try to dereference contained Zend Handles, as the val_rc destructor will be doing that.

The transplant went better and quicker than I expected, given this is still in a hacking operation status, after a number of code compiling adjustments. There may be some Value methods that I have underestimated the complexity and intention of its previous code in this transplant, so new bugs may be lurking. Zval had surprising few uses elsewhere in the PHP-CPP framework, which also required code adjustments. Interested people may have to look at the git change history.

The new PHP-CPP build

To distinguish the new code, this could use a new name. PHP-ZPP stands for PHP-"Zend Plus Plus", to indicate the greater lower level seperate Zend Handles Management implied in the zpp namespace classes.

In the zpp source folder, the file zhm.h includes the headers of the zpp namespace now incorporated directly into the PHP-ZPP build. The binary is already shared by the "wccz" extension.

Multithread support, redesign and testing required

This isn't enabled or tested. One current drawback is the state_init embedded functions cannot by used in current form, by a multi-thread enviroment, as they are using static shared memory for their function arguments, and even the zend_object* value passed to the fci structure. These will need to be reworked by using C++ thread local storage, and initialised by a request start handler. This would be over-written at random by multiple threads. The interned strings, and read-only data created at module init time should be Okay.

A preliminary change is to take all the embedded function execution objects and place them in their own structure, and declare an instance of it as "thread_local". All that remains to do is have wrapper functions in the zpp namespace, call the functionmember inside the new thread_local instance, which is done even by the test compiled version which runs as single thread process.

No doubt this adds to complexity, but this is why all extensions are configured and specifically compiled for the PHP version with settings for the environment that they are intended for, thanks to the phpize tool.

In the single thread process, the function objects will be configured by the state_init.init() call, and can be used for multiple requests. In the multithread process, this seems to be the RINIT call, for each thread, and so thread_local function object instances will need to initialized from the state_init.init_req() call, that is for each request. A possible trick to manage this, is to always use the init_req call, but store and check a flag in the thread_local storage block to indicate this thread storage has been initialized.

Test comparison, Wcp\XmlReader vs Wcc\XmlReader.

Time tests of the new version of Wcp\XmlReader, modified from old existing PHP-CPP extension code, from the previously shown comparison, now using PHP-ZPP framework code with zpp classes and techniques is shown below. While re-coding the Wcc\XmlReader code, as in remarks below

Versionms / iterationrelative
PHP script class1,9921.0
PHP-CPP version4,3552.8
PHP-ZPP version2,0241.02
ZPP version1,8190.91

Cached function call objects from zpp::fn_call have more than doubled the performance of this class, from the PHP-CPP to the PHP-ZPP version. These test objects embed the function call object in their class, so no problems should occur in a multithread process. Another place to be useful could be a stack allocated function call object to be used many times in a local loop.

The execution limits of this algorithm are set by the communication gulf to the XMLReader class, which acts as a pull parser for the underlying XML read library. To speed up, a more direct connection to XML parsing code is required.

Value class given push_back method

One shortcoming of Value that I wanted to overcome was to give it an obvious push_back method, corresponding to the packed array append expression in PHP. "$value[] = $extra". The point of the Value wrapper class is to emulate the functions of "dynamic type by assignment" variables of PHP. So that method now exists in PHP-ZPP Value class.

Parameters class is a std::vector<Value>

This class in PHP-CPP lets C++ functions with a fixed function parameter signature be called with a variable number of arguments from PHP. It is called in the class object methods, the ClassImpl::callMethod(INTERNAL_FUNCTION_PARAMETERS).

The Parameters class is an extension of std::vector<Value>. The Value class inherites from HashParent, and so has a pointer to a C++ class virtual function table, as well as its embedded zval structure. This is now a val_rc in PHP-ZPP, instead of Zval structure, even though they are really both a renamed zval.

In ClassImpl::callMethod, the derived ParametersImpl class, std::vector<Value> reserves space for the given number of arguments. It allocates on its call stack, via alocca(), a temporary std::vector<zval>, filled by a call to the zend_get_parameters_array_ex() API function to fetch each of the zend_execute_data arguments array. The arguments from the zend_execute_data structure are copied in a loop into their locations in the vector, emplaced then constructed as Value object, with its extra virtual function table pointer.

A slice is a simple way to copy access to an array, wihout copying all the array elements. Its briefest form is the address of the base of the array, followed by the number of valid elements. A C/C++ array slice is a slice, no matter what structure the address comes from.

Direct read access to the Parameters values can be given via const Value&, which gives access to only those Value methods that are declared as const.

The result of all this fuss is that two loops through copies of the same zval arguments are done. When the Value instances are created they will always try to increment reference counts on any passed zval handle structures, no matter how many times their buffer was copied.

This mechanism also bypasses, such final parameter type checks that PHP functions normally get by using the recommended macros designed for functions parameter parsing. By this standard practise, parameters get a final sanity check, and potentially throw errors, before further use and dispatch.

PHP-ZPP has tossed away the ParametersImpl class, and changed the Parameters class from a std::vector<Value> into a slice of the original zval array in zend_execute_data. This slice from the zend_execute_data buffer, will be present until the function returns.

There was maybe a small in the average improvement in the XmlRead comparison test after this. There is still about 5% performance gap remaining to be squeezed out.

// How to get a Value out of a slice of zend_execute_data
class PHPCPP_EXPORT Parameters : public zpp::zarg_rd {
private:
    /**
     *  The base (C++) object
     *  @var Base
     */
    Base *_object = nullptr;
public:
    Parameters(zend_execute_data* zexd);

    Base *object() const
    {
        return _object;
    }

    Value operator[](size_t index)
    {
        if (index < nargs_)
        {
            return Value(zptr0_ + ix);
        }
        throw Error("Parameters index error");
    }
}

There is a lot of code which uses the Parameters&, so it won't be possible to make a compatible replacement, except to have something with the same name. A replacement would behave something like the zpp::zarg_rd (zend arguments read), which stores a pointer to the zeroth argument in the zend_execute_data, and the number of parameters, and has a number of methods to fetch and check for various required and optional arguments, used according to whatever function expects to recieve. This hopes to emulate the PHP recommended parsing C macros provided for the same purpose, and obtain as direct as possible, values stored in declarations of individual zpp::*_ptr handle type wrapper classes.

So, a new transplant, a new declaration for Parameters. Keep the name, but it constructs itself with a pointer to zend_execute_data, and the pointer to the return_value zval*, and derives its utility entirely from this.

Using an STL allocator for PHP request memory

In using STL classes within the C++ std:: namespace, they are all being allocated using the std::allocator, which is not using the recommended PHP heap for memory allocations during request handling. This could be done, since such an allocator template exists, in the zpp folder. (alloc_phpreq.h) The Wcp\ReadXml class is using it. C++ classes for use during the request processing stage can also specify their operator::new and delete to use emalloc and efree.

class DStack {
    //... Use PHP request memory pool
    void* operator new(std::size_t msize){
        return emalloc(msize);
    }
    void operator delete(void* p){
            efree(p);
    }
};
// where to push and pop pointers to stack objects
std::vector<DStack*, alloc_phpreq<DStack*> > path_; 

Use std::string_view

std::string_view is a good example of a read-only slice. It has useful functions for string comparison, finding and making sub-strings, and can even be used for static memory constants. It never needs to allocate memory. This is one of the advantages of going beyond C++11. A std::string_view can be returned from a zpp::str_ptr, and therefore also indirectly from a zval.

Wcc\XmlRead has a few additional tricks which may be giving it the 5% edge in performance. String comparisons for tag identity uses std::string_view static constants.

Its DStack structure uses pointers for a double linked list, and therefore doesn't allocate an array to make a stack. What change could be made next to Wcp\XmlRead? I think only an internal simplified XML parser would make a significant boost to both implementations.

constexpr std::string_view pdoc_tag = "pdoc";
constexpr std::string_view s_tag = "s";

//...
bool Wcc_XmlRead::tag_start(
      str_ptr tag, 
      str_ptr key)
{
    std::string_view s = tag.vstr();
    if (s == root_tag)
    {
        pushRoot(key);
    }
    else if (s == tb_tag)
    {
        pushTable(XC_TABLE, key);
    }
    else if (s == a_tag)
    {
        pushTable(XC_ARRAY, key);
    }
    else if (s == i_tag)
    {
        setInteger(key);
    }
//...

SQL and Table Model Classes - Wcd namespace

The Wcd namespace is for Database Management. It presents several classes that evolved for handling SQL dialects, database table model classes. The aim is to abstract away details of SQL databases. It has a mix of approaches, seen in some other PHP framework classes. It presented challenges for personal learning about design of such objects that need to usefully work together, and resulted in some C++ additions to the PHP-ZPP framework basics.

Avoid use of C++ std::exception, how to return error strings along with values

PHP uses exception handling, with try - catch and exception class matching. Error conditions that arise in C++ code need to create a PHP exception class. There are a number of zend API methods to call which will "post" an exception, but using them to create the exception information, does not break the flow of code execution. They do not "throw", the way C++ exceptions do , by unwinding the execution stack back to a previously set place in the code. It relies on the Zend interpreter to check for a posted exception after returning from a called object method or function, to do the cleanup work, and interpreter stack unwinding work.

Throwing a C++ style exception that is not caught before returning from the Zend call, will exit script execution. To have controlled C++ exceptions would mean wrapping all suspect exception throwing function in a try - catch, and then using the captured information to post a Zend exception.

Initially the Wcd C++ classes had some ackward C++ exception throws, which were eventually rooted out, and replaced with return value structure that also contains an optional pointer to an error buffer.

The "bireturn" type

The root class is struct error_return, defined in zpp/bireturn.h.

namespace zpp {

struct error_return {
	str_buf*  errors_;

	error_return() : errors_(nullptr) {}
    bool has_errors() const { return (errors_); }
    bool     throw_errors(const char* fncstr = nullptr);
    str_buf& error();
    //...
};

template <typename T>
struct bireturn  : public error_return 
{
	T   	  value_;

	bireturn() : error_return()
	{
	}
};

//...
bool 
error_return::throw_errors(const char* fncstr)
{
	if (errors_)
	{
		*errors_ << "\n*** ERROR " << fncstr;
		str_rc s = errors_->zstr();

		zend_throw_error(zend_ce_error,"%s", s.data());
		delete errors_;
		errors_ = nullptr;
		return true;
	}
	return false;
}

There is more, but the idea is all functions that call a function that returns some form of error_return, must check if it "has_errors()". It must either stop the propagation, or must move the errors into its own form of error_return. The Zend function or method, must be the final check on any type bireturn, and always check "has_errors" or call throw_errors(), which will setup a final Zend exception posting with a message, if errors_ is not a nullptr.

This is of course more coding work than throwing and catching exceptions. The execution inefficiency of returning a more complex type can be offset by simple coding style to allow "return value copy ellision", where one return value is declared inside the function body, and C++ pushes a pointer to the recieving value, for return value optimization.

This mechanism could be made more intricate. For instance a stack history representation could be linked. This seems fine as way of removing the need for C++ exception handling. The errors are mainly used to indicate development - coding errors, for strategic correction.

WeakReference - zpp::weak_ref

PHP has a memory garbage collection to clean up circular reference object cycles. This can be helped by using a "WeakReference" class. This enables other classes to hold a reference to an object, without being counted in the objects reference count.

This was deemed useful for the Wcd\IDriver class. Most classes can hold a WeakReference to it. When the actual IDriver class object is required, a temporary object variable is obtained by a call to the WeakReference "get()",

A garbage collection problem in the class relations for Wcd, is that the IDriver class manages an array of instances of database model classes, and each of this keeps a "Builder" class that keeps a reference to IDriver, which makes a full circle of object references. PHP garbage collection is said to be able to detect such cycles, but this will be aanan execution cost. To avoid undoing the effort creating the origin design, it seemed easier to make most stored references to IDriver to be a WeakReference.

IDriver carries its own WeakReference instance, without making another cycle. Weakreference is also a normal reference counted PHP class object, so on WeakReference object can be shared around for every IDriver instance.

WeakReference became useful, especially for enabling good memory cleanup when using a debug compiled version of PHP. A C++ wrapper "zpp::weak_ref" is derived from "zpp::obj_rc", and zpp::zarg_rd has a "weakref" method to initialize it from Zend function arguments held in "zend_execute_data".

This makes it worthwhile to use a "fn_call" object to cache the call information for "weakref::create", and also the get() method of a WeakReference object. The final result is two access functions.

// In zpp/fn_call.h
    obj_rc weakref_create(obj_ptr obj);
    obj_rc weakref_get(obj_ptr wref);

Testing weakref_get, made a discovery that the set_obj method of fn_call needed to update the zend_object* property object of not only the zend_fcall_info, but also the zend_fcall_info_cache, otherwise the cached object property from the first call would continue to be used.

// In zpp/fn_call.cpp
void 
fn_call::set_obj(zend_object* obj)
{
    //TODO: ?Why requires object to be set in both structures.
    //On first call the cache_ object value is set by PHP.
    
    fci_.object = obj;
    cache_.object = obj;
}