C++ classes for PHP extensions

This book introduces a suite of C++ classes for building PHP extensions. These are contained in the namespace zpp.

The ZPP classes wrap frequently used internal Zend PHP structures from their pointers. There is also a zval wrapper, and zval* wrapper class. Methods and operators exist to automatically handle parameter passing, reference counting, assignment, for easier implementation of PHP class and function methods in C++.

There are two namespaces of a suite of integrated class examples, based on an existing custom content management system framework classes written in PHP. These examples may assist in finding ways to build or adapt your own classes using the zpp architecture.

For some small number of tasks, a few of the PHP classes required functions with environment restrictions are too awkward, or not available from the C++ PHP extension environment. Fortunately PHP can easily provides the means to create small helper classes to handle tricks only available to the PHP byte-code compiler within a scripted function. And these can be easily called back from a C++ extension. The PHP code of these will be provided in this book.

For more examples, there is an integrated suite of custom classes, C++ implementations of same API of PHP classes, that manage database operations, and SQL construction. These are in the namespace wcd.

ZPP - wrap Zend PHP Pointers

Writing a PHP extension in C or in C++ is a specialized programming task. The PHP api is extensive, its source code uses C programming language, with an extensive use of C- language macros. Which means time spent examining them "at depth" and the C structures they operate on, to confirm what code they cause to be compiled.

The C++ api here wraps several of the C API's for reference counted PHP values. The most common value types are Scalar values of Integers and Double floating point, boolean, stored entirely in a zval structure, and reference counted pointer values for variable length structures of Strings, Arrays and the organization of Php Objects that may use all of these.

Reference counted PHP entities managed by a simple "smart pointer" C++ wrapper are the zend_object, HashTable, and zend_string, as well as a pointer to zval structure, that may require reference count management of contents.

Useful zpp classes

All of these are wrapped in a namespace "zpp". Its for the zend engine, and many of its namespace functions and structures start with "z". So why not? Everything here assumes 64-bit sized pointers.

Wrapper classes PHP structures *_ptr, *_rc.

Not RCRCWrapped C API typesize(bytes)
obj_ptrobj_rczend_object*4
htab_ptrhtab_rcHashTable*4
str_ptrstr_rczend_string*4
val_ptrzval*4
val_rczval8

The top 3 classes here have a PHP api pointer as a member, and each of these 3 in second column, *_rc inherits from the corresponding non-reference counted class *_ptr in the first column. To this they add the reference counting of the enclosed pointer. None of these classes use virtual functions, so their is no use of polymorphism. Using virtual methods would add a hidden virtual function table pointer to the class, which would remove an advantage of being the same instance size as the wrapped pointer type. Standard C++ constructor, destructor and operations take care of any requied refence count changes.

The 3 specialized value classes can all be constructed or assigned from a zval* pointer, or its wrapper class val_ptr, and can be safely declared as members of a C++ class, or as stack variables in function code.

The reference counted structures have in common the first part of a "zend_refcounted_h", a combination of 4-byte integer refcount, and 4-byte type_info.

Also reference counted is the class val_rc, which encloses and initializes a zval, and takes the size of two pointers. This common Zend structure is the PHP varient record with a typeid that can hold many things. The"zval" (\zval_struct), is a 16-byte structure, partitioned to hold data and type information. The first 8 byte value is the data which can be a 64-bit long or floating point, or a pointer to a bigger structure. The second 8 byte is divided to identify type of the data, plus various flags and space reserved for the Zend engine execution management.

Special Helper classes

Additional zpp classes to help inside the PHP execution environments.

classPurpose
base_dParent class for C++ classes that are embedded inside a zend_object
base_obj_mgrParent class to manage PHP class instances with a base_d
datetime_objAn obj_rc for Datetime object
dt_intervalAn obj_rc for DateInterval object
fn_callBase class for calling PHP functions
fn_call_argsTemplate class to call PHP functions with N parameters
for_key_valueIterate keys, values and index of HashTable*.
htab_rwUpdate, Append or delete HashTable*, after doing copy on write.
htab_walkStep iteration of keys and values of HashTable*
pregMethods to manipute strings using using PCRE regular expression
str_bufStream text to PHP API "zend_smart_str"
str_internzend_string made as "interned"
str_outStream text to PHP output
str_permzend_string allocated in permanent memory
str_tempzend_string for throwaway
state_initSetup structures for interned strings and constants during module initialization
timezone_objAn obj_rc for TimeZone object
zarg_rdConfirm and transfer PHP function arguments from zend_execute_data

Where to use *_ptr and *_rc classes

The reference counting (RC) class variants are for object member storage, and creating or deleting instances of data. They are used when reference counting operations are required. For instance, returning a newly created value from a function. The none-reference counting wrappers (NRC) are for passing the pointers around, where reference counting is unnecessary. This includes arguments passed to C++ functions and methods, as object data is considered to have secure ownership in the calling environment. All classes have a constructor method for a zval*. Only *_rc classes will increment a reference count.

The *_ptr classes are for methods and functions that do not require changing the reference count of their passed argument. If the same functions where available for the val_ptr, or val_rc, frequent type retrieval and test would be required to check for an appropriate PHP handle type. Instead they are used directly in available methods appropriate to their type.

These classes have been only tested on PHP 8.2 and above. They certainly won't be useful with PHP versions before 7

The non-reference *_ptr classes useful as parameters passed by functions. This makes them useful for passing arguments from zval* to methods with *_ptr type parameters. They are useful for returning access to C++ class objects members, where a reference counting change will not occur. In returning a referenced handle as result of a registered zend function, the return_value parameter is a pointer to a zval, wherein the rewturned value must be set, and a reference count increment is required, as expected by the Zend interpreter.

In cases where a new value is created and returned by a method or function, specific *_rc types can be returned, where this indicates that function stack values have been erased, to keep the reference count. If multiple types can be returned, a val_rc structure is returned. All of the *_rc types have C++ move operators declared as &&, and the compiler will likely perform Return Value Optimisation.

Setting the return_value

All of the wrapper classes have 2 available functions which need to be called to return a value back to a PHP script function or object method call, to the "return_value" zval pointer.

// What Zend passes to all declared PHP C functions and methods.
#define INTERNAL_FUNCTION_PARAMETERS zend_execute_data *execute_data, zval *return_value

A function or method call from the PHP Zend Engine, is passed a pointer to a C structure "zend_execute_data". The number of parameters passed is accessed with C macro ZEND_CALL_NUM_ARGS(execute_data), and the zend_object* if a method call, is accessed by C macro ZEND_THIS which resolves to execute_data->This.

There is a large number of C API macros, and alternates to access and transfer parameters passed in the zend_execute_data structure to somewhere useful. A commonly used set of them start with ZEND_PARSE_PARAMETERS_START, or ZEND_PARSE_PARAMETERS_NONE. These test for parameter number and type compliance, and throw PHP defined exceptions when compliance fails.

In the more recent created classes of this code library I used a C++ class created with methods to replace the ZEND_PARSE_* macros.

These are the methods of zpp::zarg_rd class. Its methods take a reference to one of the above *_ptr reciever classes, and a zval* returned by the method need(size_t ix) or option(size_t ix) call, as an index into the execute_data array of zval parameters. It is a zero-based "slice" from the address of the zend_execute_data zval arguments.

As this class is new and different, it provides different error handling and error messages to ZEND_PARSE_* and C-macro helpers.

Returning values back to PHP

The second parameter of the INTERNAL_FUNCTION_PARAMETERS is the return_value, pointer to zval. By convention, each *_ptr class should use this method which will try to increment its reference count.

void return_zv(zval* ret) const;
/* For example, this does a reference counted copy to the zval*, even though obj_ptr is otherwise not involved in reference counting. If it didn't do this, PHP will "dissappear" the object passed to it. The recieving zval takes reference counting responsibility, not the obj_ptr.

This is most often used if the obj_ptr class is in the scope of the ZEND_FUNCTION body.
*/
void 
obj_ptr::return_zv(zval* ret) const
{
    if (obj_)
        ZVAL_OBJ_COPY(ret, obj_); // set and bump reference count
    else
        ZVAL_NULL(ret);
}

Instances of returned *_rc class, in ZEND_FUNCTION body, need to use their move_zv method and pass it the return_value pointer. This transfers the referenced value to the return_value, and sets its own valpointerue to a nullptr, which prevents the destructor performing a try decrement reference. This is a move operation, such that it makes no change in reference count of the copied pointer. The return_value keeps incremented reference count that belonged to _rc class.

void move_zv(zval* ret);
/* For example, this obj_rc already bumped the reference count on construction or assignment. It is therefor already counted. To ensure it does not decrement the reference count on object destruction,
after moving the pointer to the zval, its copy of the pointer is nulled. The zval is set with a macro that does not change the reference count. The ownership of the reference count is "moved".
Cannot call this with a const <_rc>& 
*/

void 
obj_rc::move_zv(zval* ret)
{
  if (obj_)
  {
    ZVAL_OBJ(ret, obj_); 
    //Z_TYPE_FLAGS_P(ret) = 0; // Not allowed to dereference
    obj_ = nullptr;// give up ownership privilege
  }
  else {
    ZVAL_NULL(ret);
  } 
}

To be sure, *_rc classes reimplement return_zv (without the descriminate declaration const) to do exactly the same as its move_zv method.

Reading values in HashTables, and function parameters.

The val_ptr is useful in checking the value returned from array read methods of htab_ptr. All of the wrapped zend_hash read methods return a zval* which is either the address of its stored zval, or a nullptr value.

"Not Found" is represented by nullptr zval* result. Otherwise the returned zval* is a pointer to the values internal storage. All of the val_ptr class methods check for a nullptr.

As the PHP array can be indexed by strings, integers, and arbitrarily some of each, the most used method signatures for "get" methods returning a zval* have argument types of zend_long, aned_string*, str_ptr.

For convenience of having useful *_ptr access functions, obj_rc inherits from obj_ptr, htab_rc inherits from htab_ptr, and str_rc inherits from str_ptr. This inheritance does not imply that any of other kind of polymorphism is catered for.

Calling PHP functions.

The val_rc class wraps the zval structure. It has no inheritance. It is useful for settling up arguments for callbacks to PHP functions. Therefore it has a lot of constructor and assignment methods take all of the other classes and also raw PHP pointers.

Calling PHP functions requires setting up a zval arguments block, and passing a pointer to the first one, and the number of arguments. This is a part of the "zend_fcall_info" structure, which also needs a function name string, a zval to store the result, and optionally a zend_object* for a method call. There is also a way to pass named parameters with a HashTable*.

The basics of function calling are handled by the class "zpp::fn_call". This allows for a no-parameters function call. It holds a val_rc class to store the result, a "zend_fcall_info" and a "zend_fcall_info_cache" structure. Repeated calls for the same function/method presumably use the cached data to speed up the process.

A templated class of this for the number of parameters as a template argument. Users of this use the argsptr() method to wipe the array and get a pointer to its start.

The execution of the call returns a val_rc&&, and the other *_rc classes have constructor and assignment operators to take this.

Tricky memory for function call objects in state_init

The function call objects stored in state_init structures did have a tricky issue that I felt needed to be handled.

The fn_call structure returned the result by a move operator from its internal val_rc result member. If this is embedded in static memory through a state_init instance, and no result result assignment occurred to C++ class operator method, the memory could be left hanging inside the function object.

The move assignment was chosen to have less reference counting ups and downs. Now it has been changed as returning a val_rc, and C++ return value optimisation may some of the desired efficiency.