Building a zend_object class
Memory layout of extension objects in PHP-CPP
It is worth discussing the PHP-CPP implementation. PHP-CPP has been a good source of implementation ideas and tested code. How can it be improved?
The author forked PHP-CPP and tried to hack it, This was a good strategy for learning its details. Some features of seem clunky, such as using C++ multiple inheritance, and exceptions.
PHP-CPP utilizes a Php::Base object, as the recommended C++ parent of PHP extension class objects. (include/base.h) Php::Base is a clean C++ object, that only has one member, a private pointer another C++ class, ObjectImpl*.
In the PHP-CPP source, ObjectImpl class is found in zend/objectimpl.h. It internally uses a pointer to a MixedObject structure, that contains a zend_object structure itself, and a pointer to its ObjectImpl, stored at the first negative pointer offset from the zend_object structure.
Storing custom data at negative offsets to the beginning of the is the recommended way to customize zend objects. Below I show the MixedObject declaration shown outside of ObjectImpl for clarity.
In this design, the C++ base can be independent of, but can find its zend_object, and the zend_object can find the Base C++ implementation, through the middleman ObjectImpl.
// PHP-CPP object creation
// zend/objectimpl.h
// Connect a zend_object to a C++ object.
using namespace Php;
struct MixedObject
{
ObjectImpl *self;
zend_object php;
};
class ObjectImpl
{
MixedObject *_mixed,
std::unique_ptr<Base> _object;
//class functions
};
class Base {
ObjectImpl *_impl = nullptr;
}
To create a fully fledged PHP-CPP zend_object instance requires three memory allocations. ObjectImpl is allocated first, and ObjectImpl constructor then creates the MixedObject with the zend_object, and stores the pointer to itself. Only after this can a newly allocated C++ class implementation be given a pointer to ObjectImpl.
This pattern also means that the Base object instance can be created, without its zend_object existance, this being optional. I don't know why this may be useful, but it is possible to have classes derived from Base object, which do not become zend_object instances.
In zend\object.cpp, an Object Value constructor, is given PHP class name, and a Base instance, and creates the ObjectImpl, giving visible existance to the Zend engine as a callable object from PHP. It implies freedom to use different Base class implementations of same zend_object interface. This is correct.
All the different C++ class instances derived from Base can create a new ObjectImpl instance to connect themselves to a zend_object instance. Each ObjectImpl, after creating a standardized zend_object instance for C++ Base-derived objects, provides a bi-directional pointer access for the two kinds of object.
The standard ObjectImpl - zend_object constructor, requires a zend_class_entry*, from the PHP Zend engine. The zend structure, zend_class_entry*, handles class-specific data, such as class name and function handlers, that provide zend_object identity and behaviour.
A zend_class_entry* requires that all the functions that the zend_object provides to PHP scripts, standard property tables and more, have been registered with the PHP zend engine. The zend_class_entry* can be fetched using the PHP class name.
In the Perforce-Zend guide, as in PHP internals books, also online, it is also recommended to place custom C-data below (negative offsets from) the zend_object structure, which then can be created using a single memory allocation. C-Data locations are at negative offsets from the memory address of the beginning of the zend_object. All the extensions distributed with the PHP source do this. PHP managed data is on the other side.
PHP objects are created during the request handling state, and use variants of the ecalloc function. Its request allocated memory functions all start with the letter e, eg emalloc, efree. All request handling memory is freed at after the request has been handled and a response returned.
It isn't explicitly stated anywhere I have yet found in PHP-CPP documentation, but all its other memory allocations, the ObjectImpl, and Base, are using C++ standard library memory allocators. And hopefully freed as well.
The PHP-CPP design effectively separates the memory allocator worlds of C++ and the PHP zend engine, which makes it easier to use other C++ code, standard template library allocators.
Object Memory layout and object management for zpp
ZPP objects contain the entire C++ object in space available below the zend_object structure.
The object memory layout code is found in the files zpp/base.*
The zpp::base_d class uses its placement new operator to store a pointer plus the entire C++ object below the zend_object. In the allocated block, the zend_object structure sits on top of each C++ class instance, and a pointer to it. A single allocation is done for the combined storage with a call to zend_object_alloc.
Other design possibilities were storing a pointer to the base_obj_mgr
template<typename T>
T* zobj_toc(zend_object* zobj)
{
return *(((T**)(zobj))-1);
}
Going the other way, getting the zend_object* from a base_d class should be as simple, except that inheriting classes will have various increases in size. The simplest approach is to store the zend_object* value as the first protected member of base_d.
//Wcc - include wc_base.h
class base_d {
protected:
zend_object* self_;
public:
// for derived classes
};
Each base_d class expects to be part of a zend_object in usage.
base_d objects cannot safely allocating base_d objects on the stack. There should be no reasons for copying or moving them around in memory, as can be done with simpler C++ classe and structures. To become part of zend_object, wierdly allocated base_d classes would need to be copied into the instance inside its zend_object class it belongs to.
```C++
// part of base_obj_mgr<T>
// typedef base_obj_mgr<T> mydef;
// Create new zpp::obj_rc instance
static obj_rc new_zobj()
{
obj_rc result;
result.adopt(mydef::znew_ex(class_entry_));
return result;
}
Each static instance of base_obj_mgr<T> must be configured with its classEntry(zend_class_entry*) method during the calls to individual PHP_MINIT_FUNCTION(my_class_reg).
template< typename T >
class base_obj_mgr : public mgr_link {
/**
* At module init time, one is instantiated for each class T.
*/
public:
typedef base_obj_mgr<T> mydef;
static zend_class_entry* class_entry_;
static zend_object_handlers handlers_;
static base_obj_mgr<T>* self_;
static size_t self_count_;
static size_t obj_count_;
// more
};
For both Zend single and multi-threaded model, during module initialize, these setup their static memory which then becomes read-only during MINIT, and they create dynamic classes with writeable memory during request initialization - RINIT.
So the first 3 members of base_obj_mgr are maybe thread safe, as the base_obj_mgr class instance is created in the PHP_MINIT_FUNCTION, as called from the extensions main PHP_MINIT_FUNCTION. The zend_class_entry* is created by a register_class_<Namespaced_ClassName>. This function and registration information is generated by a script that reads a _stub.php file.
The static obj_count_ function, which I use for debugging to ensure all objects get freed at request end, could be an issue for threaded requests. Currently this is excised for non-debug code.
// In services.h
class Services : base_d {
//...
public:
// define a template type manager for my class
static base_obj_mgr<Services> omg;
};
// In services.cpp
base_obj_mgr<Services Services::omg;
// Typical PHP extension class initialize in Wcc.
// From wc_services.cpp : create zend_class_entry* from class registration file \_arginfo.h
PHP_MINIT_FUNCTION(wc_services_md)
{
auto ce = register_class_Wcc_Services();
Services::omg.classEntry(ce);
return SUCCESS;
}
The PHP_MINIT_FUNCTION is a C macro, which generates a function name and arguments from its text argument. In the main module initialize function, is a matching call to this function, generated by a similar C-macro.
PHP_MINIT(wc_services_md)(INIT_FUNC_ARGS_PASSTHRU);
Make an object instance
Object instance construction is done by a templated static function of base_obj_mgr. This function knows the class name and thereby its required size, and does a complicated calculation to allocate a memory block, with the C++ object at the start, with its zend_object* stored its first address, and a pointer to C++ object start stored just after the end of the C++ object, after which begins the standard structure of the zend_object, which is what the zend_object* points to.
The C++ object inherits from the base_d class, which has a virtual destructor, and comes with few virtual functions already.
In a table form
| Label | Item | What? |
|---|---|---|
| 0 | C++ object (base_d) | zend_object* (1a) |
| ? | vtabptr_ | C++ Compilers inserts a virtual function table pointer |
| derived classes add C++ members |
| 1 | Pointer to 0 | C++ object* | | 1a | zend_object | PHP managed | | | object handlers, | | property values,
In each templated singleton, member functions access their own set of static data members, including a zend_class_entry* value which is allocated by the class registration function in module initization.
The memory layout is exactly described by the placement new allocator. Deallocation is managed by the zend_object life cycle.
/**
the easiest way to create a new instance is from the
object manager class new_zobj()
eg obj_rc my_object = Services::omg.new_zobj();
class base_d has a custom operator new.
void* operator new(std::size_t msize, zend_class_entry *ce)
which sets up the above allocation.
*/
static obj_rc new_zobj()
{
// setup object with handlers
obj_rc result;
result.adopt(mydef::make_new());
//showobj("new_zobj()", result);
return result;
}
Customize zend object handlers, using a derived base_obj_mgr
The behaviour of zend_object is modified according function handlers table, pointed to by its zend_class_entry. This means that any custom version of base_obj_mgr
An example is found in the hmap.* for the HMap class, to install custom property handlers. This is done when the static class manager object is setup, that is once only at class registration time. Each created zend_object of a kind, gets the same pointer to this table.
Below we hijack the handlers get_debug_info function, and divert it to call base_d objects virtual function, debug_info, so such objects can fill a HashTable with a list of property names and values.
virtual void init_class_fn()
{
class_entry_->create_object = mydef::znew_ex;
// std_object_handlers is somewhere in PHP
memcpy(&handlers_, &std_object_handlers, sizeof(zend_object_handlers));
handlers_.offset = sizeof(T) + sizeof(base_d*);
handlers_.get_debug_info = mydef::base_debug_info; // can be set later?
handlers_.clone_obj = nullptr; //cloning not supported
handlers_.dtor_obj = zend_objects_destroy_object;
handlers_.free_obj = mydef::z_free;
}
Object methods registration
The PHP source release provides a means to set up a default development environment for an extension, and provides a file with the file name type of "
So in the MINIT_FUNCTION scrap above, the function "register_class_Wcc_Services" exists in an _arginfo.h file which was generated from a stub.php, containing a namespace Wcc and a class declaration of Services, with all its public methods. Most all the standard PHP extensions use this feature. It takes away the tedious, error prone work of hand-coding the registration methods, and is thoroughly recommended.
PHP-CPP does not seem to make use of the stub.php to _arginfo.h generation, and requires some manual coding of a registration function for each class. It has its own PHP compatible data structures to hold object registration information, and uses some class handler functions connect these to the PHP engine. This has some declaration limitations derived from the limits of older PHP versions.
Call base_d class methods from PHP C function declarations.
The PHP zend_object method calls are declared and registered from the _arginfo.h generated file(s). The following pattern is used to implement the C function which calls a method of a base_d derived class.
/**
* Implement Wcc_Services method in wpp.sub.php
*
* public function setObject(object $obj, string|null $key = null) : object {}
*/
// ZEND_METHOD macro generates the function header
ZEND_METHOD(Wcc_Services, setObject)
{
zval* obj;
zend_string* skey = nullptr;
// minimum and maximum parameters
// parameter type check macros
ZEND_PARSE_PARAMETERS_START(1, 2)
Z_PARAM_OBJECT(obj)
Z_PARAM_OPTIONAL
Z_PARAM_STR(skey)
ZEND_PARSE_PARAMETERS_END();
Wcc_Services* svc = zval_toc<Wcc_Services>(ZEND_THIS);
zval_own result = svc->setObject(obj, skey);
result.move_zv(return_value);
}
In this not very complex implementation, of the "setObject" method of Wcc_Services, the passed zend_object* is stored in an array (PHP HashTable) accessed by a zend_string key, usually its class name. The design aim, for good or ill, is to store and retrieve an object instance by class name, in the Wcc_Services object.
Why doesn't this C++ object use an STL collection for greater efficiency? Well it could be done that way, but might increase the overhead of information exchange between PHP and the C++ classes, and would increase the memory size of the code, and an efficiency gain here may be not significant. This extension framework aims for PHP inter-operability, and PHP internals reuse, by using some lightweight C++ wrapper classes around common zend data structures and pointers to them.
Briefly, the return value, zval_own, is a C++ wrapper class around a PHP zval structure. It is one of small suite of light-weight C++ wrapper objects around the most used data structures in the PHP software. The PHP structures are the string (zend_string), HashTable, object (zend_object), and the zval structure which can contain any one of them. There are a lot of other PHP types that can be contained in zval structure. Scaler values include long integer and floating point double type, and boolean true or false values.
The zval_ptr is a C++ wrapper around a pointer to a zval structure. Both zval_own and zval_ptr have useful functions for interoperability. The _own suffix of the class name indicates ownership using the structures PHP reference counting is done. The _ptr suffix of the class name indicates that no ownership reference counting is done. No-reference counting versions are safe to use, if the execution flow of program indicates that ownership is already managed safely.
A zend_string is a PHP string with built in reference counting. The C++ wrapper classes are zstr_ptr, and zstr_own. The class member instances_ is a htab_own class, a C++ wrapper around a PHP HashTable pointer. Its non-reference counting version is a htab_ptr. C++ operator[] is only used for a read operation from the htab_ptr/htab_own wrapper. I find that setting up write mechanism functions for the operator[] a too much more unnecessary complication.
zval_own
Wcc_Services::setObject(zval_ptr obj, zend_string* key)
{
if (!key) {
// get class name of object
key = obj.className();
}
instances_.set(key, obj.ptr());
return obj;
}
The PHP data wrappers will get some more discussion and documentation to help usage in the next chapter.