HashTable* wrap, class zpp::htab_ptr
C++ classes of an extension will be transferring data to and from the zend_object and zend_array world.
Behind the PHP array class is the zend_array structure, which also aliased by the name HashTable. This is storage structure of considerable sophistication, and for most purposes is a set and fetch store of zval structures, indexed either by string hash, or by integer indexes.
HashTable manipulation is managed by the C++ classes of htab_ptr, to wrap a HashTable* with common methods, and its derived class htab_own, which adds methods that can change its reference count. Php does "Copy on Write" - COW, to ensure that only HashTables that have a reference count of 1 can be updated. To avoid warnings and exceptions from the zend API, C++ classes have to respect this.
htab_ptr
- Is constructed by giving it a valid HashTable* from elsewhere
- Never bothers to touch the HashTable reference count
- Has many functions to fetch or set values, creating a zval if necessary.
htab_own
- Inherits methods from htab_ptr
- Additional methods to get, duplicate for COW, transfer or abandon HashTable* ownership with reference count manipulation.
This is a self-reminder that the source currently has very little in the way of code comments to explain how,why, what of various methods.
Constructors
htab_ptr gets its only data member a HashTable* , from any other wrapper class that might have one. The default constructor assigns a nullptr. This means a bool isNull() method exists to check that. Other sources are
- A bare HashTable*
- zval_own class
- zval_ptr class
- zval* pointer, usually from PHP land
- Another htab_ptr class.
htab_ptr() : ht_(nullptr) {}
htab_ptr(HashTable* ht);
htab_ptr(const zval_own& zw);
htab_ptr(const zval_ptr& zptr);
htab_ptr(zval* p);
htab_ptr(const htab_ptr& c) : ht_(c.ht_) {}
Has an assignment operator= from a zval*.
Pointer access
htab_ptr has a cast-operator () to a Hashtable* of course, and a ptr() method also returning it for redundancy.
No isset
No isset, because that is a special PHP function. The methods has_index, and has_key are provided.
bool has_index(zend_long key);
bool has_key(zend_string* skey);
// It is assumed zval_ptr has a zend_string , or zend_long value.
bool has_key(zval_ptr skey);
Append methods
// Append string value, or zval* with anything, to next free integer index.
void push_back(zend_string* zs);
void push_back(const char* s, std::size_t slen);
void push_back(zval* zv);
Set or Unset by string key methods
There are a combinatorial number of these, with different ways to pass a string key, and a value source. Almost N^2^. There are far many more zend API C functions for setting an array value. They are all called set. Another version was added every time the need was felt. Unset at least only needs one argument.
void set(zend_string* key, zval* val);
void set(zend_long idx, zval_own& value);
//...etc
bool unset(zend_long idx);
bool unset(zend_string* key);
bool unset(zval_own& key);
void unset(zval_ptr key);
Get or Array operator
The array operator calls the equivalent get function to do the job. The get methods were done first, and operator[] added for cuteness. They all return a zval*, since this is what the zend API functions return.
zval* get(zend_long idx) const;
zval* get(zval_ptr key) const;
zval* get(zend_string* zkey) const;
//...etc
try and fetch
All of these return a bool (true or false) for the success of the lookup and assign on success operation. There are a few more, like this. Note that none of this arguments for string key use any kind of reference to a zstr_ptr or zstr_own class, because C++ will automatically use their cast operator to push a zend_string*.
bool try_fetch(zval_own& key, zval_ptr& store);
bool try_fetch(zend_string* key, zval_own& store);
bool try_fetch(zend_string* key, zval_ptr& store);
//...
Special assign to zval* return_value in PHP object method implementation.
Even though the htab_ptr doesn't care about reference counts, this is the one exception. When returning an array, as the PHP method return value, the zval* will be assigned the HashTable, and is then told to bump its reference count, which was at least 1 already.
This might make the API complain if another attempt to write to the HashTable is made. Due to copy-on-write practices for zend_array, warnings or exceptions will happen if its reference count is greater than 1. PHP normally ensures this by making a copy on write, but the above functions do not do this, and successful updates depend on (HashTable* reference count == 1).
void htab_ptr::return_zv(zval* return_value) const
{
ZVAL_ARR(return_value, ht_);
Z_TRY_ADDREF_P(return_value);
ht_ = nullptr;
}
Getting out of zval land.
Several important zval types, in particular zend_string, zend_object, zend_array, zend_reference, hold a pointer to a structure that starts with -
zend_refcounted_h gc;
A zval structure, because its a union type, a can only reference or hold one type at a time. That means an all-purpose zval wrapper, which is trying to be an interface for all common types, has to have a lot of methods, and type-specific methods have to perform type checks to see if they apply to the current stored type. The PHP-CPP Php::Value type is its general purpose zval wrapper, and it has methods galore to do many things. Php::Value has derived Array and Object classes, which add a few special purpose functions. There are a number of methods to work with C++ STL datatypes.
In the PHP-CPP source is an internal String class defined in zend/string.h. This is specialized for generating persistant zend_string values. This is used as an adjunct to the using of Php::Value as the basis of all PHP value structure management. This is not a surprise, given that scripting variables all have the _struct_zend_value
On the whole the PHP-CPP either expects programmers to work with raw zend API for its reference counted types, or to be using C++ data and STL types outside of the Value wrapper. Depending on what the extension is for, minimizing the time working with the Value abstraction, and minizing callbacks to PHP script-land, is a goal. Working too much with the Value abstraction of a zval, and PHP callbacks are therefore labelled as inefficient for those expecting some performance gains from a compiled C++ extension. There is truth in this, given the ease of script programming versus time spent coding C++ extensions.
Array operators []
Programmers expect to use lovely array brackets for array indexes. It is easy enough to define these in terms of an inline call to the get function. However a common return type needs to be settled on, as C++, does not allow multiple declared functions with different return types but everything else the same. I choose the return type to be zval_ptr class, which of course just contains a zval* pointer, as this is always returned by the PHP zend_array fetch API calls, and can be a nullptr value.
// inline operators
zval_ptr operator[](zend_long idx) const { return get(idx); }
zval_ptr operator[](zend_string* zkey) const { return get(zkey); }
The get functions return a nullptr into the zval_ptr class, if the key does not find a stored value.
In order to set a PHP value via C++, the plain set method is the most direct. For string keys and a direct zval* the following implementation is used.
void htab_ptr::set(zend_string* key, zval* val)
{
if (zend_hash_update(ht_, key, val))
{
Z_TRY_ADDREF(*val);
}
}
The zend_hash_update function is for this purpose an add or update function, returns a pointer to the actual zval structure as where the data has been stored in the array. This is very unlikely to be the original zval* passed as an argument to ::set.
But the returned zval* should contain the same data. The contents have been already copied using the ZVAL_COPY_VALUE macro, which does nothing about reference counting. If zend_hash_upate returns a non-null value, the add or update to the HashTable has been successful, and the returned zval* must be holding the same referenced structure, that is held in the original zval* val, so either can be used to have its reference count incremented.
The zend API also provides functions that will only add, if a value does not already exist, and only update, providing a value already exists.
To use the returned zval* by zend_hash_update, in some form of override operator[] for set, would require further work. For instance, imagine that the hash_update call just sets a null value initially, then a write operator[] uses the returned storage zval address to be assigned the final intended value. It should be possible to provide C++ notational illusion of array[key] = value. It means extra coding to provide this the more direct set methods, and I do not see this as worth the extra coding and computational effort.
The C++ brackets assignment operator works well where the storage already exists prior to the method call, as in fixed arrays and matrices. In PHP the storage in the HashTable does not exist until the zend_hash_update returns the pointer to it, and has already updated it once. There is nothing I know in the design of C++ that easily rewrites a r-value to brackets l-value assignment, as a simple call to our set function, that already does the job.
PHP-CPP HashMember virtual function call chain
The design of the PHP-CPP array brackets operator returning the template HashMember is too complex for me to understand, just why it is so. In actual code, Value objects use the operator[] for read and write for keyed array or object property values.
These operators work their magic via returning a templated HashMember structure, which has a base class with many virtual functions. The parent class is HashParent, and Value itself inherits from HashParent. So a call to set via operator[] eventually calls a virtual method of Value to call set, which will call setRaw. HashMember has many C++ style override operators.
Its templated, so a compiler might only compile and link whatever was used. There is no use of zend_hash_update anywhere in the PHP-CPP code. Nevermind, there are a lot of zend array access functions, available in all flavours.
For the write operator[], PHP::Value returns an entire HashMember structure, containing a pointer to itself, and the value of the key for the intended write. This structure has assignment operator access for whatever value is being assigned, and the assignment operator calls back the Value class to do the actual array update operation.
So in PHP-CPP the final HashTable write calls are routed into Value::setRaw(), which is overridden with either and integer or string key. The string version calls add_assoc_zval_ex, which calls zend_symtable_str_update. This function checks if a string is numeric, and tries to convert it to a zend_long index, if so.
Here the PHP-CPP code is educational, to try and figure out the C++ virtual virtuosity. for the lengths it has gone to duplicate known PHP quirks, and provide an easy to code in set of classes. My low level attack on this issue expects the programmer to know when to use string keys or integer keys directly. Its by data type, and I do not care about string numeric content as keys for conversion to zend_long. Why should the string hash function care?
// in php-src, the zend_hash.c function called by setRaw(const char* key ...)
static zend_always_inline zval *zend_symtable_str_update(HashTable *ht, const char *str, size_t len, zval *pData)
{
zend_ulong idx;
if (ZEND_HANDLE_NUMERIC_STR(str, len, idx)) {
return zend_hash_index_update(ht, idx, pData);
} else {
return zend_hash_str_update(ht, str, len, pData);
}
}
This is only a vague discriptions of the HashMember call chain for array[] notation, and property access It seems that there is some additional complexity cost overhead, requiring additional class creation, and destruction, in terms of function calls. The Php::Value class a kind of swiss army knife of PHP values access. It is versitile. In memory cost, also Value instance holds a hidden virtual function table pointer, as well as a full zval structure. Value harbours a lingering identity crisis, and must often check its zval structure every time to figure out just what it really is, and what any operator function needs to do.
Performance gains for PHP-CPP and Wcc vs PHP script
I made an XML file format for use as configuation files, similar aims to JSON, and YAML. For this simple parser script called xmlread.php has a xmlreader class to pull-parse the format into its final hierarchy of array values including objects. To get measurable values a test script repeated the file read 1,000 times.
I made a PHP-CPP version, and a Wcc version, of the same class, with no major algorithm differences, however the source code classes are different. All three give the same data output values. The C++ extensions manage this timed task at least 5 times faster.
The results on this AMD Ryzen 5 laptop were :- PHP Script - 0.25 seconds PHP-CPP - 0.048 seconds. Wcc - 0.036 seconds.
The Wcc value was 0.37-0.38 before the code was changed to use fcall_info_cache for some method calls.
The xmlread class works by callbacks to xmlreader object methods, such as to read the current node string value, fetch a node attribute value, and read current property values.
Both PHP-CPP and Wcc make use of the main callback facility for the Zend API, which is call_user_fn. Just a little further into the API it calls the function.
zend_result zend_call_function(zend_fcall_info *fci, zend_fcall_info_cache *fci_cache)
By putting the zend_fcall_info, and zend_fcall_info_cache, as C++ datamembers into the xmlread class itself, for the 3 most repeated method calls (read, get_attribute, read_string), to call zend_call_function, instead of via call_user_fn, overall time for the script read test diminished by 1-2%. The cached function calls themselves must have improved somewhat more than that, given that the time for all the other processing did not change, and their is enough work done in C++ extension xmlread class to make it faster than the PHP-Script version.