Project: STRM Privacy

STRM Privacy is a Dutch startup providing "privacy first" data streams. It was my first assignment as an independent freelancer. I built a PHP implementation of their driver.

STRM Privacy

STRM Privacy is an event data processing platform that offers "privacy first" event data streams. It ensures all sent data conforms to a pre defined schema, and delivers only data that conforms to all privacy requirements. It does that by defining all privacy rules at the very beginning, instead of at the end, where in fact it's already too late.
Through an ex colleague I came in contact with this company. They were looking for a PHP developer. They already had implementations of their driver for languages like Typescript, Python and Java, but they also needed one for PHP. I built this for them. It came down to a client that validates data with a JSON schema, serialize it and send it in a correct and efficient way to the STRM privacy gateway.

Requirements

  • Authenticate correctly with the backend
  • The driver should conform to existing implementations, regarding architecture and object model
  • Efficient use and re-use of resource; use of http/2
  • Clear and helpful user feedback and logging
  • Serializer should be able to handle JSON and AVRO_BINARY
  • Provided with tests and relevant CI pipelines

Design choices

The minimal PHP version that needed to be supported was not known. So to err on the cautious side, I chose to support PHP7.2 and up. This version is quite end of life but I know from practice that it still is being used in quite a few places.

The driver would be released as open source. So of course I created a composer package on packagist.org, which can be easily installed using composer.

Apache Avro

The driver is in fact an API client and as such not very complex. But to be frank, I had never before heard of Apache Avro. Turns out it's just a serialize system, but a very efficient one. Much more efficient than JSON, and schema based. So you are sure every data serialized using AVRO conforms to the defined schema, and especially if you strip the schema itself from the serialized data, it can be very efficient with very low overhead.

Integration of AVRO turned out to be almost trivial. I spent most of the time looking for the best suited PHP libary, because there are quite a few different implementations. Finally I settled for the Wikimedia version. It was the most complete and up to date. It is based on the official version from Apache but included composer support. Too bad the library wasn't very PSR-4 compliant. I did find a PSR-4 compliant port but that one wasn't suitable for production yet, unfortunately.

Tests and CI

Projects like this are very suited for Test Driven Development. However I was quite used to Laravel which has the very nice "inversion of control" principle which makes it very easy to inject mock versions of for example http clients in the application. But of course there are other ways to do this. The Guzzle HTTP library has some support for mocking requests, so I used that.

After the driver itself was finished, tested and approved, I set up the Continuous Integration pipelines using Github Actions, to run the unit tests on every push, and to update the Packagist repository on every release using a webhook.

In all other repositories of STRM Privacy Husky was being used, combined with commitizen. This enforces consistent commit messages, and takes care of automatic semantic versioning and changelog generation. I didn't know these tools, but they turned out to be very useful, so i will probably be using them more often.

Do you have a project or job where you could use my help?

Contact me, and we can discuss your requirements and wishes!