Comment by gib444
4 days ago
> A month ago, I went on a performance quest trying to optimize a PHP script that took 5 days to run. Together with the help of many talented developers, I eventually got it to run in under 30 seconds
That's a huge improvement! How much was low hanging fruit unrelated to the PHP interpreter itself, out of curiosity? (E.g. parallelism, faster SQL queries etc)
Almost all, actually. I wrote about it here: https://stitcher.io/blog/11-million-rows-in-seconds
A couple of things I did:
- Cursor based pagination - Combining insert statements - Using database transactions to prevent fsync calls - Moving calculations from the database to PHP - Avoiding serialization where possible
Aren’t these optimizations less about PHP, and more about optimizing how your using the database.
PHP is kind of like C. It can be very fast if you do things right, and it gives you more than enough rope to tie yourself in knots.
Making your application fast is less about tuning your runtime and more about carefully selecting what you do at runtime.
Runtime choice does still matter, an environment where you can reasonably separate sending database queries and receiving the result (async communication) or otherwise lets you pipeline requests will tend to have higher throughput, if used appropriately, batching queries can narrow the gap though. Languages with easy parallelism can make individual requests faster at least while you have available resources. Etc.
A lot of popular PHP programs and frameworks start by spending lots of time assembling a beautiful sculpture of objects that will be thrown away at the end of the request. Almost everything is going to be thrown away at the end of the request; making your garbage beautiful doesn't usually help performance.
4 replies →
It's still valid as as example to the language community of how to apply these optimizations.
in all my years doing database tuning/admin/reliability/etc, performance have overwhelmingly been in the bad query/bad data pattern categories. the data platform is rarely the issue
3 replies →
In general, it is bad practice to touch transaction datasets in php script space. Like all foot-guns it leads to Read-modify-write bugs eventually.
Depending on the SQL engine, there are many PHP Cursor optimizations that save moving around large chunks of data.
Clean cached PHP can be fast for REST transactional data parsing, but it is also often used as a bodge language by amateurs. PHP is not slow by default or meant to run persistently (low memory use is nice), but it still gets a lot of justified criticism.
Erlang and Elixir are much better for clients/host budgets, but less intuitive than PHP =3