Blogroll is back – on steroids

I finally got around to adding an excerpt of the list of blogs I’m regularly reading to the navigation bar to the right.

The list is somewhat special as it’s auto-updating: It refereshes every 30 minutes and displays a list of blogs in descending order of last-updated-time.

Adding the blogroll was a multi step process:

At first, I thought adding the Serendipity blogroll plugin and pointing it to my Newsgator subscription list (I’m using Newsgator to always have an up-to-date read-status in both Net News Wire and FeedDemon) was enough, but unfortunately, that did not turn out to be the case.

First, the expat module of the PHP installation on this server has a bug making it unable to parse files with the unicode byte order mark at the beginning (basically three bytes telling your machine if the document was encoded on a little- or big-endian machine). So it was clear that I had to do some restructuring of the OPML-feed (or patching around in the s9y plugin, or upgrading PHP).

Additionally, I wanted the list to be sorted in a way that the blogs with the most recent postings will be listed first.

My quickly hacked-together solution is this script which uses a RSS/Atom-parser I took from WordPress, which means that the script is licensed under the GNU GPL (as the parser is).

I’m calling it from a cron-job once per 30 minutes (that’s why the built-in cache is disabled on this configuration) to generate the OPML-file sorted by the individual feeds update time stamp.

That OPML-file then is fed into the serendipity plugin.

The only problem I now have is that the list is unfairly giving advantage to the aggregated feeds as these are updated much more often than individual persons blogs. In the future I will thus either create a penalty for these feeds, remove them from the list or just plain show more feeds on the page.

Still, this was a fun hack to do and fulfills its purpose. Think of it: Whenever I add a feed in either Net News Wire or FeedeDemon, it will automatically pop up on the blogroll on gnegg.ch – this is really nice.

On a side note: I could have used the Newsgator API to get the needed information faster and probably even without parsing the individual feeds. Still, I went the OMPL-way as that’s an open format making the script useful for other people or for me should I ever change the service.

PHP Stream Filters

You know what I want? I want to append one of those nice and shiny PHP stream filters to the output stream.

I have this nice windows-application that recives a lot of XML-data that can be compressed with a very high compression factor. And as the windows application is for people with very limited bandwith, this seems to be the perfect thing to do.

You know, I CAN compress all my output already. By doing something like this:

<?php
ob_start();
echo "stuff";
$c = ob_get_clean();
echo bzcompress($c);
?>

The problem with this approach is that the data is only sent to the client once it’s assembled completely. bzip2 on the other hand is a stream compressor that is very well able to compress a stream of data and send it out as soon as a chunk is ready.

The windows client on the reciving end is certainly capable of doing that. As soon as bytes come in, it decompresses it chunk-wise and feeds it to a Expat based parser which will handle the extracted data. Now I want this to happen on the sending side aswell.

The following code does work sometimes:

<?php
  $fh = fopen('php://stdout', 'w');
  stream_filter_append($fh, 'bzip2.compress', STREAM_FILTER_WRITE, $param);
  fwrite($fh, "Stuff");
  fclose($fh);
?>

But sometimes it doesn’t and produces a incomplete bzip2-stream.

I have a certain idea of why this is happening (no sending out of data to the filter on shutdown), but I can’t prevent it. Sometimes the data is not put out which makes this method unusable.

I’m afraid to report this to bugs.php.net as I’m sure it’s something PHP was not designed for and it’ll get marked as BOGUS faster than I can spell ‘gnegg’.

So this means that the windows-client just has to wait for the data being extracted, converted to xml and compressed.

*sigh*

(thinking of it, there may be this option of outputting data to a temp-file (to which handle a filter is assigned to) and the read it out to the browser immediately afterwards. But come on, this can’t be the solution, can it?)

Update: I’ve since tracked the problem to a bug in PHP itself for which I found a fix. My assumption of writing to a temporary file could help was wrong as PHP itself does not check the return value of a bzlib function correctly and never writes out a half-full buffer on stream close. Neither to the output stream nor to a file.

mp3act

When you have a home server, sooner or later your coworkers and friends (and if all is well even both in one person ;-) ) will want to have access to your library

Cablecom, my ISP, has this nice 6000/600 service, so there’s plenty of upstream for others to use in principle. And you know: Here in Switzerland, the private copy among friends is still legal.

Well, last sunday it was time again. Richard wanted access to my large collection of audiobooks and if you know me (and you do as a reader of this blog), you’ll know that I can’t just give him those files on a DVD-R or something. No. A webbased mp3-library had to be found.

Last few times, I used Apache::MP3, but that grew kinda old on me. You know: It’s a perl module and my home server does not have mod_perl installed. And I’m running Apache 2 for which Apache::MP3 is not ported yet AFAIK. And finally, I’m far more comfortable with PHP, so I wanted something written in that language so I could make a patch or two on my own.

I found mp[3]actmp3act which is written in PHP and provides a very, very nice AJAX based interface. Granted. It breaks the back-button, but everything else is very well done

And it’s fast. Very fast.

Richard liked it and Christoph is currently trying to install it on his windows server, not as successful as he wants to be. mp3act is quite Unix-Only currently.

The project is in an early state of developement and certainly has a rough end here and there, but in the end, it’s very well done, serves its need and is even easily modifiable (for me). Nice.

Once more: PHP and SOAP

I can’t reist: I made my third attempt at getting a SOAP-Server in PHP to work (I only documented my first try here on the blog).

My first try was a little more than two years ago. That one failed miserably.

The next try was last november. I came somewhat further than I did my first time, but Visual Studio was unable to import the WSDL correctly as soon as I was passing arrays of structs around

And now I tried again – this time with PEAR SOAP 0.9.1

This time all looks so much better. First of all, I do this because I really have to: For one of our PopScan customers, we are accessing their IBM DB2 database – currently using a Perl-based server that’s nearing the end of its maintainability, so I deceided to redo it with PHP (PHP-code is somewhat cleaner than Perl code and I’m more fluent in PHP than in Perl)

The DB2-client (especially the one needed for that old 7.1 database) is clumsy, a bit unstable and really not something I want to link into our Apache-Server that serves all our clients.

So the idea was to compile another apache, run it on another port, bound to localhost only. Add PHP with the DB2-client. Access this combo via some way of RPC with the nice DB2-free standard-installation.

Well. And instead of once again designing a custom protocol (like I did for the Perl-Server), I though: Maybe give SOAP another shot.

In contrast to previous experience, this time, it was the Server that worked and the client that was failing. Using PEAR SOAP 0.9.1, creating the server (which creates the dreaded WSDL) went without flaw. This time I was even able to import the WSDL into VS 2003, which I tried just for fun.

Passing around arrays of structs of structs was no problem at all. After building the self::$__typedef and self::$__dispatch_map arrays, passing around those data types has become really intuitive: Just create arrays of arrays in PHP and return them. No problem.

Well done, PEAR team!

This time I’ve had problems with the PEAR SOAP Client. It insisted in passing around ints as strings which the server (correctly) did not like.

Instead of using lots and lots of time debugging that, I went the pragmatical way and used PHP5’s build in SoapClient functionality. No problems there.

And then it suddenly broke

My test-client was written for the CLI version of php which was version 5.0.4. The apache-module of the live-server was 5.0.3.

All I got with 5.0.3 was a HTTP Client Error (SoapFault exception: [HTTP] Client Error).

Whatever I did, it did not go away, but to my delight I have seen that PHP did not even connect to the server to fetch the WSDL. This was good as I was able to debug much quicker that way.

In the end it was the URL of the WSDL. Every version of PHP5 (even the 5.1 betas) – besides 5.0.4 – does not like this:

http://be.sen.work:5436/?wsdl

it prefers this

http://be.sen.work:5436/index.php?wsdl

I ask now: Why is that this way? The first version is a valid URL aswell. The served WSDL is correct – it’s the same file that gets called and it returns totally the same content. This is so strange.

After all, I have to say. SOAP with PHP – after two years – still is not ready for prime time. It’s still in the state of “sometimes working – sometimes not”. But as I now have an environement where it’s known to be working and as I’m in total control of said environement, I will go with SOAP none-the-less. It’s so much cleaner (and more secure: more people than just me are looking at the SOAP-code) than designing yet another protocol and server.

Oh. And the bottom line is: Never trust protocols that call themselves “simple” or “lightweight” ;-)

What I hate about PHP

This is what I really hate about PHP:

pilif@galadriel ~ % cat test.php
<?
if (10 == '10ABC')
    echo "Gnegg!n";
?>
pilif@galadriel ~ % php test.php
Gnegg!

This is the reason for a pretty serious bug in my current i’m-loving-doing-that-as-it’s-the-greatest-ever-project

What happens is that PHP implicitly converts 10ABC to an integer (yielding 10) and then making an integer comparison.

In my oppinion, this is wrong as inplicitely converting a string to an integer can cause information to be lost. Would PHP have converted 10 to ’10’, the comparison would have worked like one expects because converting an intger to a string works without losing information.

Then again, integer-conversions are more accurate than string conversions, so I can understand PHP’s way. What I cannot understand is that a non-integer string is converted to something else than 0 or nothing (while causing a runtime-error). The comparison in my example should never have evaluated to a true value (which happened, because intval('10abc') == 10!

And converting to string if one argument of a comparison is a string is not the holy grail either – problems with locale-specific decimal points come to mind (is it . or ,?).

So perls idea of using a dedicated string comparison operator may not have been a bad idea after all…

Horde 3.0

Today Horde 3.0 along with some applications using it, the most noteworthy being IMP 4.0, has been released.

For me, horde has a long history of being a pain in the ass to install and extend. While installing the first versions has been quite easy (but not possible for me back then as I did not have access to my own server and the environement of our shared hoster did not have all the extensions needed – let alone shell access), it grew quite complicated with 3.0 onwards.

My main problem has been and is that Horde is not really a framework for application developement, but a frontend-container. It’s not possible to just install IMP. You’re always installing a kind-of groupware-application (Horde) and only them the webmail component

If you don’t do it right, you actually force your users to login in twice when checking their email (once in horde, once in IMP)

As always, I really had to take a look at those new releases.

As the horde main server is quite busy currently, I’ve downloaded from the mirror in the netherlands – the others where either not reachable or not current.

After downloading the horde framework, satisfying the very long list of dependencies took some time. Especially tricky was the fileinfo PECL-extension but this was because of a problem with my local PHP installation. Glad I found out now and could fix it

Then came the configuration. What a nicely done web interface! Unfortunately, I just managed to lock myself out (I chose “IMAP Server” as authentication source not knowing that this only works after IMP is installed and IMP cannot be installed without a working horde installation…)

After those things where setteled, I came to the installation of IMP. Easy procedure here – after getting used to with the framework itself before.

Then I’ve configured horde to use IMP as authentication source which did not work at first but after copying over the backup configuration file and trying again, it finally worked (don’t ask me what I did the second time).

My next problem was the preset settings for my users: Per default, it’s using a 12 hours time format, Arabia as location and somewhere in Africa for the time zone.

As I cannot ask my (possible) users to change those preferences, I looked for a way to do that and while doing that I began to understand how the Horde configuration system works.

Now, I’m quite impressed about how they are doing this: it’s generic, it’s configurable and every single feature can be locked down for the end users. Very nice.

Just make your configuration changes in config/prefs.php. If you need a list of possible values, either read the source, or easier: Just look at the HTML source of the preferences-screens.

If I had a whish for the next release: Provide a way for the administrator changing those settings via the Web-Frontend.

While I first just installed IMP, which worked flawslessly out-of-the box, I ventured further and installed kronolith, turba, nag and fiinally even chora. Additionally, I configured Horde to give access to chora only to me. Comfortable. Even more impressive, when I recall that the whole user-management is done via the XAMS environement (by using IMP to authenicate the users).

All in all, I still would whish to hide away horde and just install IMP (with a small, simple integrated addressbook), but as a) IMP really is the best (PHP-based – I don’t know no others) webmailer out there and as the other applications work really nicely (even with PHP5, though it’s not officially supported), I can live with that limitation.

Now, I have two tasks ahead of me:

  1. Provide support for changing the XAMS account password from within the web interface. This will be a great opportunity to learn how the preferences system really works.
  2. Teach Ingo how to create Exim-Filters as this is the filtering system that could most easily be integrated into XAMS. When I designed the initial draft of XAMS (then still called pmail), I took great pride that the mail delivery does not cause a non MTA-process to be forked an I want to keep it that way. It saves resources under high mail load.

After the christmas days, I certainly will know what the new Horde/IMP is made of. From an Administrators/Users perspective, it’s a great release.

Thank you guys!

Apache 2

There was this discussion recently about whether Apache 2.0 should be recommended by the PHP guys or not.

While I find their warning a bit too harsh, I for myself still cannot run Apache 2 – though I’d really like to. So maybe it’s time to add my two cents:

Last march, I was going to newly set up our productive server. As the apache guys keep telling that Apache 2.0 is production ready, I first went with the new version of course. Here’s what did not work and finally forced me to go back to 1.3: It’s not about PHP at all: The two extensions I’m depending on (MySQL and PostgreSQL) are available in a threadsafe edition, so even one of the threaded MPMs would have worked. What killed my intentions was mod_perl.

Back then, when the comment-spam problem was not that a big one for me, I have been running gnegg.ch in a mod_perl environement which at that time was not setupable with Apache 2: mod_perl itself had an even bigger warning about not working well than PHP still has. And additionally, they’ve changed their API, so even if I’d been able to get it to work, there would have been no guarantees of getting MT to work with that new api.

Anyway: I’ve been willing to try it out, but libapreq, required by MT when running in mod_perl, was only available as an early preview too (still isn’t nowhere near production ready). My tries in installing it anyway lead to a flurry of SIGSEGVs in Apache when using MT. Judging from the Gentoo bugtracker this has not gotten better yet.

One of the strongest selling-points for Apache isn’t PHP. It’s mod_perl. And currently, it’s mod_perl that should have this big warning on its webpage. Mod_perl and not PHP (which works nicely under Apache 2 in an internal developement system).

And even when mod_perl gets fixed: As they have changed the API, many existing (and not longer maintained) packages using mod_perl (like Apache::MP3 for example) will possibly stop working after the switch to Apache 2.

As soon as the first guy comes here and posts that he/she’s gotten MT to work under mod_perl on Apache 2, I’m going to reconsider the switch. Not a second earlier.

Internet Explorer, File Downloads, PHP

Have you ever tried sending a file to Internet Explorer, for which an internal displaying plugin is installed? Take a .CSV-File for example (or a PDF for that matter).

If so, then maybe you have noticed that IE in some versions just displays an error-message about not being able to find the file just downloaded whenever you have a call to session_start() in your script.

The problem is with the Headers PHPs session management sends to the browser: It disallows any cahing and tells that the document expired somewhere around my year of birth (1981). It seems like IE takes that literaly and really does not cache the doument, but then naturally is unable to forward it to the plugin (or activex-control or whatever).

Fortunately, you may change PHPs default headers by just emitting some additional header()-calls:

    header('Content-Type:  application/csv');
    header('Pragma: cache');
    header('Cache-Control: public, must-revalidate, max-age=0');
    header('Connection: close');
    header('Expires: '.date('r', time()+60*60));
    header('Last-Modified: '.date('r', time()));

A short explanation of the headers sent:

  1. The content-type tells the browser that there’s a CSV file coming
  2. Pragma is an old HTTP/1.0-Header. This one allows caching of the resource
  3. Cache-Control is the new HTTP/1.1 header to replace Pragma. “public” means: Public proxies may cache the document (private would also work and would mean: Cache in the Browsers cache). must-revalidate advises proxy servers (and browsers) to check if the resource is modified whenever the document is older than max_age seconds.
  4. The connection-header tells the server and browser what to do with the connection when the resource has been transmitted. The old HTTP/1.0 behaviour is close. keep-alive would be the newer behaviour. I’m not sure whether this really is necessary here, but with this header, it definitely works.
  5. The Expires-Header tells the browser when the document is going to expire. PHP default this to somewhere in 1981 and I think this is what causes the problem for IE. I set it to one hour in the future. If it were possible to just turn off those default-headers, I would simply send no Expires-header at all.
  6. Last-Modified tells the browser when the resource was last modified. I could actually get a timestamp of the underlying data representation and output that so the browser would not have to redownload the resource when the data has not changed, but it’s changing that often that this optimization is not worth the trouble, so I’m telling it just changed.

I have confirmation that this solves the problems some clients where expecting before. Very nice.

PHP 5

As you surely know, PHP 5 has been released. Actually, it’s already 5.0.1.

What you also may know is that Gentoo’s dev-php/mod_php package was promoted from -x86 to ~x86. This means from broken to unstable in Gentoo-terms.

This means that I can now make some tests with PHP5 which I already began doing: I’ve upgraded PHP on our developement server to 5.0.1 and it’s working quite well so far. The only problem I’ve come across is this stupid code in a osCommerce installation:

class something{
  function something{
    // do something
   $this = null;
  }
}

New or old object model in PHP: This is just something you don’t do. Not in PHP, and certainly not in any other language. You should not assign anything to this, self or even Me (or whatever the implicit pointer to your own object is called in your language).

PHP scales well

I think PHP scales well because Apache scales well because the Web scales well. PHP doesn’t try to reinvent the wheel; it simply tries to fit into the existing paradigm, and this is the beauty of it.

Read on shiflett.org after a small pointer by Slashdot into the right direction. This guy really knows what he is writing – or at least it seems to me as I think exactly the same way as he does (which is a somewhat arrogant way of saying things, I suppose :-)).