Tracking comments with cocomment

I’m subscribed to quite a long list of feeds lately. Most of them are blogs and almost all of them allow users to comment on posts.

I often leave comments on these blogs. Many times, they are as rich as a posting here as I got lots to say once you make me open my mouth. Many times, I quietly hope for people to respond to my comments. And I’m certainly eager to read these responses and to participate in a real discussion.

Now this is a problem: Some of the feeds I read are aggregated feeds (like PlanetGnome or PlanetPHP or whatever) and it’s practically impossible to find the entry in question again.

Up until now, I had multiple workarounds: Some blogs (mainly those using the incredibly powerful Serendipity engine) provide the commenter with a way to subscribe to an entry, so you get notified per Email when new comments are posted.

For all non-s9y-blogs, I usually dragged the link to the site to my desktop and tried to remember to visit them again to check if replies to my comments where posted (or maybe another interesting comment).

While the email method was somewhat comfortable to use, the link-to-desktop one was not: My desktop is enough cluttered with icons without these additional links anyways. And I often forgot to check them none the less (making a bookmark would guarantee myself forgetting them. The desktop link at least provides me with a slim chance of not forgetting).

Now, by accident, I came across cocomment.

cocomment is interesting from multiple standpoints. For one, it just solves my problem as it allows you to track discussions on various blog entries – even if they share no affiliation at all with cocomment itself.

This means that I finally have a centralized place where I can store all my comments I post and I can even check if I got a response on a comment of mine.

No more links on the desktop, no more using bandwidth of the blog owners mail server.

As a blog owner, you can add a javascript-snippet to your template so cocomment is always enabled for every commenter. Or you just keep your blog unmodified. In that case, your visitors will use a bookmarklet provided by cocomment which does the job.

Cocomment will crawl the page in question to learn if more comments were posted (or it will be notified automatically if the blog owner added that javascript snippet). Now, crawling sounds like they waste the blog owners bandwidth. True. In a way. But on the other hand: It’s way better if one centralized service checks your blog once than if 100 different users each check your blog once. Isn’t it?

Anyways. The other thing that impresses me about cocomment is how much you can do with JavaScript these days.

You see, even if the blog owner does not add that snippet, you can still use the service by clicking on that bookmarklet. And once you do that, so many impressive things happen: In-Page popups, additional UI elements appear right below the comment field (how the hell do they do that? I’ll need to do some research on that), and so on.

The service itself currently seems a bit slow to me, but I guess that’s because they are getting a lot of hits currently. I just hope, they can keep up, as the service they are providing is really, really useful. For me and I imagine for others aswell.

Computers under my command (4): yuna

Yuna was the lead girl in Final Fantasy X, the first episode of the series being released for the Playstation 2.

Now, I know I’m alone with this oppinion, but FFX was a big disappointment for me: Obvious character backgrounds, unimpressive story, stupid mini games, no world map, much too short. No. I didn’t like FFX.

But this doesn’t change the fact that I played through the game and that I was serisouly impressed of how well the thing looked. Yes. The graphics were good – unfortunately that’s everything positive I can say about the game.

And this is why I’m getting straight to the computer behind the name:

I called my MacBook Pro “yuna”.

My MacBook Pro is the one machine I use at work that impressed me the most yet: Fast, good looking, long battery life… and… running MacOS X.

Yuna did what was completely unthinkable for me not much more than 5 years ago: It converted me over to using MacOS X as my main OS. It’s not secondary OS. It’s no dual boot (especially since I stopped playing WoW). It’s no “MacOS is nice, but I’m still more productive in Windows”. It’s no “sometimes I miss Windows” and no “mmh… this would work better in Windows”.

No. It’s a full-blown remorseless conversion.

Granted: Some things DO work better in windows (patched emulators for use in Timeattack videos come to mind), but my point is: I don’t miss them.

The slickness and polish of the OSX interface and especially the font rendering (I admit, I putting way too much emphasis in fonts when chosing my platform, but fonts after all are the most important interface between you and the machine) and the unix backend make me wonder: How could I ever work without OS X?

It’s funny. For some time now I thought about converting.

But what really made me do it was the knowing that there’s a safety net: You know: I still have that windows partition on this intel mac. And I do have Parallels (which is much faster than Virtual PC) which I use for Delphi and lately Visual Studio.

Everyone that keeps telling that Apple switching to Intel will decrease their market share even more better shuts up. Now. Once you have that machine, once you see the slickness of the interface, once you notice how quickly you can be productive in the new environement, once that happens, you’ll see that there’s no need, no need at all, to keep using Windows.

So, a wonderful machine with a name of a (admittedly) good looking girl (with a crappy background story) from a crappy game. Too bad Marle or Terra wasn’t free any more.

Developing with the help of trac

trac rules.

If you have a small team working on a project that’s getting bigger and bigger, if you need a system to track the progress of your project, a system to allow communications within your team in a way that keeps track of what you’ve talked about, if you need a kick-ass frontend to subversion – if you need anything of that, consider trac.

trac is a web based subversion frontend with the nicest addons: It provides a wiki, some project management features and a bug tracker. One that’s actually usable for non-scientists as well (in contrast to bugzilla).

But the tools real strength comes from its networking features: All components are interconnected. You are looking at the svn history and you see links to your bugtracker. You are looking at the bugtracker und you see links to the wiki where you find more information about the bug. And you look at the wiki and you’ll find links to individual changesets (SVN revisions). And so on.

All this is very nice in itself, but it’s not what really made me write this post. The ease of use is. And the good looks.

The software, once it’s running, looks very nice and is very, very easy to use. Some administration tasks require you to pay a visit to the command line, but all everyday tasks can be done from the web interface. In a completely hassle-free way.

No forms too complicated to understand for a normal person to be able to add a bug to the database. No complex customization needed to make these links between the modules work. And no ugly, bloated interface.

If you like the tool so far, be warned though: Installing the thing isn’t exactly a piece of cake – at least if you want to integrate it into an existing apache installation. Still: The benefits far outweigh the hassle you have to go through to set the thing up.

Trac really is one nice piece of software.

Oh and in case you haven’t noticed. Yepp. We are using it internally to manage our projects. One of them at least.

Template engines complexity

The current edition of the german computer magazine iX has an article comparing different template engines for PHP.

When I read it, the old discussion about Smarty providing too many flow controlling options sprang to my mind again, even though that article itself doesn’t say anything about whether providing a rich template language is good or not.

Many purists out there keep telling us that no flow control what so ever should be allowed in a template. The only thing a template should allow is to replace certain marker by some text. Nothing more.

Some other people insist, that having blocks which are parsed in a loop is ok too. But all the options Smarty provides are out of the question as it begins intermixing logic and design again.

I somewhat agree on that argument. But the problem is that if you are limited to simple replacements and maybe blocks, you begin to create logic in PHP which serves no other purpose than filling that specially created block structure.

What happens is that you end up with a layer of PHP (or whatever other language) code that’s so closely tailored to the template (or even templates – the limitations of the block/replacement engines often require you to split a template into many partial file) that even the slightest changes in layout structure will require a rewrite in PHP.

Experience shows me that if you really intend to touch your templates to change the design, it won’t suffice to change the order of some replacements here and there. You will be moving parts around and more often than not the new layout will force changes in the different blocks / template files (imagine marker {mark} moving from block HEAD to block FOOT).

So if you want to work with the down-stripped template engines while still keeping the layout easily exchangeable, you’ll create layout-classes in PHP which get called from the core. These in turn use tightly coupled code to fill the templates.

When you change the layout, you’ll dissect the page layouts again, recreate the wealth of template files / blocks and then update your layout classes. This means that changing the layout does in-fact require your PHP backend coders to work with the designers yet again.

Take smarty.

Basically you can feed a template a defined representation of view data (or even better: Your model data) in unlimited complexity and in raw form. You want to have floating numbers on your template represented with four significant digits? Not your problem with smarty. The template guys can do the formatting. You just feed a float to the template.

In other engines, formatting numbers for example is considered backend logic and thus must be done in PHP.

This means that when the design requirement in my example changes and numbers must be formatted with 6 significant digits, the designer is stuck. He must refer back to you, the programmer.

Not with Smarty. Remember: You got the whole data in a raw representation. A Smarty template guy, knows how to format Numbers from within Smarty. He just makes the change (which is a presentation change only) right in the template. No need to bother the backend programmer.

Furthermore, look at complex structures. Let’s say a shopping cart. With Smarty, the backed can push the whole internal representation of that cart to the template (maybe after some cleaning up – I usually pass an associative array of data to the template to have a unified way of working with model data over all templates). Now it’s your Smarty guys responsibility (and possibility) to do whatever job he has to do to format your model (the cart) in a way the current layout specification asks him to.

If the presentation of the cart changes (maybe some additional text info must be displayed what the template was not designed for in the first place), the model and the whole backend logic can stay the same. The template just uses the model object it’s provided with to display that additional data.

Smarty is the template engine allowing to completely decouple the layout from the business logic.

And let’s face it: Layout DOES in-fact contain logic: Alternating row colors, formatting numbers, displaying different texts if no entries could be found,…

When you remove logic from the layout, you will have to move it to the backend where it immediately means that you will need a backend worker whenever the layout logic changes (which it always does on redesigns).

Granted. Smarty isn’t exactly easy to get used to for a HTML only guy.

But think of it: They managed to learn to replace <font> tags in their code with something more reasonable (CSS), that works completely differently and follows a completely different syntax.

What I want to say is that your layout guys are not stupid. They are well capable of learning the little bits of pieces of logic you’d want to have in your presentation layer. Let them have that responsibility means that you yourself can go back to the business logic once and for all. Your responsibility ends after pushing model objects to the view. The rest is the Smarty guys job.

Being in the process of redesigning a fully smarty-based application right now, I can tell you: It works. PHP does not need to get touched (mostly – design flaws exist everywhere). This is a BIG improvement over other stuff I’ve had to do before which was using the way everyone is calling clean: PHPLIB templates. I still remember fixing up tons and tons of PHP-code that was tightly coupled into the limited structure of the templates.

In my world, you can have one backend, no layout code in PHP and a unlimited amount of layout templates. Interchangable without changing anything in the PHP code. Without adding any PHP code when creating a new template.

Smarty is the only PHP template engine I know of that makes that dream come true.

Oh and btw, Smarty won the performance contest in that article with a lot of distance to the second fastest entry. So bloat can’t be used as argument against smarty. Even if it IS bloated, it’s not slower than non-bloated engines. It’s faster.

PostgreSQL: Explain is your friend

Batch updates to a database are a tricky thing because of multiple aspects. For one, many databases are optimized for fast read access (though not as optimized as say LDAP). Then, when you are importing a lot of data, you are changing the structure of the data already in there which means that it’s very well possible that the query analyzer/optimizer has to change its plan in mid-batch. Also, even if a batch import is allowed to take a few minutes when running in the background, it must not take too long either.

PopScan often relies heavily on large bulk imports into its database: As the applications feature set increased in time, it has become impossible to match all of the applications features to a database which may already be running at the vendors side.

And sometimes, there is no database to work with. Sometimes, you’re getting quite rough exports from whatever legacy system may be working at the other end.

All this is what forces me to work with large bulk amounts of data coming in in one of many possible formats: Other databases, text files, XML files, you name it.

Because of a lot of bookkeeping and especially tracking of changes in the data to allow to synchronize only changed datasets to our external components (Windows Client, Windows CE Scanner), I can’t just use COPY to read in a complete dump. I have to work with UPDATE/INSERT which doesn’t exactly help at speeding up the process.

Now what’s interesting is how indexes come into play when working with bulk transfers: I had it both now: Sometimes it’s faster if you drop them before starting the bulk process. Sometimes you must not drop them if you want the process to finish this century.

EXPLAIN (and top – if your postgres process is sitting there with constant 100% CPU usage, it’s full-table-scanning) is your friend in such situations. That and an open eye. Sometimes, like yesterday, it was obvious that something was going wrong: That particular Import I was working with slowed down the more data it processed. We all know: If speed is dependent of the quantity of data, something is wrong with your indexes.

Funny thing was: There was one index too many in that table: The primary key.

The query optimizer in PostgreSQL thought that using the primary key for one condition and then filtering for the other conditions was faster. But it was dead wrong as the condition on which I checked the primary key yielded more data with every completed dataset.

That means that PostgreSQL had to sequentially scan more and more data with every completed dataset. Using the other index, one I specifically made for the other conditions to be checked, always would have yielded a constant amount of datasets (one to four) so filtering after the PK condition after using that other index would have been much faster. And constant in speed even with increasing amounts of imported datasets.

This is one of the times when I wish PostgreSQL had a way how to tell the optimizer what to do. To tell it: “Take index a for these conditions. Then filter after that condition.”.

The only way to accomplish that so far is to drop the index that was used by accident. It’s just that it feels bad, dropping primary keys. But here it was the only solution. To PostgreSQL’s defense, let me add though: My 8.1 installation took the right approach. It was the 7.3 installation that screwed here.

OK. So just drop the indexes when making a bulk import. Right? Wrong.

Sometimes, you get a full dump to import, but you want to update only changed datasets (to mark only the ones that actually changed as updated). Or you get data which is said to have a unique key, but which doesn’t. Or you get data which is said to have a foreign key, but which violates it.

In all these cases, you have to check your database for what’s already there before you can actually import your dataset. Otherwise you wrongly mark a set as updated, or your transaction dies because of a primary key uniqueness violation or because of a foreign key violation.

In such cases, you must not remove the index your database would use in your query to check if something is already there.

Belive me: The cost of updating the index on each insert is MUCH lower than the cost of doing a full table scan on every dataset you are trying to import ;-)

So in conclusion let me tell this:

  • Bulk imports are interesting. Probably even more interesting than complex data selection queries.
  • EXPLAIN is your best friend. Learn how to read it. Learn it now.
  • So-called “rules of thumb” don’t apply all the time.
  • There are few things in life that beat the feeling of satisfaction you get after staring at the output for EXPLAIN for sometimes hours and optimizing the queries/indexes in question countless times, when your previously crawling imports begin to fly.

Six years of Sensational AG

Six years and a day ago (I already made two posts yesterday, so this had to wait), we were at the Handelsregisteramt (the public office where you register companies here in Switzerland) where we officially founded the Sensational AG.

I even remember the weather (which is basically because I have a really hard time at forgetting anything): It was one of the few days that summer where it didn’t rain (completely contrary to the current summer). There wasn’t much to see of the sun either, but it was hot and moist.

When we founded, one of us was still going to school and I was absorbed by something else, so we continued keeping our operation on a slow level.

On February 4th, 2001, we really took off.

By then school was over for all of us and my thing was over too. We moved into a real office (the team was working together even before we founded the real company – but then we all were still at school and working from home and from the school house). I set up the basics of our internal network (which still works today – even some hardware is the same, namely Thomas, a Thinkpad 390 or something like that which is a central gateway). The internet access was still over a ISDN line, but at least it was something. ADSL was not available back then.

Mid 2001, we developed a barcode scanning application on a specific customers request. This application is the foundation of PopScan – our current Big-Thing.

In the last five years of operations, we did a lot of interesting stuff. Sometimes risky, sometimes just interesting and sometimes really, really great. I myself migrated to Mac OS, we migrated our telephone system to VoIP, we were running quite a big internet portal, we developed applications from scratch for the web, for windows and for PocketPCs. We moved office (inside the same building – even the same floor) and we finally hired two more people.

Looking back, we’ve come a long way while still being ourselves. And we managed to achieve incredibly much with just three (and now five) people.

Thanks Lukas, thanks Richard. It’s great to have this thing going with you!

Blogroll is back – on steroids

I finally got around to adding an excerpt of the list of blogs I’m regularly reading to the navigation bar to the right.

The list is somewhat special as it’s auto-updating: It refereshes every 30 minutes and displays a list of blogs in descending order of last-updated-time.

Adding the blogroll was a multi step process:

At first, I thought adding the Serendipity blogroll plugin and pointing it to my Newsgator subscription list (I’m using Newsgator to always have an up-to-date read-status in both Net News Wire and FeedDemon) was enough, but unfortunately, that did not turn out to be the case.

First, the expat module of the PHP installation on this server has a bug making it unable to parse files with the unicode byte order mark at the beginning (basically three bytes telling your machine if the document was encoded on a little- or big-endian machine). So it was clear that I had to do some restructuring of the OPML-feed (or patching around in the s9y plugin, or upgrading PHP).

Additionally, I wanted the list to be sorted in a way that the blogs with the most recent postings will be listed first.

My quickly hacked-together solution is this script which uses a RSS/Atom-parser I took from WordPress, which means that the script is licensed under the GNU GPL (as the parser is).

I’m calling it from a cron-job once per 30 minutes (that’s why the built-in cache is disabled on this configuration) to generate the OPML-file sorted by the individual feeds update time stamp.

That OPML-file then is fed into the serendipity plugin.

The only problem I now have is that the list is unfairly giving advantage to the aggregated feeds as these are updated much more often than individual persons blogs. In the future I will thus either create a penalty for these feeds, remove them from the list or just plain show more feeds on the page.

Still, this was a fun hack to do and fulfills its purpose. Think of it: Whenever I add a feed in either Net News Wire or FeedeDemon, it will automatically pop up on the blogroll on gnegg.ch – this is really nice.

On a side note: I could have used the Newsgator API to get the needed information faster and probably even without parsing the individual feeds. Still, I went the OMPL-way as that’s an open format making the script useful for other people or for me should I ever change the service.