Syntax Highlighting II

The first time I looked up for a method to include code snippets into my blog entries, I came up with a solution based upon using the service provided by the site tohtml.com. I blogged about it.

This approach had a couple of drawbacks:

  1. Highlighting is based upon HTML tags. Hence code edition is a cumbersome task since the easiest way to do it is to take te code snippet back to tohtml.com, edit it, generate the HTML code again and replace it in the post.
  2. The generated HTML code doesn't use CSS. So customizing it is hard and changing its style requires to generate the code again.
  3. The code snippet is part of the articles text so its width is limited. This affects readability and requires some custom line breaking.
So after I read the post How to embed snippets in your blog, I gave it a try and switch to use GitHub's gists.

Using this approach overcomes the first and third drawbacks. And the second is not a serious problem since now code snippets appear enclosed into a text box, so even if I would change the background to a dark color, snippets will still look good.

The only disadvantage is that I lose the possibility of highlighting parts of the code snippet (e.g. using a bold font or yellow marker style for just one line), but I certainly like it much more the way code snippets look in the blog now.


The rename command

I bought a new home NAS drive and started copying the bunch of pictures in my computer to it. So I faced the problem that either nautilus, SMB or the network drive's filesystem couldn't manage file names having special shell characters.

I happened to have some old picture files that I initially stored in a Windows machine and later moved to my current linux system. The pictures were taken during my semester abroad in Copenhagen, and I found it nice to use the danish name of Copenhagen for the picture file names. So the names of these files initially looked like

København-21042003 <three digits>.jpg

and after moving them into my linux ext3 drive they looked like

K?benhavn-21042003 <three digits>.jpg

which has an interrogation mark ('?') instead of the danish letter 'ø'.

Besides the fact that this kind of file names becomes a shell issue, it's been a while since I decided to avoid file names having spaces and/or letter that are not in the english alfabet, e.g. my beloved 'ñ'. It's a simple rule which is not a pain to follow and just makes life easier.

So I could manually rename all the files one at a time using the nautilus interface (right click menu). But considering that the amount of files raised up to more than 50, I thought it might be easier to use the shell.

I remembered that a while ago, at work, I had to write a script that copied some files renaming them performing a substitution of a pattern in the file names like

literalfoo.ext -> literalbar.ext

and I used a variable to store the file name, then edited its value using sed, and then actually copying the files. So my first thought was to do the same moving instead of copying. But I suspected there should be a tool for easy multiple file renaming and I googled it.

So that's how I found this article in the Debian Adminstration Site and learned about the rename command. So I just had to use this command line

$ rename 's/K.*-(.*\.jpg)/Koebenhavn-\1/' *.jpg

to turn all the file names into

Koebenhavn-21042003 <three digits>.jpg

And the best thing about it was the '-n' option that performs a dry run that just prints out what the command would do, so it lets you try and refine your regular expressions if you are not too familiar with them (as is my case).


Selecting the right key value for the Muenchian method

I have recently faced the problem of element grouping using XSLT. So I had to learn and understand what the Muenchian method is about, since it seems to be the de facto solution for grouping in XSLT due to its good performance (or the bad performance of alternate solution).

I had to transform a Subversion XML verbose log to get a list of changed files for a given issue. Here is a sample document:

One issue may comprise several SVN revision which could (and usually do) affect the same files. The file above illustrates this situation. A simple template matching the issue log entries and copying all the path nodes ends up with a list having repeated items.

The transformation used:

Transformation's result using 'ISSUE-2' as value for 'ticket' parameter:

So the Muenchian method solved the repeated items problem.

Transformation using Muenchian method to create just one entry for each path:

Transformation's result using 'ISSUE-2' as value for 'ticket' parameter:

No repeated items in list, right. But there are missing files! Adding some debugging code to the transformation showed that the missing files where those which also appeared in previous logentries for other issues in input file. This is pretty clear in this example, but when working with real data and an larger input file, it took me some hours to realize what was actually going on.

The expression generate-id() = generate-id(key('path-key', text())[1]) evaluates to false for this nodes, since first node with the same key is out of the nodeset to which template is applied

Muenchian method does not work so simply for grouping child nodes of nodes filtered out using a transformation parameter. So the first approach to solve the problem was to use two transformations and writting a short shell script to call xsltproc twice in a piped chain.

First transformation takes a ticket parameter and copies just the interesting log entries:

Second applies the Muenchian method to get just unique entries for affected paths in log entries:

Transformation's result using 'ISSUE-2' as value for 'ticket' parameter for the first transformation:

This approach worked perfect, but I kept thinking on a possible solution using a single transformation, and eventually found it using a different, slightly complex key. The problem was that the actual key that identified the unique entries I was using had to consider the ticket as well as the path.

Here is the transformation:


Requirements bloat fighting

Requirements proliferation must be fought, by both birth control and infanticide.
Frederick P. Brooks. The Design Of Design. 2010. 


Experience and promotion

I recently came across a couple of articles on how promotion and experience are managed in two different (maybe not so much) software development companies.

The first of them is Why I ran a flat company by Jason Fried. I like his position on keeping the organization as flat as possible and the idea of self-managing teams. But what I like most is what he points out as horizontal ambition. "We always try to hire people who yearn to be master craftspeople, that is, designers who want to be great designers, not managers of designers; developers who want to master the art of programming, not management.", I find this quote to express the concept quite clear.

The second article is Why I never let employees negotiate a raise by Joel Spolsky. He explains the compensation policy that they follow at his company and the rationale behind it. He defines a salary scale based on experience, scope of responsibility and skill set. So for each employee, three factors are measured and his/her final level defines the salary.

There is this quote that made me nod my head: "If you worked as a receptionist for six years, for example, you aren't credited with six years of experience; I give you credit for one year." It somehow leverages what I think about experience.


Paternity leave

So it's been a while since I wrote my last post here. The reason is that I've been too busy since February 20th, when my daughter was born. My wife and me have been taking good care of her and that, as any parents already know, requires a lot of time and energy.

This blog is a personal project, thus one of the first things I had to take aside. It's been almost two months since then.

I'm glad that I had the opportunity to invest all those time and energy in my family. It's been hard to adapt to the new situation. Now it's time to take it back.



I recently discovered Mimi & Eunice, a comic strip by Nina Paley. It's really inspired most of the time. I really love the simplicity of the drawing and the sharp dialogs.

I felt curious about it and clicked the site's About link and read this:

Q. I have a great idea for a cartoon. Will you draw it for me?

A. No.

I enjoy the straightforwardness in the answer. You just can't say you don't understand or took it wrong. Easy. Simple. Perfect.

It also reminded me of a Rework essay: "Draw a line in the sand". I find Nina Paley's answer draws a pretty clear line in the sand.


Smoke Tests

I learned what Smoke Tests are. A software smoke test would be the test or set of tests that are made to the software system first after a new build to assure the program performs some basic actions so it is ready for some other more stressful testing.

I've been thinking about it. In some development projects I've been involved, the testing team did not have any smoke testing. But the development team did perform some basic testing before handing a new build to the testing team: just run the program and check it displays basic data in screen when feeding test input.

In a different project, the development included unit tests that where compiled and run together within the build process, making the build process fail if tests do fail. Not performing a smoke test suite on a new build is not a big deal when there is a good set of unit tests compiling and running together with the application build.

Unit testing lowers the probability of a fatal failure on a new build, but it doesn't cancel it at all.

Funny why the name for these tests:

The phrase smoke test comes from hardware testing. You plug in a new board and turn on the power. If you see smoke coming from the board, turn off the power. You don't have to do any more testing.

Kaner, Bach, and Pettichord. "Lessons Learned in Software Testing".
Wiley Computer Publishing, 2002, p. 95.


Syntax highlighting

In my last post, I included some C++ code snippets. I always wondered how to provide syntax highlighting when posting some code, so a quick Google search for html syntax highlighting headed me towards tohtml.com.

It's a really useful and simple web app for the occasional code blogger, since I was afraid of the possibility that I would have to manually edit the blog's template CSS to achieve it.


Loops on STL containers

C++ and STL are my everyday working tools. I have get used to use the same construction to iterate through the elements in an STL container:

I've seen most people use a for instead of a while loop:

I find the following reasons to prefer the first over the second option:
  1. The statement for the for loop is too large and usually need to be split into 2 (ugly) or 3 (same as while) lines.
  2. In the while option, operations are performed on the variable Object instead of the iterator itElement, yielding a cleaner and more readable coding style. This becomes a greater advantage when the container is a map instead of a vector.
  3. If some elements must be removed from the container, the for loop is simply not an option, whereas the while option is the way to go.


C++ exceptions

I've been updating some library code to add new functionality lately. This involved some refactoring and moving classes through namespaces.

So I've been thinking about a clean way to use and manage exceptions. I like the approach of libxml++: it is clean, simple and functional. But I find it hard to emulate this approach when writing my own libraries.

These are my guidelines for code design regarding exception handling:
  1. Define a general exception class deriving from std::exception for the whole library.
  2. Derive a new exception class for each class whose methods may throw exceptions. These are declared and defined in the same header and definition files as the class throwing them.
  3. Derive a new class for each specific error type.
  4. Avoid throwing exceptions defined for member classes: members are implementation details, so are the exceptions they throw.
  5. Avoid adding extra data to exceptions such as stack info, this should be managed by throwing a different type of exceptions.