Computers Category

Relating to computers and technology

RSS feed for this category

01.21.10

Downsides to Dogfooding

Posted in Software Development at 3:54 pm

My pal Mars Saxman had some interesting things to say against the development paradigm of dogfooding, using this related article at jalf.dk as a springboard. I have to say, there are some excellent points.

10.26.08

Git vs Mercurial

Posted in Software Development at 3:45 pm

I just got out of a point-by-point comparison of Git and Mercurial, presented by one developer from each project (simultaneously). Both are DVCSs that are popular among Free and Open Source Software projects.

The presentation pretty much confirmed my existing convictions about the two projects: they are extremely comparable in terms of the features provided, though of course there are differences in the approach that can make a big difference for people’s preferences. But the major difference seems to be that Mercurial has significantly better documentation, is easier to use overall (at least for the most common tasks), and runs better on Windows.

The better performance on Windows was the main reason I chose to migrate GNU Wget to Mercurial rather than Git. The ease-of-use, and similarity to Subversion, is why I prefer to use it in my own personal projects, as well.

When I had to choose a DVCS for GNU Screen, on the other hand, I went with Git. This was mainly because I believed it to be essentially equal in power to Mercurial (which I feel was confirmed by this presentation), was of no use to Windows users (at least, other than ones running a Unix environment such as Cygwin), and would be the more familiar DVCS to the folks most likely to be interested in developing it (several other important GNU projects, such as coreutils, gnulib, autoconf and automake, are managed with Git).

10.10.08

Adventures in Haskell, Part 2: Kewlness

Posted in Software Development at 5:12 pm

This is a continuation on part one, which was a very basic introduction to functional programming. In this portion I’m pointing out some aspects of Haskell which I find to be fairly kick-ass. :)

My apologies if 80-90% of this post is gibberish to you. I try to be clear, but I’m also trying not to spend a copious amount of time writing this post. Think of it as a rather detailed “best things about Haskell” list, rather than anything remotely like a tutorial, or something that otherwise gives you an actual decent understanding of Haskell as a language. The idea is to try to whet your appetite to investigate further.

Death to Parentheses

Well, the first thing that impressed me is going to be small hat to folks whose primary experience with functional programming is with languages other than Lisp or Scheme, or to folks who have no FP experience at all; but Lisp is notorious for its excessive use of parentheses. Every function call gets its own pair, and since invoking functions is a frequent occurrence in functional programming, you get quite a lot of them. Even better, when you’re done with all your nested function calls, you’re left to figure out how to place the dozen or so closing parentheses. Here’s an example of the crappy powers function from part one of this miniseries, in Lisp:

(defun powers (i v)
    (if (= i 8) ()
                (cons v (powers (+ i 1) (* v 2)))
    )
)

For this example, it’s not too terrible; but it’s a simple function, and already we end up with five successive closing parentheses at the end. Imagine how more complex code might look! Still, you get used to it; and at least there’s never a question of what’s an argument of what.

The parens tend to bunch up when you nest function calls:

(reverse (mapcar '3* (filter 'oddp (number-sequence 1 10))))

The example above includes non-standard functions (3* and filter) whose definitions I have not provided (they’re trivial); it takes a list of numbers from 1 to 10, then filters the list so it’s just the odd numbers in that range (1 3 5 9), multiplies each element of the list by three, and then reverses the list (largest to smallest). You may note that it’s sometimes easier to read expressions like the above from right to left.

In Lisp there’s no way to avoid these parens in a series of nested function calls. In Haskell, you could also write something similar to the above…

reverse (map (3*) (filter odd [1..10]))

However, Haskell provides a “dot” operator, which applies the function on the left to the output of the function on the right, so you can also write this snippet like:

(reverse . map (3*) . filter odd) [1..10]

The outer set of parentheses are necessary in the above example, because the dot operator takes a function on either end; and the result of filter odd [1..10] (which binds more tightly than the dot operator; function applications always bind more tightly than operators) is not a function, it’s a list result. You might wonder how filter odd and map (3*) can be functions; they look more like the result of a function applied to an argument. And they are: one of Haskell’s great features is partial function application: you can pass one argument to a function that takes two, and the result is a function that takes the “second” argument. The (3*) is actually a partial application too: in the Lisp example, I would have had to define a function named 3* to make it work; but in the Haskell example (3*) is a partial application of the * operator.

I like to read the dot operator as “of”. So you can read (reverse . map (3*) . filter odd) as” reverse the order of mapping the multiply-by-three operation across the elements of the result of filtering for just the odd-numbered values of…”

We can even get rid of those outer parentheses, too, if we want:

reverse . map (3*) . filter odd $ [1..10]

The dollar operator takes a function on the left and applies it to everything on the right as a single argument. It’s right-associative, so we actually could also write this expression as:

reverse $ map (3*) $ filter odd $ [1..10]

The only set of parentheses we can’t get rid of is the ones in (3*), where they are used in the partial application of the * operator, called a section. Parentheses are part of the syntax required to designate a section. Of course, if we really wanted, to, we could just write a function that does the same thing, and name it mulThree or what have you.

Personally, I like this one best:

(reverse . map (3*) . filter odd) [1..10]

Function Definitions and Pattern Matching

In part one of this overview, we presented this version of our powers function in Haskell:

powers i v | i >= 0 && i < 8  = v : powers (i+1) (v*2)
           | otherwise        = [] 

The truth is, though, we don’t need both i and v. One value could be deduced from the other. I just kept both parameters because they’d been needed in C (they could each be deduced from the other in C as well, but at a performance penalty; there are other ways to write the example in C without using i; but the example was written for readability).

A common way to write function definitions in Haskell, is to first write a definition of the function for a concrete value, and then define the function for other values in terms of that first definition. For instance, we could’ve written powers like:

powers 0 = 1 : powers 1
powers 8 = []
powers n = 2 * head (powers (n-1)) : powers n+1

Note that this assumes no one ever calls powers 9; and also that powers now takes one argument instead of two (we dropped v).

This series of definitions means that

  • “powers 0” is a list whose first element has the value 1, and whose remaining elements are defined by “powers 1”
  • “powers 8” is the empty list.
  • “powers n” for any other value n is a list whose first element is twice the value of the first element of “powers (n-1)”, and whose remaining values are defined by “powers (n+1)”.

This winds up giving us the same list result for powers 0 as we previously got for powers 0 1 in our original definition of the function.

Note that there’s just one function powers, but there are three definitions for that function. These definitions combine to provide the complete description of what powers means.

Pattern Matching

We’ve just made a slight use of a very powerful Haskell feature, pattern matching. In this case, we’re matching arguments to powers against various possibilities of 0, 1, or n (where n can be any value). But pattern matching can actually be performed against any sort of value or partial value.

For instance, if we wanted to write a very simplistic function to convert a string into 1337-speak, transforming

The quick brown fox jumped over the lazy dog

to

Th3 quick br0wn f0x jump5 0v3r 7h3 14zy d0g.

we could write it like:

leetize ""       = ""
leetize ('l':xs) = '1' : leetize xs
leetize ('e':xs) = '3' : leetize xs
leetize ('a':xs) = '4' : leetize xs
leetize ('s':xs) = '5' : leetize xs
leetize ('t':xs) = '7' : leetize xs
leetize ('o':xs) = '0' : leetize xs
leetize (x:xs)   = x : leetize xs

The above could have been written somewhat more tersely, but it’s a good demonstration of using pattern-matched function definitions. The (x:xs) patterns describe a list (: is the list-construction operator), whose first element is x and whose remaining elements are the (possibly empty) list xs. The first definition in this example simply says that applying leetize to an empty string will yield an empty string as the result. The following definition means that if the leetize function is applied to a string whose first character is the letter ell, then the result is the one digit, followed by the application of leetize to the remaining characters of the string. The final definition matches (x:xs), for any string beginning with an arbitrary character x that doesn’t match any of the previous patterns; it just leaves that character as-is, and continues to transform transformations on the remaining characters.

Roll your own operators

Another thing that I think is really cool in Haskell is the ability to invent or redefine your own operators. Languages like C++ let you override the definition of existing operators for new types, but Haskell lets you invent entirely new operators. Want to define a new operator *!? It’s as easy as writing

x *! y = x * y / 2

You can also use normal functions as if they were operators. This is common with functions like elem, where elem x mylist evaluates to True if x may be found as an element in mylist. While elem x mylist is fine, the meaning of x `elem` mylist seems just a little clearer.

For both operators and functions-applied-as-operators, you can specify associativity and precedence. (You can only define binary operators; all operators in Haskell are binary operators, except for unary -, as used in expressions such as -3.)

Lazy Evaluation

One of the things that many people find really cool about Haskell (not just me!), is that it follows a “lazy” or “non-strict” evaluation scheme. The value of an expression is only calculated when it is required. This has some rather interesting consequences.

One consequence is that Haskell allows you to define lists that don’t end. Among other things, this means I’ve been making our running powers example way more difficult than it needs to be. Instead of defining a function that takes some starting value and keeps doing calculations until it reaches a maximum, I can have it just keep calculating that result forever:

powers = 1 : map (*2) powers

Note that this latest powers doesn’t need any arguments at all. It’s just bound to the set of all 2n, n≥0. That’s a list without a limit; but that doesn’t matter, so long as we never try to evalute the entire list. But if I ask Haskell to give me just the first eight elements of that list (with take 8 powers), it works just fine. Haskell performs the necessary calculations to bring me the first eight elements, and doesn’t continue trying to calculate any further elements.

Infinite-length lists are fun, but when you come down to it, they’re really just neat sugar. One area where Haskell’s lazy evaluation really comes in handy is in code like:

import Char (toUpper)
main = do
            slurp <- getContents
            let upperCased = map toUpper slurp
            putStr upperCased

This is a complete program to convert the standard input to all-uppercase, and print the results to the standard output. It binds the variable slurp to the full contents from standard input as a string, then binds the variable upperCased to a string which holds the result of mapping the toUpper function against each character in slurp. Finally, the string held in upperCased is printed to the standard output using putStr.

Similar code in Perl might look like:

undef $/;
my $slurp = <STDIN>;
my $upper_cased = uc $slurp;
print $upper_cased;

The difference is that the Perl example will happily try to take a 5GB file and shove it all into memory (or, much more likely, make the attempt and choke), whereas the Haskell version, which looks nearly identical, will process input a bit at a time—how much of a bit depends on the buffering scheme, which depends on the implementation—but the point is, it’s not going to try to slurp it all into memory. It’ll process data a bit at a time, converting it to uppercase, and spitting it out. It only grabs input when it’s ready to print out the next few characters. Then it looks at what it’s printing—upperCased, and sees that it’s supposed to hold the uppercasing of slurp. It then sees that slurp is supposed to be a string holding the content from standard input, so it goes ahead and reads the next bit from standard input, uppercases it like it’s supposed to… rinse, lather, repeat.

What’s neat about this is that it frees you from having to think about the details. Your code doesn’t need to “know” that it’s dealing with input piecemeal: as far as it’s concerned, it’s just transforming a string. Naturally, Perl is just as capable as Haskell at processing input a piece at a time; the difference is that the Perl code will be very aware that it’s doing things one bit at a time, while the Haskell code “thinks” it’s just working on the whole data.

Note, the Haskell example is for illustration. A much simpler program that does the same thing is:

import Char (toUpper)
main = interact (map toUpper)

Monads

You may have noticed that the original version of the uppercasing program looked somewhat procedural-ish. The do keyword introduces a block of code that is essentially procedural; but it never for a moment violates the rules of a pure functional language. The behavior of do is defined in terms of a series of function applications and nested name-bindings. The important thing is that Haskell preserves the principle of non-destructive, immutable data, and do doesn’t let you do anything you can’t do without it; it just looks much cleaner than its equivalent.

What’s cool about the do block, though, is that it actually provides a mechanism that many people refer to as overloading the semicolon. The do block is defined in terms of a couple of operators, >> and >>=, that the user can overload in the types phe creates. All types that can be used as the types of the expression-statements in a do block, and as a do block’s own type, belong to a class called Monad.

For instance, take the Maybe parameterized type (parameterized means it takes another type as a parameter; for instance, it can be a Maybe String or a Maybe Integer). This is a type provided by the Haskell standard library, that provides a way to represent a “null”-like value. For instance, an expression that returns a Maybe String can construct return values such as Just "hello" or Just "what is your name?"; it can also use the constructor Nothing, to represent no value (often used to indicate an error condition). For instance, Haskell provides a find function that takes two arguments: a predicate function, and a list. If it finds an element a that matches the predicate in the list, it will return Just a; if it doesn’t find anything, it returns Nothing. So, find (< 2) [1 5 3 9 8] will return Just 1; but find (> 10) [1 5 3 9 8] would return Nothing.

So, what happens if you need to use the result from several successive finds? Say, to find the customer ID corresponding to a name, and then use that to find a sales record? You could end up with successive, nested if expressions to check for Nothing before proceeding. Rough pseudo-Haskell:

getSalesRecord name =
    let custId = find (== name) customers
        record = find (== custId) salesRecords
    in
        if isJust custId
            then if isJust record
                then Just record
                else Nothing
            else Nothing

(There are severe glitches in the above, but it was written for human readability, and not actual compilability.) Note that it’s safe to bind record as above, even though we haven’t yet checked isJust record, because it’s not actually evaluated until we use it, and we don’t use it until after we check.

Imagine if there were even more dependent lookups we needed to perform! The nested if expressions would quickly become unweildly. However, since the Maybe type is a Monad, we can write it with the do block, like this:

getSalesRecord name =
    do
        custId <- find (== name) customers
        record <- find (== custId) salesRecords
        return record

The Maybe type overloads the meaning of the name bindings done with <- (actually, it overloads >>=, but <- is defined in terms of >>=). If the result of the right-hand side is a Just x (for any x), it binds custId to x (stripping off the Just) for the remaining statements. If either of the two <- statements result in a Nothing value, then the remaining statements are not processed, and the result of the entire do block is Nothing.

I was at first dismayed when I discovered that Haskell’s exception mechanisms are not as powerful as one might hope. However, as it turns out, it doesn’t actually pose a problem, because you can just use Maybe and do blocks, as in the example above. The first <- binding or statement that evalutes to Nothing will short-circuit the rest of the code, and eliminates the need to manually check for error-status results after each statement!

This “semicolon-overloading” mechanism is really very powerful. The do block provides a mechanism for compactly and abstractly representing a sequence-of-statements, where a statement is either a Monad-returning expression or a name-binding; but the monadic type gets to decide what “sequencing” these things actually means.

A great example of the power of Monads is that simple lists are also Monads. But with list types, the name-binding <- notation is overloaded so that x <- someList will result in x being set to each element of someList in turn, and then all the remaining statements that appear after the <- binding statement in the same do block will be evaluated in turn, for each x. The result of the do block will be the list of each of those results. As an example, here’s a simple permuation of two lists:

example = do
    word <- ["foo", "bar"]
    num  <- [1, 2, 4]
    return (word, num)

-- evaluates to: [("foo",1),("foo",2),("foo",4),
--                ("bar",1),("bar",2),("bar",4)]

As perhaps you’re beginning to see, Monads are an incredibly powerful feature of Haskell. In fact, Monads combined with Haskell’s lazy, evaluate-as-needed behavior is what enables Haskell to be capable of elegantly dealing with interactive with user input, without sacrificing its status as a pure functional programming language. Interactivity (such as might be found in a typical text-editor) is not possible in languages that are purely functional (immutable data), and use strict (non-lazy) evaluation, because such languages require that they have all the input at the beginning before processing can even begin.

String Literals

I haven’t seen this emphasized anywhere, and it’s a small thing, but one more aspect of Haskell that strikes me as particularly elegant is the ways in which Haskell deals with string literals.

One of the things I really like, is that Haskell’s string literals can include spans of whitespace that are ignored, when they are enclosed within backslashes; this whitespace can include newlines. So you can write something like:

main = putStr " _   _      _ _                             _     _ _ \n\
              \| | | | ___| | | ____      _ _  ______ _ __| | __| | |\n\
              \| |_| |/ _ | | |/ _  |    | | |/ / _  | '__| |/ _` | |\n\
              \|  _  |  __/ | | (_) |    |   V / (_) | |  | | (_| |_|\n\
              \|_| |_|____|_|_|____( )   |_/|_/|____/|_|  |_|___,_(_)\n\
              \                    |/                                \n"

The large “Hello, World!” will be printed flush-left, but we were able to align the literal rather nicely in the Haskell source. This sort of thing can be a little more difficult in some other languages.

Also, in addition to the usual C-style \r, \n, \b, etc, and hexadecimal and octal notations (which differ slightly from C’s), Haskell also provides a \^C notation (where '\^C' represents “Control-C”), and a \ESC notation (representing the escape character, using its ASCII acronym). Thus, the ASCII backspace character can be represented as any of '\b', '\BS', '\^H, '\o33', or '\x1B'. Every one of those representations can be useful in different circumstances, especially for a terminal enthusiast and GNU Screen co-maintainer such as myself.


Well, that pretty much sums it all up. This took a lot longer to write than I really wanted it to, so maybe I’m finished talking about Haskell for a while. In presenting all the things that really impress me about Haskell, I’m perhaps painting an unrealistically rosy picture of the language. I do believe that Haskell, and its current best implementations, have some significant shortcomings (but none that actually render Haskell impractical, as is sometimes suggested); if I get around to writing a “part 3″, I’ll probably dwell on those, to counter-balance the wondrous things I covered here. However, that one might be a while in coming, as I think I’ve spent quite enough time writing about Haskell for a limited audience, for now. ;)

09.26.08

Adventures in Haskell

Posted in Software Development at 7:47 pm

So, the last few weeks I’ve been learning the Haskell programming language, and thought I’d share my thoughts on it. Haskell is a pure functional programming language that supports lazy evaluation (I’ll explain all that in a moment), which has been gaining popularity in some circles (and particularly among academia and computer lingusts). It has (semi-)famously been used to write an implementation for the Perl 6 programming language, and also a distributed revision control system (DaRCS).

I’d heard of the language a couple years ago, and have been wanting to learn it since then, but the freely available resources I’d been able to find for learning about the language were frustratingly poor, and printed books were in the range of $80 and up, which is a bit steep for learning a language as a hobby. I finally found an upcoming O’Reilly book, Real World Haskell, whose full content is available for browsing online. Amazon says the printed version’s going to be $50, which is a marked improvement over other books I’d been considering buying. The book’s not perfect—there are several minor beefs I have with it—but the fact remains that it is the highest quality resource for learning Haskell that is freely available. It also has the major advantage that it focuses heavily on real-world applications (as the name implies), including coverage of network programming, and concurrency (multi-threaded or clustered computing), writing language parsers; it even guides the reader through writing software to retrieve ISBN numbers from cell-phone photos of book backs! This probably makes it the most practical resource for learning Haskell, too—pricey or free.

I’m still working through the book, but I’ve supplemented my understanding so far by reading through the official language specification, The Haskell 98 Report, so at this point I’ve gained a pretty solid understanding of the core language and minimum libraries.

Intro to Functional Programming

(If you’re fairly familiar with functional programming, might as well skip this article and wait for part 2.)

I’ve had previous experience with functional programming languages, mainly Lisp and XSLT, which helped in trying to learn Haskell.

Lisp is one of the more popular functional programming languages , and probably the oldest. My experience with it is primarily through the Emacs-Lisp dialect. Emacs is a very powerful text editor that is popular in the Unix world (and particularly on GNU/Linux). It is mostly implemented in Emacs-Lisp, and you can alter the program’s behavior while it’s running by editing the Lisp code. It’s the first Unix editor I learned to use, and though I primarily prefer Vim these days, I’m very comfortable with Emacs (these days I run it in a vi-emulating mode called viper), and have written code in Emacs-Lisp.  I’ve also dabbled in Scheme, a Lisp dialect.

XSLT is a language for performing transformations on XML documents, and like Haskell but unlike Lisp is a “pure” functional programming language (more on that in a moment). Unfortunately, it doesn’t really provide a complete set of useful tools (version 1.0, anyway: I’m not very familiar with the XSLT 2.0 spec), so my experience with it was that writing XSLT can be an exercise in frustration.

In imperative programming languages like C, C++, BASIC, Java, and many others, the focus is on performing a series of actions: do A, then do B, etc. In imperative programming, one often deals with data in memory that can be modified through a series of actions (mutable data), and the behavior of bits of code in the program may depend on the current value of some particular piece of data in memory, which may be changed by other bits of code.

Here’s a simple example of imperative programming in C code:

int i, a[8], v=1;

for (i=0; i<8; i = i + 1) {
    a[i] = v;
    v = v * 2
}

This snippet produces an eight-element array a whose elements hold successive powers of 2 (1,2,4,8,16,32,64,128). The code amounts to a series of instructions like:

  1. set the value of v and i to 0
  2. set the ith element of the array a to the current value of v
  3. set the value of v to twice its previous value.
  4. set the value of i to one more than its previous value.
  5. if the value of i is less than 8, repeat from step 2.

Notice the heavy focus on performing a series of steps, and a dependency on modifying data.

In contrast to imperative programming, functional programming focuses on transforming input data into output data, by defining the function that performs the transformation. In a “pure” functional programming language, it is not possible to change the value of some object of data from one thing to another; in fact, there are no objects—there are only values (and functions; immutable data). This tends to place a higher focus on what the desired result is, rather than how we want to arrive at a given result.

Here’s an example of some code in Haskell that produces a result similar to the imperative snippet above:

powers i v | i >= 0 && i < 8  = v : powers (i+1) (v*2)
           | otherwise        = [] 

If you invoke the function above as powers 0 1, its result will be a list of successive powers-of-two from 1 to 128, just like for our C code. However, you may notice that the method by which we’ve arrived at that list is somewhat different. Translated to English, this code says:

  • Define the function powers(i,v) (for any v, and where 0 ≤ i < 8) to be a list whose first element has the value v, and whose remaining elements are the result of powers(i+1, v*2), or, for any i not matching the qualification just given, the empty list.

…and that’s it. When you invoke powers 0 1, it constructs the element with value 1, then invokes itself again as powers 1 2, which constructs an element whose value is 2 and invokes powers 2 4 to construct the third element with a value of 4. When it finally reaches powers 8 256, it sees that i=8 doesn’t meet the criterion that 0 ≤ i < 8, and so it caps it off with an empty list result (which, since it doesn’t invoke itself recursively again, terminates the list, and the evaluation of powers).

Functional code like the above can look cryptic to imperative programmers who aren’t used to seeing it, but it’s learned fairly quickly. It happens to correspond fairly closely to how a mathematician would formulate a function definition, so folks who are comfortable with math will tend to feel right at home.

Note that it’s just as easy to do recursion like this in C:

int *powers(int *a, int i, int v)
{
    if (i != 8) {
        a[i] = v;
        powers(i+1, v*2);
    }
}

However, in real-world C implementations, each invocation of powers will consume additional memory. It works fine for our example, where we’re limited to a total of nine calls to powers, but for longer recursions you could run out of stack space, which is not a good thing.

Note that our recursive C example still isn’t pure functional programming, as we’re modifying the values of an existing array, and not producing new values.

Disclaimer: none of the C or Haskell examples are particularly idiomatic; they’re for rough example purposes only. I’d do them differently in real life; but I felt that these examples, as written, may be a little easier to discuss, and more accessible for folks that might not be terribly familiar with either language.

To be continued (very soon)… Part 2 will discuss some of the things in Haskell that I think are really cool. :)

09.09.08

Cool Little Maze

Posted in Computers at 6:57 pm

A friend from #atheists at irc.freenode.net created this neat little interactively-revealed maze. The neat thing about it is that it is created using only HTML & CSS. No JavaScript, no programming, nothing. Just static HTML & CSS. (Note: works on most mainstream browsers except Internet Explorer.)

The only real irritation is that if the pointer slips off the trail, you have to start over. Move slowly. :)

Find the maze start at the upper left.

(See also my own maze experiment.)

07.14.08

Wget, Screen, Eseq

Posted in Software Development at 2:09 am

So, things are going pretty well with Wget. We just had our mid-term evaluations for the Google Summer of Code project. Our two GSoC students are right on-schedule with where they’d promised to be at this stage. Both of them had exams during the first portion, so the level of work they were supposed to get done was somewhat scaled down from what it would otherwise have been; still, it’s nice that there haven’t been any real difficulties, and things are coming along alright.

Also, the copyright assignment paperwork came through for a batch of changes that adds CSS support to Wget, so I’m excited about that. I haven’t gotten much done for my part, though. Been pretty busy.

I recently joined the GNU Screen project, as a co-maintainer, to help get things moving towards a next release. If you haven’t heard of Screen, I can’t really explain it in detail here: it’s sort of a special-interest thing. But, if you spend a lot of time using command-line/terminal programs, screen is a huge asset. Especially if you use them remotely.

Basically, what Screen does is act as a sort of reattachable, “pretend” terminal. I say “pretend” because, while it is a fairly full-featured vt100 terminal emulator, it’s missing a crucial component that most terminal emulators have: it doesn’t actually draw text to a screen. It interfaces with whatever “real” terminal emulator you’re running, and tells it what it should draw.

The nice thing about screen is that you can detach from it, and later reattach to it. If you lose your ssh connection, say, then you can simply log in again and attach to the same screen session you were running; none of your programs have to get killed due to a terminal hangup. Also, you can be attached to the same screen session from multiple terminals simultaneously. I do everything at work in screen; then when I come home, I can just ssh in and keep working on the same session. I can leave a build running, and then come home and check on its progress. You can even do more exotic things, like allow multiple users to use the same terminal session (mostly good for demonstrating how to do something).

My priority’s still with Wget, though, and I’ve made it clear to folks that while I’m happy to help out with organization and patch-integration, etc, I need to dedicate most of my free-time coding to Wget. I expect to be able to handle small patches and the like; but I have too much to do already with Wget. However, my work on Wget has ended up taking a sort of hiatus for a couple weeks, while I’ve been organizing some things (bug lists, mainly) at Screen; also, I’ve been spending most of that time coding a program I’ve always meant to write, but which has now become incredibly useful as a tool for debugging issues with Screen.

It’s a program that analyzes terminal escape sequences. These are special commands that are sent as part of the text stream to the terminal, to tell it to do something special. For instance, in Unix-like systems you could print a red-and-green “Merry Christmas” to the screen by issuing the command:

 $ printf '\033[32mMerry \033[31mChristmas\033[m\n'

The “gibberish” in there are the special commands that tell the terminal how to color the output. Each bit of gibberish starts with \033, which is a numeric code representing the escape character. If you run the above command, you’ll get output like:

 Merry Christmas

It’s not too bad to look at and analyze a string like the one I gave to printf above; but when you have a whole bunch of funky gobbledy-gook like that to sift through, it helps to break it down. The tool I’ve been working on lately would take that same string, and give this output:

: Esc [ 32 m
& SGR: SELECT GRAPHIC RENDITION
" Set foreground color green.
|Merry |
: Esc [ 31 m
& SGR: SELECT GRAPHIC RENDITION
" Set foreground color red.
|Christmas|
: Esc [ m
& SGR: SELECT GRAPHIC RENDITION
" Clear graphic rendition to defaults.
. LF/^J

It gives a breakdown of the text it sees, the escape sequences, what control function they represent, and what the actual effect they produce is. This makes it a lot handier to see what’s being sent to the terminal when not everything’s working as it should.

05.15.08

Travel and Laptops

Posted in Computers at 11:17 am

Renowned computer security expert Bruce Schneier has an article up at the Guardian (thanks Slashdot) about the problems of taking your laptop with you through customs.

Last month a US court ruled that border agents can search your laptop, or any other electronic device, when you’re entering the country. They can take your computer and download its entire contents, or keep it for several days. Customs and Border Patrol has not published any rules regarding this practice, and I and others have written a letter to Congress urging it to investigate and regulate this practice.

But the US is not alone. British customs agents search laptops for pornography. And there are reports on the internet of this sort of thing happening at other borders, too. You might not like it, but it’s a fact. So how do you protect yourself?

I hadn’t heard about the pornography bit before, so I did a little Googling and it looks like this mainly means pedophilic materials. Though, since it’s much easier to automatically determine whether there’s pornography of any sort on a hard drive, than it is to distinguish between “regular” and “child-flavored” porn, I think it probably means that if they find substantial porn of any sort on your hard drive, you’ll be delayed and your laptop’s disk contents will be copied, or the laptop itself retained.

Anyway, the crux of the matter isn’t that I should be relieved that I will never have to worry about custom officials finding child pornography on a laptop as I travel abroad (since I don’t ever plan to possess any), but rather the fact that they do the scan at all, and even retain the “right” to keep my laptop or copy its contents.

The vast majority of my laptop contents are publicly available material. What’s not basic software packages downloadable from packages.ubuntu.com, is probably work-in-progress on things that I code on, like Wget. But I also have things like private encryption keys on there, some of which aren’t passphrase-protected. Someone with one of those would be able to get root access to my private servers on the Net. It’s not as if I host child porn there, either, but one common thread in government snooping is that they often use one pretext as an excuse for other purposes. If the government deemed me worth investigating (for whatever reasons), they wouldn’t hesitate to take advantage of the private keys on some old copy of my hard drive to do a lot more snooping, than they have a right to.

Schneier recommends destroying the browser cache and cookies, using secure deletion software to delete anything sensitive that you can, and using encrypted partitions or USB drives for the things you can’t do without (curiously, steganography wasn’t mentioned: I’d have thought this an ideal application).

It seems to me, though, far simpler to swap your normal laptop hard-drive with a “travel suitable” one, one that just has your necessities installed over a fresh new disk. Of course, this still doesn’t solve the problem of having sensitive-but-indespensible materials, for which you’d still want encrypted (and probably stealthed) partitions or USB drives.

While we’re on the subject of laptops and travel, note too that there are restrictions on packing lithium batteries and devices that contain them (in checked luggage: “there is generally no restriction on the number of spare batteries allowed in carry-on baggage“). (I saw this too on Slashdot first.)

04.23.08

Wget, GSoC, …

Posted in Software Development at 3:12 pm

It’s been quite a while since I’ve posted last. I’ve got quite a few posts I’ve been wanting to write, but I’ve been pretty insanely busy lately.

Aside from my new Wii (which I plan to post about again, later), my free time has been completely monopolized by my role as maintainer for GNU Wget. The main reason for my the upsurge in my activity related to this project, is that we’re participating in Google’s Summer of Code program this year.

For those who aren’t familiar with it, the Summer of Code (SoC or GSoC) program is an avenue whereby Google spends large chunks of money on fostering an interest in Free and Open Source Software among university student software developers. (See Google’s stated goals for GSoC.)

What happens is, a number of organizations and FOSS projects apply to join the program, and present lists of ideas for projects that students could take on during the summer. A number of students apply to join the program, and submit proposals for projects that they would like to do over the course of the summer. The organizations choose the project proposals they like best, and rank them in order of preference.

Google then decides how many students each organization will get, the organization communicates which students have the most interesting proposals, and—here’s the fun part—Google then pays each student a stipend of $4,500 to work full-time on the project over the summer. (The organization is also given $500 per project on which it has coached a student.)

I had been hearing of GSoC for some time, but had never really understood what it was. An interested student, however (Julien Buty), strongly encouraged me to participate in the program this year (as he wanted to apply for a project with Wget), and in fact got the ball rolling for me. I’m pleased to say that he wound up being accepted, and will now be paid by Google to work on Wget, to improve its handling of authentication over HTTP. One additional student was accepted, Saint Xavier, has also been accepted, and will be working on functionality related to internationalized domain names and web addresses.

This has brought in a real surge of developer interest in Wget, which is very welcome indeed. Up until now, the only active developer on Wget has been myself. Despite Wget being very ubiquitous in the Unix world, and used on millions of installations, it has recently had no real community to speak of. The mailing list has had only a handful of participants, and there are no active developers (sometimes to include myself—I have a day job, ya know!), only occasional patch submitters. But, even though we posted up our “ideas” page less than a week before the start of the student application process, we quickly began to see an influx of interested developers. In fact, along with GRUB, the GNU system bootloader, Wget proposals dominated the applications submitted for the GNU Project.

This ended up translating into a lot of work for me, though, because suddenly a lot of my time was being taken up responding to student questions, critiquing student proposals and giving advice on how to improve them.… Several projects needed to be specified in much greater detail before they could become a useful target for students to apply for, so I wound up spending a lot of time typing up rough specs, and discussing implementation approaches, as well.

While we were only able to choose two to participate through GSoC (which, in itself, was a happy surprise, as through most of the process we expected to get only one), several of the students whose proposals didn’t make the cut have continued on with the project anyway, because they’re interested in contributing and eager to learn and gain experience in the Free Software community.

An approximately equal number of contributors have also recently joined up outside of GSoC, thanks to Saint Xavier’s encouragement that I post a “help wanted” ad through GNU’s Savannah software development portal. I didn’t really think it’d grab much attention, especially as I knew that these ads were automatically closed after two weeks. Boy was I wrong! I got a new developer every day for the first four or five days after I posted.

Assuming it doesn’t fizzle out (it’s early to tell whether everyone will keep their enthusiasm for Wget over the long term), all this additional help means that I can actually realistically think about releasing version 1.12 before the end of the year, which otherwise would have been unlikely. I’m very excited about this, because there are a lot of features I’m going to be very happy to have. Julien and Saint Xavier are both working on pieces that are very high prorities for me for the 1.12 release, and I’m excited that updating Ted Mielczarek’s addition of CSS support to Wget was much easier than I’d hoped.

Perhaps soon, I’ll post an article that gives a better idea of what my pet project actually is, and why it’s so durn useful (as well as what its current shortcomings are, and what I hope to do in the future).

03.11.08

Donkey Kong and (Not) Me

Posted in Software Development, Video Games at 4:54 pm

Slashdot posts an enjoyable read, on one man’s experiences coding videogame carts for Atari in the early 80s. My favorite quote:

Code should both entertain and educate.

02.13.08

On Passwords

Posted in Computers at 6:36 pm

So, I thought maybe I’d spend a little time discussing password authentication. Skip to the end if you just want to see good and bad ways to come up with passwords.

An early bit of computer security reading that made an impact on me while I was learning the ropes as Ye Company Computer Fellow at The Adams Group, was Foiling the Cracker: A Survey of, and Improvements to, Password Security, by Daniel V. Klein. Based on research from 1989, the limits of computing power had already dramatically increased by the time I got my hands on it, and yet even now, nearly two decades later, the cautions and advice from this paper have already proved to age remarkably well.

In conducting his research for this paper, Mr Klein collected roughly 15,000 encrypted password hashes (from actual user accounts), and attempted to recover the original passwords via “brute force”.

An “encrypted password hash” is a unique, mathematical value that is generated from a user’s password, and stored for the purpose of later authenticating the user by verifying that phe knows per password. When the user enters the password, the very same mathematical transformation is performed, and the result is compared to the stored value. If they match, the password is the same (well, to be more precise, the password has only a one in millions-times-millions-times-millions-times… chance of being different).

The advantage of doing it this way instead of just saving the passwords themselves, is that if someone were to recover the file which contained all the passwords, they suddenly have access to every account represented in the file; whereas if only the encrypted hash is stored, all they have is a bunch of useless mathematical values, represented as strings of garbage text. There is no way to take the hash, and transform it back into the original password (for this reason, they are often called “one-way hashes”). The only thing you can do with a hash is to compare it to other hashes you can generate (from guessing what the password might be), to see if you’ve found the user’s password. (This tends to be faster and safer, though, than just trying the passwords directly on the system with which you’re trying to authenticate, as many systems have built-in time delays, or don’t let you try more than a few passwords in a given amount of time, and log every attempt for later forensic analysis.)

And that’s called a “brute force” password attack. When you take a few tens of thousands of your favorite password candidates, run it through the hash algorithm, and see if any of them match the hashes you have. If any do, you note down the passwords they came from and which accounts they belong to—you’ve just hacked them!

So Mr Klein got a large number of passwords, and ran a computer (or possibly more than one, I’m not sure) to just chug along, trying out passwords from a large dictionary he’d created of some couple-million passwords to try out (about 60,000 base passwords, the rest are various permutations and transformations of those). In a week’s time, he’d recovered more than 1 of every 5 passwords (3000 passwords). He recovered 368 passwords in just the first 15 minutes!

The very first thing that would be tried against a password, was 130 variations on the account name itself. A user named “Micah J. Cowan”, with a username of mrdude, would get password attempts like mrdude, mrdude0, mrdude1, mrdude123, mjc, mjcmjc, mcowan, MCowan, hacim, micahc, MjccjM, MICAH-COWAN, (mrdude), CowanM, etc. This is actually the technique that fetched him the 368 passwords in his first 15 minutes of processing. Ouch!

Other things that would be tried, were dictionary words. And not just Meriam-Webster. A relatively exhaustive dictionary of a large number of words: people names (real and fictional), place names, foreign-language words, words from the King James Bible, offensive words and phrases, etc, etc. Variations on all these words would also be checked, such as replacing letters with similar-looking digits (o -> 0, “ell” -> 1, z -> 2, etc); various capitalizations (“mIchael”, “miChael”, “MichAel”, etc); spelling them backwards, etc.

Thought you were clever with your password of “fylgjas” (guardian creatures from Norse mythology)? Or the Chinese word for “hen-pecked husband”? Think again—he caught ‘em.

In addition to the techniques Klein describes in his paper, modern, readily-available brute-force password-crackers will also support things like exhaustive searches of all combinations of letters and numbers up through around six characters. Exhaustive searches of all combinations of all possible characters are also possible, but take a lot more time.

On the other hand, what with the power of large computer clusters, and cracker “bot-nets”, given a little time, attackers can readily search exhaustively for passwords of several characters longer than was previously practical. In fact, computer security expert Bruce Schneier has a more up-to-date description of password cracking software designed to run on computer networks, and advice on what passwords are easily cracked, and how to choose safe ones. These days, good cracking software typically recover over half of the passwords given it, rather than just the ~25% that Klein managed after a year’s worth of CPU time.

So, to close up, passwords that everyone should be avoiding, for any system they care about, are:

  • Any password shorter than eight characters. Passwords of arbitrary strings of letters and numbers up to six or 7 characters can be exhaustively searched given enough time and resources (32 CPU years were adequate in the days of Klein’s article: that sounds like a lot until you run into someone with a 128-CPU cluster and a few months to spare). Throwing in some punctuation marks will help for shorter strings, but really you’re best-off going for at least eight. And, don’t forget, if 7 characters is just within- or without-reach, where will it be in a few years, given the exponential growth of computer power?
  • Single words or names, no matter what language they’re from, or how you modify them. Write ‘em backwards, add some numbers at the end, use funky capitalization: it doesn’t matter. If they can exist in a list somewhere, a password cracker can guess it.
  • My God, man, don’t ever pick a password based on your name, your account information, your girlfriend’s name, etc. You’re better avoiding your birthday or anniversary, too: these things can be exhaustively searched faster than you can blink.
  • Never use the same password for more than one site.

Practices that are recommended for choosing secure passwords include:

  • Building it from the initial letter of each word in a phrase: To be or not to be, that is the question becomes Tbontbtitq. This would be improved by using numbers in some spots, perhaps capitalizing an extra letter or two, and leaving in or adding in additional punctuation: 2Br!2b,tit?. (note the substitution of the letter r for or, ! for not, and ? for question). This technique can easily be used to produce random-looking passwords which are very hard to brute-force or guess. However, be careful not to choose easily-guessed phrases as the basis for your password; for example, the above phrase was intended only as an example. It is far too widely recognized to make a good basis for a password; I wouldn’t be at all surprised to discover there were password dictionaries out there that already have both Tbontbtitq and 2Br!2b,tit?. in them, along with other variations. John 3:16 makes another example of an attrociously poor choice for password derivation. The best would be to choose a phrase or sentence from a random spot in a relatively obscure book. For instance, flipping open my copy of Advanced Programming in the UNIX Environment, I find “Every process has six or more IDs associated with it.” That could be made into a decent password (though not any more, obviously, now that I’ve mentioned doing so).
  • Another good technique is to use two or three regular words together, especially if you use punctuation marks to separate the words; e.g., hooky$preheroic. This can make for easily-memorized, but hard-to-guess/bruteforce passwords. As already mentioned, single words, even with a large number of variations, make for easily-cracked passwords; but multiple-word passwords exponentially increase the difficulty of brute-forcing them. That’s assuming that you pick fairly random words, particularly, words that are random with respect to one another, and to yourself. For instance, tootie and frootie, or guitar and music, make horrible words to pair. And, if you know that I play piano and love Coca-Cola, even the three-word password coke-fiend-pianist may not be too much of a stretch for you. ;)