Category Archives: Computers

Relating to computers and technology

Wget, Screen, Eseq

So, things are going pretty well with Wget. We just had our mid-term evaluations for the Google Summer of Code project. Our two GSoC students are right on-schedule with where they’d promised to be at this stage. Both of them had exams during the first portion, so the level of work they were supposed to get done was somewhat scaled down from what it would otherwise have been; still, it’s nice that there haven’t been any real difficulties, and things are coming along alright.

Also, the copyright assignment paperwork came through for a batch of changes that adds CSS support to Wget, so I’m excited about that. I haven’t gotten much done for my part, though. Been pretty busy.

I recently joined the GNU Screen project, as a co-maintainer, to help get things moving towards a next release. If you haven’t heard of Screen, I can’t really explain it in detail here: it’s sort of a special-interest thing. But, if you spend a lot of time using command-line/terminal programs, screen is a huge asset. Especially if you use them remotely.

Basically, what Screen does is act as a sort of reattachable, “pretend” terminal. I say “pretend” because, while it is a fairly full-featured vt100 terminal emulator, it’s missing a crucial component that most terminal emulators have: it doesn’t actually draw text to a screen. It interfaces with whatever “real” terminal emulator you’re running, and tells it what it should draw.

The nice thing about screen is that you can detach from it, and later reattach to it. If you lose your ssh connection, say, then you can simply log in again and attach to the same screen session you were running; none of your programs have to get killed due to a terminal hangup. Also, you can be attached to the same screen session from multiple terminals simultaneously. I do everything at work in screen; then when I come home, I can just ssh in and keep working on the same session. I can leave a build running, and then come home and check on its progress. You can even do more exotic things, like allow multiple users to use the same terminal session (mostly good for demonstrating how to do something).

My priority’s still with Wget, though, and I’ve made it clear to folks that while I’m happy to help out with organization and patch-integration, etc, I need to dedicate most of my free-time coding to Wget. I expect to be able to handle small patches and the like; but I have too much to do already with Wget. However, my work on Wget has ended up taking a sort of hiatus for a couple weeks, while I’ve been organizing some things (bug lists, mainly) at Screen; also, I’ve been spending most of that time coding a program I’ve always meant to write, but which has now become incredibly useful as a tool for debugging issues with Screen.

It’s a program that analyzes terminal escape sequences. These are special commands that are sent as part of the text stream to the terminal, to tell it to do something special. For instance, in Unix-like systems you could print a red-and-green “Merry Christmas” to the screen by issuing the command:

The “gibberish” in there are the special commands that tell the terminal how to color the output. Each bit of gibberish starts with \033, which is a numeric code representing the escape character. If you run the above command, you’ll get output like:

It’s not too bad to look at and analyze a string like the one I gave to printf above; but when you have a whole bunch of funky gobbledy-gook like that to sift through, it helps to break it down. The tool I’ve been working on lately would take that same string, and give this output:

It gives a breakdown of the text it sees, the escape sequences, what control function they represent, and what the actual effect they produce is. This makes it a lot handier to see what’s being sent to the terminal when not everything’s working as it should.

Travel and Laptops

Renowned computer security expert Bruce Schneier has an article up at the Guardian (thanks Slashdot) about the problems of taking your laptop with you through customs.

Last month a US court ruled that border agents can search your laptop, or any other electronic device, when you’re entering the country. They can take your computer and download its entire contents, or keep it for several days. Customs and Border Patrol has not published any rules regarding this practice, and I and others have written a letter to Congress urging it to investigate and regulate this practice.

But the US is not alone. British customs agents search laptops for pornography. And there are reports on the internet of this sort of thing happening at other borders, too. You might not like it, but it’s a fact. So how do you protect yourself?

I hadn’t heard about the pornography bit before, so I did a little Googling and it looks like this mainly means pedophilic materials. Though, since it’s much easier to automatically determine whether there’s pornography of any sort on a hard drive, than it is to distinguish between “regular” and “child-flavored” porn, I think it probably means that if they find substantial porn of any sort on your hard drive, you’ll be delayed and your laptop’s disk contents will be copied, or the laptop itself retained.

Anyway, the crux of the matter isn’t that I should be relieved that I will never have to worry about custom officials finding child pornography on a laptop as I travel abroad (since I don’t ever plan to possess any), but rather the fact that they do the scan at all, and even retain the “right” to keep my laptop or copy its contents.

The vast majority of my laptop contents are publicly available material. What’s not basic software packages downloadable from packages.ubuntu.com, is probably work-in-progress on things that I code on, like Wget. But I also have things like private encryption keys on there, some of which aren’t passphrase-protected. Someone with one of those would be able to get root access to my private servers on the Net. It’s not as if I host child porn there, either, but one common thread in government snooping is that they often use one pretext as an excuse for other purposes. If the government deemed me worth investigating (for whatever reasons), they wouldn’t hesitate to take advantage of the private keys on some old copy of my hard drive to do a lot more snooping, than they have a right to.

Schneier recommends destroying the browser cache and cookies, using secure deletion software to delete anything sensitive that you can, and using encrypted partitions or USB drives for the things you can’t do without (curiously, steganography wasn’t mentioned: I’d have thought this an ideal application).

It seems to me, though, far simpler to swap your normal laptop hard-drive with a “travel suitable” one, one that just has your necessities installed over a fresh new disk. Of course, this still doesn’t solve the problem of having sensitive-but-indespensible materials, for which you’d still want encrypted (and probably stealthed) partitions or USB drives.

While we’re on the subject of laptops and travel, note too that there are restrictions on packing lithium batteries and devices that contain them (in checked luggage: “there is generally no restriction on the number of spare batteries allowed in carry-on baggage“). (I saw this too on Slashdot first.)

Wget, GSoC, …

It’s been quite a while since I’ve posted last. I’ve got quite a few posts I’ve been wanting to write, but I’ve been pretty insanely busy lately.

Aside from my new Wii (which I plan to post about again, later), my free time has been completely monopolized by my role as maintainer for GNU Wget. The main reason for my the upsurge in my activity related to this project, is that we’re participating in Google’s Summer of Code program this year.

For those who aren’t familiar with it, the Summer of Code (SoC or GSoC) program is an avenue whereby Google spends large chunks of money on fostering an interest in Free and Open Source Software among university student software developers. (See Google’s stated goals for GSoC.)

What happens is, a number of organizations and FOSS projects apply to join the program, and present lists of ideas for projects that students could take on during the summer. A number of students apply to join the program, and submit proposals for projects that they would like to do over the course of the summer. The organizations choose the project proposals they like best, and rank them in order of preference.

Google then decides how many students each organization will get, the organization communicates which students have the most interesting proposals, and—here’s the fun part—Google then pays each student a stipend of $4,500 to work full-time on the project over the summer. (The organization is also given $500 per project on which it has coached a student.)

I had been hearing of GSoC for some time, but had never really understood what it was. An interested student, however (Julien Buty), strongly encouraged me to participate in the program this year (as he wanted to apply for a project with Wget), and in fact got the ball rolling for me. I’m pleased to say that he wound up being accepted, and will now be paid by Google to work on Wget, to improve its handling of authentication over HTTP. One additional student was accepted, Saint Xavier, has also been accepted, and will be working on functionality related to internationalized domain names and web addresses.

This has brought in a real surge of developer interest in Wget, which is very welcome indeed. Up until now, the only active developer on Wget has been myself. Despite Wget being very ubiquitous in the Unix world, and used on millions of installations, it has recently had no real community to speak of. The mailing list has had only a handful of participants, and there are no active developers (sometimes to include myself—I have a day job, ya know!), only occasional patch submitters. But, even though we posted up our “ideas” page less than a week before the start of the student application process, we quickly began to see an influx of interested developers. In fact, along with GRUB, the GNU system bootloader, Wget proposals dominated the applications submitted for the GNU Project.

This ended up translating into a lot of work for me, though, because suddenly a lot of my time was being taken up responding to student questions, critiquing student proposals and giving advice on how to improve them.… Several projects needed to be specified in much greater detail before they could become a useful target for students to apply for, so I wound up spending a lot of time typing up rough specs, and discussing implementation approaches, as well.

While we were only able to choose two to participate through GSoC (which, in itself, was a happy surprise, as through most of the process we expected to get only one), several of the students whose proposals didn’t make the cut have continued on with the project anyway, because they’re interested in contributing and eager to learn and gain experience in the Free Software community.

An approximately equal number of contributors have also recently joined up outside of GSoC, thanks to Saint Xavier’s encouragement that I post a “help wanted” ad through GNU’s Savannah software development portal. I didn’t really think it’d grab much attention, especially as I knew that these ads were automatically closed after two weeks. Boy was I wrong! I got a new developer every day for the first four or five days after I posted.

Assuming it doesn’t fizzle out (it’s early to tell whether everyone will keep their enthusiasm for Wget over the long term), all this additional help means that I can actually realistically think about releasing version 1.12 before the end of the year, which otherwise would have been unlikely. I’m very excited about this, because there are a lot of features I’m going to be very happy to have. Julien and Saint Xavier are both working on pieces that are very high prorities for me for the 1.12 release, and I’m excited that updating Ted Mielczarek’s addition of CSS support to Wget was much easier than I’d hoped.

Perhaps soon, I’ll post an article that gives a better idea of what my pet project actually is, and why it’s so durn useful (as well as what its current shortcomings are, and what I hope to do in the future).

On Passwords

So, I thought maybe I’d spend a little time discussing password authentication. Skip to the end if you just want to see good and bad ways to come up with passwords.

An early bit of computer security reading that made an impact on me while I was learning the ropes as Ye Company Computer Fellow at The Adams Group, was Foiling the Cracker: A Survey of, and Improvements to, Password Security, by Daniel V. Klein. Based on research from 1989, the limits of computing power had already dramatically increased by the time I got my hands on it, and yet even now, nearly two decades later, the cautions and advice from this paper have already proved to age remarkably well.

In conducting his research for this paper, Mr Klein collected roughly 15,000 encrypted password hashes (from actual user accounts), and attempted to recover the original passwords via “brute force”.

An “encrypted password hash” is a unique, mathematical value that is generated from a user’s password, and stored for the purpose of later authenticating the user by verifying that phe knows per password. When the user enters the password, the very same mathematical transformation is performed, and the result is compared to the stored value. If they match, the password is the same (well, to be more precise, the password has only a one in millions-times-millions-times-millions-times… chance of being different).

The advantage of doing it this way instead of just saving the passwords themselves, is that if someone were to recover the file which contained all the passwords, they suddenly have access to every account represented in the file; whereas if only the encrypted hash is stored, all they have is a bunch of useless mathematical values, represented as strings of garbage text. There is no way to take the hash, and transform it back into the original password (for this reason, they are often called “one-way hashes”). The only thing you can do with a hash is to compare it to other hashes you can generate (from guessing what the password might be), to see if you’ve found the user’s password. (This tends to be faster and safer, though, than just trying the passwords directly on the system with which you’re trying to authenticate, as many systems have built-in time delays, or don’t let you try more than a few passwords in a given amount of time, and log every attempt for later forensic analysis.)

And that’s called a “brute force” password attack. When you take a few tens of thousands of your favorite password candidates, run it through the hash algorithm, and see if any of them match the hashes you have. If any do, you note down the passwords they came from and which accounts they belong to—you’ve just hacked them!

So Mr Klein got a large number of passwords, and ran a computer (or possibly more than one, I’m not sure) to just chug along, trying out passwords from a large dictionary he’d created of some couple-million passwords to try out (about 60,000 base passwords, the rest are various permutations and transformations of those). In a week’s time, he’d recovered more than 1 of every 5 passwords (3000 passwords). He recovered 368 passwords in just the first 15 minutes!

The very first thing that would be tried against a password, was 130 variations on the account name itself. A user named “Micah J. Cowan”, with a username of mrdude, would get password attempts like mrdude, mrdude0, mrdude1, mrdude123, mjc, mjcmjc, mcowan, MCowan, hacim, micahc, MjccjM, MICAH-COWAN, (mrdude), CowanM, etc. This is actually the technique that fetched him the 368 passwords in his first 15 minutes of processing. Ouch!

Other things that would be tried, were dictionary words. And not just Meriam-Webster. A relatively exhaustive dictionary of a large number of words: people names (real and fictional), place names, foreign-language words, words from the King James Bible, offensive words and phrases, etc, etc. Variations on all these words would also be checked, such as replacing letters with similar-looking digits (o -> 0, “ell” -> 1, z -> 2, etc); various capitalizations (“mIchael”, “miChael”, “MichAel”, etc); spelling them backwards, etc.

Thought you were clever with your password of “fylgjas” (guardian creatures from Norse mythology)? Or the Chinese word for “hen-pecked husband”? Think again—he caught ’em.

In addition to the techniques Klein describes in his paper, modern, readily-available brute-force password-crackers will also support things like exhaustive searches of all combinations of letters and numbers up through around six characters. Exhaustive searches of all combinations of all possible characters are also possible, but take a lot more time.

On the other hand, what with the power of large computer clusters, and cracker “bot-nets”, given a little time, attackers can readily search exhaustively for passwords of several characters longer than was previously practical. In fact, computer security expert Bruce Schneier has a more up-to-date description of password cracking software designed to run on computer networks, and advice on what passwords are easily cracked, and how to choose safe ones. These days, good cracking software typically recover over half of the passwords given it, rather than just the ~25% that Klein managed after a year’s worth of CPU time.

So, to close up, passwords that everyone should be avoiding, for any system they care about, are:

  • Any password shorter than eight characters. Passwords of arbitrary strings of letters and numbers up to six or 7 characters can be exhaustively searched given enough time and resources (32 CPU years were adequate in the days of Klein’s article: that sounds like a lot until you run into someone with a 128-CPU cluster and a few months to spare). Throwing in some punctuation marks will help for shorter strings, but really you’re best-off going for at least eight. And, don’t forget, if 7 characters is just within- or without-reach, where will it be in a few years, given the exponential growth of computer power?
  • Single words or names, no matter what language they’re from, or how you modify them. Write ’em backwards, add some numbers at the end, use funky capitalization: it doesn’t matter. If they can exist in a list somewhere, a password cracker can guess it.
  • My God, man, don’t ever pick a password based on your name, your account information, your girlfriend’s name, etc. You’re better avoiding your birthday or anniversary, too: these things can be exhaustively searched faster than you can blink.
  • Never use the same password for more than one site.

Practices that are recommended for choosing secure passwords include:

  • Building it from the initial letter of each word in a phrase: To be or not to be, that is the question becomes Tbontbtitq. This would be improved by using numbers in some spots, perhaps capitalizing an extra letter or two, and leaving in or adding in additional punctuation: 2Br!2b,tit?. (note the substitution of the letter r for or, ! for not, and ? for question). This technique can easily be used to produce random-looking passwords which are very hard to brute-force or guess. However, be careful not to choose easily-guessed phrases as the basis for your password; for example, the above phrase was intended only as an example. It is far too widely recognized to make a good basis for a password; I wouldn’t be at all surprised to discover there were password dictionaries out there that already have both Tbontbtitq and 2Br!2b,tit?. in them, along with other variations. John 3:16 makes another example of an attrociously poor choice for password derivation. The best would be to choose a phrase or sentence from a random spot in a relatively obscure book. For instance, flipping open my copy of Advanced Programming in the UNIX Environment, I find “Every process has six or more IDs associated with it.” That could be made into a decent password (though not any more, obviously, now that I’ve mentioned doing so).
  • Another good technique is to use two or three regular words together, especially if you use punctuation marks to separate the words; e.g., hooky$preheroic. This can make for easily-memorized, but hard-to-guess/bruteforce passwords. As already mentioned, single words, even with a large number of variations, make for easily-cracked passwords; but multiple-word passwords exponentially increase the difficulty of brute-forcing them. That’s assuming that you pick fairly random words, particularly, words that are random with respect to one another, and to yourself. For instance, tootie and frootie, or guitar and music, make horrible words to pair. And, if you know that I play piano and love Coca-Cola, even the three-word password coke-fiend-pianist may not be too much of a stretch for you. 😉

Source Code to MULTICS Released!

Seen on Slashdot. MIT has released the source code for the MULTICS operating system, ancestor to UNIX and its derivatives such as GNU, Linux, Minix, and the various BSDs.

MULTICS was born in the days when the concept of time-sharing was the big new thing. Before time-sharing operating systems came on the scene, computers and their operating software were designed to run one program at a time. At that time, computers tended to be big hulking machines that (together with the necessary air-conditioning equipment) filled entire rooms. Since different users couldn’t interact with the computer at the same time, users typically weren’t allowed to interact directly with the computer at all. Instead, users submitted “jobs” for the computers to perform, to system operators who would arrange for the jobs to be run in a “batch”, and then at some point the user would come to fetch the printed output from the job they had submitted. Naturally, this could be very frustrating for users who would submit a program to be run, only to discover hours later that a small bug in their code had prevented it from doing its job.

Time-sharing was born out of the observations that: (1) direct, interactive use of a computer was a far less-frustrating experience than the batch-processing style of non-interactive computer use; and (2) a very large portion of the time spent interacting with a computer is actually wasted on the computer sitting idly, waiting for further input from the user. System designers reasoned that the “idle time” from one user could be put to good use in servicing the requests of another user, and thus time-sharing was born. Of course, it was no time at all until companies began to sell access time on time-sharing machines to users who were willing to pay for a few hours’ access to a mainframe via a terminal.

The legacy of time-sharing is seen today, even in computer systems designed primarily for single-user-at-a-time use, in such technologies as what is usually termed multitasking or multiprocessing (fifteen years ago, these concepts were a really big deal, as most personal computer systems lacked them; now, they are an essential—and assumed—part of every major operating system). The fact that modern machines can host multiple different network services (such as web, ftp, mail and the like) to potentially hundreds of users at a time, is due to earlier developments of time-sharing.

Wget Stuff

Yeah, well my blog posts have dropped off drastically lately. It’s probably mostly due to my recently having taken on maintainership of GNU Wget. I’ll give an update on what I’ve been doing with that.

Bug tracking

One of the first things I did was to go through the TODO list on the Wget repository, and issue reports on the mailing lists, and transfer it all to a bug tracker, so we can actually see what needs fixing, and what we’ve fixed, and keep it all semi-organized somehow. In spite of reservations, I decided to move them all into GNU Savannah’s bug tracker, because Wget already has a presence on Savannah, so it would require a minimum of setup. On the other hand, as I was already well, Savannah’s interface positively sucked. It’s cumbersome to set up bug submission form fields, it’s cumbersome to arrange their order, it’s cumbersome to search for specific kinds of bugs, …but at the end of the day, it does the minimum that I decided I needed, and required relatively little setup. Maybe someday we’ll move to Bugzilla… *shrug*

Moving the repository

Another thing I did pretty soon after taking ownership of Wget, was to move hosting of the Wget source code repository from dotsrc.org (formerly known as sunsite.dk; they still host our primary mailing list) to my own VPS (the one this blog runs on), under the domain addictivecode.org. I didn’t do this just because I’m a control-freak and want absolute power over everything (though this may be the case 😉 ), but after several weeks of trying to get the attention of the dotsrc staff so I could get commit access to the repository (and actually freakin’ write code for the project I was supposedly maintaining), I decided enough was enough, and used svnsync to create an identical copy of the Subversion source code repository, so I can give myself access. 🙂

New mailing lists

Another motivation for moving the repository was that I desired to have a mailing list for receiving commit notification, so everyone who’s interested can see what development is going on. Mauro Tortonesi, the previous Wget maintainer, related that he’d tried to get the dotsrc staff to put such a thing in place, but to no avail. So, I created a list for this purpose, which also receives bug report change notifications from Savannah; and another very low traffic one for communication between just the developers who have commit access.

The Wget Wgiki

Next was to complete the migration from a web presence at dotsrc.org to gnu.org. The original plan was to have the entire web presence hosted at the gnu.org site; however, at the same time, I was scheming about putting a wiki in place for collobarative definitions of specifications for future major improvements to Wget. When I finally got around to slapping MoinMoin onto my server (which I chose primarily because of familiarity due to my involvement with Ubuntu), I began to realize just how much better it would be to host as much of our main informational content on the wiki. So, the end result is that the dotsrc site no longer exists (or, more accurately, redirects to the GNU site); and the GNU site is a basic informational stub, that points to the new wiki site (dubbed The Wget Wgiki), which holds all the real information.

Development schedule planning

Another thing I started doing early on was to draw up a project plan (Gantt chart) to try and target when we would release the next version of GNU Wget, 1.11. Since it was pretty much just me and Mauro doing active development—who both have day jobs—I tried to be extremely generous with the amount of time it would take us to get things done. Wound up with a target of September 15. I’m confident we would’ve made it, too: we were on-target in terms of development, but there ended up being some legal issues with the new version 3 of the GNU GPL, and the exemption Wget needs to make to allow linking with OpenSSL, an incompatibly-licensed library that handles encryption for things like HTTPS. We’re still waiting for the final concensus from the FSF legal time.

At the moment, we’re not code-ready anyway; but we would’ve been if we hadn’t been somewhat demotivated by the fact that our code-readiness or lack thereof isn’t going to impact when we can release. I chose to work more on the wiki instead of code at that point, and on evaluating decentralized SCMs as potential replacements for Subversion. Now that I’m doing most work on a laptop, a DSCM is convenient. So far, Mercurial seems like a good bet, but we’re still discussing it on the list. Several folks prefer Git, but Git seems to be heavily Unix-centric, with limited support for other platforms; given that Wget is also used on other platforms, there seems to be some merit in preferring a more multiplatform solution; but we’ll see.

GNU Maintainer Me

This morning I was officially appointed as the maintainer of GNU Wget (one of the tasks on my to-do list is to update that page to the current, more modern GNU look, but for now, I’ve at least got my name on it 🙂 ). Wget is a very versatile command-line application for fetching web files, and can be used to grab local copies of web sites, or sections of web sites. I’ve used it many times for a variety of reasons; quick fetch of a web file to disk, grabbing a portion of a website so I can view it offline, web debugging….

It’s a fairly high-profile tool in the GNU and Unix worlds, so I’m proud to be able to be a part of it. It will be a big time investment, which I never have a whole lot of, but I am very, very motivated. I spent the weekend categorizing and prioritizing the things that need to be done on wget, so I have a fairly solid idea of what needs to be accomplished.

Here’s the announcement on the wget mailing list.

Music Video on an Apple ][

A well-done ancient-tech video for a well-done song.

I’ve really been lusting for some old-school computers lately, like an Apple II or a Commodore 64. I nearly bought a Commodore 64 this past weekend; when I showed up to buy it, it turned out to have a graphical glitch, and showed randomly cycling red, green and blue colors instead of white (I think there was blue… it looked like black).

I think I’d rather have an Apple II, anyway. Specifically, a //c or a late-model //e. Ebay has them for reasonable prices; though shipping gets expensive when a monitor’s involved. Hopefully I can find a seller in the Bay Area at some point.

Kiss My Optimus Maximus

(Sorry, couldn’t resist. ;-))

A while back I mentioned this keyboard that had been generating a lot of buzz for a while, and which I totally wanted: the Optimus Maximus keyboard. Every key is an array of OLEDs, whose image is customizable, programmable, animatable, etc. You could have the entire keyboard change its look depending on what use you’re putting it to: have function keys switch their appearance to indicate what they do; have the glyphs on the keys change depending on what character set you’re typing in (Cyrillic? Japanese Kana?), etc.

Anyway, it was recently made available for preorder (already sold out). But at around this time I discovered its going price, at over $1500. A bit out of my price range; looks like I’m going to have to go without one of these! :-/