Hoylen's Weblog

Fri, 28 May 2010

Newton's Y2K10 problem

The Apple iPad was released in Australia (as well as many other countries) today, but it is interesting to remember its ancestor--the Apple Newton. The Apple Newton was the original device that created the term Personal Digital Assistant (PDA), and many of the concepts in the iPad were already present in the Newton.

The Newton had its own date problem. The Apple Newton represented integers as signed 30-bit numbers. The processor does use 32-bit words Like nearly all modern microprocessors do, but the Newton uses two bits from it for its own housekeeping. The clock on the Newton represents time as the number of seconds past 1 January 1993 as an integer. Since the maximum integer is 2^29, the maximum time the Newton could represent is 2010-01-05 18:48:31.

It is an interesting coincidence that the iPad was released in the same year that the Newton's clock ends. The iPad was announced on 27 January 2010, just 22 days after the Newton's clock reached its end. However, enterprising Newton programmers have created hacks to extend the Newton's clock.

The Apple Newton was a very innovative platform, with a unique and powerful object-oriented data model. Have a look at the Newton's technical documentation to see how it worked and why they deliberately made integers 30-bits long.

Wed, 19 May 2010

Colour TV myths

Tempted to buy a new fancy flat-screen TV? Don't be.

Read the Goodbye CRT article that was published in IEEE Spectrum. The article was written back in 2006, but its description of the pros and cons of plasma and LCD is still very good. As you would expect from IEEE, it gets the technical details right and without marketing bias. Read through the article and you will learn why the classical Cathode Ray Tube (CRT) is still the best quality picture for your money.

If you still want to get a new TV, then ignore the specifications the manufacturers publish. The Display Myths Shattered: How Monitor Companies Cook Their Specs article tells us they are all grossly exaggerated and really don't indicate how good the display is for actually watching videos.

As for me, I'm sticking with my CRT!

Update: the surface conduction electron emitter display (SED) technology, described in the 2006 IEEE Spectrun article, is dead. In August 2010, the commercial development of SED was shut down.

Thu, 13 May 2010

How good is facial recognition technolgy?

I recently renewed my passport and got one of the new ePassports which allows me to use the SmartGate passport control. This system uses facial recognition technology for self-processing at the border control.

How accurate is facial recognition techology? It is far from perfect, but it is very useful when used correctly in the right system.

Accuracy of biometric recognition systems is measured in two ways: False Acceptance Rate (FAR) where the system recognises a face when it should not have (e.g. letting an unknown person in), and False Rejection Rate (FRR) where the system does not recognise a face when it should have (e.g. keeping a legitimate person out).

These two values are related, because you can always improve one at the expense of the other. For example, tune the algorithm to accept more borderline cases and you improve the FRR but make the FAR worse. Letting everyone (including the bad guys) into the country is perfect FRR, but terrible FAR. Keeping everyone out (including legitimate people) is perfect FAR, but terrible FRR.

To see some real numbers, I found the results from the Face Recognition Vendor Test (PDF) of 2006.

Their benchmark performance for 2006 technolgy, for a FAR of 0.001, was a FRR of 0.01. That is, incorrectly accepting 1 in 1000 faces means incorrectly rejecting 1 in 100 faces. That was the benchmark, but different algorithms achieved similar or worse results under different conditions (see the graphs on pages 14 and 16). A FRR of 0.01 sounds poor, but is significantly better than the FRR of 0.79 that was achieved using 1993 technology -- incorrectly rejecting 8 out of 10 faces!

If the facial recognition algorithms are so poor why are they being used at border control? It is because it is not just the algorithm, but the entire system that counts.

In these systems, they probably use algorithms that crank up the FAR, so that the computer is very unlikely to let the wrong person into the country. That means their FRR is poor, so more legitimate people will be refused entry. But those people can then be processed by a border control officer, who can then recognise them. So it is not the algorithm that works, but the entire system involving both computers and people that works.

This system actually uses the respective strengths of people and computers. Page 20 of the report shows the error rates of people and the algorithms. It shows that when the FRR is high, the algorithms generally achieve better FAR than people; but when the FAR is high, people achieve better FRR than the algorithms. That is, an algorithm is better at correctly rejecting an impersonator than a person; but a person is more better at correctly recognising a person (even though they might look different) than a computer.

It is what we expect: computers are not very good at recognising faces. People are better at recognising faces, but computers are better at rejecting faces. Together the system works. Perhaps researchers should really be claiming success at facial rejection technology rather than facial recognition technology!

Sun, 04 Apr 2010

Telecommuting productivity

Telecommuting is often portrayed as the great innovation and if only unenlightened companies and managers would permit more of it the world would be a better place. However, something is lost: tacit knowledge and productivity decreases.

Andy "Sandy" Pentland gave an interesting talk on the Reality Mining for Companies, or, How Social Networks Network Best at the 2009 O'Reilly Media Where 2.0 Conference. Technology now allows us to measure social interactions much better than before -- where are the people and what they are doing -- and the results are surprising. Instead of speculating whether reorganising a seating plan improves productivity, we can now measure it.

Although technology can give us the same access to explicit knowledge, the decrease in tacit knowledge (which is usually not measured) has a large negative impact on productivity. For example, face to face contact, having coffee and lunch with colleagues significantly improves productivity.

I'll remember that when I have coffee and lunch with my colleagues tomorrow. I do great things at work, and I credit that to the great people who work with me.

The above link has a video of the talk, but an audio only version is also available from IT Conversations.

Fri, 02 Apr 2010

Reality becoming a game?

IEEE Spectrum has an article about a presentation that they called The most disturbing presentation of the year. Games designer Jesse Schell describes at DICE 2010 a future where everything we do is motivated by getting game points--we'll be living in one big game.

Although he describes the future, I think parts of that future are already here today. Think frequent flyer miles and loyalty points. As with any technology, this can be used for good or for evil.

View a short 10 minute video clip of the presentation--especially for the twist at the end. Or watch the entire 28 minute video presentation.

Then again, isn't life already motivated by potential rewards? It is just that in real life the rewards are not as obvious or measurable as simple game points. Some rewards come many years after our actions. Some rewards come from persistence and dedication. Some rewards don't make sense to other people. If you think of your current life as a game, what game points do you value?

Thu, 01 Apr 2010

DIY passport photos

I'm getting my passport renewed, but why should I pay for a simple instant photo when I can take my own with a better quality camera?

It is easy to take a digial photo and to crop and scale it to the right size. The application form says the photos must be 45mm high and 35mm wide, and the head between 32-36mm tall. Printing to a 4x6 inch photo at 300dpi requires an image that is 1200x1800 pixels in size. The passport photos will be 540 pixels high and 420 pixels wide--you can get eight photos onto one print. The head will have to be between 384 and 432 pixels tall. That was simple mathematics.

The difficult part was the things that they don't tell you:

  • The head height measures just the face and does not include hair.
  • They wanted it on gloss paper instead of matt.

So I had to do it three times before I got it right!

They are very particular about the size of the face. If you are doing it, I suggest including some 10% larger and 10% smaller pictures on the print.

Still, doing it this way I have a much nicer looking photo and paid a small fraction of the price for it... just don't count the time and effort it took!

Thu, 11 Mar 2010

TLS renegotiation security vulnerability

In November 2009, a security vulnerability in TLS was announced. This affects nearly all implementatations of TLS, but the IETF is working quickly at revising the TLS specification to address the problem. A lot of the articles about the problem characterise the problem as a flaw in the TLS protocol, but actually the problem is not with TLS but how it is (incorrectly) used.

I have been reading the original paper Authentication Gap in TLS Renegotiation and the vulnerability results from a number of things. If you know something about the technical details of TLS, I recommend reading the article for yourself.

The main problem comes from a connection consisting of an insecure session being renegotiated into a secure session. During the insecure session, a man-in-the-middle can inject some malicious data into the request sent to the server. This is fine according to the TLS protocol. TLS knows that the data sent over the first session is not to be trusted.

The problem comes about because the application incorrectly treats the all the data as having the same security as the second session, after the renegotiation with the legitimate client. That is, it incorrectly treats the data received over the insecure session, before the renegotiation, as secure when it should not. So the vulnerability comes about because the application protocol was incorrectly using TLS. This is an example of where important information has been abstracted away--a common problem in system design: the presence of different sessions should not have been abstracted into a single connection with one level of security.

Einstein said, "everything should be made as simple as possible, but no simpler." Unfortunately, in this case they did make it simpler!

Before you panic: the vulnerability only allows the man-in-the-middle to inject its data into the beginning of the request. Although they could use that to inject their own requests, they can't see the real request or the response--those are still encrypted for the legitimate client.

So I would not be too hasty in blaming TLS itself for the vulnerability. Except, that SSL/TLS was originally designed to secure HTTP and introducing sessions with different security (a concept which HTTP does not support), so it could be argued that it didn't completely meet the requirements properly. Unfortuntely, this is also a common problem in system design.

It is desirable to design components as separate pieces, but when they come together there can be unintended problems.

Sun, 21 Feb 2010

File Set Diff

I wrote a utility to compare files from two directories.

A friend had a large directory of photos on their computer and some of it was backed up to an external hard disk. We suspected that some photos were not backed up, but which ones? This was made more difficult because they had renamed some of the files.

So I wrote a script to find all the files in a directory and calculate a SHA-1 hash on their contents. The script does the same to a second directory and compares the hashes. It then prints out the files that are in one directory but are missing from the other. It also can detect duplicate files in a directory, since the SHA-1 hash uniquely identifies the contents of a file (even if it has been moved or renamed).

The script can be obtained from the downloads page on this Web site.

Sun, 07 Feb 2010

Khan Academy

Here's an example of someone making good use of this new media.

Salman Khan has created thousands of short videos to teach students everything from simple mathematics through to university science. Students around the world can watch these videos for free at the Khan Academy. Students can study the topics at their own pace, and repeat a topic until they understand it before moving on.

There is a podcast interview with Sal Khan on IT Conversations, where he discusses how the academy began and the successes it has achieved.

This is an example of where the Internet is used for good. To be a student in the Internet age poses many distractions and dangers, but it also brings many wonderful opportunities for learning.

Thu, 04 Feb 2010

QR codes

I've been experimenting with QR codes. These are a two dimensional bar codes that can contain a URL, phone number, email address, vCard contact information, location, SMS message, calendar event, or arbitrary text. They are popular in Japan and are being used in the Google Favorite Places business listings and Google Charts API.

ZXing QR code generator

Thu, 28 Jan 2010

Controlling URL line breaking with zero-width spaces

Line breaks for URLs often occur where you don't want them to. The solution is to use a zero-width space to suggest where it could have a line break.

In HTML, a zero width space can be represnted as "​". This character must only be used in the displayed URL and not the URL in the href attributes.

Here are two examples. The first URL is unmodified. The second URL uses zero width spaces after the slashes and uses non-breaking hyphens. Resize the browser window to see how the line breaking behaves.

http://www.example.org/alphaBetaGamma/foo-bar/alphaBetaGamma/foo-bar/alphaBetaGamma/foo-bar/alphaBetaGamma/index.html

http://www.example.org/​alphaBetaGamma/​foo-bar/​alphaBetaGamma/​foo-bar/​alphaBetaGamma/​foo-bar/​alphaBetaGamma/​index.html

Looks good, but there is one big disadvantage: if someone copies-and-pastes the URL it will not work. This is less of a problem if the URL is a hyperlink which they will normally click, but it is something to keep in mind.

The same trick can be used in Word documents. There are many ways to enter a zero-width space in Microsoft Word, but they are all very complicated. Instead, I think the simplest way is to copy it from another document. For example, save this Web page as HTML, open it in Microsoft Word, turn on hidden symbols, and copy the zero-width space character from it. With hidden symbols turned on, the zero-width space appears as a rectangle inside a rectangle. Or create a simple HTML document with ​ in it.

However, do not use non-breaking hyphens to further control the line breaks. If someone copies-and-pastes the URL, it will not work when there are non-breaking hyphens in it. They will be very confused, because the hyphen looks correct even though it is the wrong character.

Fri, 22 Jan 2010

Consumer password worst practices

How strong are your passwords? Despite lots of warnings, people still use weak passwords.

In December 2009, a cracker posted 32 million passwords onto the Internet. A security firm (Imperiva) calculated some statistics on these passwords. In their report they say:

  • About 30% of passwords are 6 characters or shorter.
  • About 60% of passwords only contain alpha-numeric characters.
  • About 50% were easily guessed names or words.

The most common password was "123456", followed by "12345", 123456789", "password", "iloveyou" and "princess". Read the Consumer password worst practices report to see what the top 20 passwords were, and for tips on using strong passwords so you don't become (literally in this case) a statistic.

Thu, 21 Jan 2010

Changing file line endings and encodings in emacs

Text files on Unix systems use a single line feed character (LF, 0x0A) to indicate the end of a line. Text files on MS-DOS and Microsoft Windows uses a carrage return plus line feed pair (CR-LF, 0x0D 0x0A). The classical Macintosh used a single carriage return character (CR, Ox0D). Thankfully, the LF-CR pair has never been used!

One way to change the line ending convention is to use emacs with the set-buffer-file-coding-system function (mapped to C-x RET f). When it prompts you for the coding system, enter either "unix", "dos" or "mac".

This is easier than trying to remember cryptic commands like:

tr -d '\r'
sed 's/$/^M/'

And having to worry about getting them to work because of different variations in sed and shell environments (e.g. when using bash the ^M is typed using Ctrl-v Ctrl-m).

If your system has the unix2dos and dos2unix commands installed (e.g. Cygwin and most Linux distributions do) use them. Otherwise, emacs lives up to its reputation as the kitchen sink tool.

Sun, 17 Jan 2010

Creating iTunes audiobooks with chapters

I downloaded the MP3 version of Free: The future of a radical price and wanted to create an iTunes audiobook of it. An audiobook is more convenient because it will appear as one item with multiple chapters (instead of separate songs), it will remember where you listened up to, and can play at different speeds on iPods and iPhones.

After much searching, I found that you can create iTunes audiobooks, complete with chapters and artwork, by using Garageband. I've described the procedure in an article on how to create an iTunes audiobook using GarageBand.

Sun, 10 Jan 2010

Cygwin rxvt: a better terminal

After installing Cygwin (a very powerful Unix like environument for Microsoft Windows) I usually set up my home directory and create a shortcut to rxvt.

I make the Windows "My Documents" directory my Cygwin home directory:

cd /home
mv username username.bak
ln -s "/cygdrive/c/Documents and Settings/username/My Documents" username

Setup xrvt as the shell window, since it is much better than the default Windows Command Prompt:

  1. In Windows Explorer, go to C:/cygwin/bin.
  2. Right click on xrvt.exe and create a shortcut for it.
  3. Rename the shortcut to "Cygwin rxvt".
  4. Right click the shortcut and select "Pin to Start menu".
  5. Right click on the shortcut and select "Properties".
  6. Change the Target property of the shortcut to:
C:\cygwin\bin\rxvt.exe -sl 1500 -fn "Consolas-16" -bg black -fg orange -e bash --login -i

The -sl arguments sets the number of lines in the history buffer. The -fn argument sets the font. If you haven't got the Consolas font, use "Courier New-16" instead. The -bg and -fg sets the colours. The -e bash --login -i runs the bash shell.

The rxvt here is a Cygwin Windows program. It does not require X11 to operate. But it does use the X11 method of copying and pasting (i.e. selecting the text copies it, and the middle mouse button is paste).

Note: Cygwin version 1.7 (or later) now installs a shortcut to rxvt called "rxvt-native", so the above instructions are no longer necessary. However, I still customise its font and colours by modifying the command as described above. There is also now Mintty a terminal emulator written especially for Cygwin.

Storing Cygwin on an ISO image to install in a Parallels VM

Store Cygwin on an ISO image for easy re-installation onto virtual machines.

I'm installing Cygwin onto a Parallels virtual machine. I wanted to download Cygwin and its packages only once, and to install it onto multiple virtual machines. I tried storing it as a directory on the Mac, and attaching it to the Parallels virtual machine as a shared folder. Unfortunately, shared folders appear as a network drive on ".psf" under Parallels, and Cygwin has problems installing from it. Of course, I could have copied the files onto the (virtual) C: drive and installed it from there, but would have needlessly used up space on the VM's drive.

The solution I found was to create an ISO disc image containing the Cygwin files and to mount that onto the virtual machine as a DVD disc. Cygwin installs fine from the virtual DVD-ROM and unnecessary file copying was avoided.

Creating the ISO still required the packages to be downloaded inside a VM running Windows, and then copied out of that VM into an ISO image. But after that, no more copying is required.

Tue, 05 Jan 2010

diff utilities

I've been reading the documentation for the diff command on Unix and have discovered lots of powerful options in it.

The diff command can show the changes side by side. You will need a very wide terminal, but you can still get a good indication of what has changed by setting its output to a narrower terminal width.

diff -y -W 80 file1 file2

Two directories can also be recursively compared:

diff -Naur dir1 dir2

There is also an interactive command called sdiff to merge two files together to create a third file. However, I think it is easier to use emerge mode in emacs.

If you are running Mac OS X, another option is to use the FileMerge application. If you install Xcode, It can be found in the /Developer/Applications/Utilities folder.

Security tips for the rest of us

Computer security is hard. Technical people have a hard time keeping up with all the issues, so what is the average computer going to do?

The Security Now, podcast #229 describes a few simple rules that anyone can follow:

  1. Don't click on links in emails.
  2. Don't accept files or email attachments from people you don't know.
  3. Do keep your computer up to date with Windows Update or Mac Software Update.
  4. Do use good strong passwords.

These are easy enough for anyone to remember and follow. It is much better to follow a few simple rules, instead of having more better rules that don't get followed.

For further details, see the So Long, And No Thanks for the Externalities: The Rational Rejection of Security Advice by Users paper. It describes how some traditional security advice is not worth following, because the benefits/risks are outweighed by the cost of following them.

Sun, 03 Jan 2010

iPhoto slideshows to iDVD

I was creating a video DVD of some wedding photos I had taken. I was using iPhoto and iDVD, but found the video it generated was suboptimal because it generated video at the wrong resolution and frame rate. With the low quality of standard definition video probably no one would notice, but why settle for an inferior outcome when it is easy to do it right.

In the creating a video DVD from an iPhoto slideshow article, I describe how to use custom settings to generate iPhoto slideshow videos which are optimised for creating a normal video DVD.

Fri, 01 Jan 2010

Hello World!

The start of the new year. A good time to update my Web site and to start this blog.

This site is generated using XSLT and some Perl code. It uses valid XHTML and CSS.