Steve Schafer's Log: November 2008

Saturday, November 29, 2008

Communication is hard. The web makes it easier in some ways but it's still difficult. I subscribe to some technical blogs and in many cases, I have no idea what the author is talking about. This is not so much because the concepts are difficult to grasp. It's more because the blogger throws abbreviations about like confetti at New Years, without going to the effort to provide links that might help an uninformed reader. Now there's the essence of the paradigm shift that the web is bringing about. We've been speaking to each other for a long time (millenia? I don't know when language was invented) but for the most part it's face-to-face or in small parties. Occasions where one person speaks to a large gathering were significant. Now they are insignificant. When you create a blog you have no idea how few or how many people will read it. You have no idea how much background they will have in the topic.

I think communication is evolving. The mere fact that I can provide links to help explain terms is only the start. This medium of language is actually very archaic and not well suited to the internet. Something along the lines of semantic trees would be more appropriate. But our brains are not wired that way presently. Future generations, however, will learn interactivity in entirely different ways.

I've found a very nice distributed storage solution: Wuala. This is exactly what I've been wanting in a number of respects. You trade storage on your local filesystem for storage on the cloud. Everything is encrypted. It's written in Java. It comes with a nice user interface but I've set it up to run as a daemon on Linux. I access the files through an NFS share. One drawback is that you can copy a file to a shared folder and, if the file is particularly large, you won't see it right away. The file gets uploaded to the cloud in the background and you don't see it in the shared folder until it's fully uploaded. That makes Wuala better suited for backups than for interactive applications unless I can build a caching app on top of it. They have a REST interface but say that it's still in early development.

I still have some questions but I haven't gone through all the FAQs and documentation yet.

They says it's based on open-source but it doesn't appear to be entirely open source itself. The storage scheme is like a file system which is ok for some applications but it would be difficult to build an interactive application on top of it. Amazon's S3 is better for that.

Now if S3 would allow trading storage and encrypt or if Wuala would allow for a more object-oriented schema and synchronous transfers, all would be perfect.

Thursday, November 27, 2008

Ideas are the mechanism by which our puny brains model the world. Everything is an idea. The most honorable and the most horrible acts are committed because of an idea.

Ideas always have an emotion at their core.

Ideas are imperfect. They are not reality, but we treat them as such. Ideas are subjective. The concept of objective truth is an idea and thus is itself subjective.

Our perceptions solidify into ideas. What we know as the world around us is just a collection of ideas. It's an illusion really. This is itself an idea. One that I got from reading the book "The Disappearance of the Universe".

I write this in the third person, but it's really what I've learned about myself, but I have a pretty good hunch it applies to other people as well.

Tuesday, November 25, 2008

I don't trust software. I don't trust what other people write and I don't trust what I write. That's because programmers make mistakes all the time, and there is no sure method of finding all of them. The more code there is, the more bugs there will be.

There are a number of things that can mitigate this such as coverage testing and code reviews, but my favorite is robust compilers, and right now my favorite compiler is Java. This is mostly because I'm used to it. I'm so used to it that I actually dream in Java.

Java does a lot to help prevent programmer mistakes, or to at least point mistakes out to the programmer. Recently I've been working on a C# project and in desperation I've resorted to a number of "C# for Java Programmers" documents on the web. One good one that I read yesterday had a section on exception handling that explained how C# doesn't provide any compile-time exception handling. I.E. there's no "throws" clause. This is a huge deficiency in C#'s ability to help the programmer catch mistakes. Basically you have to rely on documentation to have any clue how a method you are calling might fail. I don't know about you but I trust documentation even less than I do code.

In fact, one of my gripes with Java is that it allows some classes of exceptions to pass through without requiring a throws clause. These include array index out of bounds, integer parsing errors, and the dreaded null pointer exception.

Null pointer handling is the hardest aspect of Java programming for me. I keep wanting the compiler to help me with it but it just refuses. I end up having to remember to always check and that results in code bloat. There seems to be an unavoidable tradeoff between the risk of null pointer exceptions at runtime and code bloat. But this doesn't have to be. The compiler could be of enormous help here.

First of all, the compiler knows exactly where pointers are referenced. It also knows, at each of those places, whether the pointer could be null or not. For example,

Integer a = new Integer(0);
System.out.println(a.intValue());

a null pointer exception is not possible at line 2, whereas in

void someMethod(Integer a) {
System.out.println(a.intValue());
}

a null pointer exception IS possible at line 2. Now if we had a parameter modifier "notnull", then

void someMethod(notnull Integer a) {
System.out.println(a.intValue());
}

a null pointer exception would not be possible at line 2. The compiler would then generate an error any time someMethod was called in a context where its argument could be null. I think that null pointer problems would almost disappear if the compiler supported this, and my trust in code I and other programmers write would go up a couple of notches.

I've searched for tools that can do this sort of thing with annotations but haven't found anything that really works. I've found I can turn on a warning in eclipse that supposedly will detect possible null pointer problems but it only catches a few of them. At the moment, writing my own tool is a bit beyond my reach, so I'm waiting, continuing to search for a solution occasionally, and continuing to mistrust my own code.

Friday, November 21, 2008

I gave some serious thought to using Amazon EC2 as a server platform. Unfortunately the $0.10 per hour works out to about $72 per month, which exceeds the $50 per month I'm paying for my DSL with static IPs. It's still something I would consider for a commercial application. It's a virtual machine in the cloud. How cool is that?

This also drew my attention to Amazon S3, which is a way of storing data very similar to the way I store it in AMI. The advantage, presumably, is higher reliability. Unfortunately I don't know exactly how reliable S3 is. And calling it "cloud" data storage may be somewhat misleading. I suspect it's still hosted on one or more servers which, if they go down, leaves you SOL. True cloud storage would be distributed. I did find such a service: OpenDHT, where you can donate a portion of your local storage to the cloud and in return be able to store data of your own in the cloud. The disadvantage of this service is that the data has a limited lifetime. That kinda kills it for me.

But the actual technique of storing data in a flat space of records (aka nodes or documents) each of which can have any number of arbitrary attributes (aka fields or columns) holds much interest for me. Such a technique is much more flexible than the traditional relational techniques that require pre-defining tables. Combine that with OSGI and you have a truly dynamic system.

Wednesday, November 12, 2008

I removed AVG from my wife's computer. I'd removed it from my laptop months ago. I've come to the opinion that AVG is more dangerous than the viruses it's supposed to protect us against. That may be true of other AV programs as well. At any rate, in the years that we've been using AV, we've never encountered a virus. That's thanks to good habits. I think a weekly scan with clamav should be sufficient.

I have written my HTML sanitizer, but now I have to figure out how to apply it. At first I thought of running all incoming email through it, but realized I would be discarding information, which I hate to do. Anyway, if the method was buggy I'd have no way to recover. What I need to do is apply it as the HTML is on its way to the browser. The trick is, however, that I don't want to apply it to all HTML documents. Only ones I don't trust. And this comes around to something I've been thinking about for a long time: how to establish a trust level for documents.

There's really only one way to do this in a multi-user environment: digital signing. Now my HTML sanitizer is going to have to wait until I work out all the ramifications of this idea.

Tuesday, November 11, 2008

Jeff Atwood just wrote "Coding, after all, is just writing. How hard can it be?". I'm amazed. I really don't like writing and I love coding. Computers and people could not be more different. But Jeff is a people-person so I guess it's understandable that he would say that. I'm going to end up being one if I keep this up.

On a side note, I notice that AVG has finally made a major flub-up. This may be the end of them. I have felt this coming for some time now. In my perception they started out good and gradually became evil. This is a natural progression in a free enterprise system so I'm not condemning it. The flub-up is also part of the process. Companies get fat, lazy and corrupt and young lean hungry companies take their place. And buyers should beware.

Anyway, back to AVG. I checked my wife's computer and user32.dll is still there. She has an older version of AVG because the latest version wouldn't install. So either the update hasn't taken place yet, the problem exists only in the current version, or the slashdot article is just FUD.

On another side note, I've thought a lot about the cross site scripting problem in AMI and come to the conclusion that I have no choice but to parse HTML and sanitize it. Actually the parsing shouldn't be too hard. I just have to recognize tags and exclude the script tags.

Saturday, November 8, 2008

I've been struggling with the problem of cross site scripting in AMI. The way it's designed now, users can enter HTML in all its glory, including scripts. Some other user can view that document and there's where the XSS vulnerability lies. Maybe one can try not to be evil but one must assume everyone else is.

Thinking about how I really use AMI I realized I don't really need to be able to enter HTML. It was just an easy way for me to provide visual formatting. Really though the kind of information I record is more appropriately stored in attributes rather than the document text. I just need to make it easier to use attributes.

If everything is stored in attributes, AMI becomes much more of a semantic repository and not so much a document repository.

Friday, November 7, 2008

AMI

I've been writing software for more than 30 years. Since the late 70's I've had no other profession, and, unlike some people I've met, I really enjoy it. I find it curious that a person can get into the programming profession and really hate what they're doing. I'm sure those are people who made a rational decision about their career path early on, got the education, got the job, and stuck with it. Generally I think it's a bad idea to try to make rational decisions about your career.

Thursday, November 6, 2008

This is an experiment. A while ago I tried out a couple of different PHP blogging packages on my Linux box for my wife to use. It wasn't long before they were inundated with comment spam. It will be interesting to see what happens here.

Steve Schafer's Log