"TagsLock" HTML Obfuscation and other user-abuse

I recently happened against a firm called AtomPark, selling a product called "TagsLock." TagsLock is another HTML obfuscator, which attempts to protect page content through the usual technique of JavaScript-interpreted code and URI encoding. Worse, it tries to use the Javascript input event routines to intercept mouse clicks on images, in the mistaken idea that this inhibits copying.

While Javascript obfuscators are bad enough (if you want to run code on my computer, I expect to have access to it), using Javascript as part of an HTML obfuscation scheme is lame in the extreme. HTML was designed to degrade gracefully. When you make part of your content unusable without Javascript interpretation, you inhibit access to the information you're presumably trying to publish. Your presentation ends up discriminating against people who choose not to enable JS, or choose to run a browser not capable of it.

More to the point, though, the techniques employed are described misleadingly. TagsLock is an HTML obfuscator and Javascript harassment package, not an actual means to protect content in any meaningful way. It engenders a false sense of security -- any halfway capable attacker can still access the content, while in the meantime a great many legitimate users will have been inconvenienced.

To date I've seen two basic approaches to HTML obfuscation -- interpreted Javascript and URI encoding. URI encoding is the most visually opauqe encoding in the Javascript API. URI encoding simply replaces any given byte with a % sign followed by the hexadecimal representation of its binary value. It's used by various HTTP agents to encode characters not otherwise legal in URLs. It shouldn't be confused with "encryption", which is another matter entirely.

The "Any Browser" folks have had a few things to say about using browser-specific, non-degrading content. So has one of the original Netscape engineers.

Reversing URI encoding is trivial -- there are hundreds of implementations of the same algorithm. Interpreting Javascript is easy for a Javascript engine, of which quite a few exist. With the encoding schemes I've seen, though, only very simple JS expressions are used, which can be interpreted programmatically without much difficulty.

So?

So here's some demonstration code that reverses TagsLock's obfuscation, yielding HTML usable in a browser without Javascript, images that can be clicked on without interference, etc.

Download it here. The source is released under terms of the GNU General Public License.

The other preposterous thing about AtomPark is that they're proposing to charge money for TagsLock -- some $500 for a site license, in fact. Hopefully no one has been foolish enough to take them up on it, but in a fit of pique I reimplemented the same algorithm, in two minutes. Here it is, yours to use under the GPL. If you feel some burning need to pay for software, send the money to EFF.

Wait a minute, so how do I protect my content from being stolen?

Don't put it on the web.

Electronic content control is a losing battle. If you display something on someone's computer screen, it can be copied. If you send something over the network it can be intercepted and copied. If you try to use the web's own techniques against it, someone will get annoyed work around it. Silly obfuscatory gyrations won't protect your content; they'll merely annoy legitimate users and amuse pirates.

The basic problem here is that obfuscators need ultimately to produce content in a form the user's browser can display expeditiously. Otherwise the content doesn't flow at all. Even "encrypted" content needs to include its own decrypt key, which defeats the purpose entirely.

I've noticed an inverse relationship between content protection and content value -- those who put the most energy into keeping people from copying the content tend to be protecting content that isn't worth copying in the first place.

So won't content pirates use this tool to steal my pictures?

Anyone who needed this tool to save copies of images being served up by your webserver is probably so incompetent you needn't worry about them. In TagsLock's case, all that's needed is to let the browser download the page and interpret the JS code, then disable JS, save the images, etc.

What about the spam features?

I'm grudgingly in favor of munging email addresses in HTML to interfere with email harvesters used by spammers. However, doing it with JavaScript isn't an acceptable solution, since it inhibits use by people running without Javascript; people have a reasonable expectation that content is accessible without having to drag in the large piles of unstable, insecure code that are your typical Javascript interpreter.

If you feel like keeping your email addresses from being harvested, I recommend URI encoding, HTML entity encoding and turing-test munging. Do not do this in a way that interferes with legitimate use of your web pages.

That is all.