Tuesday, August 17, 2010

Why we (probably) won't have a Semantic Web.

Librarians make the assumption that because we are smart and know how to categorize and organize information that others will follow our recommendations.

But we see that book stores don't when they group their products under large, nebulous categories.

And Google doesn't, when it returns results based on algorithms optimized for displaying ads.

Librarians think that all metadata are equal. That "sex" is equal to "love" because they are just terms to classify content. But no two terms are ever equal. Metadata have meanings that are timely or momentarily popular or specific.

So librarians ask questions to establish intent.


"The entertainment system was belting out the Beatles' 'We Can Work It Out' when the phone rang. When Pete answered, his phone turned the sound down by sending a message to all the other local devices that had a volume control."

[The Semantic Web by Tim Berners-Lee, James Hendler and Ora Lassila]
Why did Pete's phone send a message to turn down the volume? What if it were Pete's boss calling to tell him to work Saturday? Would the phone understand to send a message to play the "Ferris Bueller" sneezing and coughing audio files?

My point is that our understanding of the purpose of the Web is wrong. And our understanding of machines is wrong. Just as our understanding of other people is wrong.

We can't possibly know the purpose of the Internet. First, we didn't make it. Second, it was designed with only one purpose, to make access to data easier.

But now we want to control that data flow. We want to tell the internet that different data have different values to us through metadata. And the only way we can do that, from my understanding of the internet, would be to insert such massive amounts of metadata into the web that we would end up creating a second internet. One made up entirely of metadata.

This second internet would act as checksum data to compare to the internet to calculate and verify the user's intent. To understand.

This second internet would contain all the metadata needed for machines to understand humans. Yes, I said it would be huge.

But I don't see that. Of course, I wouldn't see it if it existed. I would be invisible. So what I see is an internet evolving into something that looks like a semantic web, but is only a more accurately fine-tuned commercial web. The web is getting better at selling me things. It doesn't really know what I want, so it gives me what others seem to have wanted.

I can see that the internet is trying. I can see that we are trying. There are frameworks for attempting to have the internet understand us. And to have machines understand us.

In fact, there might even need to be a third internet made up purely of rules, policies and vocabularies which massage the metadata into accuracy of intent.

But to build this, one must understand what humans think. But since humans communicate with machines so much differently than how they communicate with each other, what humans think when they interact with other humans or machines changes.

In the example at the top, Pete might think that he wants to talk to the person on the phone, so the phone communicates his intent to the stereo. But if Pete doesn't want to talk to the person on the phone, the phone communicates a different message to different machines. How is the phone to know? Without asking questions? Oh, it listens in on our conversations. Did we say it could?