A semantic platform

Recently we at Gnowsis stumbled upon the question, "what is a semantic platform"? This is an interesting question, because the "semantics" term is loaded with so many different interpretations and opinions that it is really hard to give a simple yet precise answer. Anyway, since we believe that with Refinder we are actually building a "semantic platform", we should be able to give an answer to that question. So let's start.

Context is the key to success

First, there is "semantics". In the context of information technology, "semantics" is usually associated with the interpretation of signs (in most cases, words in texts), and the relationship to what they mean (i.e., what they are intended to denote). This is not a straightforward relationship, and in most cases it completely depends on the context in which the sign appears (think of the word "Tiger", which has quite different meanings in IT and in zoology, or "die", which has quite different semantics in German or in English). You cannot decide which is the correct meaning of a sign if you do not consider the context in which it appears. (Of course, this leaves the question of context boundaries open -- but let's save this for another blog post.)

So, to get the meaning of data right, you have to get an understanding of its context. How can a semantic platform know the context of the data it processes? Well, there are a variety of options:

  • Source Context: usually, information has some source; there is some system where the information originates from (e.g., the website from which a snippet of text was copied, or the directory on which a file is stored). Such context is in many cases 'hard fact', and its meaning is quite clear and obvious.
  • User and Social Context: as we usually collaborate which others, the information on who is working with information, and from whom we may have received information is quite helpful, especially when people decide to share proactively (and people are wanting to share, as the examples of Twitter, Facebook, and Google+ demonstrate).
  • Global Context: Some context of information can be considered as globally valid, meaning that there are some commonly accepted sources of information that give us an 'absolute' context of certain words. One example of such global context sources are encyclopedic sites like Wikipedia, from which we can obtain the commonly accepted meaning of certain words in certain languages. In some cases these knowledge sources are available in structured form (like in the Linked Data cloud), making it even more easy to process this information with machines.

If a platform considers data context, it can gain a lot of benefit for the user. For instance, instead of just having "documents" or "pages", a semantic platform can manage real-world entities like persons, organizations, or events. By "semantifying" raw data, the options and actions the platform can offer the user to perform on the data is significantly extended -- the more you know about your data, the more you can do with it. (By the way, this is exactly the route that Google, Bing, and Yahoo have recently taken with schema.org -- they use semantic markup embedded in Web pages to improve the precision of their search engines.)

The context of information needs not necessarily be expressed in a categorization or hierarchy (although that may be helpful in many cases). Especially when the amount of data exceeds a certain level (like in big enterprises, highly connected social networks, or on the Web) hierarchies do not scale. What scales, however, are bi-directional links between semantically related items. Things that are related via an intermediate step may still be relevant, but probably to a lesser degree. After a few hops the relevance of connected information approaches zero.

Machines do the math, but not the thinking

A big problem with contextualizing information is that machines still cannot think. They are only able to do calculations, so everything we do to contextualize data in a software system must be "reduced" to statistics and mathematics. When a certain problem cannot be solved using mathematics (and there are many of them!) then the user must jump in. In turn, every step a user performs can be multiplied and optimized by offering clear, simple-to-use interaction metaphors and user interfaces. So, with a little help of the user will make information much more meaningful, both for the machine and the user. Semantics emerges.

Refinder captures the semantics of information by considering context about data on the three level described above, plus giving the user an unobtrusive and simple way to enrich the context of information items in the system. Refinder makes it easy for the user to externalize their knowledge, and to explain what a certain piece of data (which, in the end, are only zeroes and ones to the machine) is all about. This explanation can be used by the machine to improve its mathematics, which in turn will yield better search results, higher precision, and a better separation of signal and noise for the user and their collaborators. Semantics at its best, I would say.

The second part of the question is the "platform" aspect. I will try to answer that in an upcoming blog post.

Comments

Well done artlice that. I'll make sure to use it wisely.