Schema.Org: On the Painful Coding of the Semantic Web

Our Erik Vlietinck says schema.org improves the display of a site in the search results and brings us closer to a truly semantic web. But is it ever time consuming to mark up …

Metadata markup initiative Schema.org hasn’t been successful with the web master community since it launched a year ago. But with the release of version 1.0 in June and Data.gov almost ready to add its listing of over 450 thousand datasets, the semantic web at last is taking off.

Launched back in June 2011, Schema.org is a joint initiative of Bing, Google and Yahoo. It provides a vocabulary — a complete collection of concepts and their properties — to aid masters to markup their web content in ways recognized by major search robots. Search providers rely on this markup to improve the display of search results, making it easier for people to find the right pages customers search for.

So far Schema.org combines dictionaries from IPTC (rNews standard), for job postings, genealogy, e-commerce, learning and education, medical and health and technical publishing.

As it is backed by the three largest search providers and its founding members are already exploring other ways of sharing Schema.org structured data, Schema.org is one of the most important efforts on the path to a semantic web.

Even a year after its launch, though, Schema.org’s wiki reports only 198 web sites that incorporate Schema.org markup. The BBC, for example, is using it to more efficiently — that is, more semantically– identify athletes. But that’s about it.

schema.org metadata used by BBC for more semantic web

The reason for the slow uptake is that, although Schema.org code looks easy to implement and is certainly not rocket science, the situation is complex. There is  an endless array of items, types and properties to choose from. Item properties do have  such expected content types as “URL”, “text”, “duration”, “date” and so on. But these associations are not carved in stone.

Writing the code by itself is a pain any way you look at it. It’s too easy to make mistakes and mess up the HTML and CSS code that controls the looks of a page.

the metadata dictionary for a general semantic web is huge.

Coding the semantic web

To help web masters and developers write correct code nevertheless, Google has a Rich Snippet page for testing code. The analysis results page shows how Google would list the page and how the search robot interpreted the code. Errors are shown in red but without much clarification.

The Rich Snippet tool can be used to check how search robots understand the page semantics.

Schema.org items aren’t always clear cut. The rating content block in the image below refers to the author of the review rating the product, not the visitors of the web page. Schema.org’s Review type, however, seems to refer to reviews generated by site visitors who experienced the product. It can take a good deal of searching through Schema.org’s type listings, as well as on its wiki to find out which types and properties are best for any given purpose.

When you have found the right item you should declare it in a “div itemscope” element with a link. This is usually done in a separate “div”, but can be incorporated in an existing one.

The schema.org metadata is added to the HTML code

After declaring the item scope and type, each piece of data within the declared element should get an appropriate element. When it’s text you’re marking up, the HTML element to add the property attribute to can be an existing element or a dedicated element because these don’t take up any design unless you explicitly add CSS attributes.

A simple snippet of HTML code...

In some cases, you will need to nest item scopes and types. For example, if the review covers the Schema.org item type “Movies”, but you are adding information about the Director of a movie, that information can be scoped and typed accordingly, i.e. with the “Person” item type. The order in which this nesting occurs must be done as in the next example.

Schema.org metadata markup added to the HTML code.

If the element you’re marking up is an image or a multimedia object, schema.org provides for description attributes like duration and so on.

For aNewDomain.net, I’m Erik Vlietinck.

1 Comment