What is Linked Data?
- In: Mapping
- Published Date
By Glen Hart and John Goodwin, Ordnance Survey
Whether we like it or not, we are entirely dependent on the use of computers and the internet to do business. Increasingly this dependence is spreading to our social lives too and the notion of an “always connected” life is almost reality for some.
Underpinning this dependency, the raw opiate of our digital world, is data and it is the one thing we are not short of. But for the most part that data, if viewed from a distance, can be seen to be a chaotic assemblage.
Tom Heath of Talis (www.talis.com) has a very nice analogy that we can expand upon here. Imagine if datasets were cities and the railway system was used to connect those cities; what would this world look like? Well perhaps like the world we know, each of these data cities would be very different, each with a unique character, built using different architectures and building techniques. How they would be connected though would be very different. In fact the most noticeable thing will be how unconnected most of these cities are.
Where connections exist, each will be different, in railway terms no two railway lines would have the same gauge; a rain running from City A to City B, could not run from City B to City C - the lines would be a different size. But just as connectivity is important for real cities, so it is for data too. For instance, a company performing an environmental assessment will need to take data from a number of sources – mapping, land cover, habitat surveys, soils data and so on - and combine them to perform the required analysis. Given the current state of data none of this process is easy.
Essentially we need a better solution. Linked Data is a rising technology that can help with some of these issues. To understand what Linked Data is, it is worth understanding where it has come. The World Wide Web emerged in the early 1990’s, the brain child of Sir Tim Berners-Lee. What Berners-Lee wanted to do was to link documents over the internet to share with his colleagues. But he was unable to do so in a simple and consistent manner.
The World Wide Web did not exist and so he had to invent it.
Linked Data has taken the same basic idea of behind the linking of documents and applied it to data and has also reused as much of the technology as possible. So Linked Data uses HTTP and using a standard called RDF to enable data to be interconnected via links, much as HTML enables documents to be linked. So Linked Data is part of the Web, just as documents are.
The two main problems that this seeks to address are to standardise data around one format – much as HTML has standardised documents on the web; and to provide a way to directly link data so that once linked, it is not necessary for others to link it again. Although this may not sound much it is actually a very big deal, in time it will probably affect the way we all do business as much as the original Web has.
Linked Data works by requiring all data to be formatted in RDF. RDF is a World Wide Web Consortium (W3C) standard and stands for Resource Description Framework. The name itself isn’t too helpful in saying what RDF is, but putting that to one side the one thing that can be said about RDF is that it is about as simple as a ‘Standard’ can be. The basic format of all RDF is what is known as a triple.
A triple has the format:
Subject, Predicate, Object
Hence the name. It enables us to make lots of simple statements that combined represent our dataset, such as:
Hampshire is a County.
M5 connects to M6.
Portsmouth is also known as “Pompey”.
In each case we have a subject: Hampshire, M5 and Portsmouth, a predicate: “is a”, “connects to” and “is also known as” and an object: County, M6 and “Pompey”. The subject must always refer to a thing; the object can be either a thing such as M6, or a value such as “Pompey”.
The above statements are written in English, but as data they are encoded and so are a lot less readable, but the basic principle is nonetheless the same.
Linked Data also borrows another idea from the web of documents; the idea of identity as an address. Each document on the web can be found by reference to its URL. This both uniquely identifies the web page (document) and allows the document to be located. Linked data does not use URLs but URIs or Uniform Resource Identifiers. The URIs are used to uniquely identify the things you are interested in (subjects and objects) and the predicates. The URI is the handle you use to represent the thing you are interested in as you can’t put the thing itself on the Web.
Ordnance Survey has published Linked Data about all the administrative areas of Great Britain. In that data we have assigned a URI to represent the City of Portsmouth: http://data.ordnancesurvey.co.uk/id/7000000000037254, a URI to the predicate ‘Contains’: http://data.ordnancesurvey.co.uk/ontology/spatialrelations/contains and a URI to Cosham: http://data.ordnancesurvey.co.uk/id/7000000000017392 an administrative area within Portsmouth. So the statement “Portsmouth contains Cosham” is published as:
http://data.ordnancesurvey.co.uk/id/7000000000037254 http://data.ordnancesurvey.co.uk/ontology/spatialrelations/contains http://data.ordnancesurvey.co.uk/id/7000000000017392
Well, not exactly, there are a few other bits and pieces that go around it, but these are just syntactic sugar for the benefit of a computer that we need not worry about at this stage.
But how does this really help? By having a standard and universal data standard, we don’t have to worry what the format the data is, we know what it’s going to be RDF, and that means everything comes as a triple. This immediately removes one of the problems of converting data from one format to another. Secondly as a dataset becomes more and more linked the chances are someone will have already made the links that are needed for our application.
Both these can make very significant inroads into reducing the cost and time it takes to link datasets together so it can be used to solve some problem. Of course it does not address all problems, but it can help in time with finding data if we assume that a Linked Data equivalent of Google will emerge. But Linked Data isn’t without problems. Probably the most pressing is the ease with which it is possible to make bad links to other people’s data – for example linking data about Birmingham in England to Birmingham in Alabama. But Linked Data is nonetheless a very good move in the right direction.
What is happening?
Like the original Web, Linked Data started in a small way in 2007 with just a few interlinked datasets. Even then there were datasets with distinct geographic content that would be of interest to many in the environmental industry. One was GeoNames the crowd sourced gazetteer of world placenames, another was DBpedia.
DBpedia is a Linked Data version of much of the content of Wikipedia and contains a lot of explicitly geographical and environmental information. But Linked Data is not just for the crowd sourcing community; its value has also been recognised by governments and companies. For example the BBC has begun to use Linked Data as the basis from which to generate many of its web pages.
It does not just stop there, the US government is promoting the use of Linked Data to publish data and companies such as Tesco have recognised the importance of Linked Data and are using it for things like stock control. Ordnance Survey was one of the first agencies to release Linked Data and people are already beginning to use this in anger with current usage (March 2011) being 1.8 million.
For those interested in environmental data, the use of Linked Data will enable it to be much easier to integrate with other sources. And if you need to use environmental data, look to see if it is available as Linked Data; whilst if you’re publishing any, you should serious consider doing so as Linked Data. This article can do no more that scratch the surface of what Linked Data is, but if you want to find out more a good starting point is http://www.linkeddata.org.
Image 1 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/



Features