Land Governance Lost in Translation: Exploring Semantic Technologies to Increase Discoverability of New Technologies & Data
It is said that the languages which we speak changes our thoughts and the way we think. What is more, is that new research shows that the many subtle differences across languages might actually change the way we experience the world around us. Sara Jani, Project Monitoring and Knowledge Management Coordinator at ICARDA says the following about the array of languages and words we use on a daily basis: “Our ancestors used to say each tongue (language) is a human being" "كل لسان انسان "
Despite the increasingly interconnected world we live in however, language and technology barriers continue to be a very serious constraint to effectively exchange and learning from the wide plethora of information now available to us. Land, for example, is a topic that is debated in many languages, across many different academic disciplines and in all parts of the world. Many attempts have been made in the past to find common definitions and terminologies for issues related to land, but a wide consensus or adoption has not quite yet been reached. Understandably so: one can only imagine the heated and controversial discussion to reach agreement on what we mean exactly when we use the word ‘property’ or ‘tenure rights’, for example. A new “land thesaurus” LandVoc, hopes to change that.
THE POTENTIAL OF THE SEMANTIC WEB
It is no secret that there is a wealth of data and information available on the web, with more being added each and every day in every part of the world. It has become impossible for humans to digest all of this information. For example, a simple Google search for ‘surveying techniques’ returns over 34 million records! Even a curious mind and thirst for knowledge simply will not be enough to sift through this amount of information.
Despite this plethora of information, it is sometimes said, ironically, that the answer to the world’s problems lies in a PDF somewhere online. In essence, we have yet to reap the benefits of the mass of information being produced every day, because information is hidden and fragmented; not published in ways that would turn information into knowledge that could invoke change. Furthermore, much of the information out there, especially when it comes to land rights, remains highly compartmentalized. This often happens by theme, sector or geography and information often fails to move outside of these predetermined categories.
The semantic web aims to address this discrepancy. The goal of the ‘semantic web’ is to make information available online machine-readable. What does this mean in simple terms? As mentioned above, humans cannot absorb all data and information. This means that important knowledge will never reach its full potential or even, in the worst-case scenario, remain unused. Machines can help us read and digest this information at an unprecedented speed or scale. In order to effectively share knowledge and technologies across the globe and increase our collective efficiency - we should indeed embrace a tool like the semantic web.
Let’s look at an example here. If a user asks for a specific set of information from a database, for the machine to be able to retrieve the relevant information, it is essential that the machine also understands what the topic is. If user A, who uploads content to this database, can fill in any word to describe the content uploaded, the machine then has no way of knowing the relationships between terms. A common issue is when a resource may be tagged with a synonym, for example. When user A has tagged a piece of content “favela” and user B is searching for “slums”, there is no way for the machine to know that these are indeed closely related terms. Controlled vocabularies help to mend these gaps, and work with unique IDs for each concept, with the possibility of adding several labels to that ID: the preferred term, translations in an endless number of languages, relationships between terms (A is related toB, or X influences Y, etc.). This way the machine can understand the languages and the nuances we use in languages, and help retrieve the most relevant and to-the-point information to a user’s search and information needs.
AGROVOC is one such controlled vocabulary, built by and for the agricultural sector. It is facilitated by the Food and Agriculture Organization (FAO) of the United Nations. It covers “all areas of interest to the FAO, including food, nutrition, agriculture, fisheries, forestry, environment etc.”.The AGROVOC thesaurus was first published in English, Spanish and French in the early 1980s. In 2000, AGROVOC went digital. It has evolved and grown over the years, with a vibrant and international community of editors behind it, contributing new concepts and new translations every month. Today, AGROVOC consists of over 36,000 concepts and over 750,000 terms (synonyms or translations to those concepts, etc.) related to agriculture and is translated to over 35 languages.
A controlled vocabulary such as AGROVOC, has helped no less than 10 million users a year in overcoming the language barriers which have been described. Through AGROVOC’s technical infrastructure, computers can read concepts to understand that ‘maize’ as a concept is the same as ‘Maïs’ in French or ‘ذرة صفراء’ in Arabic. Translations, synonyms and relationships of this one concept are captured in one unique code, a ‘Uniform Resource Identifier’ (URI) that computers, including search engines, can read and understand.
International Research Organizations such as the International Center for Agricultural Research in the Dry Areas (ICARDA),the International Potato Center (CIP),the International Institute of Tropical Agriculture (IITA), WorldFish, the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)and World Agroforestry (ICRAF)use AGROVOC every day in their data management processes in order to achieve greater discoverability of their results and support development actors and policy makers.
With such an incredible tool and even more incredible user base as AGROVOC, one quickly starts thinking: what about land? If the AGROVOC tool covers all areas of interest to the FAO, surely land governance must be one of the topics they cover. When the Land Portal Foundation first discovered AGROVOC and engaged with the team, only 20 concepts related to land governance were included in the AGROVOC vocabulary.
It is with this in mind that, in 2012, the Land Portal Foundation did a scoping study of land information providers online and the way they classified their information. To put things simply we asked: what kind of tags are land information providers using? The main conclusion was that there was no standard vocabulary being used and no structured or uniform approach to publish information. We saw a range of sophistication in the way to classify the materials organizations were publishing, starting from no classification at all, to a standard set of keywords that could be used.
The Land Portal Foundation responded to this gap, not by creating yet another new standard, but by taking a widely accepted and used standard such as AGROVOC and enriching the concepts related to land within this vocabulary. By building on existing land glossaries, such as the FAO’s Land Tenure Thesaurus, or the Land Administration Domain Model or the Global Land Indicators Initiative, new concepts were added and translated to several languages. This particular set of land-governance related concepts in AGROVOC is now called “LandVoc - the linked land governance thesaurus”.
“The Land Portal collaboration with AGROVOC is an excellent example of how to work with an expert community. It is a community of experts on land governance working with the AGROVOC Editorial Community. The LandVoc is a real success story: not only a living controlled vocabulary, but actively used in the Land Portal information platform. Working with LandVoc has been a learning process for the AGROVOC Team, and it has been an extraordinary opportunity to work with a high-level team” say Kristin Kolshus and Imma Subirats of the FAO AGROVOC team.
LandVoc is an extremely powerful tool in making data and information more discoverable. It connects knowledge and experiences from across the world, bridging both language and cultural barriers. LandVoc is intended to be an unbranded linking tool between the different classification and tagging systems information providers in the land sector use. "It is important to highlight that LandVoc does not attempt to be a glossary of universally accepted concepts related to land governance that will never change: language lives and evolves, therefore so should vocabularies and thesauri. While the Land Portal facilitates the enrichments technically, we do not ‘own’ LandVoc nor are we the ones that decide whether a concept is warranted or a definition is inaccurate or not. LandVoc is, in the end, an unbranded product by and for the land governance community." say Laura Meggiolaro and Lisette Mey of the Land Portal Foundation.
Land concepts are contained within the AGROVOC hierarchy, but there is also a separate scheme within AGROVOC, that only contains concepts related to land governance: “LandVoc”. This LandVoc scheme has its own independent hierarchy from AGROVOC. This solution has allowed us to avoid duplication of efforts, while still making the thesauri relevant for specific expert communities. The AGROVOC team goes on to say that: “By introducing LandVoc as a specialized subvocabulary within AGROVOC, LandVoc benefits from AGROVOC infrastructure and the international network of editors. At the same time, LandVoc has enriched AGROVOC with a well-curated collection on land governance, and contributed their expertise in this key area. The Land Portal were pioneers in this regard. The recent addition of LandVoc content in Swahili was an exciting development.”
Building on this experience, AGROVOC is now exploring these options for other expert communities as well, such as fisheries and soil.“Within the dynamic and available internet resources and communication between experts internationally, there is a need for a common information system on land terminologies. The efforts made by The Land Portal Foundation in developing and maintaining the thesaurus of land governance "LandVoc" will be minimizing the difference between translation and localization of lands' related vocabularies” says Mira Haddad, Senior Research Assistant of Spatial Analyses and Database Management at ICARDA.
We have seen how semantic technologies, and particularly the use of controlled vocabularies, can increase the discoverability of data and information considerably. AGROVOC has increased the visibility of agriculture data and information and serves an audience of over 1.8 million users per month. The Land Portal’s research showed that the land sector was far from reaching such a potential since no standards are being used to classify land data and information online.
The Land Portal saw this gap and worked with the AGROVOC team at FAO to increase the 20 land-related concepts in AGROVOC to 300 unique concepts, excluding the added translations and synonyms. This set of land-related concepts within AGROVOC ,“LandVoc”, hopes to similarly increase the visibility of land data and information and help the way we exchange land data across the world. More than that, it will serve as a reference document for translations and to capture and understand the richness and complexity of the large variety of land governance terms.