The phenomenon of machine tags
The whole document in pdf form can be found File:Here
Tags in general are one of most recognized Web 2.0 products. Going one step forward, one can say that machine tags as a part of the semantic web, should become one of the most recognized Web 3.0 products. Before we try to understand usages of machine tags, first we should understand what the meanings of the tags / tagging are, what kinds of tags exists and how users of internet can exploit it.
There is not official definition of tags and tagging but there are several characteristics of the tags that are applicable to all tag. Tags are user contributed (user-generated) descriptive strings, possibly labels and keywords that are describing a piece of content. Those strings should be relevant and easily associated to the piece of content. Under the content we can understand URLs, web pages, texts, images, videos, geographic maps, blog entries etc. Tags are not same as keyword annotations. The difference is that tags are flat, disorganized, free-form strings made by users and keyword annotations are usually part of the predefined vocabulary given by different authors, web systems (web sites, web directories, web platforms etc.) or librarians.
The fact that the tags are made by humans according to their own understanding of the content can be advantage and disadvantage of the tags systems. It is advantage in the sense that user knows and understands meanings of the content (data) and by adding the tags he can easier remember, retrieve, recognize, save, browse and search for content. The major disadvantage is that the same content can be tagged differently by different people. For example, images on the Flickr could be tagged according to the place where they had been taken (geo-tags) or by its content. If we have image of the mountain we can tag it with: “winter” (time of year when image had been taken), “Zlatibor” (place), “skiing” (activity shown on the image). But same image could be tagged also with: “January” (winter month), “Obudojevica” (skiing resort on Zlatibor), “skiing”. Another problem of tagging systems is that “system” doesn’t understand meaning of the tags. For example, tag “java” can describe computer company and program, coffee and island; tag “apple” can be applicable for both computer company (Apple Inc.) and fruit. In the case of individual tagging on the personal computer, those problems are not crucial, but in the case of collaborative / sharing tagging systems (like: delicious.com, flicker.com, digg.com) those problems are critical.
In social bookmarking web sites (collaborative tagging communities), users can share tags one with another, retrieve tagged content online, search, browse and filter tags. Examples of such communities are: Delicious, Flickr, Digg etc. We can distinguish social bookmarking communities according to the type of the content they are used to tag:
- Tagging for URL (for example: del.icio.us, stumbleupon.com)
- Tagging for photos (for example: flickr.com)
- Tagging for videos (for example: youtube.com)
- Tagging for news (dig.com, reddit.com, netscape.com)
- Tagging for books (librarything.com, openlibrar.com)
- Tagging for academic articles (citeulike.com)
- Tagging for retail products (amazon.com)
Those entire collaborative tagging systems share previously described problems. Some of them try to resolve it by using the machine tags.
The idea of the machine tags follows the basic idea of the semantic web: to give a meaning to every tag, so it can be understood and interpreted by machines. The machine tags keep characteristics of the “ordinary” tags, but also provide variety of new possibilities.
The machine tags are extension of the “ordinary” tags: they are made by humans according to their understanding of the content, they are descriptive but they are written in the specific format so machine can read it, understand it and perform specific action according to it. They add extra semantic information about tag and indirectly about content. Machine tags are semi-automated (must be added by humans, and then machine can perform action) and they can be understood as link between tags and keyword annotations ( at the moment, machine tags are given by collaborative system as part of API; users can add it but they will be parsed as regular, flat tags). In the table below, there is the list of the characteristics of “ordinary” tags, machine tags and keyword annotations.
|Tags||Machine tags||Keyword annotations|
Table 1: Characteristics of the tags, machine tags and keyword annotations
Machine tags are also known as triple tags. The first examples of the machine tags are geo-tags (specific identifiers of the geographical location) provided by Delicious and GeoBloggers.
The structure of “normal” tags, which has been defined above, is flat. They are usually single words, free form strings given by users. On the contrary, machine tags have well defined and simple structure. Note that the structure of the machine tags can vary from one collaborative tagging system to another since there is no standards for machine tags jet. In this work we will be focused on the structure of the machine tags on the Flick.
Machine tags comprise three parts: a namespace, a predicate and a value.
Namespaces are used to distinguish between multiple meanings of the same term. Namespaces should be used when site-specific information is being encoded.  The namespace defines a class or a face that a tag belongs to ('geo', 'flickr', etc.). It describes “who is going to take care about the tag”.
Predicates are the type of value that is being defined. The predicate is name of the property for a namespace ('latitude', 'user', etc.) which describes “what tag applies to”. And the value is the specific value of the tag (“which one is this”). Not all machine tags need to have value. Example provided by Flicker to explain machine tags is “Upcoming.org” example  . The starting assumption is that we have images from the event that has been listed on the upcoming.org. On the Flickr, we can add machine tags: “upcoming:event=428084” where “upcoming” is namespace, “event” is the predicate and “428084” is the value. When this machine tag is added to the photo, it will be automatically shown on the upcoming.org “events” page. How? By adding this machine tag, robot squirrel on the Flickr servers would call the robot squirrels that are running on the Upcoming API and ask for Upcoming event with 428024 ID. The Upcoming robots return answer back by saying: “event under 428042 is named with Flicker Tunes 4” and Flickr stores that name in its database. Next time when user loads particular photo, Upcoming icon with name of the event is shown in the sidebar.
Image 1: structure of the machine tags
Another example is record of location information by entering latitude and longitude as geo:lat=12.345678 and geo:lon=12.345678.
Simple syntax rules of the machine tags are given below:
- A "namespace":
Namespaces MUST begin with any character between a - z; remaining characters MAY be a - z, 0 - 9 and underbars. Namespaces are case-insensitive.
- A "predicate":
Predicates MUST begin with any character between a - z; remaining characters MAY be a - z, 0 - 9 and underbars. Predicates are case-insensitive.
- A "value":
Values MAY contain any characters that a "plain vanilla" tags use. Values may also contain spaces but, like regular tags, they need to wrapped in quotes.
- Namespace and predicates are separated by a colon : ":"
- Predicates and values are separated by an equals symbol : "="
Like tags, there are no rules for machine tags beyond the syntax, described above, to specify the parts of a machine tag. For example, you could tag a photo with:
- flickr:id=2436387779 – for flickr photo id
- flora:tree=coniferous – user defined machine tag
- medium:paint=oil – user defined machine tags
- geo:quartier="plateaumontroyal" – geo tags
- geo:neighbourhood=geo:quartier – geo tags
The complete list of the namespaces and predicated that are currently existing on the Flickr can be found in Paul Mison’s machnie tags browser application, which has been described in “Machine tags searching and browsing” section. Machine tags on the Flicker are part of the API. Anyone can make machine tags but in some cases they would be preceded like normal, flat tags.
In the case of the Flicker, “machine tags are added exactly the same as any other tag whether it is done through the website or the API. When the Flickr supercomputer processes your tags, we take a moment to check whether it is a machine tag.” In addition, machine tags are also queried by API. How to add machine tags is described through examples in “Examples” section, while searching and browsing of the machine tags is given in “Machine tags search and browsing” section.
Beside normal tags that users add to the pictures (called “raw” tags on Flickr), tags that have been seen in URL (“clear” tags by Flicker) and machine tags, Flicker introduced new term: “machine tag extras”.
Under “machine tag extras” , developers of Flickr mean “the entire process of the machine tags as a kind of foreign key to access data stored on another website.” Take a look to the “upcoming.org” example again. Beside “upcoming:event=XXXX”, Flickr supports other web sites like:
- Dopplr.comis social travel web site that has launched “Social Atlas” where users can recommend places to stay, eat and visit in the places they know. The structure of the machine tags for this web site is: “dopplr:(eat|stay|explore)=XXXX”
- Open Plaques.org is the community-run web site set up to catalogue and document blue plaques that hung across the UK to commemorate persons and famous events. The structure of the machine tags for this web site is: “openplaques:id=XXXX”
- OpenLibrary.com is internet achieve devoted to make web page for “every book ever published”. The structure of the machine tags for this web site is: “openlibrary:id=XXXX”
- Burning Man project is aimed at providing a digital space which encompasses the entire event and community, both in BRC and in archival form, long after the dust has settled . The structure of the machine tags for this web site is: “burningman:(camp|art|event)=XXXX”
- Last.fm – music recommendation service. The structure of the machine tags for this web site is: “lastfm:event=XXXX”
The usages of those machine tags are described in the examples below.
Each place on Dopplr.com can be tagged with 3 machine tags:
- doppler:eat= - for tagging places with good food that you would like to recommend / remember
- doppler:stay= - for tagging places that you would like to recommend / remember as good examples for staying in
- doppler:visit= - for tagging places that should be visited in some city
The good point about the Dopplr.com is that users doesn’t have to be familiar with syntax and semantic rules of machine tags, they can just copy them from Dopplr.com web site and past code on the their photos on the Flickr.
The example of the Dopplr:stay= and dopplr:visit= machine tag is available here .
On the Doppler.com web site machine tags are used to connect different kinds of practical information about touristic destination that might be relevant for users or (potential) visitors. Dopplr.com could be considered as new, extended recommendation service; beside simply ratings, it provides photos of ranked item. If we take food as an example: instead of just providing textual feedback (or feedback in the terms of ratings on the scale)about restaurant and its “delicious” food, visitors provide also photos of food. Based on the photos other users (traveler, potential customer) can decide if the restaurant is interesting for him, if the portions are big enough etc. Similarly, accommodation can be ranked: we don’t have to rate cleanness of the rooms, we can just connect photos with particular hotel.
Since this web site is made only for UK market, I took images from Flickr that has been tagged with “openplaques:id=XXXX” machine tag. Beside id, predicates that might be used are: context (describing famous houses), todo and test. If we take “openplanques:id=1372” or on the Flicker: we will see at the bottom of the page “additional information” which is linking us to openplaques.com where all information about Thomas Lord is given . The machine tag can be seen under “tags/ show machine tags”.
Image 2: Additional info on Flickr shows openplanques.com web site which is providing additional information about Thomas Lord.
OpenPlanques.com represents the idea of how machine tags could be used in the process of gathering information and creating database on any topics. If we consider that streets could change names over time, and for each of them we have OpenPlanques.com ID, OpenPlanques.com could be used to store information about each name (person or event) and to analyses historical changes and influences in one country / region over time. Another interesting usage of OpenPlanques.com platform could be see how “popular” or influential one person is. For example, if we have same street names in several different cities, we could use “link popularity” approach to define popularity / influence of the person in some country / region.
The process of adding machine tags on the openlibrary.com is also simple. To add a book, visitors need to register and fill a form about basic information about book (name, author, publishing date, publisher and id (optionally). To add machine tag one needs to add “openlibrary:id=XXXX”. The OpenLibrary provides virtual space for each cover, which contains: title of the book, description, different identifiers (openlibrary id, ISBN needed for linking with book sellers, WorldCat ID etc), information and links where book we can buy / read / borrow a book.
Gathering books by using their cover page is attempt to provide virtual space for each book that has been ever published (note that one book can have several cover pages and publishers over time), but also to link all publishers, authors, languages and countries where book has been published, critics that exist for certain book. For example, some book editions are famous because of critics and overviews provided, while others are interesting because of illustrations that book contains inside. By using machine tags, OpenLibrary.com provides all information about single book on one place and but having it users can access the book itself and information related to book more easily.
Image 3: examples of the images with “openlibrary” namespace and interface provided on the OpenLibrary.com
In this case we can see two different goals of the using machine tags: the user who provided “Markham's master-piece” wanted to keep the book in web achieve (together with description of it and online forms of the book) and make it accessible to broad audience, while in the case of “Delicious from Goa” it is obvious that machine tags are used (in)directly in promotional / commercial purposes (there is no description, other users cannot read it online, but links of the booksellers are provided).
Adding images in the previously described examples is simple and anyone can do it – we all know which place we have visited, what is address, who published a book that we like. But can anyone put astro tags in the image? Can amateurs know precisely what stars they see on the night sky? Astrometry.net introduced a new way to help amateurs and professionals to recognize stars on the sky. They are using machine tags (astro namespace) to label astronomy photos with their celestial subject and its location. To find out the right coordinates of the stars and to correctly identify sky they need two basic predicates:
- date and time: “astro:gmt=yyyy-mm-ddThh:mm”, where yyyy is the year, mm is the month, dd is a day,T separates date and time, hh:mm represents hour and minute (24 hours clock) when picture has been taken. For example, if picture has been taken on January 12, 2011 at 9.15 in the evening, we should tag it as astro:gmt=2011-01-12T21:15
- description of the main astrological subject of the photo: “astro:subject=”. As value of the tag we should add English words or the letters in the combination with numbers. The good point here is that if user doesn’t know what is on the image, Astrology.net robot will return a list of the object return a list of all known astronomical objects in your picture. All you need to do is rollover your picture in Flickr to see what’s what and then create the appropriate tag.
Another tags that can be used are:
- “astro:pixelScale=” - describes how much of space each pixel in your photo shows
- “astro:RA= “ - measures the right ascension of the centre of your photo. Right ascension (RA) is the space equivalent of Earth’s longitude.
- “astro:Dec=” - measures the declination of the centre of your photo. Declination is the space equivalent of Earth’s latitude, that is, how far north or south something is.
- “astro:name=” - name of the objects found in your photo.
- “astro:orientation=” - which way up your picture is. It is measured in degrees east of north.
The Astrometry.net robot works differently then robots described in the “Upcoming.org” example. The detailed explanation of how the Astrometry.net robot works is given at here . The example of the image that has been tagged with astro tags can be found on Flickr (image 3).
Image 4: Astro tags on the Flickr
To realize astro tagging, in Astronomy.net project started on University of Toronto, it uses Yahoo infrastructure to deliver and transform data, the Royal Observatory at Greenwich provides leadership and expertise, US Navy Observatory catalog is used by Astrology.net robots etc. List of the resources that might be useful for further exploration of astro tags (and that has been used as resources for this example is given: here and here and here.
Interesting aspect of the astro tags is that users with different levels of knowledge can tag their photos of the night sky as professionals. In this case, Flick is used as social hub – community exploits photos and machine tags, but robots and information about stars (astro:subject) is provided by third parties. This is maybe the biggest potential of the machine tags: to provide new insights, to make knowledge available to everyone and to connect professionals in the field with average users interested in the same topic.
There is several project that are using Flickr machine tags in different purposes:
- “Developing Academic Image Collection on the Flick” made by a group at Lewis & Clark College in Portland. They are in the process of developing an educational collection of contemporary ceramics images using the photo sharing site Flickr as a back end. They are using Flickr machine tags, and the concept of Flickr as an application database layer. Link to the project is given here.
- Hacking Google Street View by Mickel Maron uses Flickr “upcoming” machine tags and GeoRSS from Upcoming.org. Full article can be found here
- Utata uses Flickr machine tags in their project. When every project is published they are publishing also the list of the machine tags that should be used to tag appropriate content (images). Utata is a collective of photographers, writers, and like-minded people who share a compelling interest in the arts. Link to their web site is given here.
Adding machine tags is exactly the same as adding “ordinary” tags whether it is done through web site (Flickr) or API. If one wants to query machine tags, he should do it by using API. There are 5 methods that are used for browsing the hierarchies of machine tags that are added to photos on the Flickr. Complete list of the machine tags could be found in Flickr API .
Existing methods for browsing machine tags are:
- flickr.machinetags.getNamespace: this method returns a list of unique namespaces, optionally limited by a given predicate, in alphabetical order.
- flickr.machinetags.getPairs: this method returns a list of unique namespace and predicate pairs, optionally limited by predicate or namespace, in alphabetical order.
- flickr.machinetags.getPredicates: this method returns a list of unique predicates, optionally limited by a given namespace.
- flickr.machinetags.getRecentValues: this method fetches recently used (or created) machine tags values.
- flickr.machinetages.getValue: this method returns a list of unique values for a namespace and predicate.
For each method Flickr provide API explorer so users can search and browse pictures according to namespaces / predicates / values. To do this search users need to have API application key (must be registered users). For example, if we want to know the value of the latitude on photos that are geo tagged (in this example I’m using http://www.flickr.com/photos/cci_media ) we have to use flickr.machinetags.getValues method and API Explorer. By adding “geo” as namespace and “latitude” as predicate (image 5) we are getting following response (image 6):
Image 5: API explorer for the flickr.machinetags.getValues method
Image 6: Response on the query from image 5
Note that <rsp stat=”ok”> confirms the addition of the machine tags, “24.54” is value of the latitude and “usage” gives us roughly how a popular machine tag is.
Another possibility is using “A Flickr machine tag browser” made by Paul Mison that can be found here .The advantage of the Mison’s machine tag browser is that user sees a list of namespaces, with existing predicates and values for each namespace and by selecting one (one namespace, one predicate and one value) user gets list of images with selected options. It can be also useful in getting to know with existing namespaces and predicates for the namespaces on the Flickr. The disadvantages are: no filtering options (list is very long, so it would be good to have it), cannot browse only namespaces / predicates / values, user doesn’t know if and how accurate lists are.
As shown in the examples, the machine tags are very powerful. They are mainly used to connect resources from the different places on the web. In the Flickr examples, it is shown how many information can be gathered and processed on the one photo: geographical location, astrological location, texts about content of the image (book covers, information about famous persons, events, music etc.), commercial and non-commercial links etc. From the end user point of view, the main advantage of the machine tags is that users are not aware of using them – machine tags are invisible for end users. All the end user has to do is to click to provided links / icons to get more information about resource (in the case of Flickr – photos).
Generally speaking machine tags can:
- improve getting new insights by connecting different resources or by providing users to learn more about its work (like in the case of astro tags – users can learn about their image)
- connect amateurs (average users interested in some topics) with professionals in the field in the indirect way – they are not communicating directly, but they are exchange experiences and knowledge using different channels (texts, photos, videos that are tagged using machine tags).
- Make knowledge available and easily accessible to everyone
- Gather and analyze data from different perspectives
- Be used in the improving web search. Based on the “Can social bookmarknig improve web search?” paper , “popular query terms and tags overlap significantly (thought tags and query terms are not correlated). “ By using machine tags, our queries would be more precise, and would that help search engines to retrieve more relevant and precise results?
Similarly to Flickr, machine tags could be applied to another social bookmarking communities and collaborative tagging systems. For example, concept of connecting photos from Flickr with appropriate “Upcoming.org” events (or with “visit” tag on Dopplr) could be also applied on YouTube videos. This could be particularly interesting for the music events. Same is with news. Take, for example, news about hurricane published on digg.com. While tagging the news, one could connect those tags with Flickr photos and Red Cross and make news more accurate. In the case of the texts and academic articles, machine tags could be way of representing resources. For example, for each resource that has been considered during the writing of the article (quoted or not), one can use particular machine tag and collect all virtual resources on the one place.
Here are web references that has been used during the exploration of the machine tags. Some of them are directly quoted (indicated in the work by adding external links):
- General data:
- Stanford University Research Lab
- Flickr links
- machine tag extras - open library
- astro tags
- burning man
- machine tag browser