mala::home Davide “+mala” Eynard’s website

24Sep/120

New TR: Multimodal diffusion geometry by joint diagonalization of Laplacians

Hi all,

this paper is something I am particularly happy to share, as it is the first report related to my new research theme (wow, this reminds me that I should update my research page!). The coolest aspect of this topic is that, despite looking different from my previous work, it actually has a lot of points in common with it.

As some of you may know, many previous works of mine heavily relied on different implementations of the concept of similarity (e.g. similarity between tags, between tourism destinations, and so on). This concept has many interpretations, depending on how it is translated into an actual distance for automatic calculation (this is what typically happens in practice, no matter how "semantic" your interpretation is supposed to be).

One of the main problems is: in a rich and social ecosystem like the Web is, it is frequent to find different ways to define/measure similarity between entities. For instance, two images could be considered similar according to some visual descriptors (e.g. SIFT, or color histograms), to tags associated with them (e.g. "lake", "holiday", "bw"), to some descriptive text (e.g. a Wikipedia page describing what is depicted), metadata (e.g. author, camera lens, etc.), and so on. Moreover, people might not agree on what is similar to what, as everyone has their own subjective way of categorizing stuff. The result is that often there is no single way to relate similar entities. This is sometimes a limit (how can we say that our method is the correct one?) but also an advantage: for instance, when entities need to be disambiguated it is useful to have different ways of describing/classifying them. This is, I believe, an important step towards (more or less) automatically understanding the semantics of data.

The concept I like most behind this work is that there are indeed ways to exploit these different measures of similarity and (pardon me if I banalize it too much) find some kind of average measure that takes all of them into account. This allows, for instance, to tell apart different acceptations of the same word as it can be applied in dissimilar contexts, or photos that share the same graphical features but are assigned different tags. Some (synthetic and real-data) examples are provided, and finally some friends of mine will understand why I have spent weeks talking about swimming tigers ;-). The paper abstract follows:

"We construct an extension of diffusion geometry to multiple modalities through joint approximate diagonalization of Laplacian matrices. This naturally extends classical data analysis tools based on spectral geometry, such as diffusion maps and spectral clustering. We provide several synthetic and real examples of manifold learning, retrieval, and clustering demonstrating that the joint diffusion geometry frequently better captures the inherent structure of multi-modal data. We also show that many previous attempts to construct multimodal spectral clustering can be seen as particular cases of joint approximate diagonalization of the Laplacians."

… and the full text is available on ArXiv. Enjoy, and remember that --especially in this case, as this is mostly new stuff for me-- comments are more than welcome :-)

7Sep/120

New paper: Exploiting tag similarities to discover synonyms and homonyms in folksonomies

[This is post number 5 of the "2012 publications" series. Read here if you want to know more about this]

I have posted a new publication in the Research page:

Davide Eynard, Luca Mazzola, and Antonina Dattolo. Exploiting tag similarities to discover synonyms and homonyms in folksonomies.

"Tag-based systems are widely available thanks to their intrinsic advantages, such as self-organization, currency, and ease of use. Although they represent a precious source of semantic metadata, their utility is still limited. The inherent lexical ambiguities of tags strongly affect the extraction of structured knowledge and the quality of tag-based recommendation systems. In this paper, we propose a methodology for the analysis of tag-based systems, addressing tag synonymy and homonymy at the same time in a holistic approach: in more detail, we exploit a tripartite graph to reduce the problem of synonyms and homonyms; we apply a customized version of Tag Context Similarity to detect them, overcoming the limitations of current similarity metrics; finally, we propose the application of an overlapping clustering algorithm to detect contexts and homonymies, then evaluate its performances, and introduce a methodology for the interpretation of its results."

The editor (John Wiley & Sons, Ltd.) requested not to directly make the paper available online. However I have "the personal right to send or transmit individual copies of this PDF to colleagues upon their specific request provided no fee is charged, and further-provided that there is no systematic distribution of the Contribution, e.g. posting on a listserv, website or automated delivery." So, just drop me an email if you want to read it and I will send it to you (in a non-systematic way ;-))

9Jul/120

New paper: Harvesting User Generated Picture Metadata To Understand Destination Similarity

[This is post number 4 of the "2012 publications" series. Read here if you want to know more about this]

I have posted a new publication in the Research page:

Alessandro Inversini, Davide Eynard. Harvesting User Generated Picture Metadata To Understand Destination Similarity.

This is an extension of a previous work for the Journal of Information Technology & Tourism, providing additional and updated information gathered with new user surveys.

"Pictures about tourism destinations are part of the contents shared online through social media by travelers. User-generated pictures shared in social networks carry additional information such as geotags and user descriptions of places that can be used to identify groups of similar destinations. This article investigates the possibility of defining destination similarities relying on implicit information already shared on the Web. Additionally, the possibility of recommending one city on the basis of a given set of pictures is explored. Flickr. com was used as a case study as it represents the most popular picture sharing website. The results indicate that it is possible to group similar destinations according to picture-related information, and recommending destinations without requiring users' profiles or sets of explicit preferences.".

 

18Jun/121

New (old) paper: Finding similar destinations with Flickr Geotags

[This is post number 3 of the "2012 publications" series. Read here if you want to know more about this]

I have posted a new publication in the Research page:

Davide Eynard, Alessandro Inversini, and Leonardo Gentile (2012). Finding similar destinations with Flickr Geotags.

"The amount of geo-referenced information on the Web is increasing thanks to the large availability of location-aware mobile devices and map interfaces. In particular, in photo
collections like Flickr the coexistence of geographic metadata and text-based annotations (tags) can be exploited to infer new, useful information. This paper introduces a novel method to generate place profiles as vectors of user-provided tags from Flickr geo-referenced photos. These profiles can then be used to measure place similarity in terms of the distance between their matching vectors. A Web-based prototype has been implemented and used to analyze two distinct Flickr datasets, related to a chosen set of top tourism destinations. The system has been evaluated by real users with an online survey. Results show that our method is suitable to define similar destinations. Moreover, according to users, enriching place description with information from user activities provided better similarities".

Filed under: papers, research 1 Comment
11Jun/120

New (old) paper: A Modular Framework to Learn Seed Ontologies from Text

[This is post number 2 of the "2012 publications" series. Read here if you want to know more about this]

I have posted a new publication in the Research page:

Davide Eynard, Matteo Matteucci, and Fabio Marfia (2012).A Modular Framework to Learn Seed Ontologies from Text

"Ontologies are the basic block of modern knowledge-based systems; however the effort and expertise required to develop them are often preventing their widespread adoption. In this chapter we present a tool for the automatic discovery of basic ontologies –we call them seed ontologies– starting from a corpus of documents related to a specific domain of knowledge. These seed ontologies are not meant for direct use, but they can be used to bootstrap the knowledge acquisition process by providing a selection of relevant terms and fundamental relationships. The tool is modular and it allows the integration of different methods/strategies in the indexing of the corpus, selection of relevant terms, discovery of hierarchies and other relationships among terms. Like any induction process, also ontology learning from text is prone to errors, so we do not expect from our tool a 100% correct ontology; according to our evaluation the result is more close to 80%, but this should be enough for a domain expert to complete the work with limited effort and in a short time".

This work is part of the book "Semi-Automatic Ontology Development: Processes and Resources" edited by Maria Teresa Pazienza and Armando Stellato.

5Jun/120

Paper review: A Tutorial on Spectral Clustering

Lately I am experimenting with spectral clustering. I find it a very interesting approach (well, family of) to clustering and I think that the paper that most helped me to have a grasp of it was "A Tutorial on Spectral Clustering", by Ulrike von Luxburg. In case you are curious about it, here is its abstract:

"In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed".

Of course this is just a beginning, and if you are interested in the topic I would also suggest you to read at least another couple of papers such as Ng, Jordan, and Weiss: "On spectral clustering: analysis and an algorithm" or "Laplacian eigenmaps for dimensionality reduction and data representation" by Belkin and Niyogi.

While reading von Luxburg's paper, I took some notes that might be handy if you want to have a brief summary of the main concepts or explain them to somebody else. I actually reused them for a class in PoliMI, empirically demonstrating that in research nothing is unuseful ;-) Here are the slides I made, enjoy!



Filed under: papers, research No Comments
4Jun/120

New (old) paper: Destinations Similarity Based on User Generated Pictures’ Tags

[This is post number 1 of the "2012 publications" series. Read here if you want to know more about this]

I have posted a new publication in the Research page:

Alessandro Inversini, Davide Eynard, Leonardo Gentile, and Marchiori Elena (2012). Destinations Similarity Based on User Generated Pictures' Tags.

"Pictures about tourism destinations are part of the contents shared online through social media
by travelers. Additional picture information, such as geo-tags and user description of a place,
can be used to create groups of similar destinations. This paper investigates the possibility of
defining destination similarities based on implicit information already shared on the Web.
Flickr.com was used as a case study as it represents the most popular picture sharing website.
Results show the possibility to group similar destinations based on visual components,
represented by the contents of the pictures, and the related tag descriptions".

Filed under: papers, research No Comments
4Jun/120

2012 publications series

I have recently attended the "Promoting your academic profile on the Web" at USI. It was a good workshop (thanks Lorenzo and Nadzeya!) and allowed me to get an idea about how well I am communicating who I am and what I do to the world. As you might know already, I think this is pretty important if your aim is to make your research (or in general your work) open and accessible to everyone.

The introspection that followed the workshop made me realize how often I put doing stuff before communicating it. I mean, of course I do that when I publish a paper, but too often I do not let others know that the paper I wrote exists! For instance, I realized that as of today I had not updated my publication list with any of the work published in 2012... D'oh :-)

For this reason I have decided not only to update that list, but also to write one post for each new paper, making the abstract and additional material available. In this post, instead, I will keep an updated list of links to this year's publications, so you can just watch this page and see when something new has been posted.

2012 publication series:

Filed under: blog, papers, research No Comments
17Feb/120

Internet Technology (2011-2012) assignments are online

After one year here is another update regarding my Internet Technology class (see here for last year's update). Unfortunately it will also be the last one, at least for the class as it is now, because the master I was teaching this class for has been closed :-/. But hey, there are many ways in which knowledge can be shared and that master was only one, right?

So here they are, the new papers written by my dear students! This year fewer have been shared, but I think their quality kind of compensates the amount. So do not worry if you cannot access all of them and enjoy the fact that the ones you can read are willingly shared by students with a CC BY-NC-SA license :-) If you are interested in any of the topics let me know and I might try to put you in contact with the authors.

Filed under: papers, teaching, web No Comments
14Feb/120

New year, new you

... starting from the blog theme. Of course I have just downloaded a ready made one, otherwise with my taste you would have probably gotten something painful for your eyes ;-)

New year's resolutions? Plenty. But after last year's ones, my  main resolution is no promises :-). And no creativity-killer posts: I'll try to stay far away from those topics I know will stop me from writing instead of  incentivating me. I'll try to make this fun and useful, first of all for me. And if you find something useful here too, well, good for you ;-)

Fist post of the year, first after a long while... And to leave you with some more food for thought than the one you would have just by reading news about my wordpress themes, here you are:

Alon, Uri: "How To Choose a Good Scientific Problem". Molecular cell doi:10.1016/j.molcel.2009.09.013 (volume 35 issue 6 pp.726 - 728).

Here's the abstract:

"Choosing good problems is essential for being a good scientist. But what is a good problem, and how do you choose one? The subject is not usually discussed explicitly within our profession. Scientists are expected to be smart enough to figure it out on their own and through the observation of their teachers. This lack of explicit discussion leaves a vacuum that can lead to approaches such as choosing problems that can give results that merit publication in valued journals, resulting in a job and tenure."

I found the paper very inspiring and I agreed with most of it. Here are few sentences I particularly liked:

  • "A lab is a nurturing environment that aims to maximize the potential of students as scientists and as human beings."
  • "The projects that a particular researcher finds interesting are an expression of a personal filter, a way of perceiving the world. This filter is associated with a set of values: the beliefs of what is good, beautiful, and true versus what is bad, ugly, and false."
  • "... when one can achieve self-expression in science, work becomes revitalizing, self- driven, and laden with personal meaning."

What do you think about it? I think that this self-expression, this possibility of projecting my personal values in my work is one of the main reasons I have chosen to do it. Of course, this is also constraining me somehow: what happens when I work with others? What if there is a clash of values between me and my collaborators? Finally, one last big question arises: how much is this applicable for other job? Is there a chance for everyone to achieve this self-expression or only for someone? What about those who can't?

Ok, enough food for today ;-) One last link, which you might find interesting if you liked this paper too: Uri Alon Lab homepage, where you can find more materials for nurturing scientists.

Take care, have a great 2012!

Filed under: blog, papers, research No Comments