Contact

Département d'informatique
Université du Québec à Montréal
CP 8888, Succ. Centre-ville
Montréal (Québec) H3C 3P8
Phone: 514-987-3000, #5516
Office: PK-4525
Email: blondin_masse[dot]alexandre
[at]uqam[dot]ca

About

I have completed my Ph.D. in mathematics and computer science under the supervision of Professors Srecko Brlek, from Université du Québec à Montréal, in Canada, and Laurent Vuillon, from Université de Savoie, in France.

Since August 1st, 2014, I'm a regular assistant-professor at Université du Québec à Montréal, in Canada.

Graphs

[français]

I am currently working on a project in which we study the structure induced by dictionary definitions. This work is done in collaboration with Guillaume Chicoisne, Zhe Fu, Philippe Galinier, Yassine Gargouri, Stevan Harnad, Mélodie Lapointe, Martin Lavoie, Marcos Lopes, Mélanie Lord, Odile Marcotte, Olivier Picard, Fatiha Sadat and Philippe Vincent-Lamarre.

Dictionaries

We use a graph-theoretical approach to represent definitional relations. For instance, consider the dictionary D whose definitions are given in the table below:

Word	Definition	Word	Definition
apple	red fruit	good	not bad
bad	not good	light	not dark
banana	yellow fruit	no	not
color	light dark	not	no
dark	not light	red	dark color
edible	good	tomato	red fruit
fruit	edible	yellow	light color

D may be represented by a directed graph as in the following figure:

A small dictionary represented as a directed graph

More precisely, there is one vertex per word and there is an arc from vertex u to vertex v if the word u appears in the definition of the word v.

Not all words have the same importance and frequency in language. When restricting our attention to dictionaries, we notice that some words appear in definitions while other ones do not. This is the case of the words apple, banana and tomato in the figure above: They do not have any successor. Therefore, if they are removed, it is still possible to recover them by looking at their definition. This process may be repeated, so that the words fruit, red and yellow are removed, followed by the words color and edible until no more word may be removed. The unique subdictionary resulting from this operation is called the grounding kernel (or just kernel) of the dictionary (see next figure).

Grounding kernel of the toy dictionary

The grounding kernel K presents many interesting properties.

Every word of the whole dictionary may be reached by definition only starting from K;
Every cycle of the dictionary belongs to K;
It is unique for each dictionary;
Empirically, natural language dictionaries reduce to about 10% of their initial size when computing K.

The reader interested in an introduction of these different concepts is referred to [1].

Minimum grounding sets

Although the grounding kernel has remarkable properties, it is not minimal: It suffices to notice that some other words may be removed without compromising the understanding of the other words. Continuing the example of the toy dictionari, one may verify that only three words are enough. Indeed, one can choose one word among bad and good, one word among no and not and, finally, one word among dark and light. There are eight possibilities. For instance, imagine some person knows only the words bad, not and light. Then this person may check the definition of good and learn it (since he knows all words in the definition of good) and so on, until all words in the dictionary are learned.

Therefore, it is of great interest to find such subsets of words big enough, and conveniently chosen. We shall call these grounding sets. Clearly, the grounding kernel is a grounding set, but, as it was said earlier, it is not minimal. We are thus interested in finding minimum grounding sets and we have shown that this is equivalent to the minimum feedback vertex set problem for directed graphs [1], which is stated as follow.

Problem 1 Let G be a graph. The minimum feedback vertex set problem is the problem of finding a subset of vertices U of the graph G such that

U is of minimal size;
If we remove the vertices of U from G and all their incident arcs, then G is acyclic, i.e. it contains no cycle.

Unfortunately, this problem is NP-complete, i.e. it is very unlikely that one will ever find a polynomial-time algorithm to compute a minimum feedback vertex set for any graph. On the other hand, since the dictionary-like graphs we are studying have a very particular structure and have reasonable size.

In August 2011, we were able to compute a minimum grounding set for two small English dictionaries. We are still working on an algorithm to compute such a set for WordNet and the Merriam Webster, both having bigger size. Our strategy combines the contraction graph operators presented in Lin and Jou's paper [2] and a linear program adaptated for the minimum feedback vertex set problem.

Hierarchies

In parallel of finding minimum grounding sets, we were interested in classifying words according to their importance in the symbolic grounding process. More precisely, let G be some graph. We construct a new graph H as follows:

Each strongly connected component of G is a vertex in H;
Let C and D be two distinct strongly connected components in G. Then there is an arc (C,D) in H if there exist a vertex u in C and a vertex v in D such that (u,v) is an arc.

It is easy to show that the resulting graph is acyclic. In other words, H is the quotient graph obtained with respect to the equivalence relation "is in the same strongly connected component" on vertices.

Therefore, it is possible to associate to every strongly connected components (and thus to every word) an integer indicating its level in the order induced by the acyclic quotient graph. More precisely, level 0 words are those belonging to the core of the graph, level 1 are the words having all level 0 predecessors, level 2 words are those having level 0-1 predecessors, etc.

When inspecting different psycholinguistic variables on the words and the evolution along the hierarchy levels induced, we noticed that

As the level of the words increases, the words are more imageable and become more abstract;
Level 0 words are more frequent and are learned earlier, and levels 1, 2, 3, ... words are less frequent and learned later, but there is no significant difference between words of level 1, 2, 3, ...

The image below summarizes these observations:

Evolution of psycholinguistic variables according to the hierarchy on words

See [3] for more details about these statistical results.

In the next year, we wish to extend those results on different dictionaries in order to check whether they are universal or not. There are also challenging problems about definitions disambiguation and semantic relationships. Please feel free to contact me if you are interested in the matter.

References

[1]	(1, 2) A. Blondin Massé, G. Chicoisne, Y. Gargouri, S. Harnad, O. Marcotte and O. Picard. How is meaning grounded in dictionary definitions?, (TextGraphs-3) 3rd Workshop on Graph-Based Algorithms for Natural Language Processing (August 24th, 2008, Manchester, United Kingdom), 8p.

[2]	H.-M. Lin and J.-Y. Jou, On computing the minimum feedback vertex set of a directed graph by contraction operations, IEEE Trans. on CAD of Integrated Circuits and Systems 19(3): 295-307, 2000

[3]	O. Picard, A. Blondin Massé, S. Harnad, O. Marcotte, G. Chicoisne and Y. Gargouri. Hierarchies in Dictionary Definition Space, (NIPS-Graphs 2009) Network Analysis and Learning with Graphs Workshop (December 11th, 2009, Whistler, Canada, 9p.