Presentation on theme: "What has been done in the morpho-syntactic profile ? Gil Francopoulo INRIA 17 March 2005 I tried to initiate the DCR."— Presentation transcript:
What has been done in the morpho-syntactic profile ? Gil Francopoulo INRIA 17 March 2005 I tried to initiate the DCR
A huge list of datcats has been collected by Sébastien Guérin from old ISO12620, Multext & Eagles. The first problem I faced, was that various namming conventions were used (upper case with space, camel words …) for identifier namming. => use camel word, that is: 1.The first letter is lower case 2.When the identifier is a compound, an upper letter is used without any space (e.g. grammaticalGender) 3.Else, a lower case is used. After that, I found some duplicates that I deleted. The list was still very large and it is very difficult to figure out what the values are. So I defined 2 folders in order to gather datcats: /partOfSpeech/ and /morphologicalFeature/. Let’s note that these folders are just facilities inside the software. I have done the job for datcats used in English and French. The rest of the datcats are still there, as a huge flat list.
In /partOfSpeech/ folder 1.Complex datcat/partOfSpeech/ /noun/ etc. 2.Simple datcat/commonNoun/ etc. In /morphologicalFeature/ folder 1.Complex datcat /morphologicalFeature/ /grammaticalGender/ etc. 2.Simple datcat/feminine/ etc.
No link between POS & MF: nowhere it is specified that a /noun/ can (nor must) be decorated by /grammaticalGender/ and /grammaticalNumber/ in French. It is not an ontology of datcats It is possible to access to: A flat list e.g. /feminine/ Or, thru a one level hierarchy e.g. /grammaticalGender/ and then to /feminine/
Conclusion Needs to be done: -do the same job for POS & MF for other languages -do a gathering for the other datcats