Title 



Generating, sampling and counting subclasses of regular tree languages
 
Author 



 
Abstract 



To experimentally validate learning and approximation algorithms for XML Schema Definitions (XSDs), we need algorithms to generate uniformly at random a corpus of XSDs as well as a similarity measure to compare how close the generated XSD resembles the target schema. In this paper, we provide the formal foundation for such a testbed. We adopt similarity measures based on counting the number of common and different trees in the two languages, and we develop the necessary machinery for computing them. We use the formalism of extended DTDs (EDTDs) to represent the unranked regular tree languages. In particular, we obtain an efficient algorithm to count the number of trees up to a certain size in an unambiguous EDTD. The latter class of unambiguous EDTDs encompasses the more familiar classes of singletype, restrained competition and bottomup deterministic EDTDs. The singletype EDTDs correspond precisely to the core of XML Schema, while the others are strictly more expressive. We also show how constraints on the shape of allowed trees can be incorporated. As we make use of a translation into a wellknown formalism for combinatorial specifications, we get for free a sampling procedure to draw members of any unambiguous EDTD. When dropping the restriction to unambiguous EDTDs, i.e. taking the full class of EDTDs into account, we show that the counting problem becomes #Pcomplete and provide an approximation algorithm. Finally, we discuss uniform generation of singletype EDTDs, i.e., the formal abstraction of XSDs. To this end, we provide an algorithm to generate koccurrence automata (kOAs) uniformly at random and show how this leads to uniform generation of singletype EDTDs.   
Language 



English
 
Source (book) 



Proceedings of the 14th International Conference on Database Theory (ICDT 2011), Uppsala, Sweden, March 2124, 2011 / Milo, Tova [edit.]  
Publication 



S.l. : ACM, 2011
 
ISBN 



9781450305297
 
Volume/pages 



p. 3041
 
Full text (Publishers DOI) 


  
