Expressivity and Complexity of MongoDB (Extended Version)

Elena Botoeva, Diego Calvanese, Benjamin Cogrel, and Guohui Xiao

Technical Report, arXiv.org e-Print archive. CoRR Technical Report arXiv:1603.09291 2017. Available at https://arxiv.org/abs/1603.09291.

A significant number of novel database architectures and data models have been proposed during the last decade. While some of these new systems have gained in popularity, they lack a proper formalization, and a precise understanding of the expressivity and the computational properties of the associated query languages. In this paper, we aim at filling this gap, and we do so by considering MongoDB, a widely adopted document database managing complex (tree structured) values represented in a JSON-based data model, equipped with a powerful query mechanism. We provide a formalization of the MongoDB data model, and of a core fragment, called MQuery, of the MongoDB query language. We study the expressivity of MQuery, showing its equivalence with nested relational algebra. We further investigate the computational complexity of significant fragments of it, obtaining several (tight) bounds in combined complexity, which range from LOGSPACE to alternating exponential-time with a polynomial number of alternations. As a consequence, we obtain also a characterization of the combined complexity of nested relational algebra query evaluation.


@techreport{Corr-2017-mongodb,
   title = "Expressivity and Complexity of MongoDB (Extended Version)",
   year = "2017",
   author = "Elena Botoeva and Diego Calvanese and Benjamin Cogrel and
Guohui Xiao",
   institution = "arXiv.org e-Print archive",
   number = "arXiv:1603.09291",
   note = "Available at https://arxiv.org/abs/1603.09291",
}
pdf url