A Formal Presentation of MongoDB (Extended Version)

Elena Botoeva, Diego Calvanese, Benjamin Cogrel, Martin Rezk, and Guohui Xiao

Technical Report, arXiv.org e-Print archive. CoRR Technical Report arXiv:1603.09291v1 2016. Available at https://arxiv.org/abs/1603.09291v1.

A significant number of database architectures and data models have been proposed during the last decade. While some of these new systems have gained in popularity, their formal semantics are generally still missing. In this paper, we consider the symptomatic case of MongoDB, a widely adopted document database, in which roughly speaking relational tables correspond to collections, and tuples to documents. We provide a formalization of the JSON-based data model adopted by MongoDB, and of a core fragment of the MongoDB aggregation query language, MUPGL, which includes the match, unwind, project, group, and lookup operators. We study the expressiveness of MUPGL by defining a relational view of MongoDB databases and developing a translation from relational algebra to MUPGL. Notably, we show that the MUPG fragment is already at least as expressive as full relational algebra over (the relational view of ) a single collection, and in particular able to express arbitrary joins. We further investigate the computational complexity of MUPGL and of significant fragments of it.


@techreport{Corr-2016-mongodb-v1,
   title = "A Formal Presentation of MongoDB (Extended Version)",
   year = "2016",
   author = "Elena Botoeva and Diego Calvanese and Benjamin Cogrel and
Martin Rezk and Guohui Xiao",
   institution = "arXiv.org e-Print archive",
   number = "arXiv:1603.09291v1",
   note = "Available at https://arxiv.org/abs/1603.09291v1",
}
url