Uncertainty is ubiquitous in data and can take various forms. Usually, this is not formally taken into account: only the most likely data interpretation is kept for future processing, or all probable choices of correct information above a threshold are maintained. We claim this is not sufficient. There is a need for managing the imprecision in data more rigorously, and the current thesis addresses this problem with focus on semistructured information. What we present is a theoretical study of a probabilistic XML database management system. We present a general architecture of such a system and investigate four of its main aspects: (1) Underlying data models, where we introduce two novel probabilistic XML models: one that accounts for continuous probability distributions and another one based on recursive Markov chains. (2) Probabilistic updates, that we see as the main mechanism to embrace uncertain data in probabilistic models. (3) Query answering (which is vital for any data management system) with tree-pattern queries and monadic second-order queries. (4) Aggregate query answering, which is essential for data mining. For probabilistic XML data models we study expressiveness and succinctness, as well as (efficient) translations between various models. For updates, query answering, and aggregating we present a detailed complexity analysis and algorithms. We also illustrate our querying techniques on various data mining tasks.