Formalization of the classification pattern: survey of classification modeling in information systems engineering

Softw Syst Model. 2018;17(1):167-203. doi: 10.1007/s10270-016-0521-5. Epub 2016 Apr 16.

Abstract

Formalization is becoming more common in all stages of the development of information systems, as a better understanding of its benefits emerges. Classification systems are ubiquitous, no more so than in domain modeling. The classification pattern that underlies these systems provides a good case study of the move toward formalization in part because it illustrates some of the barriers to formalization, including the formal complexity of the pattern and the ontological issues surrounding the "one and the many." Powersets are a way of characterizing the (complex) formal structure of the classification pattern, and their formalization has been extensively studied in mathematics since Cantor's work in the late nineteenth century. One can use this formalization to develop a useful benchmark. There are various communities within information systems engineering (ISE) that are gradually working toward a formalization of the classification pattern. However, for most of these communities, this work is incomplete, in that they have not yet arrived at a solution with the expressiveness of the powerset benchmark. This contrasts with the early smooth adoption of powerset by other information systems communities to, for example, formalize relations. One way of understanding the varying rates of adoption is recognizing that the different communities have different historical baggage. Many conceptual modeling communities emerged from work done on database design, and this creates hurdles to the adoption of the high level of expressiveness of powersets. Another relevant factor is that these communities also often feel, particularly in the case of domain modeling, a responsibility to explain the semantics of whatever formal structures they adopt. This paper aims to make sense of the formalization of the classification pattern in ISE and surveys its history through the literature, starting from the relevant theoretical works of the mathematical literature and gradually shifting focus to the ISE literature. The literature survey follows the evolution of ISE's understanding of how to formalize the classification pattern. The various proposals are assessed using the classical example of classification; the Linnaean taxonomy formalized using powersets as a benchmark for formal expressiveness. The broad conclusion of the survey is that (1) the ISE community is currently in the early stages of the process of understanding how to formalize the classification pattern, particularly in the requirements for expressiveness exemplified by powersets, and (2) that there is an opportunity to intervene and speed up the process of adoption by clarifying this expressiveness. Given the central place that the classification pattern has in domain modeling, this intervention has the potential to lead to significant improvements.

Keywords: Classification; Classification system; Powerset; Powertype; Set theory.