In this article I would like provide some high-level guidance for domain (data) modelling. This article is for you if you are relatively new to domain or concept modelling. It uses Accord Project Concerto but the principles should apply to many modelling languages.
Goals
Models can serve many different purposes, from documentation to data validation, to code generation, to computation. Before you start defining your domain model it’s worth spending a few minutes thinking about your motivations and goals. A model is an abstraction of reality, hence what you decide is important to include vs what you leave out will depend on your goals.
“The map is not the territory….”
Understand Your Domain
It’s impossible to create a good model to represent a domain if you don’t understand the domain. Ideally you also have access to Subject Matter Experts that you can interview, existing models or sample documents.
You should consider using an LLM driven tool, like Finchbot.net to help refine your understanding of the domain.
Understand Your Metamodel
Now you have an idea of WHAT you would like to represent it’s time to think about HOW to represent it. The tools in your toolbox are your metamodel. If you are creating a relational database this would be things like tables, columns, rows, constraints. If you are creating an API, this would be REST endpoints and JSON Schema or GraphQL. If you are creating a semantic web application it could be RDF tripes or OWL ontologies.
For the purposes of this article we will be using Accord Project Concerto, a domain modelling language that can be converted to many of the formats above for deployment. Yes, you can think of it as a model for models!
We will describe the major elements of the Concerto metamodel in the sections below and describe how to use (and not use) them. For each I have provided a “linguistic hint” which can be useful to influence your modelling work.
Enumerations
| Linguistic Hint | The values for … are … |
| When to use | To model a static/fixed (known a-priori) set of values. |
| Good example | enum OrderStatus { |
| When not to use | If there are a very large number of values, or if the values change frequently. E.g. all the SKUs in a large product catalog. Do not use to indicate the type of a concept, instead define a concept type. If the set of values is open-ended or extensible it may indicate that this is not an enumeration, but is in fact an abstract concept (see below). |
Concepts
| Linguistic Hint | We define the concept of …. as … |
| When to use | To model a concept/type in the domain. A concept is often a noun (a thing with properties). A concept may optionally be identified by an identifying field. Concerto also includes specialized types of concept: Asset, Participant, Event, Transaction which add more semantic hints. |
| Good example | concept Order identifier by id { |
| When not to use | If the set of concepts is fixed/small and known a-priori in some cases an enumeration is more appropriate. Use a scalar if the concept is a simple wrapper type for a primitive value. Use a map if the properties of the concept are unknown or very loosely defined. |
Abstract Concepts
| Linguistic Hint | Concept … is a generic … concept. |
| When to use | To model an abstract concept/type in the domain. An abstract concept is often a noun (a thing with properties) that must be specialized and cannot exist as-such. We can talk about an abstract concept Animal in general, but never say “create a new Animal”. Abstract concepts allow the shared/generic properties of a hierarchy/taxonomy of related concepts to be defined. |
| Good example | abstract concept Pet { |
| When not to use | When the concept could be instantiated/exist in the domain. |
Concept Specialisation
| Linguistic Hint | The … is-a (type of) … |
| When to use | To model that a concept specialises another. A specialisation adds new properties to an existing concept, while retaining the names and semantics of existing properties. In the example below, an Employee is-a Person, so can be used anywhere a Person could be used. |
| Good example | concept Person { |
| When not to use | When the substitution principle is not desirable, i.e. it may not make sense in an automated assembly line to say that Robot extends Employee, as that would also state that a Robot is a Person. Also, because there is a single “is-a” relationship for a concept, you cannot state that a Robot is both an Employee AND a Machine. For these use cases it is better to think in terms of containment and features/properties: A Robot has-the Machine property as well as the Employee property. |
Concept Properties
| Linguistic Hint | The concept …. has-a property … |
| When to use | To define the named properties of a concept, grouping them into a set that is appropriate for the domain. Use meta properties: array, optional, validators to define the multiplicity of the property, whether it is optional and any property specific validation. Has-a properties (in contrast to relationships) bind the lifecycle of the owning concept and the property together: when the owning concept is destroyed, so are all its owning properties. |
| Good example | concept Person { |
| When not to use | If the property definition occurs frequently across many concepts consider promoting it to a scalar. If the property is a relationship between two types of concepts consider using a relationship. |
Relationship Properties
| Linguistic Hint | The concept …. is related to … |
| When to use | To define the relationships between one concept and other identified concepts. Relationships are “smart pointers” in that they encode a one-way relationship from one concept to another, but allow each concept to have its own lifecycle. In the example below, destroying a Person does not destroy the Vehicle that is related to the Person, and vice-a-versa. |
| Good example | concept Person identified by id { |
| When not to use | Relationships can only be declared to identified concepts. |
Scalars
| Linguistic Hint | The … property is defined as … |
| When to use | To define new types of properties, useful to add custom validation facets to existing primitive types (String, Integer, Double, Long…). Scalars are also useful to document and convey additional semantics. |
| Good example | scalar SSN extends String regex=/^(?!(000|666|9))[0-9]{3}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}$/ |
| When not to use | Can be overkill if the property is only being used in a small number of places |
Maps
| Linguistic Hint | Concept … has a collection/map/dictionary … accessed/indexed by … |
| When to use | To define open-ended concepts, where the property names are not known a-priori. Scalars are often useful to convey the semantics of String keys and/or values. |
| Good example | concept Contact { |
| When not to use | Do not use to short-circuit modelling, resorting to the “null model”, where everything is represented as maps, with strings as keys and objects as values! Read more about pitfalls with maps here. |
Namespaces
| Linguistic Hint | The … model/domain defines the following concepts … |
| When to use | To group related concepts, scalars, enumerations etc. into a unit for modularity, reuse and lifecycle control. A namespace typically defines all the concepts required for a given (sub)domain. Often namespaces need to be granular enough to support a multiuser lifecycle. Namespaces are versioned and must explicitly import types from other namespaces. |
| Good example | namespace customer@1.0.0 |
| When not to use | Namespaces must be used to organise concepts etc. Do not create namespaces that are too granular that it is hard to map from a concept to its owning namespace. Namespaces should typically align with the business intent/domain — not arbitrary technical considerations. For complex applications it can be useful to think about an explicit namespace dependency graph/tree. |
Decorators
| Linguistic Hint | The namespace/concept/property …. has the metadata … |
| When to use | Decorators literally “decorate” a domain model with application specific metadata (sometimes called annotations). Decorations should be like the “wallpaper” on the “house” of your domain model. In the example below the pii (personally identifiable information) decorator has been added to two properties of Person (email and ssn) while the height property remains undecorated. Concerto doesn’t known anything about the semantics of decorators, they are defined by the application that is creating and reading the model. Decorators can be applied to namespaces, concepts/enums/maps/scalars or their properties. |
| Good example | concept Person {@pii @pii o Double height |
| When not to use | Do not use decorators to change the underlying semantics of the Concerto metamodel. You should assume that code that is reading the model will understand Concerto metamodel semantics but will ignore decorators. These readers should still be able to usefully introspect and manipulate the model. |
Summary
I hope you found this introduction to data modelling best-practices, and the Concerto metamodel, useful. Don’t hesitate to comment with your own tips-and-tricks. The Accord Project Working Group and Discord channel is a great place to share your experiences and ask questions.
Leave a comment