data models

Overview of the MPEG-21 Media Contract Ontology

I was recently passed a pointer to the MPEG-21 Media Contract Ontology (thank you Niall!). This post is a high-level summary of the very informative paper (linked above) by Víctor Rodríguez-Doncela, Jaime Delgadob, Silvia Llorenteb, Eva Rodríguezb and Laurent Boch.

The Media Contract Ontology (MCO) is an OWL ontology formalizing a vocabulary to represent business contracts in the media content industry. MCO contracts are RDF documents using that vocabulary.

Continue reading →

December 18, 2023 0

Why Concepts and Clauses?

Variables without concepts are not conceptually grounded.

What does {{amount}} mean in a template? How about {{height}}?

Contrast with a locale-independent concept model / ontology / data model:

concept Loan has-a-property Amount (a Monetary Amount)

Continue reading →

November 28, 2023 0

Tips for Concept Organisation

Imagine you are setting up a book shop. You will primarily sell a selection of books, but to diversify your revenue streams you will also sell board games, coffee and greeting cards. How should you organise your store so that your customers can find your products?

Continue reading →

July 30, 2023 0

Maps Considered Harmful

Developers have an obsession with maps or dictionaries. Adding a map to an entity makes it “extensible”, in as far as arbitrary keys and values can now be associated with the entity, or so they think.

Continue reading →

June 23, 2023 1

TemplateMark Compilation

This is part two of a series of articles on Accord Project TemplateMark. In this article I will show how TemplateMark can be statically compiled to TypeScript. I recommend you read Getting Started with TemplateMark if you have not yet done so.

Continue reading →

May 18, 2023 0

Getting Started with TemplateMark

TemplateMark is a data format from Accord Project that describes formatted natural language with embedded variables. In this article I introduce TemplateMark, the problems it solves and how to use it as an application developer.

Continue reading →

May 16, 2023 1

Custom Attributes, Properties and Concepts

In my job at DocuSign I provide pragmatic guidance to teams building various “smart” agreement features, which increasingly require a digital (aka computable aka symbolic) representation of knowledge, facts and rules; or what is called Knowledge Representation and Reasoning (KRR). In this article I introduce the high-level concepts and challenges, particularly related to “custom attributes” and machine learning.

Continue reading →

May 2, 2023 1

Breaking the Language Barrier: Why Large Language Models Need Open Text Formats

Foundational LLMs are trained on huge corpuses of text collected from the public Internet, including websites, books, Wikipedia, GitHub, academic papers, chat logs, Enron emails (!) etc. One of the better known public collections of training data is called The Pile and is an 800 GB dataset of diverse text for language modelling.

In this article I will examine how the training sets for LLMs should influence your choice of data formats and best-practices for data formats that can be generated by LLMs.

Continue reading →

April 24, 2023 0

Does ChatGPT Understand?

If you work in technology you will be aware that OpenAI ChatGPT has taken off like a 🚀, and the press is filled with people making rash prognostications, from gloomsters and doomsters to hypesters, and everything between.

Continue reading →

March 28, 2023 4