danielselman

Author

danielselman

Dan Selman is a CTO with over 25 years of experience in the IT industry. He has created software products for BEA Systems, ILOG, IBM and others. He co-founded Clause, Inc. (acquired by DocuSign in 2021) and is a founder and maintainer of accordproject.org under the Linux Foundation.

TemplateMark Compilation

This is part two of a series of articles on Accord Project TemplateMark. In this article I will show how TemplateMark can be statically compiled to TypeScript. I recommend you read Getting Started with TemplateMark if you have not yet done so.

May 18, 2023 0

Getting Started with TemplateMark

TemplateMark is a data format from Accord Project that describes formatted natural language with embedded variables. In this article I introduce TemplateMark, the problems it solves and how to use it as an application developer.

Continue reading →

May 16, 2023 1

Ethical LLM Whac-A-Mole

Anthropic recently released Claude their LLM trained to be “helpful, honest, and harmless”. Much has been written about Anthropic’s laudable approach, including their philosophy of “constitutional AI“. In this post we take a look at how Claude works in practice, and the enormous challenges posed by using natural language as a general purpose interface.

Continue reading →

May 16, 2023 1

You Can See The Specialist Now

I’m becoming increasingly convinced that the conversational AI future is a mixture of general (foundational) large language models (LLMs) that can provide a high-level diagnosis of a situation or question, and which then delegate to LLMs for specialized reasoning. The general LLM is used to process generic language to orchestrate calls to specialized services and LLMs with deep domain knowledge, and then to potentially summarise and synthesis the results back into a general form for the end-user.

Continue reading →

May 9, 2023 0

Custom Attributes, Properties and Concepts

In my job at DocuSign I provide pragmatic guidance to teams building various “smart” agreement features, which increasingly require a digital (aka computable aka symbolic) representation of knowledge, facts and rules; or what is called Knowledge Representation and Reasoning (KRR). In this article I introduce the high-level concepts and challenges, particularly related to “custom attributes” and machine learning.

Continue reading →

May 2, 2023 1

Breaking the Language Barrier: Why Large Language Models Need Open Text Formats

Foundational LLMs are trained on huge corpuses of text collected from the public Internet, including websites, books, Wikipedia, GitHub, academic papers, chat logs, Enron emails (!) etc. One of the better known public collections of training data is called The Pile and is an 800 GB dataset of diverse text for language modelling.

In this article I will examine how the training sets for LLMs should influence your choice of data formats and best-practices for data formats that can be generated by LLMs.

Continue reading →

April 24, 2023 0

The 8 Billion Person Question

The latest advances in artificial intelligence (particularly large language models) continue to reverberate. Even for an “old school” AI person like myself (who cut his teeth with Prolog) it is clear that there has been a step change in our ability to create computer systems that can interact with humans using natural language. GPT-4 et al are exhibiting early signs of “common sense” and have encoded useful conceptual representations of the world. The debate rages on as to whether this is “intelligence”, but to an engineer like me, it sure seems useful!

Continue reading →

April 21, 2023 0

Does ChatGPT Understand?

If you work in technology you will be aware that OpenAI ChatGPT has taken off like a 🚀, and the press is filled with people making rash prognostications, from gloomsters and doomsters to hypesters, and everything between.

Continue reading →

March 28, 2023 4