Metadata in Data Modelling
Metadata is widely interpreted as "data about data", however this is a highly simplistic notion, although it is included in the ISO11179 definition.
You could also argue that metadata is a description of data, however this covers an enormous range of topics.
Metadata covers two main areas:
- Data or information about, connected to or related to the concept being represented.
- A Description or definition of the data container that is being used.
In the first we are refering to things such as the lineage of the data, the ownership, the quality, the governance, the area of operations that the data is used in. For instance the electronic tracks that are generated by radar systems contain some very basic information about the object being tracked, normally it's position (lat-lon), and it's direction, although that may be a derived piece of data. Primarily we have a prime concept which is being referred to, such as a book, the book has a conceptual definition, and it also have many attributes or properties which enhance the core definition, such as author. It may also have relationships to other concepts, such as publishing house or library, which may need to be included in the information system.
In the second case we are referring to the structure of the database, codebase or language in which the data is held. A database can be modelled using Entity-Relationship diagrams, and there are several notations based around the core ERD concepts, originally put forward by Chen (REF), such as Bachman, Martin or what is called "Crows Foot". These representations of entities vary, although the core ideas behind them are very closely aligned. They are mostly applied to what are termed relational database management systems (RDBMS), which were at the time these concepts were put forward the dominant database structure in use. However today there are many types of databases available, XML databases, graph databases, key-value store databases, all of which have a different structure and cannot easily be described using the RDBMS notations. Data is stored in a great many different systems, all of which have different descriptions.
Example diagrams
These are taken from commercial diagramming applications LucidChart and SmartDraw and can be seen below:
The notation is different, however what is common is the idea that there are entities and relationships. Where 2 entities are related there are a number of ways of describing that relationship, it can be optional or mandatory, it can be one to many, many to one, or many to many. And it is at this level of detail that the diagram notations will differ, normally only slightly.
Example: A Book
This is an ERD diagram, generated with Visual Paradigm (another commercial diagramming tool) which illustrates a data model for a book.
Details of books are stored in a table called : Book, and this table is related to one called: Author, and another called: Publisher. The Book table stores information about the book, such as title, author, subject and publisher, although the information about the author and the publisher is stored in a separate database table. Here there is a possibility that a book will have more than one author, and an author will have contributed to more than one book, and so we have an intermediate link table defined.
Similarly with the relationship between publisher and book, we have a relationship which is illustrated by the crow's foot symbol at one end:
The crow's foot indicates that one end of the relationship has many Entities One Entity is indicated by the single line out, and the cross at the opposite end of the line. Thus the relationship is indicated by the lines shown, in our example here the Book entity potentially has many authors and the Author entity potentially has (or has written) many books.
This kind of modelling allows the data modeller to build a set of entities in a relational database which ideally mean that data is stored in only one place, which makes for easier data management.
For the purposes of managing a catalogue of books, perhaps as part of a bookstore or library this kind of database is sufficient.
The data model we have illustrated is sufficent for simple data storage, which is what it has been designed for. However for data governance more information needs to be linked to or related to this core data model.
The way we do this is using a metadata registry.