Development on Clowder v2 is progressing. Recently our core development team had in-depth design discussions regarding the architecture of metadata for files and datasets in v2. We have begun implementing aspects of this architecture and wanted to describe it below.
Metadata Structure
In the database, metadata is composed of 4 pieces of information:
One change in v2 is improved handling predefined metadata fields. Metadata fields can be defined in the database with:
This means that users can refer to the metadata fields in their context rather than the more involved provision of links to URIs or JSON schema documents. The specification of data types will also allow the Clowder UI to provide widgets for users adding the metadata via the interface, e.g. date fields providing a calendar widget.
File Versioning
In v2, files are automatically versioned, meaning users can replace files as needed. Older file versions will remain accessible for viewing and download. Because metadata is often generated by extractors based on the contents of a file, changes to a file may necessitate re-running extractors or replacing metadata that is no longer applicable.
To manage this, metadata is associated with a specific file version in v2. When a file is updated to a new version any existing metadata will be carried over, but changes to the metadata will only affect that specific version. Older file versions will retain the old metadata ( permitted users can still modify previous metadata versions).
The intent is that if, for example, a text file was updated with additional contents, re-running the wordcount extractor will generate correct wordcounts for the new version will retaining the correct wordcounts for the old version as well.
** User vs. Extractor metadata **
As briefly mentioned above, the distinction between user and extractor metadata categories is going away. Every piece of metadata will now have a user associated for ownership purposes, even metadata generated by an extractor - in those cases, the user who triggered the extractor will be listed. We intend for this to reduce some of the complexity in metadata handling and conflict detection.
We are adding more features to update and replace metadata. In Clowder v1, running the same extractor multiple times would attach duplicate metadata to the resource unless the extractor itself was coded to avoid this; we are now building in duplication and replacement handling to make it easy to update specific fields or whole metadata objects and avoid inadvertent duplication.