Clowder Framework

Open Source Data Management for Long Tail Data Data catalogs in the clouds

Clowder is a customizable and scalable data management framework to support any data format and multiple research domains.

Flexible Metadata Representation

Support for both user-defined and machine-defined metadata. System accepts metadata in a flexible representation based on JSON-LD. Users can add metadata entries directly from the UI. Extractors and external clients can attach metadata to files and datasets using the Web service API.

Human readable metadata.
Machine parsable metadata.

Automatic Metadata Extraction

When new data is added to the system, whether it is via the web front-end or through its Web service API, a cluster of extraction services process the data to extract interesting metadata and create web based data visualizations.

Extend the system by creating new extractors to analyze data. Using the publish-subscribe model and the RabbitMQ broker, when certain events occur in Clowder, such as the uploading of a new file, a message is published notifying any listening metadata extractors that a new file is available. Each extractor can then use the public Web Service API to analize the data and write back to Clowder any relevant information.

A partial list of available extractors is available on GitHub, in the NCSA Bitbucket and in the wiki.

Metadata about the image automatically extracted by cloud metadata extractors.

Data Visualizations

To preview the content of large files and visualize the information contained in files and datasets in a meaningful way, Clowder provides ways to write Javascript based widgets that can be added to files and datasets. Often these data previews are added by automatic extractions.

For example, the geospatial extractors watch for shapefiles and geotiff files to be uploaded to Clowder and then submit the GIS layers to an instance of Geoserver. A custom Javascript previewer visualizes the data on an interactive map.