Tuesday, July 3, 2012

Getting Started with MetCat | Provenance Aware Metadata Catalog for Apache Airavata


In this post, I am going to describe how to get started with MetCat and show you how to perform basic.




About Project


MetCat project will develop metadata catalog targeted to be integrated with Apache Airavata project. The project focuses on capturing metadata from workflows and assist in scalable metadata management and user defined queries.
For more information, see Project Site and Project Host page 

( New to Airavata ? Learn Airavata in less than 10 mins, by following Airavata in 5 minutes and Airavata in 10 minutes tutorials. )


Current Release of MetCat


Currently MetCat doesn't have a stable release. But you can build the MetCat 0.1-SNAPSHOT version from developers' trunk. see building from source section for more information.
(Known Issues MetCat 0.1-SNAPSHOT: This is a snapshot release and less on features. This version is not recommended for production usage.)

Requirements for MetCat

  • Java >= 1.6 (Oracle have been tested)
  • Linux Environment is preferred, since currently we don't have MetCat binary executor files for Windows environment. ( But you can run it on Window by executing jars and again this is not fully tested on Windows environment.)
  • Running Apache Airavata server. Requred 0.3-incubating or higher version of Airavata. (If you don't have running server, set up Airavata server locally with help with this Tutorial.)

Building From Source


    • Unzip/Untar the source file or check out from svn location http://svn.codespot.com/a/apache-extras.org/metcat/trunk/
    svn checkout http://svn.codespot.com/a/apache-extras.org/metcat/trunk/ metcat-read-only
    
    • cd to project folder and type
    mvn clean install
    • The compressed binary distributions are created at 
    <your_project_source_location>/distribution/target
    


    Let's Configure MetCat

    • Extract apache-airavata-metcat-$VERSION-bin.tar.gz or unzip  apache-airavata-metcat-$VERSION-bin.zip
    • Let's say unzip location as <METCAT_HOME>
    • Configure Apache Cassandra properties at <METCAT_HOME>/cassandra/apache-cassandra-1.1.1/conf , only if you need.
    • Configure Metcat Properties using <METCAT_HOME>/conf
      •  msgBrokerMonitor.properties : To set metcat server listener port and set Airavata message broker URL to subscribe work-flow notifications.
      • cassandra.properties : set Cassandra server details and keyspace for data storage.
    ( Please do not change those properties, if you are using local Airavata server and local Cassandra server using default configuration. Those are for Advance MetCat configuration.)

    Now we're ready to Start MetCat.

    Starting MetCat

    •   cd to <METCAT_HOME>/bin
    •   Start Cassandra by executing cassandra.sh 
    sh cassandra.sh
    
    •   Start MetCat-server by executing metcat-server.sh 
    sh cassandra.sh
    

    Done... :)


    Monitoring workflow metadata

    Currently we don't support monitoring/query back results from cassandra database. This feature will added later. But you can use Cassandra-GUI to monitor Cassandra data till then.

    So run a workflow as described in Airavata in 5 minutes and Airavata in 10 minutes tutorials. Use Cassandra-GUI  to see extracted workflow metadata and how they stored in Cassandra cluster .  



    MetCat Road Map

    you can find MetCat Road map from Here.


    Got an Error/Need some Help or interested on this project

    Please do not hesitate to contact MetCat developers using https://groups.google.com/d/forum/metcat-dev