Renga CLI and SDK for Python

https://img.shields.io/travis/SwissDataScienceCenter/renga-python.svg https://img.shields.io/coveralls/SwissDataScienceCenter/renga-python.svg https://img.shields.io/github/tag/SwissDataScienceCenter/renga-python.svg https://img.shields.io/pypi/dm/renga.svg Documentation Status https://img.shields.io/github/license/SwissDataScienceCenter/renga-python.svg

A Python library for the Renga collaborative data science platform. It allows the user to create projects, manage datasets, and capture data provenance while performing analysis tasks.

NOTE:
renga-python is the python library for Renga that provides an SDK and a command-line interface (CLI). It does not start the Renga platform itself - for that, refer to the Renga docs on running the platform.

This is the development branch of `renga-python` and should be considered highly volatile. The documentation for certain components may be out of sync.

Installation

The latest release is available on PyPI and can be installed using pip:

$ pip install renga

The development branch can be installed directly from the Git repository:

$ pip install -e git+https://github.com/SwissDataScienceCenter/renga-python.git@development#egg=renga

For more information about the Renga API see its documentation.

Use the Renga command line

Interaction with the platform can take place via the command-line interface (CLI).

Start by creating for folder where you want to keep your Renga project:

$ mkdir -p ~/temp/my-renga-project
$ cd ~/temp/my-renga-project
$ renga init

Create a dataset and add data to it:

$ renga dataset create my-dataset
$ renga dataset add my-dataset https://raw.githubusercontent.com/SwissDataScienceCenter/renga-python/development/README.rst

Run an analysis:

$ renga run wc < data/my-dataset/README.rst > wc_readme

Trace the data provenance:

$ renga log wc_readme

These are the basics, but there is much more that Renga allows you to do with your data analysis workflows.

For more information about using renga, refer to the Renga command line instructions.

Renga Command Line

The base command for interacting with the Renga platform.

renga (base command)

To list the available commands, either run renga with no parameters or execute renga help:

$ renga help
Usage: renga [OPTIONS] COMMAND [ARGS]...

Check common Renga commands used in various situations.

Options:
  --version            Print version number.
  --config PATH        Location of client config files.
  --config-path        Print application config path.
  --path <path>        Location of a Renga repository.  [default: .]
  --renga-home <path>  Location of Renga directory.  [default: .renga]
  -h, --help           Show this message and exit.

Commands:
  # [...]
Configuration files

Depending on your system, you may find the configuration files used by Renga command line in a different folder. By default, the following rules are used:

MacOS:
~/Library/Application Support/Renga
Unix:
~/.config/renga
Windows:
C:\Users\<user>\AppData\Roaming\Renga

If in doubt where to look for the configuration file, you can display its path by running renga --config-path.

You can specify a different location via the RENGA_CONFIG environment variable or the --config command line option. If both are specified, then the --config option value is used. For example:

$ renga --config ~/renga/config/ init

instructs Renga to store the configuration files in your ~/renga/config/ directory when running the init command.

renga init

Create an empty Renga project or reinitialize an existing one.

Starting a Renga project

If you have an existing directory which you want to turn into a Renga project, you can type:

$ cd ~/my_project
$ renga init

or:

$ renga init ~/my_project

This creates a new subdirectory named .renga that contains all the necessary files for managing the project configuration.

renga datasets

Work with datasets in the current repository.

Manipulating datasets

Creating an empty dataset inside a Renga project:

$ renga dataset create my-dataset

Adding data to the dataset:

$ renga dataset add my-dataset http://data-url

This will copy the contents of data-url to the dataset and add it to the dataset metadata.

renga run

Track provenance of data created by executing programs.

renga log

Show provenance of data created by executing programs.

renga workflow

Workflow operations.

Projects

Model objects representing projects.

class renga.models.projects.Project(name=None, created=NOTHING, updated=NOTHING, version='1')[source]

Represent a project.

Type:

"foaf:Project"

Context:

{
  "name": "foaf:name",
  "created": "http://schema.org/dateCreated",
  "foaf": "http://xmlns.com/foaf/0.1/",
  "updated": "http://schema.org/dateUpdated",
  "version": "http://schema.org/schemaVersion"
}
class renga.models.projects.ProjectCollection(client=None)[source]

Represent projects on the server.

Example

Create a project and check its name.

# >>> project = client.projects.create(name=’test-project’) # >>> project.name # ‘test-project’

Create a representation of objects on the server.

class Meta[source]

Information about individual projects.

model

alias of Project

create(name=None, **kwargs)[source]

Create a new project.

Parameters:name – The name of the project.
Returns:An instance of the newly create project.
Return type:Project

Datasets

Manage datasets and their metadata.

Dataset object

class renga.models.datasets.Dataset(name, created=NOTHING, identifier=NOTHING, authors=NOTHING, files=NOTHING)[source]

Repesent a dataset.

Type:

"dctypes:Dataset"

Context:

{
  "scoro": "http://purl.org/spar/scoro/",
  "identifier": {
    "@type": "@id",
    "@id": "dctypes:Dataset"
  },
  "added": "http://schema.org/dateCreated",
  "name": "dcterms:name",
  "prov": "http://www.w3.org/ns/prov#",
  "foaf": "http://xmlns.com/foaf/0.1/",
  "email": "dcterms:email",
  "files": {
    "@container": "@index"
  },
  "dcterms": "http://purl.org/dc/terms/",
  "affiliation": "scoro:affiliate",
  "url": "http://schema.org/url",
  "created": "http://schema.org/dateCreated",
  "dctypes": "http://purl.org/dc/dcmitypes/",
  "authors": {
    "@container": "@list"
  }
}
from_jsonld(data)

Instantiate a JSON-LD class from data.

Dataset file

Manage files in the dataset.

class renga.models.datasets.DatasetFile(path, url=None, authors=NOTHING, dataset=None, added=NOTHING)[source]

Represent a file in a dataset.

Type:

"http://schema.org/DigitalDocument"

Context:

{
  "scoro": "http://purl.org/spar/scoro/",
  "authors": {
    "@container": "@list"
  },
  "added": "http://schema.org/dateCreated",
  "email": "dcterms:email",
  "url": "http://schema.org/url",
  "name": "dcterms:name",
  "dcterms": "http://purl.org/dc/terms/",
  "foaf": "http://xmlns.com/foaf/0.1/",
  "affiliation": "scoro:affiliate"
}
from_jsonld(data)

Instantiate a JSON-LD class from data.

Author

class renga.models.datasets.Author(name, email, affiliation=None)[source]

Represent the author of a resource.

Type:

"dcterms:creator"

Context:

{
  "scoro": "http://purl.org/spar/scoro/",
  "dcterms": "http://purl.org/dc/terms/",
  "affiliation": "scoro:affiliate",
  "name": "dcterms:name",
  "foaf": "http://xmlns.com/foaf/0.1/",
  "email": "dcterms:email"
}
check_email(attribute, value)[source]

Check that the email is valid.

from_commit(commit)[source]

Create an instance from a Git commit.

from_git(git)[source]

Create an instance from a Git repo.

from_jsonld(data)

Instantiate a JSON-LD class from data.

Tools and Workflows

Manage creation of tools and workflows using the Common Workflow Language (CWL).

Common Workflow language

Renga uses CWL to represent runnable steps (tools) along with their inputs and outputs. Similarly, tools can be chained together to form CWL-defined workflows.

Command-line tool

Represent a CommandLineTool from the Common Workflow Language.

class renga.models.cwl.command_line_tool.CommandLineTool(requirements=NOTHING, hints=NOTHING, label=None, doc=None, cwlVersion='v1.0', baseCommand='', arguments=NOTHING, stdin=None, stdout=None, stderr=None, inputs=NOTHING, outputs=NOTHING, successCodes=NOTHING, temporaryFailCodes=NOTHING, permanentFailCodes=NOTHING)[source]

Represent a command line tool.

get_output_id(path)[source]

Return an id of the matching path from default values.

to_argv(job=None)[source]

Generate arguments for system call.

class renga.models.cwl.command_line_tool.CommandLineToolFactory(command_line, directory='.', stdin=None, stderr=None, stdout=None)[source]

Command Line Tool Factory.

file_candidate(candidate)[source]

Return a path instance if it exists in current directory.

generate_tool()[source]

Return an instance of command line tool.

guess_inputs(*arguments)[source]

Yield command input parameters and command line bindings.

guess_outputs(paths)[source]

Yield detected output and changed command input parameter.

guess_type(value)[source]

Return new value and CWL parameter type.

split_command_and_args()[source]

Return tuple with command and args from command line arguments.

validate_command_line(attribute, value)[source]

Check the command line structure.

validate_path(attribute, value)[source]

Path must exists.

watch(repo=None, no_output=False)[source]

Watch a Renga repository for changes to detect outputs.

Parameter

Represent parameters from the Common Workflow Language.

class renga.models.cwl.parameter.CommandInputParameter(id, streamable=None, type='string', description=None, default=None, inputBinding=None)[source]

An input parameter for a CommandLineTool.

to_argv()[source]

Format command input parameter as shell argument.

class renga.models.cwl.parameter.CommandLineBinding(position=None, prefix=None, separate=True, itemSeparator=None, valueFrom=None, shellQuote=True)[source]

Define the binding behavior when building the command line.

to_argv(default=None)[source]

Format command line binding as shell argument.

class renga.models.cwl.parameter.CommandOutputBinding(glob=None)[source]

Define the binding behavior for outputs.

class renga.models.cwl.parameter.CommandOutputParameter(id, streamable=None, type='string', description=None, format=None, outputBinding=None)[source]

Define an output parameter for a CommandLineTool.

class renga.models.cwl.parameter.InputParameter(id, streamable=None, type='string', description=None, default=None, inputBinding=None)[source]

An input parameter.

class renga.models.cwl.parameter.OutputParameter(id, streamable=None, type='string', description=None, format=None, outputBinding=None)[source]

An output parameter.

class renga.models.cwl.parameter.Parameter(streamable=None)[source]

Define an input or output parameter to a process.

class renga.models.cwl.parameter.WorkflowOutputParameter(id, streamable=None, type='string', description=None, format=None, outputBinding=None, outputSource=None)[source]

Define an output parameter for a Workflow.

renga.models.cwl.parameter.convert_default(value)[source]

Convert a default value.

Process

Represent a Process from the Common Workflow Language.

class renga.models.cwl.process.Process[source]

Represent a process.

Types

Represent the Common Workflow Language types.

class renga.models.cwl.types.File(path)[source]

Represent a file.

Workflow

Represent workflows from the Common Workflow Language.

class renga.models.cwl.workflow.Workflow(inputs=NOTHING, requirements=NOTHING, hints=NOTHING, label=None, doc=None, cwlVersion='v1.0', outputs=NOTHING, steps=NOTHING)[source]

Define a workflow representation.

add_step(**kwargs)[source]

Add a workflow step.

get_output_id(path)[source]

Return an id of the matching path from default values.

class renga.models.cwl.workflow.WorkflowStep(run, id=NOTHING, in_=None, out=None)[source]

Define an executable element of a workflow.

Client

Creating a client

There are several ways to instantiate a client used for communication with the Renga platform.

  1. The easiest way is by calling the function from_env() when running in an environment created by the Renga platform itself.
  2. The client can be created from a local configuration file by calling from_config().
  3. Lastly, it can also be configured manually by instantiating a RengaClient class.
renga.client.from_env()

Return a client configured from environment variables.

RENGA_ENDPOINT

The URL to the Renga platform.

RENGA_ACCESS_TOKEN

An access token obtained from Renga authentication service.

Example:

>>> import renga
>>> client = renga.from_env()
renga.cli._client.from_config()[source]

Create a new client for endpoint in the config.

Use renga command-line interface to manage multiple configurations.

Client reference

class renga.client.RengaClient[source]

A client for communicating with a Renga platform.

Example:

>>> import renga
>>> client = renga.RengaClient('http://localhost')

Create a Renga API client.

Low-level API

This API is built on top of REST API endpoints exposed by Renga services.

Warning

Renga services are currently in beta preview status and they are subject to change in forseenable future.

HTTP clients for Renga platform.

class renga.api.APIClient(endpoint=None, **kwargs)[source]

A low-level client for communicating with a Renga Platform API.

Example:

>>> import renga
>>> client = renga.APIClient('http://localhost')

Create a remote API client.

delete(*args, **kwargs)[source]

Perform the DELETE request and check its status code.

endpoint

Return endpoint value.

get(*args, **kwargs)[source]

Perform the GET request and check its status code.

post(*args, **kwargs)[source]

Perform the POST request and check its status code.

put(*args, **kwargs)[source]

Perform the PUT request and check its status code.

class renga.api.LocalClient(renga_home='.renga', datadir='data', path=NOTHING)[source]

A low-level client for communicating with a local Renga repository.

Example:

>>> import renga
>>> client = renga.LocalClient('.')

Projects

Client for handling projects.

class renga.api.projects.ProjectsApiMixin[source]

Client for handling projects.

create_project(project)[source]

Create a new project and register it on the knowledge graph.

get_project(project_id)[source]

Get existing project.

list_projects()[source]

Return an iterator for all projects.

Storage

Client for storage service.

class renga.api.storage.BucketsApiMixin[source]

Client for handling storage buckets.

create_bucket(**kwargs)[source]

Create a new storage bucket.

storage_bucket_metadata_replace(resource_id, data)[source]

Replace resource metadata.

storage_info()[source]

Return information about available bucket backends.

class renga.api.storage.FilesApiMixin[source]

Client for handling file objects in a bucket.

create_file(**kwargs)[source]

Create a new file object.

storage_authorize(resource_id=None, request_type=None)[source]

Request authorization token for performing file handle request.

storage_copy_file(resource_id=None, file_name=None, **kwargs)[source]

Request a file copy.

storage_file_metadata_replace(resource_id, data)[source]

Replace resource metadata.

storage_io_read(*args, **kwargs)[source]

Write data to the file.

Note

Use only with access_token issued by storage service.

storage_io_write(data)[source]

Write data to the file.

Note

Use only with access_token issued by storage service.

Deployer

Client for deployer service.

class renga.api.deployer.ContextsApiMixin[source]

Manage deployer contexts.

create_context(spec)[source]

Create a new deployer context.

create_execution(context_id, **kwargs)[source]

Create an execution of a context on a given engine.

execution_logs(context_id, execution_id)[source]

Retrieve logs of an execution.

execution_ports(context_id, execution_id)[source]

Retrieve port mappings for an execution.

get_context(context_id)[source]

List all known contexts.

get_execution(context_id, execution_id)[source]

Retrieve an execution.

list_contexts()[source]

List all known contexts.

list_executions(context_id)[source]

List all executions of a given context.

stop_execution(context_id, execution_id)[source]

Stop a running execution.

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

Types of Contributions

Report Bugs

Report bugs at https://github.com/SwissDataScienceCenter/renga-python/issues.

If you are reporting a bug, please include:

  • Your operating system name and version.
  • Any details about your local setup that might be helpful in troubleshooting.
  • Detailed steps to reproduce the bug.
Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” is open to whoever wants to implement it.

Implement Features

Look through the GitHub issues for features. Anything tagged with “feature” is open to whoever wants to implement it.

Write Documentation

Renga could always use more documentation, whether as part of the official Renga docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback

The best way to send feedback is to file an issue at https://github.com/SwissDataScienceCenter/renga-python/issues.

If you are proposing a feature:

  • Explain in detail how it would work.
  • Keep the scope as narrow as possible, to make it easier to implement.
  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!

Ready to contribute? Here’s how to set up renga for local development.

  1. Fork the SwissDataScienceCenter/renga-python repo on GitHub.

  2. Clone your fork locally:

    $ git clone git@github.com:your_name_here/renga.git
    
  3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:

    $ mkvirtualenv renga
    $ cd renga/
    $ pip install -e .[all]
    
  4. Create a branch for local development:

    $ git checkout -b name-of-your-bugfix-or-feature
    

    Now you can make your changes locally.

  5. When you’re done making changes, check that your changes pass tests:

    $ ./run-tests.sh
    

    The tests will provide you with test coverage and also check PEP8 (code style), PEP257 (documentation), flake8 as well as build the Sphinx documentation and run doctests.

    Before you submit a pull request, please reformat the code using yapf.

    $ yapf -irp .
    

    You may want to set up yapf styling as a pre-commit hook to do this automatically:

    $ curl https://raw.githubusercontent.com/google/yapf/master/plugins/pre-commit.sh -o .git/hooks/pre-commit
    $ chmod u+x .git/hooks/pre-commit
    
  6. Commit your changes and push your branch to GitHub:

    $ git add .
    $ git commit -s
        -m "component: title without verbs"
        -m "* NEW Adds your new feature."
        -m "* FIX Fixes an existing issue."
        -m "* BETTER Improves and existing feature."
        -m "* Changes something that should not be visible in release notes."
    $ git push origin name-of-your-bugfix-or-feature
    
  7. Submit a pull request through the GitHub website.

Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

  1. The pull request should include tests and must not decrease test coverage.
  2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring.
  3. The pull request should work for Python 2.7, 3.5, and 3.6. Check https://travis-ci.org/SwissDataScienceCenter/renga-python/pull_requests and make sure that the tests pass for all supported Python versions.

Changes

Version 0.1.0 (released TBD)

  • Initial public release.

License

Copyright 2017-2018 - Swiss Data Science Center (SDSC)
A partnership between École Polytechnique Fédérale de Lausanne (EPFL) and
Eidgenössische Technische Hochschule Zürich (ETHZ).

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Authors

Python SDK and CLI for the Renga platform.