Renga CLI and SDK for Python¶
A Python library for the Renga collaborative data science platform. It allows the user to create projects, manage datasets, and capture data provenance while performing analysis tasks.
- NOTE:
renga-python
is the python library for Renga that provides an SDK and a command-line interface (CLI). It does not start the Renga platform itself - for that, refer to the Renga docs on running the platform.
This is the development branch of `renga-python` and should be considered highly volatile. The documentation for certain components may be out of sync.
Installation¶
The latest release is available on PyPI and can be installed using
pip
:
$ pip install renga
The development branch can be installed directly from the Git repository:
$ pip install -e git+https://github.com/SwissDataScienceCenter/renga-python.git@development#egg=renga
For more information about the Renga API see its documentation.
Use the Renga command line¶
Interaction with the platform can take place via the command-line interface (CLI).
Start by creating for folder where you want to keep your Renga project:
$ mkdir -p ~/temp/my-renga-project
$ cd ~/temp/my-renga-project
$ renga init
Create a dataset and add data to it:
$ renga dataset create my-dataset
$ renga dataset add my-dataset https://raw.githubusercontent.com/SwissDataScienceCenter/renga-python/development/README.rst
Run an analysis:
$ renga run wc < data/my-dataset/README.rst > wc_readme
Trace the data provenance:
$ renga log wc_readme
These are the basics, but there is much more that Renga allows you to do with your data analysis workflows.
For more information about using renga, refer to the Renga command line instructions.
Renga Command Line¶
The base command for interacting with the Renga platform.
renga
(base command)¶
To list the available commands, either run renga
with no parameters or
execute renga help
:
$ renga help
Usage: renga [OPTIONS] COMMAND [ARGS]...
Check common Renga commands used in various situations.
Options:
--version Print version number.
--config PATH Location of client config files.
--config-path Print application config path.
--path <path> Location of a Renga repository. [default: .]
--renga-home <path> Location of Renga directory. [default: .renga]
-h, --help Show this message and exit.
Commands:
# [...]
Configuration files¶
Depending on your system, you may find the configuration files used by Renga command line in a different folder. By default, the following rules are used:
- MacOS:
~/Library/Application Support/Renga
- Unix:
~/.config/renga
- Windows:
C:\Users\<user>\AppData\Roaming\Renga
If in doubt where to look for the configuration file, you can display its path
by running renga --config-path
.
You can specify a different location via the RENGA_CONFIG
environment
variable or the --config
command line option. If both are specified, then
the --config
option value is used. For example:
$ renga --config ~/renga/config/ init
instructs Renga to store the configuration files in your ~/renga/config/
directory when running the init
command.
renga init
¶
Create an empty Renga project or reinitialize an existing one.
Starting a Renga project¶
If you have an existing directory which you want to turn into a Renga project, you can type:
$ cd ~/my_project
$ renga init
or:
$ renga init ~/my_project
This creates a new subdirectory named .renga
that contains all the
necessary files for managing the project configuration.
renga datasets
¶
Work with datasets in the current repository.
Manipulating datasets¶
Creating an empty dataset inside a Renga project:
$ renga dataset create my-dataset
Adding data to the dataset:
$ renga dataset add my-dataset http://data-url
This will copy the contents of data-url
to the dataset and add it
to the dataset metadata.
renga run
¶
Track provenance of data created by executing programs.
renga log
¶
Show provenance of data created by executing programs.
renga workflow
¶
Workflow operations.
Projects¶
Model objects representing projects.
-
class
renga.models.projects.
Project
(name=None, created=NOTHING, updated=NOTHING, version='1')[source]¶ Represent a project.
Type:
"foaf:Project"
Context:
{ "name": "foaf:name", "created": "http://schema.org/dateCreated", "foaf": "http://xmlns.com/foaf/0.1/", "updated": "http://schema.org/dateUpdated", "version": "http://schema.org/schemaVersion" }
Datasets¶
Manage datasets and their metadata.
Dataset object¶
-
class
renga.models.datasets.
Dataset
(name, created=NOTHING, identifier=NOTHING, authors=NOTHING, files=NOTHING)[source]¶ Repesent a dataset.
Type:
"dctypes:Dataset"
Context:
{ "scoro": "http://purl.org/spar/scoro/", "identifier": { "@type": "@id", "@id": "dctypes:Dataset" }, "added": "http://schema.org/dateCreated", "name": "dcterms:name", "prov": "http://www.w3.org/ns/prov#", "foaf": "http://xmlns.com/foaf/0.1/", "email": "dcterms:email", "files": { "@container": "@index" }, "dcterms": "http://purl.org/dc/terms/", "affiliation": "scoro:affiliate", "url": "http://schema.org/url", "created": "http://schema.org/dateCreated", "dctypes": "http://purl.org/dc/dcmitypes/", "authors": { "@container": "@list" } }
-
from_jsonld
(data)¶ Instantiate a JSON-LD class from data.
-
Dataset file¶
Manage files in the dataset.
-
class
renga.models.datasets.
DatasetFile
(path, url=None, authors=NOTHING, dataset=None, added=NOTHING)[source]¶ Represent a file in a dataset.
Type:
"http://schema.org/DigitalDocument"
Context:
{ "scoro": "http://purl.org/spar/scoro/", "authors": { "@container": "@list" }, "added": "http://schema.org/dateCreated", "email": "dcterms:email", "url": "http://schema.org/url", "name": "dcterms:name", "dcterms": "http://purl.org/dc/terms/", "foaf": "http://xmlns.com/foaf/0.1/", "affiliation": "scoro:affiliate" }
-
from_jsonld
(data)¶ Instantiate a JSON-LD class from data.
-
Author¶
-
class
renga.models.datasets.
Author
(name, email, affiliation=None)[source]¶ Represent the author of a resource.
Type:
"dcterms:creator"
Context:
{ "scoro": "http://purl.org/spar/scoro/", "dcterms": "http://purl.org/dc/terms/", "affiliation": "scoro:affiliate", "name": "dcterms:name", "foaf": "http://xmlns.com/foaf/0.1/", "email": "dcterms:email" }
-
from_jsonld
(data)¶ Instantiate a JSON-LD class from data.
-
Tools and Workflows¶
Manage creation of tools and workflows using the Common Workflow Language (CWL).
Common Workflow language¶
Renga uses CWL to represent runnable steps (tools) along with their inputs and outputs. Similarly, tools can be chained together to form CWL-defined workflows.
Command-line tool¶
Represent a CommandLineTool
from the Common Workflow Language.
-
class
renga.models.cwl.command_line_tool.
CommandLineTool
(requirements=NOTHING, hints=NOTHING, label=None, doc=None, cwlVersion='v1.0', baseCommand='', arguments=NOTHING, stdin=None, stdout=None, stderr=None, inputs=NOTHING, outputs=NOTHING, successCodes=NOTHING, temporaryFailCodes=NOTHING, permanentFailCodes=NOTHING)[source]¶ Represent a command line tool.
Parameter¶
Represent parameters from the Common Workflow Language.
-
class
renga.models.cwl.parameter.
CommandInputParameter
(id, streamable=None, type='string', description=None, default=None, inputBinding=None)[source]¶ An input parameter for a CommandLineTool.
-
class
renga.models.cwl.parameter.
CommandLineBinding
(position=None, prefix=None, separate=True, itemSeparator=None, valueFrom=None, shellQuote=True)[source]¶ Define the binding behavior when building the command line.
-
class
renga.models.cwl.parameter.
CommandOutputBinding
(glob=None)[source]¶ Define the binding behavior for outputs.
-
class
renga.models.cwl.parameter.
CommandOutputParameter
(id, streamable=None, type='string', description=None, format=None, outputBinding=None)[source]¶ Define an output parameter for a CommandLineTool.
-
class
renga.models.cwl.parameter.
InputParameter
(id, streamable=None, type='string', description=None, default=None, inputBinding=None)[source]¶ An input parameter.
-
class
renga.models.cwl.parameter.
OutputParameter
(id, streamable=None, type='string', description=None, format=None, outputBinding=None)[source]¶ An output parameter.
-
class
renga.models.cwl.parameter.
Parameter
(streamable=None)[source]¶ Define an input or output parameter to a process.
Process¶
Represent a Process
from the Common Workflow Language.
Types¶
Represent the Common Workflow Language types.
Client¶
Creating a client¶
There are several ways to instantiate a client used for communication with the Renga platform.
- The easiest way is by calling the function
from_env()
when running in an environment created by the Renga platform itself. - The client can be created from a local configuration file by calling
from_config()
. - Lastly, it can also be configured manually by
instantiating a
RengaClient
class.
Low-level API¶
This API is built on top of REST API endpoints exposed by Renga services.
Warning
Renga services are currently in beta preview status and they are subject to change in forseenable future.
HTTP clients for Renga platform.
-
class
renga.api.
APIClient
(endpoint=None, **kwargs)[source]¶ A low-level client for communicating with a Renga Platform API.
Example:
>>> import renga >>> client = renga.APIClient('http://localhost')
Create a remote API client.
-
endpoint
¶ Return endpoint value.
-
-
class
renga.api.
LocalClient
(renga_home='.renga', datadir='data', path=NOTHING)[source]¶ A low-level client for communicating with a local Renga repository.
Example:
>>> import renga >>> client = renga.LocalClient('.')
Projects¶
Client for handling projects.
Storage¶
Client for storage service.
Contributing¶
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
Types of Contributions¶
Report Bugs¶
Report bugs at https://github.com/SwissDataScienceCenter/renga-python/issues.
If you are reporting a bug, please include:
- Your operating system name and version.
- Any details about your local setup that might be helpful in troubleshooting.
- Detailed steps to reproduce the bug.
Fix Bugs¶
Look through the GitHub issues for bugs. Anything tagged with “bug” is open to whoever wants to implement it.
Implement Features¶
Look through the GitHub issues for features. Anything tagged with “feature” is open to whoever wants to implement it.
Write Documentation¶
Renga could always use more documentation, whether as part of the official Renga docs, in docstrings, or even on the web in blog posts, articles, and such.
Submit Feedback¶
The best way to send feedback is to file an issue at https://github.com/SwissDataScienceCenter/renga-python/issues.
If you are proposing a feature:
- Explain in detail how it would work.
- Keep the scope as narrow as possible, to make it easier to implement.
- Remember that this is a volunteer-driven project, and that contributions are welcome :)
Get Started!¶
Ready to contribute? Here’s how to set up renga for local development.
Fork the SwissDataScienceCenter/renga-python repo on GitHub.
Clone your fork locally:
$ git clone git@github.com:your_name_here/renga.git
Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:
$ mkvirtualenv renga $ cd renga/ $ pip install -e .[all]
Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
When you’re done making changes, check that your changes pass tests:
$ ./run-tests.sh
The tests will provide you with test coverage and also check PEP8 (code style), PEP257 (documentation), flake8 as well as build the Sphinx documentation and run doctests.
Before you submit a pull request, please reformat the code using yapf.
$ yapf -irp .
You may want to set up yapf styling as a pre-commit hook to do this automatically:
$ curl https://raw.githubusercontent.com/google/yapf/master/plugins/pre-commit.sh -o .git/hooks/pre-commit $ chmod u+x .git/hooks/pre-commit
Commit your changes and push your branch to GitHub:
$ git add . $ git commit -s -m "component: title without verbs" -m "* NEW Adds your new feature." -m "* FIX Fixes an existing issue." -m "* BETTER Improves and existing feature." -m "* Changes something that should not be visible in release notes." $ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Pull Request Guidelines¶
Before you submit a pull request, check that it meets these guidelines:
- The pull request should include tests and must not decrease test coverage.
- If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring.
- The pull request should work for Python 2.7, 3.5, and 3.6. Check https://travis-ci.org/SwissDataScienceCenter/renga-python/pull_requests and make sure that the tests pass for all supported Python versions.
License¶
Copyright 2017-2018 - Swiss Data Science Center (SDSC)
A partnership between École Polytechnique Fédérale de Lausanne (EPFL) and
Eidgenössische Technische Hochschule Zürich (ETHZ).
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Authors¶
Python SDK and CLI for the Renga platform.
- Swiss Data Science Center <contact@datascience.ch>