Amazon EMR Job Polling CLI

Command line interface for submitting and polling the state of Elastic MapReduce jobs.

Why?

By default, the Amazon EMR APIs are asynchronous. They return a Resource ID (ClusterId, StepId), and require additional calls to get the current state.

There is wait functionality in the AWS CLI, however, it only supports the commands wait cluster-running and wait cluster-terminated.

This project abstracts the EMR Steps API. You can submit Spark or MapReduce steps to a cluster, and poll until they succeed or fail. It can be useful if you have an ETL workflow that involves dependencies between EMR and non-EMR tasks.

Usage

Run tests in a virtual environment with tox
Install depedencies with pip install -r requirements.txt
Run python -m emr.job_client --help for help with the available options
Optional: modify the logging.yml configuration, or set the LOG_CFG environment variable to a custom file.

Note: an error will be raised if the named EMR cluster isn't found using your AWS profile or instance role.

Example

Command:

python -m emr.job_client --env qa \
    --profile qa \
    --cluster-name Sandbox \
    --job-name WordCount \
    --job-runtime Java \
    --job-args "hdfs:///text-input/" \
    --artifact-path s3://us-east-1.elasticmapreduce/samples/wordcount.jar \
    --main-class org.apache.spark.examples.WordCount \
    --no-auto-terminate \
    --poll-cluster

Output:

INFO     environment=qa, cluster=Sandbox, job=WordCount, action=check-runtime, runtime=Java
INFO     environment=qa, cluster=Sandbox, job=WordCount, action=get-clusters, count=1, clusterList=[{"id": "j-6AEOL53QG34E", "name": "Sandbox", "state": "WAITING"}]
INFO     aws emr add-steps --profile qa --cluster-id j-2MTD0ERMUNR2A --steps Type=Spark,Name=WordCount,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--conf,'spark.app.name=WordCount',--class,org.apache.spark.examples.WordCount,--conf,'spark.driver.extraJavaOptions=-DenvironmentKey=qa',--conf,'spark.executor.extraJavaOptions=-DenvironmentKey=qa',s3://us-east-1.elasticmapreduce/samples/wordcount.jar,hdfs:///text-input/]
{
    "StepIds": [
        "s-1YGO9JYO1D6HV"
    ]
}
INFO     environment=qa, cluster=Sandbox, job=WordCount, action=add-job-step
INFO     environment=qa, cluster=Sandbox, job=WordCount, action=get-steps, clusterId=j-6AEOL53QG34E, numSteps=1
INFO     environment=qa, cluster=Sandbox, job=WordCount, action=poll-cluster, stepId=s-1GJOV3B7L7228, state=PENDING, createdTime=2017-12-28T18-20-08, minutesElapsed=1.0
INFO     environment=qa, cluster=Sandbox, job=WordCount, action=poll-cluster, stepId=s-1GJOV3B7L7228, state=RUNNING, createdTime=2017-12-28T18-20-08, minutesElapsed=2.0
INFO     environment=qa, cluster=Sandbox, job=WordCount, action=poll-cluster, stepId=s-1GJOV3B7L7228, state=RUNNING, createdTime=2017-12-28T18-20-08, minutesElapsed=3.0
INFO     environment=qa, cluster=Sandbox, job=WordCount, action=poll-cluster, stepId=s-1GJOV3B7L7228, state=RUNNING, createdTime=2017-12-28T18-20-08, minutesElapsed=4.0
INFO     environment=qa, cluster=Sandbox, job=WordCount, action=poll-cluster, stepId=s-1GJOV3B7L7228, state=COMPLETED, createdTime=2017-12-28T18-20-08, minutesElapsed=5.0

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.github		.github
emr		emr
test		test
.coveragerc		.coveragerc
.gitignore		.gitignore
.travis.yml		.travis.yml
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.rst		README.rst
__init__.py		__init__.py
requirements.txt		requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon EMR Job Polling CLI

Why?

Usage

Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Amazon EMR Job Polling CLI

Why?

Usage

Example

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages