- Installation
Assume that the DDF package is already successfully built. We need to install the requirements for pyddf:
$ cd <DDF_DIRECTORY>/python
$ pip install -r requirements.txt
$ python setup.py develop
Then we will need to set the $DDF_HOME environment variable:
$ cd <DDF_DIRECTORY>
$ export DDF_HOME=`pwd`
Optional: to make the DDF_HOME variable to be available for all working session, we can add the following command into
the ~/.bash_profile (on MacOS) or ~/.profile (on other Unix systems):
export DDF_HOME=<DDF_DIRECTORY>
Now open your Python interpreter:
$ cd <DDF_DIRECTORY>/python
$ ipython
or if you don't set the DDF_HOME variable previously:
$ cd <DDF_DIRECTORY>/python
$ DDF_HOME=../ ipython
Of course, python will work just fine if you don't have IPython.
Now inside the Python interpreter, the DDF API is ready for usage:
>>> import ddf
>>> from ddf import DDFManager, DDF_HOME
>>> dm = DDFManager('spark')
>>> dm.sql('set hive.metastore.warehouse.dir=/tmp/hive/warehouse', False)
>>> dm.sql('drop table if exists mtcars', False)
>>> dm.sql("CREATE TABLE mtcars (mpg double, cyl int, disp double, hp int, drat double, wt double,"
" qesc double, vs int, am int, gear int, carb string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '", False)
>>> dm.sql("LOAD DATA LOCAL INPATH '" + DDF_HOME + "/resources/test/mtcars' INTO TABLE mtcars", False)
>>> ddf = dm.sql2ddf('select * from mtcars', False)
>>> print('Columns: ' + ', '.join(ddf.colnames))
>>> print('Number of columns: {}'.format(ddf.cols))
>>> print('Number of rows: {}'.format(ddf.rows))
>>> ddf.summary()
>>> ddf.head(10)
>>> ddf.aggregate('sum(mpg), min(hp)', 'vs, am')
>>> ddf.five_nums()
>>> ddf.sample(10)
>>> dm.shutdown()
- Run tests
$ cd <YOUR_DDF_DIRECTORY>/python
$ DDF_HOME=../ python examples/basics.py
$ DDF_HOME=../ python tests/test_ddf.py