ezData - A simplistic column based data framework.

tested with github workflow:

compatible with many existing dataframes: e.g. pandas

requirements: numpy, matplotlib (for plotting only) conversion to other formats require the appropriate library.

.. notes::

* requirements: numpy, matplotlib
* conversion to other formats require the appropriate library.
* some additional wrappers around bokeh, plotly and holoviews are in dev.

:author: Morgan Fouesneau

Documentation and API: link

Example notebook in the examples directory

As this package is rapidly evolving some features are only presented in the examples.

Installation

pip install git+https://github.com/mfouesneau/ezdata (--upgrade)

Why?

I always found myself writing snippets around numpy, matplotlib, pandas and other file readers. These are often the same things: read file foo and plot a against b where something is takes some values. It gets always very complex when you want to make something non-standard, for instance, for each of the 10 classes given according to this selection, make a scatter plot with these specific markers and color coded by another column.

I was basically tired of all the packages doing fancy things and not allowing basics or requiring a lot of dependencies.

This package initially focused on easily manipulating column oriented data. In particular this package allows easy conversions to many common dataframe containers: dict, numpy.recarray, pandas.DataFrame, dask.DataFrame, astropy.Table, xarray.Dataset, vaex.DataSetArrays.

I extended this package to allow myself to plot these data in the a very simple manner. Of course this was not covering all needs and thus I added interfaced to holoviews/datashader.

What is this package?

Based on the most basic functions and in particular methods of dict, I wrote this package. This basically builds advance-ish access to column oriented data through 4 main classes, 2 of which handle data and the others plotting shortcuts. This may not fit all needs, nor large data access.

dictdataframe: an advanced dictionary object. A simple-ish dictionary like structure allowing usage as array on non constant multi-dimensional column data. The :class:DataFrame container allows easier manipulations of the data but is basically a wrapper of many existing function around a dictionary object.
simpletable: a simplified version of ezTables The :class:SimpleTable allows easier manipulations of the data but is basically a wrapper of many existing function around a numpy.recarray object. It implements reading and writing ascii, FITS and HDF5 files. The :class:AstroTable built on top of the latter class, adds-on astronomy related functions, such as conesearch
plotter: this package implements :class:Plotter, which is a simple container to dictionary like structure (e.g. :class:dict, :class:np.recarray, :class:pandas.DataFrame, :class:SimpleTable). It allows the user to plot directly using keys of the data and also allows rapid group plotting routines (groupy and facets). Note that is also allows expressions instead of keys. This interface should basically work on any dictionary like structure
DSPlotter: extends :class:Plotter, to use datashader for some plots allows expressions instead of keys. This interface requires holoview and datashader

Both data structures implements common ground base to line and column access in the same transparent manner. These objects implement for instance array slicing, shape, dtypes on top of which they implement functions such as: sortby, groupby, where, join and evaluation of expressions as keys. (see examples below). Both also have a direct access to a Plotter attribute. DSPlotter is experimental and requires more than basic libraries so that it needs to be called on the side.

The data classes allows easy conversions to many common dataframe containers: numpy.recarray, pandas.DataFrame, dask.DataFrame, astropy.Table, xarray.Dataset, vaex.DataSetArrays.

Examples

Some data manipulation basics

    >>> t = SimpleTable('path/mytable.csv')
    # get a subset of columns only
    >>> s = t.get('M_* logTe logLo U B V I J K')
    # set some aliases
    >>> t.set_alias('logT', 'logTe')
    >>> t.set_alias('logL', 'logLLo')
    # make a query on one or multiple column
    >>> q = s.selectWhere('logT logL', '(J > 2) & (10 ** logT > 5000)')
    # note that `q` is also a table object
    # makes a simple plot (see :module:`plotter`)
    >>> q.Plotter.plot('logT', 'logL', ',')
    # export the initial subtable to a new file
    >>> s.write('newtable.fits')
    # or 
    >>> s.write('newtable.hd5')

Convert to other dataframe structures

    >>> t = SimpleTable('path/mytable.csv')
    >>> t.to_pandas()
    >>> t.to_dask(npartitions=5)
    >>> d = DictDataFrame(t)

Make a single plot of 'RA', 'DEC' on which each region 'BRK' is represented by a different color (colormap or other) and different marker.

    >>> p = t.Plotter.groupby('BRK', markers='<^>v.oxs', colors='parula')
    >>> p.plot('CRA', 'CDEC', 'o')
    >>> import pylab as plt
    >>> plt.legend(loc='best', numpoints=1)
    >>> plt.xlim(plt.xlim()[::-1])
    >>> plt.xlabel('RA')
    >>> plt.ylabel('DEC')

make a more complex plot: plot the histogram distribution of 'AV' per region given by 'BRK', with given color scheme per region value and individual plots with shared axis

    >>> t.Plotter.groupby('BRK', facet=True, \
            colors=plt.cm.parula, sharex=True, \
	    sharey=True).hist('AV', 
	    bins=np.linspace(t.AV.min(), 
	    t.AV.max(), 20), normed=True)
    >>> for ax in plt.gcf().axes[-3:]: ax.set_xlabel('AV')

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.github/workflows		.github/workflows
examples		examples
ezdata		ezdata
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ezData - A simplistic column based data framework.

Installation

Why?

What is this package?

Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ezData - A simplistic column based data framework.

Installation

Why?

What is this package?

Examples

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages