Dask Release 1.1.0
I’m pleased to announce the release of Dask version 1.1.0. This is a major release with bug fixes and new features. The last release was 1.0.0 on 2018-11-29. This blogpost outlines notable changes since the last release.
You can conda install Dask:
conda install dask
or pip install from PyPI:
pip install dask[complete] --upgrade
Full changelogs are available here:
Notable Changes
A lot of work has happened over the last couple months, and we encourage people to look through the changelog to get a sense of the kinds of incremental changes that developers are working on.
There are also a few notable changes in this release that we’ll highlight here:
- Support for the recent Numpy 1.16 and Pandas 0.24 releases
- Support for Pandas Extension Arrays (see Tom Augspurger’s post on the topic)
- High level graph in Dask dataframe and operator fusion in simple cases
- Increased support for other libraries that look enough like Numpy and Pandas to work within Dask Array/Dataframe
Support for Numpy 1.16 and Pandas 0.24
Both Numpy and Pandas have been evolving quickly over the last few months. We’re excited about the changes to extensibility arriving in both libraries. The Dask array/dataframe submodules have been updated to work well with these recent changes.
Pandas Extension Arrays
In particular Dask Dataframe supports Pandas Extension arrays, meaning that it’s easier to use third party Pandas packages like CyberPandas or Fletcher in parallel with Dask Dataframe.
For more information see Tom Augspurger’s post
High Level Graphs in Dask Dataframe
For a while Dask array has had some high level graphs for “atop” operations (elementwise, broadcasting, transpose, tensordot, reductions), which allow for reduced overhead and task fusion on computations within this class.
y = da.exp(x + 1).T # These operations get fused to a single task
We’ve renamed atop
to blockwise
to be a bit more generic, and have also
started applying it to Dask Dataframe, which helps to reduce overhead
substantially when doing computations with many simple operations.
This still needs to be improved to increase the class of cases where it works, but we’re already seeing nice speedups on previously unseen workloads.
The da.atop
function has been deprecated in favor of da.blockwise
. There
is now also a dd.blockwise
which shares a common code path.
Non-Pandas dataframe and Non-Numpy array types
We’re working to make Dask a bit more agnostic to the types of in-memory array and dataframe objects that it can manipulate. Rather than having Dask Array be a grid of Numpy arrays and Dask Dataframe be a sequence of Pandas dataframes, we’re relaxing that constraint to a grid of Numpy-like arrays and a sequence of Pandas-like dataframes.
This is an ongoing effort that has targetted alternate backends like
scipy.sparse
, pydata/sparse
, cupy
, cudf
and other systems.
There were some recent posts on arrays and dataframes that show proofs of concept for this with GPUs.
Acknowledgements
There have been several releases since the last time we had a release blogpost. The following people contributed to the dask/dask repository since the 0.19.0 release on September 5th:
- Anderson Banihirwe
- Antonino Ingargiola
- Armin Berres
- Bart Broere
- Carlos Valiente
- Daniel Li
- Daniel Saxton
- David Hoese
- Diane Trout
- Damien Garaud
- Elliott Sales de Andrade
- Eric Wolak
- Gábor Lipták
- Guido Imperiale
- Guillaume Eynard-Bontemps
- Itamar Turner-Trauring
- James Bourbeau
- Jan Koch
- Javad
- Jendrik Jördening
- Jim Crist
- Jonathan Fraine
- John Kirkham
- Johnnie Gray
- Julia Signell
- Justin Dennison
- M. Farrajota
- Marco Neumann
- Mark Harfouche
- Markus Gonser
- Martin Durant
- Matthew Rocklin
- Matthias Bussonnier
- Mina Farid
- Paul Vecchio
- Prabakaran Kumaresshan
- Rahul Vaidya
- Stephan Hoyer
- Stuart Berg
- TakaakiFuruse
- Takahiro Kojima
- Tom Augspurger
- Yu Feng
- Zhenqing Li
- @milesial
- @samc0de
- @slnguyen
The following people contributed to the dask/distributed repository since the 0.19.0 release on September 5th:
- Adam Klein
- Brett Naul
- Daniel Farrell
- Diane Trout
- Dirk Petersen
- Eric Ma
- Jim Crist
- John Kirkham
- Gaurav Sheni
- Guillaume Eynard-Bontemps
- Loïc Estève
- Marius van Niekerk
- Matthew Rocklin
- Michael Wheeler
- MikeG
- NotSqrt
- Peter Killick
- Roy Wedge
- Russ Bubley
- Stephan Hoyer
- @tjb900
- Tom Rochette
- @fjetter
blog comments powered by Disqus