I’m pleased to announce the release of Dask version 1.1.0. This is a major release with bug fixes and new features. The last release was 1.0.0 on 2018-11-29. This blogpost outlines notable changes since the last release.

You can conda install Dask:

conda install dask

or pip install from PyPI:

pip install dask[complete] --upgrade

Full changelogs are available here:

Notable Changes

A lot of work has happened over the last couple months, and we encourage people to look through the changelog to get a sense of the kinds of incremental changes that developers are working on.

There are also a few notable changes in this release that we’ll highlight here:

  • Support for the recent Numpy 1.16 and Pandas 0.24 releases
  • Support for Pandas Extension Arrays (see Tom Augspurger’s post on the topic)
  • High level graph in Dask dataframe and operator fusion in simple cases
  • Increased support for other libraries that look enough like Numpy and Pandas to work within Dask Array/Dataframe

Support for Numpy 1.16 and Pandas 0.24

Both Numpy and Pandas have been evolving quickly over the last few months. We’re excited about the changes to extensibility arriving in both libraries. The Dask array/dataframe submodules have been updated to work well with these recent changes.

Pandas Extension Arrays

In particular Dask Dataframe supports Pandas Extension arrays, meaning that it’s easier to use third party Pandas packages like CyberPandas or Fletcher in parallel with Dask Dataframe.

For more information see Tom Augspurger’s post

High Level Graphs in Dask Dataframe

For a while Dask array has had some high level graphs for “atop” operations (elementwise, broadcasting, transpose, tensordot, reductions), which allow for reduced overhead and task fusion on computations within this class.

y = da.exp(x + 1).T  # These operations get fused to a single task

We’ve renamed atop to blockwise to be a bit more generic, and have also started applying it to Dask Dataframe, which helps to reduce overhead substantially when doing computations with many simple operations.

This still needs to be improved to increase the class of cases where it works, but we’re already seeing nice speedups on previously unseen workloads.

The da.atop function has been deprecated in favor of da.blockwise. There is now also a dd.blockwise which shares a common code path.

Non-Pandas dataframe and Non-Numpy array types

We’re working to make Dask a bit more agnostic to the types of in-memory array and dataframe objects that it can manipulate. Rather than having Dask Array be a grid of Numpy arrays and Dask Dataframe be a sequence of Pandas dataframes, we’re relaxing that constraint to a grid of Numpy-like arrays and a sequence of Pandas-like dataframes.

This is an ongoing effort that has targetted alternate backends like scipy.sparse, pydata/sparse, cupy, cudf and other systems.

There were some recent posts on arrays and dataframes that show proofs of concept for this with GPUs.

Acknowledgements

There have been several releases since the last time we had a release blogpost. The following people contributed to the dask/dask repository since the 0.19.0 release on September 5th:

  • Anderson Banihirwe
  • Antonino Ingargiola
  • Armin Berres
  • Bart Broere
  • Carlos Valiente
  • Daniel Li
  • Daniel Saxton
  • David Hoese
  • Diane Trout
  • Damien Garaud
  • Elliott Sales de Andrade
  • Eric Wolak
  • Gábor Lipták
  • Guido Imperiale
  • Guillaume Eynard-Bontemps
  • Itamar Turner-Trauring
  • James Bourbeau
  • Jan Koch
  • Javad
  • Jendrik Jördening
  • Jim Crist
  • Jonathan Fraine
  • John Kirkham
  • Johnnie Gray
  • Julia Signell
  • Justin Dennison
  • M. Farrajota
  • Marco Neumann
  • Mark Harfouche
  • Markus Gonser
  • Martin Durant
  • Matthew Rocklin
  • Matthias Bussonnier
  • Mina Farid
  • Paul Vecchio
  • Prabakaran Kumaresshan
  • Rahul Vaidya
  • Stephan Hoyer
  • Stuart Berg
  • TakaakiFuruse
  • Takahiro Kojima
  • Tom Augspurger
  • Yu Feng
  • Zhenqing Li
  • @milesial
  • @samc0de
  • @slnguyen

The following people contributed to the dask/distributed repository since the 0.19.0 release on September 5th:

  • Adam Klein
  • Brett Naul
  • Daniel Farrell
  • Diane Trout
  • Dirk Petersen
  • Eric Ma
  • Jim Crist
  • John Kirkham
  • Gaurav Sheni
  • Guillaume Eynard-Bontemps
  • Loïc Estève
  • Marius van Niekerk
  • Matthew Rocklin
  • Michael Wheeler
  • MikeG
  • NotSqrt
  • Peter Killick
  • Roy Wedge
  • Russ Bubley
  • Stephan Hoyer
  • @tjb900
  • Tom Rochette
  • @fjetter

blog comments powered by Disqus