diff --git a/CMakeModules/Version.cmake b/CMakeModules/Version.cmake
index a4ea1b3560..bcc7409fd8 100644
--- a/CMakeModules/Version.cmake
+++ b/CMakeModules/Version.cmake
@@ -10,7 +10,7 @@ ENDIF()
SET(AF_VERSION_MAJOR "3")
SET(AF_VERSION_MINOR "4")
-SET(AF_VERSION_PATCH "0")
+SET(AF_VERSION_PATCH "1")
SET(AF_VERSION "${AF_VERSION_MAJOR}.${AF_VERSION_MINOR}.${AF_VERSION_PATCH}")
SET(AF_API_VERSION_CURRENT ${AF_VERSION_MAJOR}${AF_VERSION_MINOR})
diff --git a/README.md b/README.md
index 07bf01a47c..c0d12b81fd 100644
--- a/README.md
+++ b/README.md
@@ -1,35 +1,35 @@
-ArrayFire is a general-purpose library that simplifies the process of developing
-software that targets parallel and massively-parallel architectures including
+ArrayFire is a general-purpose library that simplifies the process of developing
+software that targets parallel and massively-parallel architectures including
CPUs, GPUs, and other hardware acceleration devices.
-To achieve this goal, ArrayFire provides software developers with a high-level
-abstraction of data which resides on the accelerator, the `af::array` object
+To achieve this goal, ArrayFire provides software developers with a high-level
+abstraction of data which resides on the accelerator, the `af::array` object
(or C-style struct).
Developers write code which performs operations on ArrayFire arrays which, in turn,
are automatically translated into near-optimal kernels that execute on the computational
-device.
-ArrayFire is successfully used on devices ranging from low-power mobile phones to
-high-power GPU-enabled supercomputers including CPUs from all major vendors (Intel, AMD, Arm),
-GPUs from the dominant manufacturers (NVIDIA, AMD, and Qualcomm), as well as a variety
+device.
+ArrayFire is successfully used on devices ranging from low-power mobile phones to
+high-power GPU-enabled supercomputers including CPUs from all major vendors (Intel, AMD, Arm),
+GPUs from the dominant manufacturers (NVIDIA, AMD, and Qualcomm), as well as a variety
of other accelerator devices on Windows, Mac, and Linux.
Several of ArrayFire's benefits include:
-* [Easy to use](http://arrayfire.org/docs/gettingstarted.htm), stable,
+* [Easy to use](http://arrayfire.org/docs/gettingstarted.htm), stable,
[well-documented](http://arrayfire.org/docs) API.
-* Rigorously Tested for Performance and Accuracy
+* Rigorously Tested for Performance and Accuracy
* Commercially Friendly Open-Source Licensing
* Commercial support from [ArrayFire](http://arrayfire.com)
* [Read about more benefits on Arrayfire.com](http://arrayfire.com/the-arrayfire-library/)
-
+
### Build and Test Status
-| | Linux x86_64 | Linux armv7l | Linux aarch64 | Windows | OSX |
-|:-------:|:------------:|:------------:|:-------------:|:-------:|:---:|
-| Build | [](http://ci.arrayfire.org/job/arrayfire-linux/job/build/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-tegrak1/job/build/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-tegrax1/job/build/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-windows/job/build/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-osx/job/build/branch/devel/) |
-| Test | [](http://ci.arrayfire.org/job/arrayfire-linux/job/test/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-tegrak1/job/test/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-tegrax1/job/test/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-windows/job/test/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-osx/job/test/branch/devel/) |
+| | Linux x86_64 | Linux aarch64 | Windows | OSX |
+|:-------:|:------------:|:-------------:|:-------:|:---:|
+| Build | [](http://ci.arrayfire.org/job/arrayfire-linux/job/build/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-tegrax1/job/build/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-windows/job/build/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-osx/job/build/branch/devel/) |
+| Test | [](http://ci.arrayfire.org/job/arrayfire-linux/job/test/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-tegrax1/job/test/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-windows/job/test/branch/devel/) | [](http://ci.arrayfire.org/job/arrayfire-osx/job/test/branch/devel/) |
### Installation
@@ -143,7 +143,7 @@ details.
### Trademark Policy
-The literal mark “ArrayFire” and ArrayFire logos are trademarks of
+The literal mark “ArrayFire” and ArrayFire logos are trademarks of
AccelerEyes LLC DBA ArrayFire.
If you wish to use either of these marks in your own project, please consult
[ArrayFire's Trademark Policy](http://arrayfire.com/trademark-policy/)
diff --git a/docs/pages/release_notes.md b/docs/pages/release_notes.md
index bdcd91158a..b944a0c9c1 100644
--- a/docs/pages/release_notes.md
+++ b/docs/pages/release_notes.md
@@ -1,6 +1,90 @@
Release Notes {#releasenotes}
==============
+v3.4.1
+==============
+
+Installers
+----------
+* Installers for Linux, OS X and Windows
+ * CUDA backend now uses [CUDA 8.0](https://developer.nvidia.com/cuda-toolkit).
+ * Uses [Intel MKL 2017](https://software.intel.com/en-us/intel-mkl).
+ * CUDA Compute 2.x (Fermi) is no longer compiled into the library.
+* Installer for OS X
+ * The libraries shipping in the OS X Installer are now compiled with Apple
+ Clang v7.3.1 (previouly v6.1.0).
+ * The OS X version used is 10.11.6 (previously 10.10.5).
+* Installer for Jetson TX1 / Tegra X1
+ * Requires [JetPack for L4T 2.3](https://developer.nvidia.com/embedded/jetpack)
+ (containing Linux for Tegra r24.2 for TX1).
+ * CUDA backend now uses [CUDA 8.0](https://developer.nvidia.com/cuda-toolkit) 64-bit.
+ * Using CUDA's cusolver instead of CPU fallback.
+ * Uses OpenBLAS for CPU BLAS.
+ * All ArrayFire libraries are now 64-bit.
+
+Improvements
+------------
+* Add [sparse array](\ref sparse_func) support to \ref af::eval().
+ [1](https://github.com/arrayfire/arrayfire/pull/1598)
+* Add OpenCL-CPU fallback support for sparse \ref af::matmul() when running on
+ a unified memory device. Uses MKL Sparse BLAS.
+* When using CUDA libdevice, pick the correct compute version based on device.
+ [1](https://github.com/arrayfire/arrayfire/pull/1612)
+* OpenCL FFT now also supports prime factors 7, 11 and 13.
+ [1](https://github.com/arrayfire/arrayfire/pull/1383)
+ [2](https://github.com/arrayfire/arrayfire/pull/1619)
+
+Bug Fixes
+---------
+* Allow CUDA libdevice to be detected from custom directory.
+* Fix `aarch64` detection on Jetson TX1 64-bit OS.
+ [1](https://github.com/arrayfire/arrayfire/pull/1593)
+* Add missing definition of `af_set_fft_plan_cache_size` in unified backend.
+ [1](https://github.com/arrayfire/arrayfire/pull/1591)
+* Fix intial values for \ref af::min() and \ref af::max() operations.
+ [1](https://github.com/arrayfire/arrayfire/pull/1594)
+ [2](https://github.com/arrayfire/arrayfire/pull/1595)
+* Fix distance calculation in \ref af::nearestNeighbour for CUDA and OpenCL backend.
+ [1](https://github.com/arrayfire/arrayfire/pull/1596)
+ [2](https://github.com/arrayfire/arrayfire/pull/1595)
+* Fix OpenCL bug where scalars where are passed incorrectly to compile options.
+ [1](https://github.com/arrayfire/arrayfire/pull/1595)
+* Fix bug in \ref af::Window::surface() with respect to dimensions and ranges.
+ [1](https://github.com/arrayfire/arrayfire/pull/1604)
+* Fix possible double free corruption in \ref af_assign_seq().
+ [1](https://github.com/arrayfire/arrayfire/pull/1605)
+* Add missing eval for key in \ref af::scanByKey in CPU backend.
+ [1](https://github.com/arrayfire/arrayfire/pull/1605)
+* Fixed creation of sparse values array using \ref AF_STORAGE_COO.
+ [1](https://github.com/arrayfire/arrayfire/pull/1620)
+ [1](https://github.com/arrayfire/arrayfire/pull/1621)
+
+Examples
+--------
+* Add a [Conjugate Gradient solver example](\ref benchmarks/cg.cpp)
+ to demonstrate sparse and dense matrix operations.
+ [1](https://github.com/arrayfire/arrayfire/pull/1599)
+
+CUDA Backend
+------------
+* When using [CUDA 8.0](https://developer.nvidia.com/cuda-toolkit),
+ compute 2.x are no longer in default compute list.
+ * This follows [CUDA 8.0](https://developer.nvidia.com/cuda-toolkit)
+ deprecating computes 2.x.
+ * Default computes for CUDA 8.0 will be 30, 50, 60.
+* When using CUDA pre-8.0, the default selection remains 20, 30, 50.
+* CUDA backend now uses `-arch=sm_30` for PTX compilation as default.
+ * Unless compute 2.0 is enabled.
+
+Known Issues
+------------
+* \ref af::lu() on CPU is known to give incorrect results when built run on
+ OS X 10.11 or 10.12 and compiled with Accelerate Framework.
+ [1](https://github.com/arrayfire/arrayfire/pull/1617)
+ * Since the OS X Installer libraries uses MKL rather than Accelerate
+ Framework, this issue does not affect those libraries.
+
+
v3.4.0
==============