Data teams often struggle with siloed environments and complex configurations that slow down innovation. The Cloudera Data Science Workbench (CDSW) breaks these barriers by offering a self-service, collaborative IDE that brings R, Python, and Scala directly to your browser without the setup headaches.
A New Release From Cloudera
Recently Cloudera released a new product called Cloudera Data Science Workbench (CDSW)
The CDSW is positioned as a collaborative platform for data scientists/engineers and analysts, enabling larger teams to work in a self-service manner through a web browser. This browser application is effectively an IDE for R, Python and Scala - all your favorite toys!

The CDSW is deployed onto edge nodes of your CDH cluster, providing easy access to your HDFS data and the Spark2 and Impala engines. This means that team members can immediately start working on their projects, accessing full datasets and share analysis and results. A CDSW Project can include reusable code and snippets, libraries etc helping your teams to collaborate. Oh, and these projects can be linked with Github repos to help keep version history.
The workbench is used to fire up user session with R, Python or Scala inside a dedicated Docker engines. These engines can be customized, or extended, like any other Docker images to include all your favorite R packages and Python libraries. Using HDFS, Hive, Spark2 or Impala the workload can then be distributed over to the CDH cluster, by use of your preferred methods, without having to configure anything. This engine (virtual machine, really) runs for as long as the analysis. Any logs or output files need to be saved in the project folder, which is mounted inside the engine and saved on the CDSW master node. The master node is a gateway node to the CDH cluster and can scale out to many worker nodes to distribute the Docker engines

And under the hood we also have Kubernetes to schedule user workload across the worker nodes and provide CPU and memory isolation.

So far I find the IDE to be a bit too simple and lacking features compared to e.g. RStudio Server. But the ease of use and the fact that everything is automatically configured makes the CDSW an absolute must for any Cloudera customer with data science teams. Also, I'm convinced that future releases will add loads of cool functionality
I spent about two days building a new cluster on AWS and install the Cloudera Data Science Workbench, just an indication of how easy it is to get up and running. Btw, it also runs in the cloud (Iaas) ;)
Ready to accelerate your data science capabilities and streamline your team's workflow?
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.

How to Sense-Check Your Data Science Findings

Introduction to Oracle Data Science Service

Metadata Modeling in the Database with Analytic Views
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.