Search Docs by Keyword

Table of Contents

KNIME on the FASRC clusters

Description

KNIME is an open-source data analytics, reporting, and integration platform that is meant to perform various aspects of machine-learning & data mining through its modular data pipelining concept. The platform offers a way to integrate various tasks ranging from developing analytic models to deploying them and sharing insights with your team. The KNIME Analytics Platform offers users 300+ connectors to multiple data sources and integrations to all popular machine learning libraries.

The software’s key capabilities include Data Access & Transformation, Data Analytics, Visualization & Reporting, Statistics & Machine Learning, Generative AI, Collaboration, Governance, Data Apps, Automation, AI Agents

Given KNIME’s wide scale use and applicability, we have converted it into a system-wide module that can be loaded from anywhere on any of the FASRC clusters, Cannon or FASSE. Additionally, we have packaged it as an app that can be launched using the cluster web interface, Open on Demand (OOD).

KNIME as a module

Knime is available as a module on the FASRC clusters. In order to know more about the module including the versions available and how to load one of them, execute from a terminal on the cluster: module spider knime

This would pull up the information on the versions of KNIME software that are available to load. For example, for a user jharvard on a compute node, the module spider command would produce the following output:


[jharvard@holy8a26602 ~]$ module spider knime/
knime:
Description:
An open-source data analytics, reporting, and integration platform meant to perform various aspects of machine-learning & data mining through its modular data pipelining concept.

Versions:
knime/5.4.3-fasrc01
knime/5.4.4-fasrc01

For detailed information about a specific "knime" package (including how to load the modules) use the module's full name.

Note that names that have a trailing (E) are extensions provided by other modules.

For example:
$ module spider knime/5.4.4-fasrc01


To load a specific module, one can execute: module load knime/5.4.3-fasrc01

Or, to load the default & typically the latest module, one can run: module load knime command. This would result in, e.g.:

[jharvard@holy8a26602 ~]$ module load knime
[jharvard@holy8a26602 ~]$ module list
Currently Loaded Modules:
  1) knime/5.4.4-fasrc01

Once the knime module is loaded, one can launch the GUI by running the knime executable on the terminal provided you ssh into the cluster using X11 forwarding, preferably with the -Y option, and that XQuartz (MacOS) or  MobaXterm (Windows) is installed on your local device that is being used to login to the cluster. For example:

ssh -Y jharvard@login.rc.fas.harvard.edu

[jharvard@holylogin05 ~]$ salloc -p test --x11 --time=2:00:00 --mem=4g
[jharvard@holy8a26602 ~]$ module load knime
[jharvard@holy8a26602 ~]$ knime

One can ignore the following libGL errors and should expect to see a GUI appear as shown in the screen shot below.
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast

Knime GUI launched directly on Cannon

Note: While you can launch KNIME directly on the cluster using X11 forwarding, it is laggy and doesn’t render itself well to faster executions that might be needed for certain KNIME workflows. To avoid issues associated with X11 forwarding, we recommend launching KNIME using OOD.

Both these modules are also available to use via the Knime OOD app, as explained below.

KNIME on OOD

KNIME can be run from Open OnDemand (OOD, formerly known as VDI) by choosing it from the Interactive Apps menu, and specifying your resource needs. Hit Launch, wait for the session to start, and click the “Launch Knime” button.

You can also launch KNIME from the Remote Desktop app on OOD.

Pre-installed Extensions

Both KNIME modules come with the following pre-installed extensions:

  1. For GIS: Geospatial Analytics Extension for KNIME
  2. For Programming:
    KNIME Python Integration
    KNIME Interactive R Statistics Integration
  3. For Machine Learning:
    KNIME H2O Machine Learning Integration
    KNIME XGBoost Integration
    KNIME Machine Learning Interpretability Extension
  4. For OpenAI, Hugging Face, and other LLMs: KNIME AI Extension
  5. For AI Assistant Coding: KNIME AI Assistant (Labs)
  6. For Google Drive Integration: KNIME Google Connectors

Note: New extensions cannot be installed by the users on the fly as modules don’t come with write permissions.

KNIME Tutorial

The link here takes you to the KNIME tutorial that has been prepared by Lingbo Liu from Harvard’s Center for Geographic Analysis (CGA). This tutorial is best executed by launching the Knime app on OOD.

© The President and Fellows of Harvard College
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.