Search Docs by Keyword
KNIME on the FASRC clusters
Description
KNIME is an open-source data analytics, reporting, and integration platform that is meant to perform various aspects of machine-learning & data mining through its modular data pipelining concept. The platform offers a way to integrate various tasks ranging from developing analytic models to deploying them and sharing insights with your team. The KNIME Analytics Platform offers users 300+ connectors to multiple data sources and integrations to all popular machine learning libraries.
The software’s key capabilities include Data Access & Transformation, Data Analytics, Visualization & Reporting, Statistics & Machine Learning, Generative AI, Collaboration, Governance, Data Apps, Automation, AI Agents.
Given KNIME’s wide scale use and applicability, we have converted it into a system-wide module that can be loaded from anywhere on any of the FASRC clusters, Cannon or FASSE. Additionally, we have packaged it as an app that can be launched using the cluster web interface, Open on Demand (OOD).
KNIME as a module
Knime is available as a module on the FASRC clusters. In order to know more about the module including the versions available and how to load one of them, execute from a terminal on the cluster: module spider knime
This would pull up the information on the versions of KNIME software that are available to load. For example, for a user jharvard
on a compute node, the module spider
command would produce the following output:
[jharvard@holy8a26602 ~]$ module spider knime/
knime:
Description:
An open-source data analytics, reporting, and integration platform meant to perform various aspects of machine-learning & data mining through its modular data pipelining concept.
Versions:
knime/5.4.3-fasrc01
knime/5.4.4-fasrc01
For detailed information about a specific "knime" package (including how to load the modules) use the module's full name.
Note that names that have a trailing (E) are extensions provided by other modules.
For example:
$ module spider knime/5.4.4-fasrc01
To load a specific module, one can execute: module load knime/5.4.3-fasrc01
Or, to load the default & typically the latest module, one can run: module load knime
command. This would result in, e.g.:
[jharvard@holy8a26602 ~]$ module load knime
[jharvard@holy8a26602 ~]$ module list
Currently Loaded Modules:
1) knime/5.4.4-fasrc01
Once the knime
module is loaded, one can launch the GUI by running the knime
executable on the terminal provided you ssh into the cluster using X11 forwarding, preferably with the -Y
option, and that XQuartz (MacOS) or MobaXterm (Windows) is installed on your local device that is being used to login to the cluster. For example:
ssh -Y jharvard@login.rc.fas.harvard.edu
[jharvard@holylogin05 ~]$ salloc -p test --x11 --time=2:00:00 --mem=4g
[jharvard@holy8a26602 ~]$ module load knime
[jharvard@holy8a26602 ~]$ knime
One can ignore the following libGL
errors and should expect to see a GUI appear as shown in the screen shot below.
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
Note: While you can launch KNIME directly on the cluster using X11 forwarding, it is laggy and doesn’t render itself well to faster executions that might be needed for certain KNIME workflows. To avoid issues associated with X11 forwarding, we recommend launching KNIME using OOD.
Both these modules are also available to use via the Knime OOD app, as explained below.
KNIME on OOD
KNIME can be run from Open OnDemand (OOD, formerly known as VDI) by choosing it from the Interactive Apps menu, and specifying your resource needs. Hit Launch, wait for the session to start, and click the “Launch Knime” button.
You can also launch KNIME from the Remote Desktop app on OOD.
Pre-installed Extensions
Both KNIME modules come with the following pre-installed extensions:
- For GIS: Geospatial Analytics Extension for KNIME
- For Programming:
KNIME Python Integration
KNIME Interactive R Statistics Integration - For Machine Learning:
KNIME H2O Machine Learning Integration
KNIME XGBoost Integration
KNIME Machine Learning Interpretability Extension - For OpenAI, Hugging Face, and other LLMs: KNIME AI Extension
- For AI Assistant Coding: KNIME AI Assistant (Labs)
- For Google Drive Integration: KNIME Google Connectors
Note: New extensions cannot be installed by the users on the fly as modules don’t come with write permissions.
KNIME Tutorial
The link here takes you to the KNIME tutorial that has been prepared by Lingbo Liu from Harvard’s Center for Geographic Analysis (CGA). This tutorial is best executed by launching the Knime app on OOD.