Data Science platform

  • This is a directory tree for documenting all about developing data science workflows on KSL platforms

  • a bespoke quickstart guide to work with datasceince workloads

  • We will list the information about the meta modules and their contents.

  • We will list about the details on how to install software as self-service model using pacake managers like conda, spack, containers etc.

  • Cray’s Machine Learning development enviroment will also be discussed here

  • Hyperparameter libraries will be disucssed here

  • Example jobscript related to distributed frameworks

  • A high level introduction to NGC container registry

  • parallel data processing tools and techniques will be discussed, e.g. dask and rapids on CPUs and GPUs

  • Accelerating machine learning using multithreaded ScikitLearn
    • porting ML from CPUs to GPUs

  • Deep Learning on AMD Genoa CPUs – for those workloads which don’t qualify for GPUs or are developing models and modest sized datasets