Skip to main content

Log 19 ๐Ÿ›ซ

ยท 2 min read

Objectiveโ€‹

Deploy watsonx.ai on self-managed AWS infrastructure for customer software evaluation

Milestonesโ€‹

  1. Deploy and configuration of boot node to establish a beach-head into the customer AWS environment
    • Complete
  2. Deploy OCP using the documented UPI installation steps
    • Complete
  3. Install Cloud Pak for Data
    • In Progress
  4. Deploy and configure watsonx.ai on self-managed AWS infrastructure on ref environment and document
    • In Progress

Summaryโ€‹

  • Awaiting entitlement key approval on customer side

Decisions and Action Items (DAI)โ€‹

  • Software evaluation awaiting customer's approval process. This blocks our ability to download software from cp.icr.io
    • Customer to provide by EOD Monday
  • Worker nodes shutdown until approval comes through
  • Drafted and sent instructions for the customer to resize the worker node disks for when the cluster is brought back online
  • Drafted and sent instructions for the customer to order a GPU Node
    • GPU node to be added to the cluster and then cordoned, drained, and shutdown

Lessons Learnedโ€‹

  • Preparation for Cloud Pak for Data on OpenShift sizing needed to be adjusted to reflect an under-provisioning of CPU resources
  • watsonx.ai service requires larger local disks on worker nodes (500Gb)
  • The GPU node required for watsonx.ai seems to be a limited resource

Next Stepsโ€‹

  • License and configure Cloud Pak for Data
    • Cloud Pak Considerations
      • Security scans needed on container images
      • Customer requires on-prem, offline install
      • Customer uses their own container registry that might introduce extra effort or compatability issues
      • Version compatibility with OpenShift (e.g. 4.10 required and customer has 4.11)
      • Supported storage not available
      • Multiple cloudpaks on the same cluster
      • custom connections to data sources not supported OOTB
      • AWS-specific: IAM users required for install/deploy and are not allowed
      • OpenShift specific: CoreOS requirement for control nodes
      • Automatic updating of Cloud Pak, this can interrupt engagements (solution is to always remove update polling from operators)
    • Resize local disks for worker nodes
    • Customer to order a GPU node and attach it to the cluster
  • Deploy watsonx.ai
  • Application configurations
  • Application validations