Log 19 ๐ซ
ยท 2 min read
Objectiveโ
Deploy watsonx.ai on self-managed AWS infrastructure for customer software evaluation
Milestonesโ
- Deploy and configuration of boot node to establish a beach-head into the customer AWS environment
- Complete
- Deploy OCP using the documented UPI installation steps
- Complete
- Install Cloud Pak for Data
- In Progress
- Deploy and configure watsonx.ai on self-managed AWS infrastructure on ref environment and document
- In Progress
Summaryโ
- Awaiting entitlement key approval on customer side
Decisions and Action Items (DAI)โ
- Software evaluation awaiting customer's approval process. This blocks our ability to download software from cp.icr.io
- Customer to provide by EOD Monday
- Worker nodes shutdown until approval comes through
- Drafted and sent instructions for the customer to resize the worker node disks for when the cluster is brought back online
- Drafted and sent instructions for the customer to order a GPU Node
- GPU node to be added to the cluster and then cordoned, drained, and shutdown
Lessons Learnedโ
- Preparation for Cloud Pak for Data on OpenShift sizing needed to be adjusted to reflect an under-provisioning of CPU resources
- watsonx.ai service requires larger local disks on worker nodes (500Gb)
- The GPU node required for watsonx.ai seems to be a limited resource
Next Stepsโ
- License and configure Cloud Pak for Data
- Cloud Pak Considerations
- Security scans needed on container images
- Customer requires on-prem, offline install
- Customer uses their own container registry that might introduce extra effort or compatability issues
- Version compatibility with OpenShift (e.g. 4.10 required and customer has 4.11)
- Supported storage not available
- Multiple cloudpaks on the same cluster
- custom connections to data sources not supported OOTB
- AWS-specific: IAM users required for install/deploy and are not allowed
- OpenShift specific: CoreOS requirement for control nodes
- Automatic updating of Cloud Pak, this can interrupt engagements (solution is to always remove update polling from operators)
- Resize local disks for worker nodes
- Customer to order a GPU node and attach it to the cluster
- Cloud Pak Considerations
- Deploy watsonx.ai
- Application configurations
- Application validations