Skip to main content

Data Preparation

For the success of the askCSM use case, data preparation is a critical step that ensures the dataset is clean, accurate, and well-suited for processing by Large Language Models. In this project, a csv file serves as the primary data source. The preprocessing phase involves ensuring that there are no missing values and contextualizing each row by converting it into a coherent natural language paragraph. This transformation is essential to facilitate the model's understanding and response generation. Once the data is preprocessed, the resulting csv file is imported into Watson Discovery, where it can be effectively leveraged as the knowledge repository for the virtual assistant. This will allow assistant to provide accurate and contextually relevant answers to customer inquiries related to different software products.

For data preprocessing steps, please check this link.