Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Usage Documentation

Table of Contents
minLevel1
maxLevel7
stylenone

Introduction

...

Objective

This document provides the details of how the product workflow has been designed by considering the use cases of end users. This will explain the flow process of the application starting from Injecting the datasetshow the data is uploaded to the cloud storage.

Purpose: The purpose of the document is to provide how to upload the data cloud storage using ingestion api’s so that it can used for processing to ingest the data

...

into the datasets.

Step-wise Ingestion process

The ingestion of the data can be done

...

using

...

the

...

Using CSV import

API: ingestion/csv

HTTP Method: POST

Note : Before starting to use this api first we need to run the spec apis for the csv import as it depends on the schema. Based on the schema it will validate the data.

A. CSV import with Event

To ingest the data for the event we need to run the spec/event api. Attach the event file to the api request body, it takes three parameters as shown in this example.

...

The Parameter of the request body is :-

1.file : attach the csv file for the importing

2.ingestion_type : need to specify the type of ingestion

3.ingestion_name:name of the event

...

B. CSV import with Dataset

To ingest the data for the dataset we need to attach the dataset csv file to our request body as shown in the below screenshot and it will call the ingestion/dataset api internally.

...

On hitting the post api it will do the validation and gives a appropriate result with an successfully message

C. CSV import with Dimension

We need to pass the csv file in the request body along with the ingestion type and ingestion name for the dimension as you can see in the below screenshot,it will call ingestion/dimension api internally and generate the csv files in the input_files folder.

...

As it will validate and return the successful message in the response and generate a new csv file.

Using the Ingestion API

Note : Before starting to use this api we need to run the spec apis.Based on the schema present in the database it will validate the data present in the request body using ajv validator.

A. Execution of Event API

API : ingestion/event

HTTP Method: POST

Before calling the ingestion event API, schema for the particular event name passed in the request body should be present in the database. Then the event data is validated with the schema present in the database. After the successful validation, the data is written to the csv file and stored in the input-files folder.

...

The request body is :

...

B. Execution of Dimension API

API-POST : ingestion/dimension

HTTP Method: POST

Note : To call the ingestion/dimension api, dimension schema for the particular dimension name should be present in the database.

The Request body of ingestion/dimension API is given the below screenshot.

...

The Post api will validate the request body with ajv validator and once it validates the request body it will write the data to a csv file which is stored in the input-files folder.

C. Execution of Dataset API

API: ingestion/dataset

HTTP Method: POST

The request body for the API can be seen in the below screenshot.

...

The post api will validate the request body with the ajv validator and it will write the data to the csv file which is stored in the “input-files” folder.

Execution of Schedule API

Note : To call the Schedule API first we need to run the spec/pipeline api.

The Request body of ingestion/dimension API is given the below screenshot.

The schedule API helps to run the processor group at a particular time.

It takes scheduled_at and pipeline name as the request body where pipeline_name

is the processor group name present in the Nifi.

The transformer will read the csv files and ingest the data into the database.

...

Execution of File status API

There are two types of file-status API:

A. GET file-status API

B. PUT file-status API

A. Execution of GET file-status API

API: apis. To run the apis, please use postman as a tool. If postman tool already exists, skip the setting up program section or else install the postman by following steps

Setting up Postman

  • Download the postman applicationand import the collection.

  • Select the import option in the postman to import the collection. Please refer to the below screenshot.

...

The data can be ingested in two ways

  1. Ingestion of data using csv

  2. Ingestion of data using API

Ingestion of data using CSV

Ingestion of Dimension Data using CSV 

Select the csv_import folder in the postman collection.

Step 1: Open the specified request & add the details

API Endpoint: <domain_name>/ingestion/new_programs

HTTP Method: POST      

...

This API will import the dimension csv and upload it in to the combined_input folder in cloud if there are no errors.

Step 2: Build the request body with reference to YAML file. The request body for the above api is attached in  Link for yaml: https://github.com/Sunbird-cQube/spec-ms/blob/dev/spec.yaml

 Provide the valid input details for the Parameters shown below.

  • file : Attach the csv file for the importing 

  • ingestion_type : Specify the type of ingestion

  • ingestion_name : Name of the dimension(dimension name should be present in the database)

...

Step 3: Click on the send button for the request and if the request is successful the user should see a response message. Please refer to the below screenshot.

...

After successful execution of the csv import api we get the response and we can see the file status indicating if the file is uploaded or not through GET file status API. If the file is successfully uploaded we will get the response as uploaded and if there is any error it will send us the response indicating there was an error in the file.

Ingestion of Event Data using CSV

  • Select the csv import folder from the postman collection.

Step 1: Open the specified request & add the details

API Endpoint: <domain_name>/ingestion/new_programs

HTTP Method: POST  

 This API will import the event csv and upload it in to the combined_input folder in cloud if there are no errors. Then adapter will use the same files to breakdown the combined input into multiple input files.

...

Step 2: Build the request body with reference to YAML file. The request body for the above api is attached in  Link for yaml: https://github.com/Sunbird-cQube/spec-ms/blob/dev/spec.yaml

 Provide the valid input details for the Parameters shown below.

  • file : Attach the csv file for the importing 

  • ingestion_type : Specify the type of ingestion

  • ingestion_name : Name of the  event (event name should be present in the database)

...

Step 3: Click on the send button for the request and if the request is successful the user should see a response message. Please refer to the below screenshot.

...

After successful execution of the csv import api we get the response and we can see the file status indicating if the file is uploaded or not using GET file status API. If the file is successfully uploaded we will get the response as uploaded and if there are any errors it will send us the response indicating there was an error in the file.

Things to take care of while ingesting Data / Debugging:

  • The date format should be correct. The accepted date format is DD/MM/YY

  • The file name and schema name stored in table should be same.

Error Monitoring

The error file will store all the error records during the ingestion process and will be uploaded to the appropriate cloud storage. The user can login to the respective cloud storage and download the file to take a look at the error records present in the csv. The below mentioned steps will specify on how to access the error file.

  1. The cloud storage bucket/container will be named as cQube-edu.

  2. Inside the bucket there will be multiple folders for different processing stages and the ingestion error files will be stored in ingestion_error folder. This folder will in turn contain folders for each program, the name of the folder will be same as the <program_name>.

  3. The <program_name> folder will internally have a folder which will have current date as the name.

  4. The user can access the current date folder if its present and can see the error files present in it and download the error files to view the errors present in the csv.

Please refer to the below screenshots of different storage type( Minio, Azure and AWS) on how to access the error files

Accessing error files in Minio Storage

...

Accessing error files in Azure

...

Accessing error files in AWS

...

To get the count of processed records and error records GET /ingestion/file-status API can be used.

File Status API 

  • GET file status API

GET file status API

Step 1:  Open the specified request & add the details

API Endpoint: <domain_name>/ingestion/file-status

HTTP Method: GETWith the help of this api we can check GET 

...

Step 2: Send the query parameters with reference to the YAML file. The query parameters for the above api is attached in  Link for yaml: https://github.com/Sunbird-cQube/spec-ms/blob/dev/spec.yaml

Provide the valid input details for the parameters shown below. 

...

Step 3: Click on the send button for the request and if the request is successful the user should see a response message which contains the status of the uploaded csv file. As we are validating the data present in the csv with the schema data, there may be a huge volume of data that needs to be validated. So this api will help us to know the file status of the uploaded csv. If there are any errors present it in the csv it will give us the status of the file.The request body is:

Image Removed

B. Execution of PUT file-status API

API : ingestion/file-status

HTTP Method: PUT

The use for this api is to update the csv files status in the file_tracker table present in the db. Once the csv upload is successful the files will be stored in the input-files folder so to move those files into the processing and then to archived-files folder this api is integrated.With the help of this api we will update the file status in file_tracker table based on whether it is in processing state and archived state.

The request body for the api is :

Image Removed

6

Yaml File

YAML is a digestible data serialization language often used to create configuration files with any programming language.

YAML File

...

If the csv files are too large in size, the upload process of those files will take more time which will make users wait for longer duration to receive the response from the API. As a result, the upload process will be running asynchronously and we have developed this api to know the status of the file at any particular time.

Schedule API

This API helps to schedule the processor group at any particular time.

Step 1: Open the specified request & add the details

API Endpoint: <domain_name>/spec/schedule

HTTP Method: POST

...

Step 2: Build the request body with reference to YAML file. The request body for the above api is attached in Link for yaml: https://github.com/Sunbird-cQube/spec-ms/blob/dev/spec.yaml . Provide the valid input details for the Parameters shown below.

...

Step 3: Click on the send button for the request and if the request is successful the user should see a response message. The schedule time will be updated in the processor group.

...

The schedule api helps to run the processor group in Nifi at a scheduled time. The schedule time can be updated by changing the cron expression for scheduled_at property in the request body.

V4-Data Emission API

This API are used for data migration. It will help to migrate the data in cQube 4.0 version.

Step 1: Open the specified request and add the details.

API Endpoint: <domain_name>/ingestion/v4-data-emission

HTTP Method: GET

...

Step 2: Click on the send button for the request and if the request is successful the user should see a response message as “FIles uploaded successfully”. The files will be uploaded to the emission folder created in the respective cloud storage.

...

National Programs API

This API accepts the event data in the zip file format and adds it to the emission folder in the respective cloud storage.

Step 1: Open the specified request and add the details.

API Endpoint: <domain_name>/ingestion/national_programs

HTTP Method: POST

...

Step 2: Build the request body with reference to YAML file. The request body for the above api is attached in Link for yaml: https://github.com/Sunbird-cQube/spec-ms/blob/dev/spec.yaml with this link copy the all content from this file to and paste it into the Swagger Editor so with the use of swagger we can see the example request body for the apis and with response as expected. . Provide the valid input details for the Parameters shown below.

...

Step 3: Click on the send button for the request and if the request is successful the user should see a response message as “File uploaded successfully”. The files will be uploaded to the emission folder created in the respective cloud storage.

The zip file will be extracted and will be read by the adaptors and then moved into the input folders.

...