Usage Documentation
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
Overview
This document details the product workflow to explain the process of ingestion to create processed datasets using cQube and access the same.
Step-wise Ingestion process
The ingestion of the data can be done with using CSV import or with using the ingestion apis:
Using CSV import
API: ingestion/csv
HTTP Method: POST
Note : Before using this API, we need to run the spec APIs for CSV import as it depends on the schema. It will validate the data based on the schema.
Events, Dimensions and Datasets can be ingested into cQube through the following steps:
A. CSV import with Event
To ingest the data for the event we need to run the spec/event API. Attach the event file to the API request body, it takes three parameters as shown in this example.
...
The Parameter of the request body is :-
1.file : attach the csv file for the importing
2.ingestion_type : need to specify the type of ingestion
3.ingestion_name:name of the event
...
B. CSV import with Dataset
To ingest the data for the dataset we need to attach the dataset csv file to our request body as shown in the below screenshot and it will call the ingestion/dataset api internally.
...
On hitting the post api it will do the validation and gives a appropriate result with an successfully message
C. CSV import with Dimension
We need to pass the csv file in the request body along with the ingestion type and ingestion name for the dimension as you can see in the below screenshot,it will call ingestion/dimension api internally and generate the csv files in the input_files folder.
...
As it will validate and return the successful message in the response and generate a new csv file.
Using the Ingestion API
Note : Before starting to use this api we need to run the spec apis.Based on the schema present in the database it will validate the data present in the request body using ajv validator.
A. Execution of Event API
API : ingestion/event
HTTP Method: POST
Before calling the ingestion event API, schema for the particular event name passed in the request body should be present in the database. Then the event data is validated with the schema present in the database. After the successful validation, the data is written to the csv file and stored in the input-files folder.
...
The request body is :
...
B. Execution of Dimension API
API-POST : ingestion/dimension
HTTP Method: POST
Note : To call the ingestion/dimension api, dimension schema for the particular dimension name should be present in the database.
The Request body of ingestion/dimension API is given the below screenshot.
...
The Post api will validate the request body with ajv validator and once it validates the request body it will write the data to a csv file which is stored in the input-files folder.
C. Execution of Dataset API
API: ingestion/dataset
HTTP Method: POST
The request body for the API can be seen in the below screenshot.
...
The post api will validate the request body with the ajv validator and it will write the data to the csv file which is stored in the “input-files” folder.
Execution of Schedule API
Note : To call the Schedule API first we need to run the spec/pipeline api.
The Request body of ingestion/dimension API is given the below screenshot.
The schedule API helps to run the processor group at a particular time.
It takes scheduled_at and pipeline name as the request body where pipeline_name
is the processor group name present in the Nifi.
The transformer will read the csv files and ingest the data into the database.
...
Execution of File status API
There are two types of file-status API:
A. GET file-status API
B. PUT file-status API
A. Execution of GET file-status API
API: ingestion/file-status
HTTP Method: GET
With the help of this api we can check the status of the uploaded csv file. As we are validating the data present in the csv with the schema data, there may be a huge volume of data that needs to be validated. So this api will help us to know the file status of the uploaded csv. If there are any errors present it in the csv it will give us the status of the file.The request body is:
B. Execution of PUT file-status API
API : ingestion/file-status
HTTP Method: PUT
The use for this api is to update the csv files status in the file_tracker table present in the db. Once the csv upload is successful the files will be stored in the input-files folder so to move those files into the processing and then to archived-files folder this api is integrated.With the help of this api we will update the file status in file_tracker table based on whether it is in processing state and archived state.
The request body for the api is :
Yaml File
YAML is a digestible data serialization language often used to create configuration files with any programming language.
...
Generic flow before running the API’s
Open the postman application and import the postman collection provided. To import the collection follow the below steps:
Download the postman collection provided
Select the import option in the postman and import the downloaded json file.
Ingestion of Data using CSV
Select the csv_import folder in the postman collection.
Step 1: Open the specified request & add the details
API Endpoint: <domain_name>/ingestion/csv
HTTP Method: POST
Step 2: Build the request body with reference to YAML file. The request body for the above api is attached in the below screenshot. Link for yaml: https://github.com/Sunbird-cQube/spec-ms/blob/dev/spec.yaml with this link copy the all content from this file to and paste it into the Swagger Editor so with the use of swagger we can see the example request body for the apis and with response as expected.
Step 3: Provide the valid input details for the Parameters showed below
file : Attach the csv file for the importing
ingestion_type : Specify the type of ingestion
ingestion_name : Name of the event
Step 4: Click on the send button for the request and if the request is successful the user should see a response message.
Event Import API
Dimension Import API