...
It will read the config.json file of specific state present inside the ingest folder.
Then it will process all the dimension grammar present in the dimensions folder.
The dimension grammars are stored in the “spec.dimensionGrammar” table and the dimensions tables are created in the dimensions schema.
It will also look for data files respective to each dimension grammar file name and ingest all the dimension data to the respective tables.
After the dimensions are ingested the programs array present in config.json is read and the event grammars are process from the corresponding <program-name> folder. The event grammars are stored in the spec.”EventGrammars” table.
The dataset grammars are also stored in the spec.”datasetGrammars” table and the dataset tables are created based on the combination of timeDimension, dimension and metric present in the event grammars.
In addition to the above combination of datasets created the user can also specify the combination of datasets that can be created in the whitelist array.
Config.json file
Code Block |
---|
{
"globals": {
"onlyCreateWhitelisted": true
},
"dimensions": {
"namespace": "dimensions",
"fileNameFormat": "${dimensionName}.${index}.dimensions.data.csv",
"input": {
"files": "./ingest/JH/dimensions"
}
},
"programs": [
{
"name": "DIKSHA",
"namespace": "diksha",
"description": "DIKSHA",
"shouldIngestToDB": true,
"input": {
"files": "./ingest/JH/programs/diksha"
},
"./output": {
"location": "./output/programs/diksha"
},
"dimensions": {
"whitelisted": [
"state,grade,subject,medium,board",
"textbookdiksha,grade,subject,medium",
"textbookdiksha,grade,subject,medium"
],
"blacklisted": []
}
},
{
"name": "School Attendance",
"namespace": "sch_att",
"description": "School Attendance",
"shouldIngestToDB": true,
"input": {
"files": "./ingest/JH/programs/school-attendance"
},
"./output": {
"location": "././output/programs/school-attendance"
},
"dimensions": {
"whitelisted": [
"gender,district",
"gender,block",
"gender,cluster",
"school,grade",
"gender,school",
"gender,school,grade",
"schoolcategory,district",
"schoolcategory,block",
"schoolcategory,cluster"
],
"blacklisted": []
}
},
{
"name": "PM Poshan",
"namespace": "pm_poshan",
"description": "PM Poshan",
"shouldIngestToDB": true,
"input": {
"files": "./ingest/JH/programs/pm-poshan"
},
"./output": {
"location": "./output/programs/pm-poshan"
},
"dimensions": {
"whitelisted": [
"district,categorypm"
],
"blacklisted": []
}
},
{
"name": "NAS",
"namespace": "nas",
"description": "NAS",
"shouldIngestToDB": true,
"input": {
"files": "./ingest/JH/programs/nas"
},
"./output": {
"location": "./output/programs/nas"
},
"dimensions": {
"whitelisted": [
"district,lo,subject,grade",
"state,lo,subject,grade"
],
"blacklisted": []
}
},
{
"name": "UDISE",
"namespace": "udise",
"description": "UDISE",
"shouldIngestToDB": true,
"input": {
"files": "./ingest/JH/programs/udise"
},
"./output": {
"location": "./output/programs/udise"
},
"dimensions": {
"whitelisted": [
"district,categoryudise",
"state,categoryudise"
],
"blacklisted": []
}
},
{
"name": "PGI",
"namespace": "pgi",
"description": "PGI",
"shouldIngestToDB": true,
"input": {
"files": "./ingest/JH/programs/pgi"
},
"./output": {
"location": "./output/programs/pgi"
},
"dimensions": {
"whitelisted": [
"state,district,categorypgi",
"state,categorypgi"
],
"blacklisted": []
}
},
{
"name": "NISHTHA",
"namespace": "nishtha",
"description": "NISHTHA",
"shouldIngestToDB": true,
"input": {
"files": "./ingest/JH/programs/nishtha"
},
"./output": {
"location": "./output/programs/nishtha"
},
"dimensions": {
"whitelisted": [
"state,district,programnishtha",
"state,programnishtha,coursenishtha",
"state,programnishtha",
"district,programnishtha"
],
"blacklisted": []
}
},
{
"name": "Student Progression",
"namespace": "student_progression",
"description": "Student Progression",
"shouldIngestToDB": true,
"input": {
"files": "./ingest/JH/programs/student-progression"
},
"./output": {
"location": "./output/programs/student-progression"
},
"dimensions": {
"whitelisted": [
"school,academicyear"
],
"blacklisted": []
}
},
{
"name": "School Infrastructure",
"namespace": "school_infra",
"description": "School Infrastructure",
"shouldIngestToDB": true,
"input": {
"files": "./ingest/JH/programs/school-infra"
},
"./output": {
"location": "./output/programs/school-infra"
},
"dimensions": {
"whitelisted": [
"school,academicyear"
],
"blacklisted": []
}
},
{
"name": "Student Assessment",
"namespace": "assessment",
"description": "Student Assessment",
"shouldIngestToDB": true,
"input": {
"files": "./ingest/JH/programs/student-assessment"
},
"./output": {
"location": "./output/programs/student-assessment"
},
"dimensions": {
"whitelisted": [
"exam,grade,academicyear,subject,lo,school",
"state,lo,subject,grade",
"district,subject,grade"
],
"blacklisted": []
}
}
]
} |
yarn cli ingest-data: This command will ingest the data to the dataset tables for all the programs. It also provides an option to ingest the data for the particular program.
...