Indexing Content Model to Druid
Introduction:
In this wiki, we are going to discuss methods to index Content model to Druid and its challenges. Since the content model is not time series data, updating the indexed data in Druid is not possible and this would pose challenges to query data, we will discuss more on this in the following sections.
Content Data Model
Index using transaction logs from Kafka
Index using Elasticsearch Snapshot
1. Content Data Model:
Below table has content model fields that are being indexed. (Content model has more fields than the list below. For simplicity, we have chosen some of the required fields which is useful)
Sno. | fields | Data Type | field in Druid |
|---|---|---|---|
1 | author | String | author |
2 | badgeAssertions.assertionId | String | badgeAssertions_assertionId |
3 | badgeAssertions.badgeClassId | String | badgeAssertions_badgeClassId |
4 | badgeAssertions.badgeClassImage | String | badgeAssertions_badgeClassImage |
5 | badgeAssertions.badgeClassName | String | badgeAssertions_badgeClassName |
6 | badgeAssertions.badgeId | String | badgeAssertions_badgeId |
7 | badgeAssertions.createdTS | String | badgeAssertions_createdTS |
8 | badgeAssertions.issuerId | String | badgeAssertions_issuerId |
9 | badgeAssertions.status | String | badgeAssertions_status |
10 | board | String | board |
11 | channel | String | channel |
12 | compatibilityLevel | String | compatibilityLevel |
13 | contentType | String | contentType |
14 | createdBy | String | createdBy |
15 | createdFor | String | createdFor |
16 | createdOn | String | createdOn |
17 | creator | String | creator |
18 | dialcodes | String | dialcodes |
19 | framework | String | framework |
20 | gradeLevel | String | gradeLevel |
21 | identifier | String | identifier |
22 | keywords | String | keywords |
23 | language | String | language |
24 | lastPublishedBy | String | lastPublishedBy |
25 | lastPublishedOn | String | lastPublishedOn |
26 | lastSubmittedOn | String | lastSubmittedOn |
27 | lastUpdatedBy | String | lastUpdatedBy |
28 | lastUpdatedOn | String | lastUpdatedOn |
29 | license | String | license |
30 | mediaType | String | mediaType |
31 | medium | String | medium |
32 | mimeType | String | mimeType |
33 | name | String | name |
34 | objectType | String | objectType |
35 | organisation | String | organisation |
36 | origin | String | origin |
37 | owner | String | owner |
38 | pkgVersion | Long | pkgVersion |
39 | resourceType | String | resourceType |
40 | status | String | status |
41 | subject | String | subject |
42 | topic | String | topic |
43 | me_audiosCount | longSum | me_audiosCount |
44 | me_averageInteractionsPerMin | doubleSum | me_averageInteractionsPerMin |
45 | me_averageRating | doubleSum | me_averageRating |
46 | me_averageSessionsPerDevice | doubleSum | me_averageSessionsPerDevice |
47 | me_averageTimespentPerSession | doubleSum | me_averageTimespentPerSession |
48 | me_avgCreationTsPerSession | doubleSum | me_avgCreationTsPerSession |
49 | me_creationSessions | longSum | me_creationSessions |
50 | me_creationTimespent | doubleSum | me_creationTimespent |
51 | me_hierarchyLevel | longSum | me_hierarchyLevel |
52 | me_imagesCount | longSum | me_imagesCount |
53 | me_timespentDraft | doubleSum | me_timespentDraft |
54 | me_timespentReview | doubleSum | me_timespentReview |
55 | me_totalComments | longSum | me_totalComments |
56 | me_totalDevices | longSum | me_totalDevices |
57 | me_totalDialcodeAttached | longSum | me_totalDialcodeAttached |
58 | me_totalDialcodeLinkedToContent | longSum | me_totalDialcodeLinkedToContent |
59 | me_totalDownloads | longSum | me_totalDownloads |
60 | me_totalInteractions | longSum | me_totalInteractions |
61 | me_totalRatings | longSum | me_totalRatings |
62 | me_totalSessionsCount | longSum | me_totalSessionsCount |
63 | me_totalSideloads | longSum | me_totalSideloads |
64 | me_totalTimespent | doubleSum | me_totalTimespent |
65 | me_videosCount | longSum | me_videosCount |
66 | timestamp | Long | timestamp |
67 | version | Long | version |
68 | programId | String | programId |
69 | type | String | type |
70 | category | String | category |
71 | learningOutcome |
| learningOutcome |
72 | qumlVersion | Long | qumlVersion |
73 | bloomsLevel |
| bloomsLevel |
74 | rejectComment | String | rejectComment |
2. Index using Transactional logs from Kafka: