Discussion about this post

User's avatar
Aravind Mohanoor's avatar

The GDIT and the Call Center CSV files have very different schemas. I haven't looked into it in much detail yet, but I don't think they are even supposed to map perfectly.

This also explains why there is so much difference between them in terms of the age field.

For starters, there are 690722 registrants with AGE_AT_VAX in Call Center CSV file, and 791891 registrants with AGE_AT_VAX in the GDIT CSV file. Even if the GDIT file size may be smaller, clearly it has info on the age of a lot more people.

But neither is a superset of the other. There are 104659 registrants in GDIT CSV with age, but don't have age in the Call Center CSV file. There are also 3490 registrants in Call Center CSV with age, which don't have age in the GDIT CSV file.

Someone has to investigate these disparities, for now I am more focused on the free text so you can take a look if you are interested.

Expand full comment
1 more comment...

No posts