5 Comments

Jessica Rose wrote: "There are a total of 389,398 records that include a date of death which constitutes 3.5% of the total data set. Granted, I am not sure if this number represents all of the people who died, or simply all of the people who have death date data entered."

However the yearly number of deaths in the record-level CSV file is otherwise identical to the yearly number of deaths at Eurostat and in spreadsheets for deaths by ICD code published by the Czech Statistical Office, except the record-level data has a one death missing in 2021 (ec.europa.eu/eurostat/data/database, sars2.net/czech2.html#Deaths_by_ICD_code_region_age_group_and_year):

> system("wget sars2.net/f/{czicd.csv.gz,czpopdead.csv}")

> rec=fread("CR_records.csv",showProgress=F)

> icd=fread("czicd.csv.gz")[year>=2020,.(icd=sum(dead)),year]

> eurostat=fread("czpopdead.csv")[year>=2020,.(eurostat=sum(dead)),year]

> reclev=rec[,.(reclev=.N),.(year=year(DatumUmrti))]

> merge(merge(icd,eurostat),reclev)

year icd eurostat reclev

1: 2020 129289 129289 129289

2: 2021 139891 139891 139890

3: 2022 120219 120219 120219

---

The schedule for Janssen vaccines had only one primary course dose, so the fields for the second dose are blank in all people who got a Janssen vaccine for the first dose:

rec=data.table::fread("CR_records.csv")

rec[,.(mean(is.na(Datum_2),na.rm=T)*100,.N),OckovaciLatka_1][order(-N)]

There's only 15 people who have the date of the second dose listed but the date of the first dose missing, so it doesn't seem like a major problem:

> rec[is.na(Datum_1)&!is.na(Datum_2),.N]

[1] 15

> rec[is.na(Datum_1)&!is.na(Datum_3),.N]

[1] 44

---

The code below shows the average age of people based on the type of their most recent vaccine dose, so that the age is calculated based on the number of person-days that people spent under each age. The record-level data only includes the year of birth, so I assigned a random date of birth to each person. But anyway, the main reason why you got such a high rate of deaths per injection for AstraZeneca is because here my average age was about 68 for AstraZeneca, 55 for Moderna, 50 for Pfizer, and 45 for Janssen:

> b=data.table::fread("http://sars2.net/f/czbuckets.csv.gz")

> b=b[dose==0,type:="Unvaccinated"][type!=""]

> b[,.(alive=sum(as.double(alive))),.(age,type)][,.(meanage=round(weighted.mean(age,alive))),type]|[order(meanage)]>print(r=F)

type meanage

Unvaccinated 38

Other 42

Novavax 42

Janssen 45

Pfizer 50

Moderna 55

AstraZeneca 68

However when I used the 2021 census population by 5-year age groups as the standard population, I still got a higher age-standardized mortality rate for AstraZeneca than Moderna:

> std=fread("http://sars2.net/f/czcensus2021pop.csv")[,.(stdpop=sum(pop)),.(age=pmax(15,pmin(age%/%5*5,95)))]

> a=b[,.(alive=sum(alive),dead=sum(dead)),.(age=pmax(pmin(age%/%5*5,95),15),type)]

> merge(a,std)[type!="Other",.(asmr=round(sum(dead/alive*stdpop/sum(stdpop)*365e5))),type][order(-asmr)]|>print(r=F)

type asmr

Janssen 1493

Unvaccinated 1475

AstraZeneca 1231

Moderna 1067

Pfizer 861

Novavax 682

Expand full comment

Excellent info. With regards to birthdates, I was thinking of assigning everybodies bday as July 1st and assume everybody birthday is within 6months. Strange how children don’t seem to be represented proportionately in this data?

Expand full comment

When I compared the number of people who were included in the record-level data to the resident population estimates in the 2021 census, I got about 101% the census population in ages 0-9 and 10-19, which was roughly the typical percentage across age groups. Ages 30-39 got about 104% of the census population, but it might be if the record-level data includes non-residents (because the percentage of immigrants peaks around ages 30-39): sars2.net/czech.html#Representation_of_age_groups_compared_to_2021_census_and_Eurostat.

Expand full comment

Impressive work, Welcome the Eagle!

Expand full comment

Awesome work as ever Albert Eagle88 :) Hats off to you :)

Expand full comment