The Data |
The data is assembled from several sources. One is US data about the virus in different geographic areas, assembled by the New York Times [NYT]. This consists of two separate tables, one with data with a State granularity, the other with details from every County in each state [1,2,3]. These two tables are called statsS and statsC for the statistics about the states and counties, respectively. They contain a date column (date), the number of days measured from 2020-01-21 (dd), the name of the state (state), the county (county), the cumulative number of cases testing positive (cases), and the number of deaths (deaths), as reported on every day. We have also added a few other tables. One contains simple data about each of the states, the name, the (FIPS code, and the postal code). We also have harvested a simple census summary database for each county from the US Census Bureau. This table contains teh FIPS code, state, county, population (Population), housing units (HousingUnits), the total area (TotalArea), water area (WaterArea), land area (LandArea), population density (PopDensity) and density of houses (HouseDensity). This can provide relevant sociometric information about the importance of different factors in infection rates. The census data has been modified to be consistent with the NYT data, i.e. the two cities [1,2] have been added, and the relevant counties have been removed. In order to properly capture the average population density, we have used a population weighted averaging procedure over the different counties. An addition table is about policies and interventions that the different states and the federal authorities have implemented. The source of this is on github, created by Jie Ying Wu of JHU. The columns are FIPS (fips), state, county, and a column for each type of intervention, containing the day when the ruling was announced. The origin has been modified to 2020-01-21, matching the startt date of the NYT data. The actions are (with column names): stay at home (stayhome), no gatherings for more then 50 people (gather50), no gatherings for more than 500 people (gather500), schools closed (schools), no dine-in inrestuarants (dinein), no public entertainment or public gym use (entgym), and a federal ban on foreign travel (fedGuide). Here are some relevant SQL files for creating the database. These may be useful to see how a real life application is put together. Notes: |
Using the data through SQL: |
|
Using the data with iPython |
|