Exploring the data.

Dedicated pages in these sub-sections are allowing in-depth exploration of each source's data if you're running the project localy.
Given generating these in-depth analytics requires to parse millions of reports, it consumes too much resources to allow these pages directly online - at this stage.
We are actively working on optimizing this to allow these direct online searches.
Our data is collected using scrappers (programs dedicated to fetch the data on a given website, written in Perl 5) browsing public sources :
Algorithms' behaviors are detailed on each dedicated page.
CDC & ECDC's data are updated weekly, every Friday.
Other public sources have been reviewed but haven't been integrated for reasons detailed on their pages :

Running the project.

The complete project code can be downloaded from the Github repository. The code can be used & modified freely, assuming a spirit of neutrality toward the data is preserved. Librairies used are subject to their own specific licences.

General Dependencies.

You'll need an SQL compatible database local server (for instance, MySQL Community Server - downloadable here).
Beware MYSQL Server's Password Encryption must be set set as "Legacy" to support Perl's DBI Driver.
You'll need Google Chrome & the Selenium's driver corresponding to your OS in order to use the CDC & ECDC scrappers. Just install Chrome, and place your unzipped Selenium driver on the project's folder.

Running The Project On Windows.

You'll need Strawberry Perl (downloadable here) (recommended) or Active Perl (downloadable here) to run the code.
You'll need the required libraries, which you can install using cpanminus, by double-clicking the config/dependencies.bat file.

Running The Project On Linux.

You'll need the required libraries, which you can install using cpanminus, by executing the ./config/dependencies.sh file.

Database Building.

Having MySQL server installed, you can build the database from the scratch by simply applying data/history.sql (database has it has been built during the project construction).

Configuration File.

Having the database built from the history file, you must configure the config/openvaet.conf file (simply open it in a text editor such as Notepad, SublimeText, etc.).
  • Replace the "secrets" entry by a 40 chars random string.
  • Replace the "databaseHost", "databaseUser", "databasePort", "databasePassword" by the ones you configured by installing MySQL Server (remember your MyQSL Password Encryption must be set to Legacy to support Perl's Driver).
  • Save your changes & exit.

Reviewing the code.

All scripts required to run the project are Perl scripts, which means you can freely edit & read the code.
Despite our best efforts to guarantee the code is bug-free and readable (and as a general guideline for your own security), we strongly encourage you to review the code.

Indexing or Updating CDC's Data.

Download the up to date VAERS data ("All Years Data*") (available on this URL).
The script will parse & index in the database the VAERS data, to allow a more optimized post-treatment.

Indexing or Updating ECDC's Data.

Open a terminal in the project's folder & simply run "perl tasks/ecdc/get_eu_database.pl" (without quotes).
ECDC's platform is exceptionally unstable and crashes very regularly.
Although we did our best to make it as resilient as possible (2000 more lines than CDC's scrapper to do the same thing...), it still crashes - if only because ECDC's regularly trigger random alerts blocking the scrapper or simply goes down for hours...
Therefore, ECDC's update must be - at this stage - monitored. If the script fails, simply close its chrome session & resume it.
You'll need to run the scrapper a first time to index all substances.
Then, activate the substances you wish to index from the substances' page of the ECDC section linked above (We originally went for indexing all substances, but ECDC's platform is, to date, simply too unstable to update it weekly this way).

Unifying the data.

Open a terminal in the project's folder & simply run "perl tasks/generate_stats.pl" (without quotes). This script will generate the "end-user" JSON files used by the interface from the CDC & ECDC data.