The goal of the post-processing script is to calculate the NonHAPTOG emissions that are assigned to each speciation profile, so that the work of speciating the NonHAPTOG can be done outside of MOVES. The SQL script is a translation of work done by Claudia Toro in R.
The SQL script was tested for matching exactly the output of Claudia's R script. That documentation can be found elsewhere in this repository.
The design of the script is based on a SQL procedure - a function that builds queries and executes them based on input. The calling of a SQL function is a lot like calling a function in other languages such as R, Java, and Go.
Procedures are created in a database, so the database name must be prepended to the procedure, unless the call is preceded by a USE statement. The inputs can be of any type, including strings.
xxxxxxxxxx-- Example Procedure CallCALL exampleDB.exampleProcedure(exampleInput1, 'exampleInput2');A procedure can also be called from the SQL command line, assuming it already exists.
xxxxxxxxxxmysql --user=moves --password=movesMariaDB [(none)]> CALL exampleDB.exampleProcedure(exampleInput1, 'exampleInput2');Finally, the procedure can be called from the command line in as a one-liner by using the --execute flag.
xxxxxxxxxxmysql --user=moves --password=moves --execute="CALL exampleDB.exampleProcedure(exampleInput1, 'exampleInput2')"When calling procedures using the --execute flag, care needs to be taken when handling strings. Because the code is, itself, a string, I stick to the rule that any string I pass into a procedure is wrapped in single quotes ('exampleInput2'), while commands are wrapped in double quotes.
The script does two things:
speciation_outside_moves_collected. speciate) that post-processes the output and writes the results to the collector database. The procedure is run on a MOVES output database, and uses a working database (similar in principle to the MOVES execution database). These are the procedure inputs, so calling the procedure looks like so:
xxxxxxxxxxCALL speciation_outside_moves_collected.speciate('db_results_inv_2018_20210412_batch0001_c34035_2018_7_other', 'speciate_working');The procedure can be called in the SQL script itself after it is defined, on any number of databases. This probably requires modifying the script with the procedure itself, which I'd rather not do for every instance this is called on.
A single SQL script can be created, where each line is a procedure call on an output database. For example, if we create a processor script "processNonHAPTOG.sql", like so:
xxxxxxxxxxCALL speciation_outside_moves_collected.speciate('db_results_inv_2018_20210412_batch0001_c34035_2018_10_start', 'speciate_working');CALL speciation_outside_moves_collected.speciate('db_results_inv_2018_20210412_batch0001_c34035_2018_11_start', 'speciate_working');CALL speciation_outside_moves_collected.speciate('db_results_inv_2018_20210412_batch0001_c34035_2018_12_start', 'speciate_working');CALL speciation_outside_moves_collected.speciate('db_results_inv_2018_20210412_batch0001_c34035_2018_1_other', 'speciate_working');-- more lines for other databases hereThen we can call 2 SQL scripts to do the processing:
xxxxxxxxxxmysql --user=moves --password=movesMariaDB [(none)]> source speciationProcedure.sql;MariaDB [(none)]> source processNonHAPTOG.sql;Alternatively, we could combine them into one:
xxxxxxxxxxsource speciationProcedure.sqlCALL speciation_outside_moves_collected.speciate('db_results_inv_2018_20210412_batch0001_c34035_2018_10_start', 'speciate_working');-- more lines for other databases hereAnother option is to do everything via batch using the --execute flag. The first step is to source the script, followed by any number of calls to the procedure.
xxxxxxxxxxmysql --user=moves --password=moves --execute="source speciationProcedure.sql;"mysql --user=moves --password=moves --execute="CALL speciation_outside_moves_collected.speciate('db_results_inv_2018_20210412_batch0001_c34035_2018_evp', 'speciate_working');"// more lines for other databases hereThis option requires more advanced scripting skill. Once the SQL file with the procedure is sourced, it's possible to call the procedure in a more programmatic way. A script in almost any language (R, python, PERL, Go, Java, etc.) can do the following:
This allows the flexibility to access the collector database after calling the procedure so that the tables can be written to file, modified if necessary, or for other operations.
The procedure itself has 8 steps. For each one, design decisions and assumptions are noted. Appendix 1 contains a list of these assumptions, so that they can be easily revisited later on.
This step creates all the tables in the collector database that are required if they do not already exists. The database has 6 tables. The first one is a base output table which contains the columns needed for SMOKE-MOVES, plus monthID:
xxxxxxxxxxCREATE TABLE IF NOT EXISTS speciation_outside_moves_collected.base_schema ( monthID SMALLINT(6), SMOKE_SCC VARCHAR(10), togSpeciationProfileID VARCHAR(10), pollutantID SMALLINT(6), pollutantName VARCHAR(50), SMOKE_mode VARCHAR(20), countyID INT(11), ratio DOUBLE, PRIMARY KEY (monthID, SMOKE_SCC, togSpeciationProfileID, pollutantID, pollutantName, SMOKE_mode,countyID));This base_schema table will not contain data. Instead, 4 tables are created for each SMOKE "mode". That way changing the schema in the future requires changing one table rather than 4. These map to each of the 4 csv files created by Claudia's R script.
xxxxxxxxxxCREATE TABLE IF NOT EXISTS speciation_outside_moves_collected.exh_nhtog LIKE speciation_outside_moves_collected.base_schema; CREATE TABLE IF NOT EXISTS speciation_outside_moves_collected.epm_nhtog LIKE speciation_outside_moves_collected.base_schema; CREATE TABLE IF NOT EXISTS speciation_outside_moves_collected.evp_nhtog LIKE speciation_outside_moves_collected.base_schema; CREATE TABLE IF NOT EXISTS speciation_outside_moves_collected.rfl_nhtog LIKE speciation_outside_moves_collected.base_schema;The final table created is a SMOKE-MOVES mapping table, which maps MOVES process, road type combinations to SMOKE processes. This is a utility table that's useful when speciating the output for SMOKE-MOVES. The data that populates this table is hardcoded in the script, so that there's no need to worry about moving around an additional csv or other text file in addition to the SQL file. Hardcoding is okay in this case as well because the mappings will not change very often.
xxxxxxxxxxCREATE TABLE IF NOT EXISTS speciation_outside_moves_collected.SMOKE_MOVES_mapping ( processID SMALLINT(6), processName VARCHAR(50), roadTypeID SMALLINT(6), rateTable VARCHAR(10), SMOKE_process SMALLINT(6), SMOKE_mode VARCHAR(20), PRIMARY KEY (processID, processName, roadTypeID, rateTable, SMOKE_process, SMOKE_mode));The mapping table can be seen in Appendix 2.
This step is fairly straightforward. It's worth noting, however, that the working database is not dropped after the script runs. Like the MOVES execution database, keeping it around after the script runs can be helpful for debugging.
This gets the default database used, CDB used, the county that was run, and the year. The databases and county are read from the movesrun table, while the year comes from the county database's year table.
The script assumes that each output database only has one MOVES run in it, therefore it only looks for one combination of default database, county database, and year. If there are multiple combinations in the output database, only the first will be run.
There are two areas of concern for this step:
fuelsupply and fuelformulation tables, along with the default database's regioncounty and fuelsubtype tables.This is straightforward. It simply selects the entire movesoutput table for NonHAPTOG (pollutantID 88) emissions, keeping only the columns that are relevant and skipping others, like nonroad columns hpID and sectorID.
First, the MOVES output emissions are split by fuelSubtypeID using the county's fuel mix and market shares. Then the emissions are assigned to their speciation profile according to process, fuel subtype, and regulatory class. The final step is to convert the MOVES SCCs to SMOKE SCCs and aggregate accordingly.
This produces an intermediate table called nonhaptog_speciated.
In this step, the final intermediate table is broken out according to "SMOKE mode" - exhaust, permeation, evap, and refueling - and inserted into their corresponding tables in the collector database. During this process, the raw emissions output is normalized to give a weight to each profile within an SCC.
The collector tables have a strict primary key (everything but the assigned ratio is in the key), so if multiple MOVES runs with overlapping emissions are gathered into the same collector database, the procedure will generate an error for having duplicate primary keys. This can be changed, if needed.
This is the final step, which updates the collector database tables as necessary. Right now, there are 2 adjustments:
The following list of assumptions and design decisions can be revisited and changed at any time.
INSERT INTO, so that duplicate keys don't get written. This assumes that none of the output databases have overlapping output (for example, if 2 runs have ONI activity). If this assumption isn't safe, either INSERT IGNORE INTO (which keeps the first data written and ignores the rest) or REPLACE INTO (which will overwrites previously existing data if necessary) can be used instead.| processID | processName | roadTypeID | rate | SMOKE_process | SMOKE_mode |
|---|---|---|---|---|---|
| 1 | Running Exhaust | 1 | RPHO | 92 | EXH_NHTOG |
| 1 | Running Exhaust | 2 | RPD | 72 | EXH_NHTOG |
| 1 | Running Exhaust | 3 | RPD | 72 | EXH_NHTOG |
| 1 | Running Exhaust | 4 | RPD | 72 | EXH_NHTOG |
| 1 | Running Exhaust | 5 | RPD | 72 | EXH_NHTOG |
| 2 | Start Exhaust | 1 | RPS | 72 | EXH_NHTOG |
| 11 | Evap Permeation | 1 | RPV | 72 | EPM_NHTOG |
| 11 | Evap Permeation | 2 | RPD | 72 | EPM_NHTOG |
| 11 | Evap Permeation | 3 | RPD | 72 | EPM_NHTOG |
| 11 | Evap Permeation | 4 | RPD | 72 | EPM_NHTOG |
| 11 | Evap Permeation | 5 | RPD | 72 | EPM_NHTOG |
| 12 | Evap Fuel Vapor Venting | 1 | RPP | 72 | EVP_NHTOG |
| 13 | Evap Fuel Leaks | 1 | RPV | 72 | EVP_NHTOG |
| 13 | Evap Fuel Leaks | 2 | RPD | 72 | EVP_NHTOG |
| 13 | Evap Fuel Leaks | 3 | RPD | 72 | EVP_NHTOG |
| 13 | Evap Fuel Leaks | 4 | RPD | 72 | EVP_NHTOG |
| 13 | Evap Fuel Leaks | 5 | RPD | 72 | EVP_NHTOG |
| 15 | Crankcase Running Exhaust | 2 | RPD | 72 | EXH_NHTOG |
| 15 | Crankcase Running Exhaust | 3 | RPD | 72 | EXH_NHTOG |
| 15 | Crankcase Running Exhaust | 4 | RPD | 72 | EXH_NHTOG |
| 15 | Crankcase Running Exhaust | 5 | RPD | 72 | EXH_NHTOG |
| 15 | Crankcase Running Exhaust | 1 | RPHO | 92 | EXH_NHTOG |
| 16 | Crankcase Start Exhaust | 1 | RPS | 72 | EXH_NHTOG |
| 17 | Crankcase Extended Idle Exhaust | 1 | RPH | 53 | EXH_NHTOG |
| 18 | Refueling Displacement Vapor Loss | 1 | RPD | 62 | RFL_NHTOG |
| 18 | Refueling Displacement Vapor Loss | 2 | RPD | 62 | RFL_NHTOG |
| 18 | Refueling Displacement Vapor Loss | 3 | RPD | 62 | RFL_NHTOG |
| 18 | Refueling Displacement Vapor Loss | 4 | RPD | 62 | RFL_NHTOG |
| 18 | Refueling Displacement Vapor Loss | 5 | RPD | 62 | RFL_NHTOG |
| 19 | Refueling Spillage Loss | 1 | RPD | 62 | RFL_NHTOG |
| 19 | Refueling Spillage Loss | 2 | RPD | 62 | RFL_NHTOG |
| 19 | Refueling Spillage Loss | 3 | RPD | 62 | RFL_NHTOG |
| 19 | Refueling Spillage Loss | 4 | RPD | 62 | RFL_NHTOG |
| 19 | Refueling Spillage Loss | 5 | RPD | 62 | RFL_NHTOG |
| 90 | Extended Idle Exhaust | 1 | RPH | 53 | EXH_NHTOG |
| 91 | Auxiliary Power Exhaust | 1 | RPH | 91 | EXH_NHTOG |