MOVES Speciation Post-Processing SQL Script

The goal of the post-processing script is to calculate the NonHAPTOG emissions that are assigned to each speciation profile, so that the work of speciating the NonHAPTOG can be done outside of MOVES. The SQL script is a translation of work done by Claudia Toro in R.

The SQL script was tested for matching exactly the output of Claudia's R script. That documentation can be found elsewhere in this repository.

SQL Procedures

The design of the script is based on a SQL procedure - a function that builds queries and executes them based on input. The calling of a SQL function is a lot like calling a function in other languages such as R, Java, and Go.

Procedures are created in a database, so the database name must be prepended to the procedure, unless the call is preceded by a USE statement. The inputs can be of any type, including strings.

A procedure can also be called from the SQL command line, assuming it already exists.

Finally, the procedure can be called from the command line in as a one-liner by using the --execute flag.

When calling procedures using the --execute flag, care needs to be taken when handling strings. Because the code is, itself, a string, I stick to the rule that any string I pass into a procedure is wrapped in single quotes ('exampleInput2'), while commands are wrapped in double quotes.

Script Design

The script does two things:

  1. Drops and creates the collector database, with the hardcoded name speciation_outside_moves_collected.
  2. Creates the procedure (called speciate) that post-processes the output and writes the results to the collector database.

The procedure is run on a MOVES output database, and uses a working database (similar in principle to the MOVES execution database). These are the procedure inputs, so calling the procedure looks like so:

Using the Script

Option 1: In the SQL script itself

The procedure can be called in the SQL script itself after it is defined, on any number of databases. This probably requires modifying the script with the procedure itself, which I'd rather not do for every instance this is called on.

Option 2: Create a second SQL script

A single SQL script can be created, where each line is a procedure call on an output database. For example, if we create a processor script "processNonHAPTOG.sql", like so:

Then we can call 2 SQL scripts to do the processing:

Alternatively, we could combine them into one:

Option 3: Batch/Bash script

Another option is to do everything via batch using the --execute flag. The first step is to source the script, followed by any number of calls to the procedure.

Option 4: Advanced scripting

This option requires more advanced scripting skill. Once the SQL file with the procedure is sourced, it's possible to call the procedure in a more programmatic way. A script in almost any language (R, python, PERL, Go, Java, etc.) can do the following:

  1. Source the SQL script "speciationProcedure.sql"
  2. Take advantage of the naming convention for the output databases and list the databases that need to be run through the procedure
  3. For each database, call the procedure

This allows the flexibility to access the collector database after calling the procedure so that the tables can be written to file, modified if necessary, or for other operations.

Procedure Design

The procedure itself has 8 steps. For each one, design decisions and assumptions are noted. Appendix 1 contains a list of these assumptions, so that they can be easily revisited later on.

Step 1: Prepare the collector database

This step creates all the tables in the collector database that are required if they do not already exists. The database has 6 tables. The first one is a base output table which contains the columns needed for SMOKE-MOVES, plus monthID:

This base_schema table will not contain data. Instead, 4 tables are created for each SMOKE "mode". That way changing the schema in the future requires changing one table rather than 4. These map to each of the 4 csv files created by Claudia's R script.

The final table created is a SMOKE-MOVES mapping table, which maps MOVES process, road type combinations to SMOKE processes. This is a utility table that's useful when speciating the output for SMOKE-MOVES. The data that populates this table is hardcoded in the script, so that there's no need to worry about moving around an additional csv or other text file in addition to the SQL file. Hardcoding is okay in this case as well because the mappings will not change very often.

The mapping table can be seen in Appendix 2.

Step 2: Drop and create working database

This step is fairly straightforward. It's worth noting, however, that the working database is not dropped after the script runs. Like the MOVES execution database, keeping it around after the script runs can be helpful for debugging.

Step 3: Read MOVES output metadata from output database

This gets the default database used, CDB used, the county that was run, and the year. The databases and county are read from the movesrun table, while the year comes from the county database's year table.

The script assumes that each output database only has one MOVES run in it, therefore it only looks for one combination of default database, county database, and year. If there are multiple combinations in the output database, only the first will be run.

Step 4: Get prerequisite data from the default database

There are two areas of concern for this step:

  1. The county's fuel mix is determined using the county databases fuelsupply and fuelformulation tables, along with the default database's regioncounty and fuelsubtype tables.
  2. The speciation profiles are obtained from the default database.

Step 5: Get NonHAPTOG emissions from output database

This is straightforward. It simply selects the entire movesoutput table for NonHAPTOG (pollutantID 88) emissions, keeping only the columns that are relevant and skipping others, like nonroad columns hpID and sectorID.

Step 6: Assign NonHAPTOG emissions to speciation profiles

First, the MOVES output emissions are split by fuelSubtypeID using the county's fuel mix and market shares. Then the emissions are assigned to their speciation profile according to process, fuel subtype, and regulatory class. The final step is to convert the MOVES SCCs to SMOKE SCCs and aggregate accordingly.

This produces an intermediate table called nonhaptog_speciated.

Step 7: Write the output to the collector database

In this step, the final intermediate table is broken out according to "SMOKE mode" - exhaust, permeation, evap, and refueling - and inserted into their corresponding tables in the collector database. During this process, the raw emissions output is normalized to give a weight to each profile within an SCC.

The collector tables have a strict primary key (everything but the assigned ratio is in the key), so if multiple MOVES runs with overlapping emissions are gathered into the same collector database, the procedure will generate an error for having duplicate primary keys. This can be changed, if needed.

Step 8: Adjust Factors as Necessary

This is the final step, which updates the collector database tables as necessary. Right now, there are 2 adjustments:

  1. All non-January CNG factors need to be changed to match the January factors.
  2. For years before 2010, all non-January factors for diesel source types 51, 61, and 62 need to be changed to match the January factors.

Appendix 1: List of Assumptions and Design Decisions

The following list of assumptions and design decisions can be revisited and changed at any time.

Appendix 2: SMOKE-MOVES SCC Mapping

processIDprocessNameroadTypeIDrateSMOKE_processSMOKE_mode
1Running Exhaust1RPHO92EXH_NHTOG
1Running Exhaust2RPD72EXH_NHTOG
1Running Exhaust3RPD72EXH_NHTOG
1Running Exhaust4RPD72EXH_NHTOG
1Running Exhaust5RPD72EXH_NHTOG
2Start Exhaust1RPS72EXH_NHTOG
11Evap Permeation1RPV72EPM_NHTOG
11Evap Permeation2RPD72EPM_NHTOG
11Evap Permeation3RPD72EPM_NHTOG
11Evap Permeation4RPD72EPM_NHTOG
11Evap Permeation5RPD72EPM_NHTOG
12Evap Fuel Vapor Venting1RPP72EVP_NHTOG
13Evap Fuel Leaks1RPV72EVP_NHTOG
13Evap Fuel Leaks2RPD72EVP_NHTOG
13Evap Fuel Leaks3RPD72EVP_NHTOG
13Evap Fuel Leaks4RPD72EVP_NHTOG
13Evap Fuel Leaks5RPD72EVP_NHTOG
15Crankcase Running Exhaust2RPD72EXH_NHTOG
15Crankcase Running Exhaust3RPD72EXH_NHTOG
15Crankcase Running Exhaust4RPD72EXH_NHTOG
15Crankcase Running Exhaust5RPD72EXH_NHTOG
15Crankcase Running Exhaust1RPHO92EXH_NHTOG
16Crankcase Start Exhaust1RPS72EXH_NHTOG
17Crankcase Extended Idle Exhaust1RPH53EXH_NHTOG
18Refueling Displacement Vapor Loss1RPD62RFL_NHTOG
18Refueling Displacement Vapor Loss2RPD62RFL_NHTOG
18Refueling Displacement Vapor Loss3RPD62RFL_NHTOG
18Refueling Displacement Vapor Loss4RPD62RFL_NHTOG
18Refueling Displacement Vapor Loss5RPD62RFL_NHTOG
19Refueling Spillage Loss1RPD62RFL_NHTOG
19Refueling Spillage Loss2RPD62RFL_NHTOG
19Refueling Spillage Loss3RPD62RFL_NHTOG
19Refueling Spillage Loss4RPD62RFL_NHTOG
19Refueling Spillage Loss5RPD62RFL_NHTOG
90Extended Idle Exhaust1RPH53EXH_NHTOG
91Auxiliary Power Exhaust1RPH91EXH_NHTOG