Cache File Operation

This documents how caching works within EP-Launch. Caching is how workflow runs and output data are persisted on disk.

High Level Overview

At a high level, caching simply occurs by persisting a JSON file in a run directory. When a workflow starts, if the cache file does not exist, it is created. If it already exists, it is read, updated, and re-written. The cache file includes input parameters, including workflow name, and output parameters as defined by the workflow. When a workflow is done, the cache file for that directory is updated with output data. When a user browses to a folder in EP-Launch, if it has a cache file, that is parsed and previous output data is shown.

Detailed Operation

In real operation within EP-Launch, there are complications that make the operation a difficult problem:

  • EP-Launch allows multiple workflows to be running, even within the same folder, and on the same file.
  • It is completely uncertain as to when workflows will complete, two workflows could complete in the same directory at essentially the same time.

The full documentation of the CacheFile class is shown below: Cache Module Auto-Documentation. The GUI creates instances of this class to read or write cache data to disk. This list covers the important parts of the caching operation in EP-Launch:

  • When a new folder is selected, a CacheFile instance is created to read data from disk, then released.
  • When a workflow is run, a CacheFile in the current directory is opened and workflow parameters are written, including workflow name, weather file name, and other data.
  • When a workflow is completed, a CacheFile is retrieved for the workflow’s directory, results are added from the workflow, and the cache is written.

Cache File Layout

The cache file is a simple JSON file. At the root of the JSON is an object with a single key “workflows”, that captures the entire context The value of this key is another object with keys for each workflow. The value of each workflow key is an object with a single key, “files”, whose keys correspond to files that have been run for this workflow. Each file object has two keys: “config” and “result”. The config key captures any input data related to this run, for now it is only weather data. The result key captures all the output column data corresponding to this workflow run.

An example of the layout is provided here:

{
  "workflows": {
    "Get Site:Location": {
      "files": {
        "1ZoneEvapCooler.idf": {
          "config": {
            "weather": ""
          },
          "result": {
            "Site:Location []": "Denver Centennial CO USA WMO=724666"
          }
        }
      }
    },
    "EnergyPlus 8.9 SI": {
      "files": {
        "1ZoneEvapCooler.idf": {
          "config": {
            "weather": "MyWeather.epw"
          },
          "result": {
            "Errors": 0,
            "Warnings": 1,
            "Runtime [s]": 1.23,
            "Version": "8.9"
          }
        },
        "RefBldgMediumOfficeNew2004_Chicago.idf": {
          "config": {
            "weather": ""
          },
          "result": {
            "Errors": 0,
            "Warnings": 4,
            "Runtime [s]": 1.58,
            "Version": "8.9"
          }
        }
      }
    }
  }
}

Future Work

Timestamps need to be added to the run data to easily check for stale results when input files are changed.

Cache Module Auto-Documentation

This is the auto-generated documentation of the Cache module that may provide a deeper understanding of the topics described above.

class eplaunch.utilities.cache.CacheFile(working_directory)

Bases: object

Represents the file that is kept in each folder where workflows have been started Keeps track of the most recent state of the file, with some metadata that is workflow dependent

Usage:

To ensure thread-safety, this class employs a form of a mutex, where the unique id is the current directory Any worker function that wants to alter the queue should follow the following process:

  • The worker should call the ok_to_continue() function, which will check the mutex and then wait a predetermined amount of time for the mutex to clear, or fail.
  • The worker should check the return value of this function and if False, fail. If True, it should setup a block on the directory by adding the current directory to the cache_files_currently_updating_or_writing array
  • The worker can then proceed to read the cache, modify ir, and write to disk
  • The worker must then release the mutex by removing the current directory from the list
FileName = '.eplaunch'
FilesKey = 'files'
ParametersKey = 'config'
QueueCheckInterval = 0.1
QueueTotalCheckTime = 5
ResultsKey = 'result'
RootKey = 'workflows'
WeatherFileKey = 'weather'
add_config(workflow_name, file_name, config_data)

This function is used to add a config data block for a workflow. A config data block contains data that is generally thought of as “input data” for a workflow, such as a weather file for a simulation run.

Parameters:
  • workflow_name – The name of the workflow to alter, as given by the workflow’s name() method
  • file_name – The file name of the file to alter
  • config_data – A map of data to write to this config section
Returns:

None

add_result(workflow_name, file_name, column_data)

This function is used to add a result data block for a workflow. A result data block contains data that is generally thought of as “output data” for a workflow, such as energy usage for a simulation run.

Parameters:
  • workflow_name – The name of the workflow to alter, as given by the workflow’s name() method
  • file_name – The file name of the file to alter
  • column_data – A map of data to write to this result section, the keys are expected to be defined by the workflow itself as given by the get_interface_columns() method
Returns:

None

get_files_for_workflow(current_workflow_name)

Gets a list of files that are found in this cache inside the given workflow name

Parameters:current_workflow_name – The name of a workflow (as determined by the name() function on the workflow)
Returns:A map with keys that are file names found in this workflow
ok_to_continue()

This function does the check-and-wait part of the mutex. If the current directory is not blocked, it immediately returns. If the current directory is blocked, it will attempt to check over a certain amount of time, at a tight interval, to wait on the mutex to be unlocked. Ultimately if it can’t pass, it returns False.

Returns:True or False, whether it it safe to write to this cache
read()

Reads the existing cache file, if it exists, and stores the data in the workflow_state instance variable. If the cache file doesn’t exist, this simply initializes the workflow_state instance variable.

Returns:None
write()

Writes out the workflow state to the previously determined cache file location Note that this function does not protect for thread-safety! It is expected that functions who are altering the state of the cache should call write() within their own blocking structure

Returns:None
eplaunch.utilities.cache.cache_files_currently_updating_or_writing = []

This is used as the mutex queue, the list of unique directories being altered at a given time