Version 2 (modified by csa, 8 years ago)
--

Raw Data Filtering in ADEI

The filters are implemented in 'classes/filters'. There is 2 types of filters.

Group filters

Standard ones which are applied to all items in the group. A simplest example is badvaluefilter.php which search values of all group items for the specified value and replaces it to NULL (this is, for example, used to get read of 900 in temperatures which value is used to indicate errors by Armen). You also can drop complete vector by returning true from ProcessVector? function.

The configuration is pretty stright forward. You just need to add "data_filters" array in the stanard ADEI options, like:

"data_filters" => array(
   "BADVALUEFilter" => array (
   "badvalue" => 900
  )
)

You will find few examples in the KATRIN configuration. All inside the inner array will be passed to constructor as parameters.

Channel Filters

The second type is filters applied to individual channels. Check item/rangeitemfilter.php which will check if the value is in the configured range [min,max] or will replace it with NULL otherwise.

Here is configuration more elaborate. Just check ipecube configuration for the test kitcube setup. You still use "data_filters" option, but you need to provide list of affected channels.

"data_filters" => array(
   array (
    "class" => "ITEMFilter",
    "filter" => "RANGEItemFilter",
    "item_mask" => array(
        array(
          "key" => "item_dependency_column",
          "items" => "/^PARS.PARS1.O.ND2.001.SUM$/",
        ), array(
          "key" => "item_extractor",
          "items" => "/^SUMExtractor$/",
       )
    ),
    "min" => 100
   ),
)

Here "class" is always "ITEMFilter". The "filter" specifies which item filter you actually want to use. The item_mask contains 1 or more regexps to select channels. The "items" is regexp and "key" specifies the channel metadata to apply regexp to. In the example, it is pretty tricky. But normally you can just use "key" => "name" to filter channels based on their title. Finally, the content of second array is passed to the constructor, so you can add any free parameters later, like "min" in the examples.