This is an implementation of an archiver for EPICS control systems that aims to archive millions of PVs.
Here are the main features.

System requirements

These are the prerequisites for the EPICS archiver appliance. Optionally, we'd need In terms of hardware, for production systems, we'd need a reasonably powerful server box with lots of memory for each appliance. For example, we use 24 core machines with 128GB of memory and 15K SAS drives for medium term storage.

Storage

Out of the box, the following storage technologies/plugins are supported. To add support for other storage technologies - see the customization guide for details.

Architecture

Each appliance consists of 4 modules deployed in Tomcat containers as separate WAR files. For production systems, it is recommended that each module be deployed in a separate Tomcat instance (thus yielding four Tomcat processes). A sample storage configuration is outlined below where we'd use
  1. Ramdisk for the short term store - in this storage stage, we'd store data at a granularity of an hour.
  2. SSD/SAS drives for the medium term store - in this storage stage, we'd store data at a granularity of a day.
  3. A NAS/SAN for the long term store - in this storage stage, we'd store data at a granularity of a year.
Architecture of a single appliance
A wide variety of such configurations is possible and supported. For example, if you have a powerful enough NAS/SAN, you could write straight to the long term store; bypassing all the stages in between.
The long term store is shown outside the appliance as an example of a commonly deployed configuration. There is no necessity for the appliances to share any storage; so both of these configurations are possible.
Multiple appliances into one long term store
Multiple appliances sending data into one long term store
Multiple appliances into different long term stores
Multiple appliances sending data into different long term stores

Policies

All of the various configurations can get quite tricky for end users to navigate. Rather than expose all of this variation to the end users and to provide a simple interface to end users, the archiver appliance uses policies. Policies are Python scripts that make these decisions on behalf of the users. Policies are site-specific and identical across all appliances in the cluster. When a user requests a new PV to be archived, the archiver appliance samples the PV to determine event rate, storage rate and other parameters. In addition, various fields of the PV like .NAME, .ADEL, .MDEL, .RTYP etc are also obtained. These are passed to the policies python script which then has some simple code to configure the detailed archival parameters. The archiver appliance executes the policies.py python script using an embedded jython interpreter. Policies allow system administrators to support a wide variety of configurations that are more appropriate to their infrastructure without exposing the details to their users.

Clustering

While each appliance in a cluster is independent and self-contained, all members of a cluster are listed in a special configuration file (typically called appliances.xml) that is site-specific and identical across all appliances in the cluster. The appliances.xml is a simple XML file that contains the ports and URLs of the various webapps in that appliance. Each appliance has a dedicated TCP/IP endpoint called cluster_inetport for cluster operations like cluster membership etc.. One startup, the mgmt webapp uses the cluster_inetport of all the appliances in appliances.xml to discover other members of the cluster. This is done using TCP/IP only (no need for broadcast/multicast support).

The business processes are all cluster-aware; the bulk of the inter-appliance communication that happens as part of normal operation is accomplished using JSON/HTTP on the other URLs defined in appliances.xml. All the JSON/HTTP calls from the mgmt webapp are also available to you for use in scripting, see the section on scripting.

The archiving functionality is split across members of the cluster; that is, each PV that is being archived is being archived by one appliance in the cluster. However, both data retrieval and business requests can be dispatched to any random appliance in the cluster; the appliance has the functionality to route/proxy the request accordingly.

Appliance 1 proxies data retrieval request for PV being archived by appliance 2.
In addition, users do not need to allocate PVs to appliances when requesting for new PVs be archived. The appliances maintain a small set of metrics during their operation and use this in addition to the measured event and storage rates to do an automated Capacity Planning/load balancing.

Scripting

The archiver appliance comes with a web interface that has support for various business processes. The web interface communicates with the server principally using JSON/HTTP. The same web service calls are also available for use from external scripting tools like Python.
A sample python script that prints out all the PVs in the cluster of appliances.
Click here for a list of business logic accesible thru scripting.

Screenshots

A screenshot of the home page.
We offer a wide variety of reports.
Metrics maintained by the appliances.
The appliances page offers a quick view of some JVM parameters.