SCADA Historian and Data Logging Design

Process historian systems for industrial data collection, compression, storage, and retrieval — OSIsoft PI, Ignition, and open-source options.

What Is a Process Historian?

A process historian is a time-series database optimized for high-speed industrial process data. Unlike relational databases optimized for transactional operations, historians are designed to ingest thousands of tags at 1-second or faster rates, store data efficiently using compression algorithms tailored to process signals, and retrieve it rapidly for trend display, reporting, and analysis. Major commercial historians include OSIsoft PI System (now part of AVEVA), Honeywell Uniformance PHD, AspenTech InfoPlus.21, and Ignition Tag Historian from Inductive Automation. Open-source options include InfluxDB with Telegraf collectors and TimescaleDB for PostgreSQL-based deployments.

The historian is the institutional memory of a plant. Process engineers, operations managers, and reliability teams depend on it for performance analysis, troubleshooting, compliance reporting, and optimization. Sites that lose historian data through server failure, storage exhaustion, or configuration errors lose the ability to reconstruct events, prove regulatory compliance, or correlate production data with quality outcomes.

Tag Hierarchy and Organization

Historian tag organization significantly affects long-term usability. Ad-hoc tag naming conventions become incomprehensible within a few years as personnel change. Best practice follows the ISA-88 equipment hierarchy: Enterprise / Site / Area / Unit / Equipment / Measurement. An example well-structured tag name: SITE1.AREA2.UNIT3.PUMP_001.DISCHARGE_PRESSURE. This hierarchy allows wildcard queries such as requesting all measurements from Unit 3 and makes data discoverable without requiring documentation lookup.

Tag attributes including engineering units, description, normal operating range, alarm limits, instrument tag number, and P&ID reference should be populated in the historian for every tag. This metadata makes the historian self-documenting and allows generic client tools to display tags intelligently without hardcoded configuration. Many historian platforms support custom attribute fields that can hold additional metadata like equipment class, process safety criticality level, or calibration interval.

Data Compression: Exception Reporting and Swinging Door

Raw process data at 1-second scan rates generates enormous volumes. A 10,000-tag system produces 864 million data points per day. Process historians use compression algorithms that store data efficiently without sacrificing analytical fidelity. The two dominant approaches are exception reporting and the swinging door algorithm.

Exception reporting (deadband compression) stores a new value only when the measured value changes by more than a configured deadband threshold. For a stable pressure transmitter with 0.1% noise, a 0.5% deadband eliminates 95%+ of raw scans while preserving all meaningful changes. The stored exception data exactly reproduces the original signal when retrieved with step interpolation.

The swinging door algorithm used by OSIsoft PI stores values when the signal deviates from a linear interpolation between the last two stored points by more than a compression deviation threshold. This compresses slowly-ramping signals such as temperatures and levels more aggressively than deadband compression while maintaining the ability to reconstruct the signal shape accurately. Compression ratios of 10:1 to 50:1 are typical for well-tuned historians, meaning a 10,000-tag system may store only 20-100 million points per day rather than 864 million.

Data Collection Architecture

Historian data collection uses several architectural patterns. The collector runs on a server that has network access to the data source, which could be an OPC-DA or OPC-UA server, DCS historian bridge, ICCP link, or direct communication to PLCs and RTUs. The collector applies exception reporting before sending data to the historian server, reducing network bandwidth. For distributed sites, collectors run locally at each site and forward compressed data to a central historian over the WAN.

OPC (OLE for Process Control) is the universal collection interface. OPC-DA (Data Access) is the older COM-based standard still widely used with Windows-based DCS and SCADA systems. OPC-UA (Unified Architecture) is the modern platform-independent standard that provides both data access and alarm/event collection in a single protocol with built-in security. Most major historian platforms now prefer OPC-UA collection because it eliminates DCOM configuration complexity and runs on non-Windows platforms.

Collection redundancy by running duplicate collectors that both connect to the source and submit data to the historian prevents data gaps from collector host failures. The historian deduplicates the incoming data and stores only unique time-value pairs, so duplicate submissions cause no harm beyond slightly increased network traffic.

Retrieval and Analytics

Historian data retrieval APIs provide several interpolation modes that affect how data is returned for trend display and reporting. Raw retrieval returns exactly the stored compressed values and is fast but returns uneven point density. Interpolated retrieval returns values at a uniform time interval by linear interpolation between stored points, which is useful for calculations and reports that require evenly spaced data. Sampled retrieval returns a compressed subset of raw data optimized for trend display at a specific pixel width using downsampling algorithms that preserve visual shape with far fewer points than raw retrieval.

Advanced analytics on historian data include batch analysis (aligning batch runs by phase rather than calendar time to compare similar batches), regression calculations (correlating multiple tag values to identify root causes of quality variation), and equipment health monitoring (trending vibration, temperature, and current signatures over months to detect degradation before failure). Many historian platforms now include built-in analytics modules or connect to Python/R environments via REST APIs for custom analysis.

Compliance and Long-Term Archiving

Many industries require long-term retention of process data for regulatory compliance. EPA continuous emissions monitoring (CEMS) data must be retained 5 years; pharmaceutical batch records under 21 CFR Part 11 require retention through the product shelf life plus one year; nuclear power plant data retention extends to the plant license period. These requirements drive archiving strategies that move aged data from online high-speed storage to lower-cost archive tiers while maintaining retrieval capability.

Data integrity for compliance requires that stored values be immutable. The historian must prevent retroactive modification of historical data. Audit trail logging of any manual data entry or deletion is required for 21 CFR Part 11 applications. Some regulatory applications require electronic signatures and time-stamped records of any manual entries. The historian platform compliance capabilities must be evaluated carefully for regulated industry applications since not all commercial historians meet 21 CFR Part 11 requirements out of the box.

📊 SCADA Historian and Data Logging Design