2. Multivariate data analysis of online-sensors and spectroscopic data for the prediction of solvent composition parameters for MEA (2022)
Lars E. Williamsa, Audun Dragesetb*, Bjørn Grungaa
aUniversity of Bergen, Department of Chemistry, Allégaten 41, 5020 Bergen, Norway bTechnology Centre Mongstad (TCM), 5954 Mongstad, Norway
Cost-effective operation of amine-based post-combustion CO2 capture facilities is important for successfully implementing the technology on a broad industrial scale to reach current climate objectives. Technology Centre Mongstad has benchmarked performance of such technologies in a generic amine plant since 2012. This work utilized historic plant process and laboratory data collected during a test campaign with 2-aminoethan-1-ol (MEA) in 2015. The aim of this work was to employ multivariate analysis to develop models to predict laboratory results for CO2 content (Total Inorganic Carbon) and amine functionalities (total alkalinity) in the amine solvent. Predictive models were made based on process variables alone, spectroscopic data alone and data fusion models. The process model could explain 99% of the variance for total inorganic carbon in the Lean solvent stream. The Rich solvent is more chemically complex and requires the use of spectroscopic data to explain 95-99% of the variance. In this work we demonstrated how multivariate data analysis can be employed to predict solvent parameters that can be reported in real time for improved control of the capture process.
Decarbonizing heavy industries is a key for achieving the carbon mitigation goals outlined in the 6th report from the Intergovernmental Panel on Climate Change (IPCC-6) [1]. Amine-based carbon capture is among the most mature technologies for decarbonizing existing industrial point sources for CO2 emissions. Technology Centre Mongstad (TCM) has operated and demonstrated generic and proprietary amine solvents for post-combustion carbon capture (PCCC) since 2012 [2]. TCM is located on the west coast of Norway in the vicinity of Equinor’s oil refinery at Mongstad. With access to two distinctly different industrial flue gases: combined-cycle gas turbine (CCGT)-based combined-heat-and-power plant (CHP) and RFCC (Residual fluid catalytic cracker) and the ability to manipulate these flue gases (through dilution and CO2 recycling), TCM can assess CO2 capture technologies under conditions that are representative of multiple industries emissions [3]. Among the main objectives of TCM’s test campaigns are risk reduction (economic and environmental) for commercial application and full-scale deployment of Carbon Capture and Storage (CCS). Key among these test campaigns are the open test campaigns with non-proprietary solvents like aqueous 2-aminoethan-1-ol (commonly known as Monoethanolamine or MEA) and the aqueous blend of 2-amino-2-methylpropan-1-ol (AMP) and Piperazine (PZ). Data and learnings from these campaigns can be disseminated in line with TCM’s purpose to ensure safe technology implementation to combat climate change.
MEA is a first-generation amine-based CO2 capture solvent. Amine based absorption is a reversible reaction between an aqueous amine and an industrial flue gas containing CO2. The basic amine functionality reacts with CO2 to form a carbamate, removing the CO2 from the gas and trapping it in the liquid phase. This reaction is carried out in the absorber containing a packed gas-liquid contactor to ensure high mass transfer between the two phases at lower temperatures (30 – 60 °C, depending on solvent and plant configuration). The reaction can then be reversed by applying heat via a steam boiler in the stripper section (see Figure 1). CO2 is released as a product gas and the liquid amine is regenerated and returned to the absorber. A capture plant operating with MEA can capture over 90% of the CO2 (advanced solvents have demonstrated over 98% capture) and generates CO2 product with high purity (99.9%) [3a]. The major challenge is the operational cost [4]. The capture plant should be operated under optimal conditions to minimize energy consumption mainly tied to removing CO2 from the solvent. This is often reported as Specific reboiler duty (SRD). To achieve this, operators are reliant on accurate gas composition data at the inlet and outlet of the absorber as well as the solvent composition. Gas composition is monitored online (via gas chromatography or infrared spectroscopy) and operators can quickly act on any changes in for example the CO2 concentration from the industrial source. In contrast, the solvent composition is usually measured through extractive samples and laboratory analysis, and results are only available after multiple hours and in some cases days.
This work utilized plant data from TCM’s MEA test campaign conducted in 2015 (July to October) funded by Gassnova, Equinor (former Statoil), Shell and Sasol (TCM’s owners in that period). The campaign’s primary objective was to conduct an updated baseline and plant performance with MEA and to verify plant mass balance over a set of operational conditions, as well as other technology knowledge gaps [5]. The plant was operated with a 30 wt% aqueous MEA solution and the CHP flue gas (3.6 vol% CO2) for most of the test period. Throughout the campaign laboratory samples were collected and analysed to (a) ensure tests were conducted with correct amine concentration, (b) record resulting lean loading (mole CO2 per mole of amine) during process optimization and (c) monitor solvent degradation and plant corrosion.
The TCM amine plant has a large array of analytical instruments in the rich (CO2 rich solvent after absorption) and lean (CO2 lean solvent after stripping solvent) streams. Among these are temperature, pH, conductivity, density, and pressure, see Figure 1. The data is used for general purpose applications in characterising the physical parameters during a test campaign. Such analytical instruments can potentially be used to predict solvent parameters currently only available via laboratory analysis like (1) Total Inorganic Carbon (TIC); a measure of CO2 in the solvent, (2) Total alkalinity (total NH functionality in the solvent as determined via an acid base titration) and (3) amine concentration. These are used to calculate the CO2 loading of the solvent (mole CO2 per mole of amine). During the campaign Total alkalinity was used as an analogue for amine concentration as the NH functionality is predominantly from MEA. It is expected that predictive models would benefit from the addition of chemical information acquired through spectroscopy, as spectroscopic methods like FTIR can give information about chemical bonds and functionality present in the solvent. Such chemical information is necessary if the degradation [6] of the solvent and its impact on the plant are to be monitored online [7].
In this paper we present how common online measurement principles like pH and conductivity can be used to predict solvent parameters which were previously only obtained through laboratory analysis. In addition, the limitations of this concept as well as how the addition of spectroscopy can improve the model accuracy is discussed. The implementation of such models can improve plant efficiency and lower the frequency of sampling and analysis resulting in a reduction of both exposure risks to operators and as well as costs for operating a laboratory.
Latent variables are linear combinations of the measured variables. They represent excellent tools for data visualization and quantitative modelling. Two main types of latent variable analysis have been used in this work. Principal Component Analysis (PCA) [8] and Partial Least Squares Regression (PLS) [9].
Any data set not based on an orthogonal design will have correlations among the variables. This makes it possible to go from a large number of measured variables to a much smaller number of latent variables while preserving the information content of the data. This concept is particularly useful for data exploration but can also be used in classification and regression analysis. In PCA, latent variables are referred to as principal components. These are constructed so that they capture as much of the variance in the data as possible. This is known as the maximum variance criterion. Each measured variable contributes to each latent variable, but the amount of contribution is different for different variables. Each measured object has a score on each latent variable, just like every object has a value for the measured variables. The collection of scores on a latent variable is referred to as a score vector, and a bivariate scatter plot of the first two score vectors after PCA is the two-dimensional plot that explains as much variation as possible. For this reason, PCA is extremely popular for data exploration, and it is used in a variety of scientific fields, although under different names.
All data contains noise. This fact means that the usefulness of principal components extends beyond data exploration. They can be used in classification and discrimination analysis, and regression. This can be done by calculating enough principal components to capture the systematic variation in the data. Using these principal components in further analysis and ignoring the residual variation left unmodelled ensures that noise in the data does not pollute the quality of the subsequent models. In this aspect, PCA can be seen as a denoising technique.
In this work, principal components have been used for data fusion [10], which is the term used for uniting different sets of data into one data set. The present work deals with traditional process data and infrared spectra used to characterize the state of an amine solution used to absorb CO2. It is tempting to combine these two measurement types to better describe the system’s state. Doing so, one immediately finds that the number of relevant spectral variables is significantly larger than the number of process variables. Simply fusing the two sets of measurements by adding one set of variables to the other will lead to a data matrix completely dominated by the spectral data. Deleting spectral variables so that the two sets become equal in size is another strategy, but there is a risk of throwing away useful information.
Furthermore, making a proper variable selection is not trivial. In this work, PCA is done on the spectral data. The number of principal components calculated and retained is enough to capture the systematic variation of the data. This number is independent of the number of variables in the data set but is decided by the number of independent sources of variation in the data. In this way one can reduce thousands of measured variables to a handful of principal components without any information loss. As the spectral variables are very correlated the reduction is substantial. This means the process data is joined with the significant score vectors from a PCA, not the measured spectral variables.
Principal components can also be used in regression analysis. This is referred to as Principal Component Regression (PCR). In multiple linear regression (MLR), one would model a response (e.g., total alkalinity of a solution) as a function of a set of measured variables, such as the process variables used in this work. There are many problems with this approach. In MLR it is assumed that the error in the independent variables is much less than the error in the response. This is not necessarily the case, and if this is violated the MLR model is not the optimal model. A further complication is that the regressors are assumed to be independent variables – they should be independent of each other. If they are not, neither the model predictions nor the interpretation of the effects of the regressors (the regression coefficients) may be trusted. Moving from a set of highly collinear measured variables to a set of orthogonal latent variables and using these as regressors elegantly avoids this problem. A further benefit of this approach is that the principal components are much less influenced by noise than the original data set. Where MLR uses all the data available, a latent variable regression model using principal components would only use the systematic part of the data as regressors.
While PCR in most cases is favourable to MLR, it is still not the preferred latent variable regression technique. The problem with the principal components is that they are constructed to capture all the main sources of systematic behaviour. Not all these systematic sources of variation will be relevant to the regression problem. A simple example would be the prediction of the concentration of a compound in a sample using spectroscopy. While changes in the analyte concentration surely impacts the spectra, so will any changes in the concentrations of other compounds present in the sample. This is systematic behaviour that will be picked up by the principal components, but it is not relevant for the prediction of our analyte. PLS is the solution to this problem.
In PLS the latent variables are calculated differently from the ones in PCA. Instead of focusing on the variance of the independent variables, PLS uses the covariance between the independent variables and the response as the latent variables. This means that the latent variables are directly related to the response modelled. Where PCR asks the question “what are the major sources of variation in my data?”, PLS asks “what part of my data varies in the same way as the response?”. This makes PLS a more powerful regression technique than PCR. They are both latent variable regression techniques, but the latent variables used in the regression are quite different.
The TCM amine plant is a generic plant designed and built by Aker Solutions and Kværner with a capacity to treat up to 60 000 Sm3/h post combustion flue gas. The plant was operated with the Combined Heat and Power flue gas with a CO2 concentration of 3.6 % and aqueous MEA (30 wt%). The absorber tower was operated with an 18- and 24- meter packing section and a lean amine flow of 43 000 – 70 000 kg/h. The stripper section was operated at 120.0 –121.5 °C. Detailed plant parameters for different test phases are described in the literature [5]. Solvent parameters were monitored via in-line liquid conductivity (Endress Hauser, resistance measurement conductivity meter), density (Coriolis mass flow, Proline Promass 80F) and pH meter (Endress Hauser, potentiometric pH measurement) installed after circulation pumps on rich and lean solvent flow (see Table 1). The process data are measured at different time intervals. In this work, values averaged over a period of 15 minutes were used for all variables.
The process data needs to be cleaned up prior to further analysis as the measurements are carried out continuously, regardless of the system state. Measurements taken during outlying conditions were removed from the data sets. Examples of outlying conditions are system shutdown periods, the recovery phase after such shutdowns and periods of MEA reclamation.
Extractive liquid samples were collected via a fast loop system equipped with a process sampler (DOPAK Inc.) by operators on a regular basis and delivered to TCM onsite laboratory for storage and further analysis.
Rich and lean liquid samples were analysed with a Bruker Alpha FTIR Spectrometer with a diamond ATR cell (Bruker Corporation). Spectra were recorded between ~4000 and ~425 cm-1. A plot of the raw spectra (Figure 2) shows that there is no relevant information above 3640 cm-1 or below 800 cm-1. The same applies to the region between 2750 cm- 1 and 1730 cm-1. These regions were removed from the spectra prior to further analysis.
The number of samples used in the models varies for the different responses and type of data. The analytical laboratory did not measure all responses for all samples, and IR spectra were not recorded for all samples. Table 2 shows the number of samples used in each model.
Total Alkalinity is analysed via automated acid base titration with HCl (1.0 M). The reported uncertainty is 2%. Total Inorganic Carbon (TIC) is analysed with a TOC/TN Elementar (Elementar Analysensysteme GmbH). The reported uncertainty is 4%.
As the process data is a mix of variables expressed using different measurement units, the data was standardized and mean centred prior to modelling. An initial PLS regression of the process data with the lean TIC as a response yielded an 8-component model explaining more than 98 % of the variance in the TIC. The number of components was determined using cross validation [11]. This is more than satisfactory as the uncertainty in the laboratory measurements is approximately 4 %.
Not all recorded variables contribute to the regression models. Since the presence of irrelevant predictors may be of detriment to the model, care was taken to remove any predictor variables with a small regression coefficient and a large uncertainty. This also leads to a simpler model – one with fewer PLS components. The final Lean TIC model contained six PLS components and captured 99 % of the variance of the response. Figure 3 shows the regression coefficients of the variables in the final model.
The enriched amine solution resulting from the CO2 capture is more complex chemically. Various chemical reactions take place in the mixture, and it is expected that a model based on the process data alone will struggle to explain the behaviour of the solution.
Figure 4 shows the poor performance of an optimized (irrelevant variables removed) PLS model of the Rich Total Alkalinity value. The two-component model only explains 52.81% of the response. This demonstrates that while the process variables may be enough to satisfactorily model the simpler lean system, more information is needed to obtain acceptable models for the more complex rich flow.
As shown in 4.1.2, process data alone is not sufficient for prediction of many of the properties. Infrared spectra represent an alternative information source. An infrared spectrum contains information on the functional groups and chemical bonds present in a sample. Figure 6 shows two spectra: One from a lean sample and one from a rich sample. It is evident that the two samples have strong similarities, but also that there are differences.
The models presented in this section use whole spectral profiles (from the regions described in the Experimental section) as predictor variables. This necessitates different pre-treatment compared to the process data. Standardization is not an appropriate tool for profile data, as that would inflate small, noisy variables and reduce the contribution from the larger, more interesting variables. The effects that usually cause problems when using spectral profiles as predictors are additive baseline effects and multiplicative effects due to light scattering. In this work, the baseline effects were handled by Savitzky-Golay second order differentiation [12] with a window size of 25 and a cubic polynomial. The multiplicative effects were handled by Extended Multiplicative Scatter Correction [13].
Figure 7 shows the performance of a three component PLS model for predicting the total alkalinity in the rich samples. For this model, the Total Alkalinity values were root transformed. This improves the model quality. The model captures 95.56 % of the variance of the data and compared to the model shown in Figure 5 the IR model improves the performance to a remarkable extent.
Although not shown in this work, attempts were made to model the TIC of the rich solvent using only the online process data. The resulting model was only able to explain 59.46% of the response variance. It is therefore interesting to see if the performance improves when replacing the process data with the IR spectra. The best model of the IR data explained 81.66% of the response. While this is an improvement, Figure 8 shows that there is still room for improvement. The next logical thing to try is then a combination of the process and IR data.
An immediate challenge presents itself when trying to combine the online process data and the IR spectra. The number of spectral variables is more than 250 times larger than the process variables. This means that a simple fusion where the spectra are simply added to the process variables will lead to a data matrix completely dominated by the spectral information. More advanced fusion methods are therefore needed.
In this work, the spectral data was decomposed using Principal Component Analysis. Enough components were extracted to explain 99 % of the variance of the spectra. The number of components, which typically is quite small, was found using cross validation. The corresponding significant score vectors were subsequently appended to the process data, resulting in new variables.
Three principal components were enough to reach more than 99 % variance explained for the spectra used in the modelling of the rich TIC. The resulting model was able to explain 87 % of the variance of the response. The performance is illustrated in Figure 9. While far from perfect, it is still the best model variable, and is good enough to predict the general variations in the response.
While the model still has problems with picking up the finer changes in the response, it is clear that the combination of process and IR data has improved the performance.
This work has demonstrated the possibility of predicting relevant parameters describing the state of a MEA solvent used in CO2 capture. For some responses, solid models are achieved only using online process measurements. More complex situations benefit from the usage of IR spectra. The best models are achieved by combining both sets of data using data fusion techniques.
This work has been carried out on historical data from a campaign several years old. This represents challenges, as the detection and exclusion of outlying conditions becomes more difficult. A model is never better than the data from which it was created, so cleaning the data from outlying conditions is important.
This work demonstrates how multivariate models from process and spectral data can be used to predict the state of the solvent mixture and the efficiency of the capturing process. It is paramount that the sample set from which the model is created is representative for all the states the system may occupy in the future. Samples from both fresh and various degrees of degraded solvent states must be used. This should not represent a problem for a plant operating under relatively constant conditions.
Continuous monitoring of plant conditions is vital for cost-effective plant operation. This includes the monitoring of solvent conditions both in the lean and the rich stream. Laboratory analyses contribute to the operational expenses (OPEX) of the capture plant and a significant reduction in the scope of laboratory analysis can be a major cost saving for a full-scale plant. In addition, models can be incorporated into the control system so operators can take immediate action to keep the plant running optimally. As this work demonstrates, models can accurately predict standard laboratory parameters and can improve response time to changes in the plant. This work illustrate how technology developers can utilize process measurements and spectroscopy to improve the control of the plant. The method is not solvent or plant specific and can be used in screening historic datasets to guide further technology development.
The authors gratefully acknowledge the staff of TCM DA, Gassnova, Equinor, Shell and TotalEnergies for their contribution and work at the TCM DA facility. The authors also gratefully acknowledge Gassnova, Equinor, Shell, and TotalEnergies as the owners of TCM DA for their financial support and contributions.
[1] Intergovernmental Panel on Climate Change (IPCC), 2021: Summary for Policymakers. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [MassonDelmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press.
[2] a) E. Gjernes, S. Pedersen, D. Jain, K. I. Åsen, O. A. Hvidsten, G. de Koeijer, L. Faramarzi, T. de Cazenove, Documenting modes of operation with cost saving potential at the Technology Centre Mongstad, 14th Greenhouse Gas Control Technologies Conference Melbourne 21-26 October 2018 (GHGT-14), b) A. Morken, S. Pedersen, E. R. Kleppe, A. Wisthaler, K. Vernstad, Ø. Ullestad, N. E. Flø, L. Faramarzi, E. S. Hamborg, Energy Procedia, 114, 2017, 1245-1262, c) C. Benquet, A. Knarvik, E. Gjernes, O. A. Hvidsten, E. R. Kleppe, S. Akhter, First Process Results and Operational Experience with CESAR1 Solvent at TCM with High Capture Rates (ALIGN-CCUS Project), Proceedings of the 15th Greenhouse Gas Control Technologies Conference 15-18 March 2021.
[3] a) K. Johnsen, E. R. Kleppe, L. Faramarzi, C. Benquet, E. Gjernes, T. de Cazenove, A. K. Morken, N. Flø, M. I. Shah, M. Aronsson, Ø. Ullestad, CO2 product quality: assessment of the range and level of impurities in the CO2 product stream from MEA testing at the Technology Centre Mongstad (TCM), Proceedings of the 14th Greenhouse Gas Control Technologies Conference Melbourne 21-26 October 2018 (GHGT-14), b)G. Lombardo, M. I. Shah, B. Fostås, O. A. Hvodsten, L. Faramarzi, T. de Cazenove, H. Lepaumier, P. Rogiers, Results from testing of a Brownian diffusion filter for reducing the aerosol concentration in a residual fluidized catalytic cracker flue gas at the Technology Centre Mongstad, 14th Greenhouse Gas Control Technologies Conference Melbourne 21-26 October 2018 (GHGT-14).
[4] a) E. Gjenres, S. Pedersen, D. Jain, K. I. Åsen, O. A. Hvidsten, G, de Koijer, L. Faramerzi, T, de Cazenove, Documenting modes of operation with cost saving potential at the Technology Centre Mongstad, 14th Greenhouse Gas Control Technologies Conference Melbourne 21-26 October 2018 (GHGT-14). b) L. Faramerzi, D. Thimsen, S. Hume, A. Maxon, G. Watson, E. Gjernes, B. F. Fostås, G. Lombardo, T. Cents, A.K. Morken, M. I. Shah, T, de Cazenove, E. S. Hamborg, Results from MEA testing at the CO2 Technology Centre Mongstad: Verification of baseline results in 2015, Energy Procedia, 2017, 114, 1128-1145.
[5] E. Gjernes, S. Pedersen, T. Cents, G. Cents, G. Watson, B. F. Fostås, M. I. Shah, G. Lombardo, C. Desvingnes, N. E. Flø, A. K. Morken, T. de Cazenove, L. Faramerzi, E. S. Hamborg, Energy Procedia, 114, 2017, 1146-1157.
[6] N. E. Flø, L. Faramerzi, T. de Cazenove, O. A. Hvidsten, A. K. Morken, E. S. Hamborg, K. Vernstad, G. Watson, S. Pedersen, T. Cents, B. F. Fostås, M. I. Shah, G. Lombardo, E. Gjernes, Results from MEA Degradation and Reclaiming Processes at the CO2 Technology Centre Mongstad, Energy Procedia, 2017, 114, 1307-1324.
[7] S. Hjelmaas, E. Storheim, N. E. Flø, E. S. Thorjussen, A. K. Morken, L. Faramerzi, T. de Cazenove, E. S. Hamborg, Results from MEA amine plant corrosion processes at the CO2 Technology Centre Mongstad, Energy Procedia, 2017, 114, 1166-1178.
[8] H. Hotelling, Analysis of a complex of statistical variables into principal components, Journal of Educational. Psychology., 1933, 24, 417-441
[9] S. Wold, M. Sjöström, L. Eriksson, PLS-regression: a basic tool of chemometrics,m Chemometrics and Intelligent Laboratory Systems, 2001, 58, 109-130
[10] E. Borràs, J. Ferré, R. Boqué, M. Mestres, L. Aceña, O. Busto, Data fusion methodologies for food and beverage authentication and quality assessment – a review, Analytica Chimica Acta, 2015, 891, 1-14
[11] P. Filzmoser, B. Liebmann, K. Varmuza, Repeated double cross validation, Journal of Chemometrics, 2009, 23, 160-171
[12] A. Savitzky, M.J.E. Golay, Smoothing and Differentiation of Data by Simplified Least Squares Procedures, Analytical Chemistry, 36, 1627- 1639
[13] H. Martemns, E. Stark, Extended multiplicative signal correction and spectral interference subtraction: new preprocessing methods foir near infrared spectroscopy, Journal of Pharmaceutical and Biomedical Analysis, 1991, 9, 625-635