# Solved: Looking for a function/algorithm



## Jimmy the Hand (Jul 28, 2006)

Hello World,

I'm not sure if this is the proper forum to ask, but I don't know of a better place. Okay, here's the task.

I want to reproduce a UV spectra plot from the raw data that a UV spectrophotometer provides. The instrument measures a *Signal* value and a *Background *value. From these two, with a correction *Factor*, it calculates a *Corrected Signal*.

The spectrometer controlling software can export raw and manipulated signal data to text file. So, from a certain laboratory test I exported the corresponding *Signal*, *Background *and *Corrected Signal* values as function of time. See attached Excel table.

What I'm looking for is how the *Corrected Signal* is calculated. In _theory_ (according to user manuals), the formula is


> *Corrected Signal* = *Signal *- *Factor***Background*


where *Factor *is a constant numeric value. In _practice_, however, this formula doesn't work. But the real formula must be something similar.

So I'm looking for a function, an algorithm that calculates _exactly_ the given *Corrected Signal*s from the given *Signal *and *Background *values.

Can anyone tell me how to start with this? Any suggestions, advices would be highly appreciated.
Thank you in advance,

Jimmy


----------



## OBP (Mar 8, 2005)

Jimmy, sorry but your equation does not add up at all, it is out by a magnitude of 100.
It is much closer with C2/b2*i1 but even then you don't get an actual constant that gives all the results that you have.
The constant that comes closest is 1.164846039

See this for Signal to Noise Ratios.
http://en.wikipedia.org/wiki/Signal-to-noise_ratio


----------



## Jimmy the Hand (Jul 28, 2006)

Hi OBP,

thanks for replying. I'm sorry, I don't follow the first part of your post.  
You see, my English was developed on fantasy books and RPGs, and still have problems with technical terms. Could you please rephrase what you wanted to say?

As for the signal to noise ratio. It's interesting but I don't think I have anything to do with it. 
In the data I posted last time, Signal and Background values are real, physical, measured values. (They are, in fact, electrical charges, because the instrument is a diode array detector.) These values are influenced by noise, all right. But the Corrected Signal is a *computed* value, and I'm looking for the exact formula the software uses to generate Corrected Signal from Signal and Background. I don't care about noise here. I could just as well ask for a formula to generate column D from columns B and C, and never even mention instruments or signals or measured values.

I thought, as I said last time, the formula was basically 
Corrected Signal = Signal - Factor * Background.

I still think this makes sense, but there must be a trick in the way the formula is used. Maybe the Factor is not a constant, but somehow generated from time or something. I was actually asking how to determine a certain (existing) correlation (i.e. the formula) between sets of data.


----------



## OBP (Mar 8, 2005)

Jimmy that is what I tried to do for you, your formula if you apply it as written gives corrected values 100 time larger than the ones that you get from the computer.
So I tried figuring out a formula that would give the values that are provided by the computer.
It came out as Signal/Background*Factor where the factor is 1.164846039.
But even with that formula and factor the computed values can't be matched exactly to more than a couple of decimal places.
I thought the signal to noise ratio is what you are doing, i.e. the background is the "noise" and the factor is the "ratio".


----------



## IMM (Feb 1, 2002)

Something pretty fishy when background>signal unless signal has already had background removed from it.
Is background in different units than signal ( eg. background is in micro.. and signal in milli... ?)

--------
I'm not going to try and sort it out - but if you want a small treatise on the type of thing that is normally done - try this
http://www.varianinc.com/cgi-bin/reg?a='media/sci/apps/uv77.pdf'


----------



## OBP (Mar 8, 2005)

That is good thinking IMM.


----------



## IMM (Feb 1, 2002)

Actually - I suspect the numbers are raw _I_ and _Io_ values and he has to do the std. transmittance and absorbance thing with it.
If that is 2 channel data - it's not really surprising - but I dislike calling the reference channel "background".
I'd guess his "factor" will represent some offset between 2 beams or channels -but it would be nice to know that the detector(s) are linear and don't follow something like a diode equation.
If the truth be told tho' -- I am really too lazy to really look at it w/o knowing the physical setup.

Maybe he'll come back to this thread and I wont have to wonder if this is a dual beam kinetic study done at a fixed wavelength - or if it's a wall street stock chart


----------



## Jimmy the Hand (Jul 28, 2006)

First of all, I thank you very much for trying to help me. I didn't want to go into details, because my case is very special, so it requires a lot of explanation, and I didn't see much hope that anyone can tell me exactly the thing I want. But maybe it's worth making the effort, after all.

So. 
I said at first that this was a UV spectrometer. Well, not exactly. Actually, this is an Atomic Emission Detector for Gas Chromatography. The principles of operation are the following. Compounds eluting from a GC column are carried (by GC carrier gas) into a microwave induced plasma, where they are atomized and the atoms excited. The excited atoms emit characteristic light, which goes into the spectrometer part. Here there is a rotating grating that resolves the incoming light according to wavelength, and sends the light to a photo diode array (PDA). By rotating the grating, the instrument controls the wavelength range that is detected by PDA, e.g. 170-196 nm, 480-501 nm, or any possible slice of the whole spectra.

This detector is used for selective detection of heteroelements like Sulfur, Nitrogen, Oxygen, etc. My reason to mark it as a spectrometer is that AED is a rare type of instrument. For example, in our country there are only 4 pieces. Chemists know spectrometers much better, as it is 10 times more widespread. Also, AED is, in fact, a spectrometer. The paricular application I'm trying to make is about analysis of Sulfur. Sulfur is best detected around 181 nm. This (the 181 nm) is the reason why I marked my instrument as a *UV* spectrometer. It might have been misleading, sorry. This instrument measurers emission, not absorbance. There's no reference beam here, nor lamp intensity fluctuations, cuvette or solvent absorptions. There's only intensity of the emitted light. And noise, and background.

Now, about signal generation.
The detector is sensitive of sulfur but, in the vast majority of cases, sulfur is not alone in the plasma. Almost always ther's carbon there as well. When the original sample is petroleum related, as is my case, it's only natural that you can't, by gas chromathography, separate a sulfur compound from the hydrocarbon matrix, so the palsma will always be full of carbon, and there might also be a little sulfur present. The signal of the carbon is what I call background. *It' not the same as noise.* See the attached images. "Carbon background.jpg" shows the effect of carbon on spectra. In H2S there is no carbon, so it's a beautiful sulfur spectra. In hexil-mercaptan (C6H14S) there are 6 carbon atoms, so backgroung is apparent, and if hexil-mercaptan is dissolved in gasoline, the background is much higher, because of the hydrocarbon matrix. In real samples, carbon background can be *significantly higher* than sulfur signal.

"Signal Generation.jpg" shows the metod of background correction of sulfur signal. In the diode array, some diodes are designated as sulfur signals, others are designated as backgrounds. Signal diodes are combined to one value, background diodes are combined to another value. As the figure shows, Net (or Corrected) Signal is obtained by subtracting Backgroung value from Signal value, and here is the mysterious BackAmount, which I called factor previously.

The instrument controlling software is a modified version of Agilent Chemstation v10.02. It stores signal and background in two separate sets of raw data. (They are _sets_ of data, because signals are recorded as function of time, that's what makes it a chromatogram.) Chemstation allows me to load signal and background chromatograms separately, and export them into csv files. If I load signal and background chromatograms together, Chemstation does the subtracting, and in the meantime it calculates an ideal BackAmount (factor), which can be different for every single sample. I can export the corrected signal, as well, to csv. That's how I produced the 3 sets of data I posted before.

Now, my task is to write a software to expand the capabilities of Chemstation, because we want to do a new type of sample evaluation. I've already decoded Agilent binary (raw) data fies, so I can convert them into series of numbers vs time. But I'm stuck with this background correction thing. I can't calcuate my final result from Signal only, because that would definitely be wrong, I mean, very wrong. I must have the method Chemstation uses to generate a corrected signal.

After all this explanation, I think it's not important anymore, how I got the 3 series of data. I just need one formula that calculates one from the other two. Or, I need an algorithm that allows me to find this formula for myself.


----------



## OBP (Mar 8, 2005)

IMM, you were unbelievably close with analysis.
Jimmy, this says it all (quote)
If I load signal and background chromatograms together, Chemstation does the subtracting, and in the meantime *it calculates an ideal BackAmount (factor), which can be different for every single sample*.
Can you just work it in reverse to obtain the Backamount (factor) that the Machine has used, won't that be enough for your analysis?
I don't think you are ever going to be able to calculate it yourself from the raw data as it obviously uses an algorithm you can't get access to, unless you can get it from the Machines makers. The signals may be peak values or they may be "area under the curve" values or rms values.


----------



## Jimmy the Hand (Jul 28, 2006)

Hi, OBP,

Knowing the BackAmount is no help at all. You see, I *know* the BackAmount, because Chemstation tells me, if I ask. The problem is that, if I apply the given formula to calculate Corrected Signal values, and use the BackAmount that Chemstation gives me, I get values different from those that are calculated by Chemstation. This means that either the formula is wrong, or the BackAmount (given by Chemstation) is wrong, or there is some hidden trick in the calculation.

You have analyzed my sets of data and said that there is not a single constant BackAmount that's valid for all Signal/Background/Corrected Signal trios. On the other hand, the formula was given to me on an official AED training, by official AED staff, in Atlanta, 1997. It must be correct, or else Agilent (or HP, I think back then it was still Hewlett-Packard) people were lying. I don't want to accept this possibility, so I'm more inclined to suspect that there is a trick in the calculation. All the more so, because I've had a tricky issue with HP/Agilent raw data format in the past.

The story is: Chemstation has always had the functionality to display raw data or export it into textfile. But the displayed/exported values were a constant 4/3 times greater than those stored in the binary file. And they were rounded, to spice the problem a bit more. This little trick caused me not a small headache when I was trying to unveil the secrets of the binary data structure.

So, my suspect is that the formula will have to be modified. Like if Chemstation gives me a BackAmount of 1.0168, it is really 0.0168, or Exp(1.0168) or something.

By the way, signals are electronically generated this way (or so I was told):
There are the photodiodes in the array. Each of them are periodically (e.g. 5 times per seconds) charged up to full capacity. When they are contacted by light, they lose some electric charge. Next time they are recharged, the instrument measures the amount of charge that is needed to reach full capacity again. So this measured charge is an avarage value over the recharge period, and is proportional to the amount of light emitting atoms going through the plasma during a recharge period.


----------



## IMM (Feb 1, 2002)

> AED is a rare type of instrument


I'll say  -- never got much more exotic than Ni63 ECD, hot salt NPDs or flame photometric myself -- but I'll go through what you wrote and give it a little thought.
Have to sleep now tho'

As an aside - organo-sulphur labs out to be on remote islands.
Most peoples noses aren't very fond of them


----------



## OBP (Mar 8, 2005)

Jimmy, when you ask the Chemstation for the "factor", are they all different values?
If so can you post some so that we can see ant possible correlation?


----------



## Jimmy the Hand (Jul 28, 2006)

OBP said:


> Jimmy, when you ask the Chemstation for the "factor", are they all different values?
> If so can you post some so that we can see ant possible correlation?


I'm not sure I understand. 
When I load the signal and background chromatograms of a sample, as a result, I see only one chromatogram, which is calculated from the loaded ones by Chemstation. It uses uses an unknown algorithm to determine an auto-calculated BackAmount factor, which is best suitable to eliminate matrix interference from that particular chromatogram. Auto-calculated factors depend greatly on the composition of their respective samples, so yes, they are practically all different.

In the software there is a menu where I can choose from recommended BackAmount factors, or supply one myself. There is a default factor, which is determined by analysing a standard test sample, there is the auto-calculated one, which differs from sample to sample, and there is a user supplied value. The auto-calculated factors are most of the time OK. But if I don't like what I see, I can manipulate the chromatogram by supplying any factor I want to the background correction procedure. Then Chemstation recalculates the chromatogram and displays the new one. Changing the BackAmount factor has a significal effect on the chromatogram.

But there is *only one single* factor for each chromatogram. Any choosen (either recommended or user supplied) factor value is applied for the entire chromatogram, each datapoint of it.

So, I can give you any factor value you want, but it makes sense only if I provide the respective calculated chromatogram data points as well. Is this what you'd like? If so, do you want specific factors, or any one will do?


----------



## OBP (Mar 8, 2005)

Jimmy, that is right, I would like some results that you are trying to evaluate plus the "factor" that goes with each set.


----------



## Jimmy the Hand (Jul 28, 2006)

Hi OBP and IMM,

I exported the Background and Signal data, and Corrected Signals with 6 different Factor values, so that I can supply you with enough data to proceed. Somewhere during the process I noticed that the Signal data is not the same as the one I posted earlier. I swear I followed the same procedure of exporting raw data as before, yet I found that I can't reproduce that particular data column. I can't for life tell how it happened. Seems like I have messed up something.  

Anyway, with the new Signal data the formula works like charm, as it should, so I guess this whole thread was a good big chase after shadows.  
Sorry for stealing your time, and thanks anyway.

Regards,
Jimmy

Edit:
For no obvious reason I decided to upload the new data columns. May someone have fun with it.


----------



## OBP (Mar 8, 2005)

Well you should be able to mark this as solved now. :up:


----------



## Jimmy the Hand (Jul 28, 2006)

Yes. Solved it is. By merit of chance and fortune.


----------



## IMM (Feb 1, 2002)

glad to hear it - I dislike negative peaks in a chromo rslt.


----------



## Jimmy the Hand (Jul 28, 2006)

IMM said:


> glad to hear it - I dislike negative peaks in a chromo rslt.


I dislike them, too. But with AED and this kind of background correction they are quite ususal. Most of the time a big hydrocarbon peak in the matrix causes negative peaks in sulfur (nitrogen, oxygen, etc.) chromatogram. You can say it's overcompensation, but if I decrease the BackAmount factor in order to remove negative peaks, then ghost peaks appear everywhere else. I think this can't be helped.


----------



## IMM (Feb 1, 2002)

Jimmy the Hand said:


> I dislike them either. But with AED and this kind of background correction they are quite ususal. Most of the time a big hydrocarbon peak in the matrix causes negative peaks in sulfur (nitrogen, oxygen, etc.) chromatogram. You can say it's overcompensating, but if I decrease the BackAmount factor in order to remove negative peaks, then ghost peaks appear everywhere else. I think this can't be helped.


I wonder if standard runs on sulfur-free stock (in various concs. = std. addition style) to calibrate any non-linear responses of the array to the large peaks would help - then the analysis could be adjusted using the slope of the HC pk. (or magnitude) to fix the "factor" according to the 'hydrocarbon' height?
Is there also interference with the sulfur line intensities as a result of 'denser' matrix?

This is split injection, or cold-on-column or...?


----------



## Jimmy the Hand (Jul 28, 2006)

Well, this is starting to go way out of "development" category, at least in the computer programming sense of the word. I hope it's not a problem with moderators.

So it's a split injection. Split ratio is subject to change, however, as I have to optimize. The problem is with samples of low sulfur content. Current regulations in EU order max. 10 ppm sulfur in both gasoline and gas oil. That's pretty low, and if distributed between a few dozens or few hundreds of sulfur compounds, sulfur content per peak is most often below detection limit.

The detector has a certain sensitivity of sulfur. If sulfur concentration is low, I have to increase the injected amount or lower the split ratio in order to get sulfur peaks at all. Either way, the hydrocarbon matrix gets 'denser'. Since both plasma and photo-sensitive diodes have a capacity, increasing HC matrix might (or will) result in 
1. nonlinear response of carbon signal (the actual background of sulfur detection)
2. incomplete atomization in plasma
-- 2a. nonlinear response of sulfur, even in low sulfur concentrations
-- 2b. fast recombination of carbon atoms into a graphite-like black substance, which contaminates the system by forming deposits and plugs everywhere.

[...]

All the time I explained things so far, I was thinking of excuses as to why your suggestion is wrong, and why any efforts in this direction are futile. When I reached this point, I finally understood (at least I think I understood) what you meant. 
1. Measure optimal BackAmount factors for a range of HC peaks of different peak heights.
2. When analyzing real samlples, always check the background heigth, and between the retention time limits of the current HC peak, always apply the appropriate (previously determined) factor to compensate for background.

I had considered this impossible, because Chemstation uses one single constant factor for the entire chromatogram, but then I realized that in the software I'm writing I might be able to include this functionality. But... man, this is a LOT of work!

Besides, and now I'm getting back to development after all, I don't know of a good algorithm for peak recognition, especially with overlapping peaks. Now, if someone showed me one, _that_ would be joyful. I think. Maybe I'll start another thread with this problem.


----------



## IMM (Feb 1, 2002)

It's been a long time since I played with needles - but most of the stuff I recall used savitsky-golay smoothing or similar on the data first to give a curve which could be picked mathemagically (on the derivative)
Here's something simple but probably not good enough
http://www.ncnr.nist.gov/staff/dimeo/riddle/get_peak_pos.pro

An article(short) on peak calcs (general = no code)
http://www.lcgcmag.com/lcgc/article/articleDetail.jsp?id=126171

You could perhaps grab this paper (I haven't seen it)
http://www.ncbi.nlm.nih.gov/entrez/...ve&db=PubMed&list_uids=16301076&dopt=Abstract

What language do you write in?

------
Just to complicate this further  
There are several 'diodes' in use here covering several lines per element (assuming that diagram you had earlier is close).
Are those available as seperate channels?
If so - maybe that would be of some use in refining the analysis (assuming they don't all react the same way to increasing HC)
But that takes us back to algorithmics - multivariate fitting then  (perhaps partial least squares)


----------



## IMM (Feb 1, 2002)

> and between the retention time limits of the current HC peak, always apply the appropriate (previously determined) factor to compensate for background.


was thinking a multiplicative factor based on HC amount for the overall factor to adjust for density - not perfect.
We could call what I was thinking a 'quench factor' for the he** of it.
perhaps something along the lines of:
adjusted factor = quench factor * overall factor
where quench factor = fn(HC counts) 
The data would then be analyzed using the adj.factor to produce the 'gram - the quench factor is something I imagine as near unity, but reflecting the non-linearity of the response with increasing HC?

Your idea is to do it for differing mol. weights as they elute?
Is this a completely non-selective narrow bore cap column (a boiling pt. column) ?

------edit
You may? find this of some use
http://www.nrbook.com/a/bookcpdf.html
Chap14 covers some of it.
The alg. I linked one post above has a 'window' of 4 points if I recall -- probably not really a suitable choice for this.


----------



## Jimmy the Hand (Jul 28, 2006)

Thanks for the links. I requested the article from library, because it looks promising. However, to be honest, I'm not a mathemagician and, though I'm trying hard, things more complicated than addition often elude me. We'll se. If the peak recognition algorithm is good enough, and I can grasp it's essence, I may use it in another project.

As for the diodes.
The diagram I gave is close I think. The diode array consists of 200+ diodes, which distribute over 26 nm, so for 1 nm there is an average of 8 diodes. In theory, you can set up a new algorithm of how an element signal and the corresponding background is produced from spectral data. However, those are absolutely unknown waters for me, and I don't presume I could do a better job in optimizing a new recipe than Agilent professionals who were trained for this task. I'm willing to accept that their factory settings are close to perfect.



> adjusted factor = quench factor * overall factor
> where quench factor = fn(HC counts)
> The data would then be analyzed using the adj.factor to produce the 'gram - the quench factor is something I imagine as near unity, but reflecting the non-linearity of the response with increasing HC?


I see your point. I can obtain overall factor and HC count. The only question is *fn* .



> Your idea is to do it for differing mol. weights as they elute?


My idea didn't involve molecular weights. It was to inject one pure hydrocarbon a 100 times, each injection a little bigger (because of increased volume or decreased split) than the last. Then evaluate each injection by asking Chemstation to calculate the ideal factor for that particular amount of HC. Thus I get *absolute factor = fn(HC counts)*. So I can apply the absolute factor to each data point of chromatogram, and have no need of an overall factor calculated by Chemstation. I'm not sure it's a working idea though. Yours looks as good as mine.

By the way, I write in Delphi. And yes, usually I work with non-polar stationary phases, as there's no selective column for sulfur compounds. If there was, I wouldn't need a selective detector.

I had missed one earlier comment of yours about organic sulfur and nose. Well, I think everyone feels best in their own stink  
During the 10 years I worked with sulfur compounds I got to accept the odor. I almost got to like it. It's certainly better than some communal wastes


----------



## IMM (Feb 1, 2002)

good luck - I'd be interested to know how it goes.

--edit to add


> I don't presume I could do a better job in optimizing a new recipe than Agilent professionals who were trained for this task. I'm willing to accept that their factory settings are close to perfect.


I wasn't thinking of changing the recipe.
I was imagining summing them in the same fashion as the Agilent recipe did, but trying to extract a little information from them first. I was wondering about differences in quenching behaviour between the various lines (diodes) as a replacement for fn(HC counts).
It is unlikely that they could indicate how much quenching was going on though -- probably a very silly thought from me.


----------



## Jimmy the Hand (Jul 28, 2006)

IMM said:


> good luck - I'd be interested to know how it goes.


Okay. If you are still member at TSG in 5 years from now, I'll tell you.  
Thanks for all the tips.


----------



## faital (Oct 6, 2006)

Hi, This post is very informative, however I would like some specific information. If someone can help me then please send me a private message. Best Regards,


----------



## Jimmy the Hand (Jul 28, 2006)

Of course, but how does one know whether or not he can help you if you don't tell the problem first.


----------

