WikiStart: statdocs.txt

File statdocs.txt, 13.0 KB (added by pcr, 13 years ago)

documentation for statistical procedures

Line 
1
2----- Documentation for Stat\cov2cor.pro -----
3 NAME:
4      COV2COR
5
6 PURPOSE:
7      CONVERT COVARIANCE MATRIX TO MATRIX OF CORRELATION COEFFICIENTS
8
9 CATEGORY:
10      statistics
11
12 CALLING SEQUENCE:
13      COV2COR, COV, SIGMA, COR
14
15 INPUTS:
16      COV = COVARIANCE MATRIX
17
18 OUTPUTS:
19      SIGMA = ERROR ARRAY
20      COR = CORRELATION MATRIX
21
22 COMMON BLOCKS:
23        NONE
24
25 SIDE EFFECTS:
26        NONE
27
28 REQUIREMENTS:
29       COR MUST BE SQUARE, SYMMETRIC, WITH POSITIVE DIAGONALS
30
31 PROCEDURE:
32        COV = COR / (SIGMA # SIGMA)
33        WITH:   SIGMA = SQRT(DIAG(COV))
34
35 LOCAL PROCEDURE CALLED:
36        NONE
37
38 LOCAL FUNCTION USED:
39        NONE
40
41 LOCAL SYSTEM VARIABLE USED:
42        NONE
43
44 REVISION HISTORY:
45       Written by pcr 2010/02/01
46       from www.boulder.swri.edu/~layoung/idl/layoung_v5/math/cov2cor.pro
47
48----- Documentation for Stat\get_outlier_fences.pro -----
49 NAME:
50        GET_OUTLIER_FENCES
51
52 AUTHOR:
53       pierre.cruzalebes@oca.eu
54
55 PURPOSE:
56        return lower and upper fences identifying extreme values in tails
57       of univariate distribution (outliers).
58       Estimation of percentiles following NIST-recommended method
59       http://www.itl.nist.gov/div898/handbook/prc/section2/prc252.htm
60       For small nber of data, estimate percentiles using normal CDF
61
62 CATEGORY:
63       statistics
64
65 CALLING SEQUENCE:
66        FENCES=GET_OUTLIER_FENCES(X,ALFA,COEFF)
67
68 INPUTS:
69       X = experimental data distribution.
70       ALFA = percentage of experimental data included in the interpercentile range.
71         (must be gt 0 and lt 1, e.g. =0.5 for quartile-based outliers).
72       COEFF = outlier coefficient defining types of outliers
73         (=1.5 for mild outliers, =3 for extreme outliers).
74
75 OUTPUTS:
76        Function result = FENCES = 2-element array of outlier fences
77         (FENCES(0)=lower fence, FENCES(1)=upper fence).
78
79 OPTIONAL OUTPUT PARAMETERS:
80        NONE.
81
82 COMMON BLOCKS:
83        NONE.
84
85 SIDE EFFECTS:
86        NONE.
87
88 RESTRICTIONS:
89        NONE.
90
91 PROCEDURE:
92
93 LOCAL PROCEDURE CALLED:
94        NONE.
95
96 LOCAL FUNCTION USED:
97        NONE.
98
99 LOCAL SYSTEM VARIABLE USED:
100        NB_DATA_BOOT
101
102 REVISION HISTORY:
103      Written by pcr 2009/02/02
104      Last modification by pcr 2010/12/12
105
106----- Documentation for Stat\loca_scale_chi2.pro -----
107 NAME:
108        LOCA_SCALE_CHI2
109
110 AUTHOR:
111       pierre.cruzalebes@oca.eu
112
113 PURPOSE:
114        return location and scale parameters of standard chi-square distribution
115       for experimental data, using median (for location) and interpercentile range (for scale).
116       interquartile range (ALFA=0.75) usually used as robust measure of scale
117       Estimation of percentiles following NIST-recommended method
118       http://www.itl.nist.gov/div898/handbook/prc/section2/prc252.htm
119
120 CATEGORY:
121       statistics
122
123 CALLING SEQUENCE:
124        LOCA_SCALE_CHI2,X,ALFA,NU,LOC,SCA
125
126 INPUTS:
127       X = experimental data set.
128       ALFA = percentage of experimental data included in the interpercentile range (must be gt 0 and lt 1).
129       NU = number of degrees of freedom.
130
131 OUTPUTS:
132        LOC = location parameter of standard chi-square distribution.
133        SCA = scale parameter of standard chi-square distribution.
134
135 OPTIONAL OUTPUT PARAMETERS:
136        NONE.
137
138 COMMON BLOCKS:
139        NONE.
140
141 SIDE EFFECTS:
142        NONE.
143
144 RESTRICTIONS:
145        NONE.
146
147 PROCEDURE:
148
149 LOCAL PROCEDURE CALLED:
150        NONE
151
152 LOCAL FUNCTION USED:
153        NONE
154
155 LOCAL SYSTEM VARIABLE USED:
156        NONE
157
158 REVISION HISTORY:
159       Written by pcr 2008/01/24
160        last modification by pcr 2009/09/31
161
162----- Documentation for Stat\loca_scale_norm.pro -----
163 NAME:
164        LOCA_SCALE_NORM
165
166 AUTHOR:
167       pierre.cruzalebes@oca.eu
168
169 PURPOSE:
170        return location and scale parameters of standard normal distribution
171       for experimental data, using median (for location) and interpercentile
172       range (for scale).
173       Estimation of percentiles following NIST-recommended method
174       http://www.itl.nist.gov/div898/handbook/prc/section2/prc252.htm
175
176 CATEGORY:
177       statistics
178
179 CALLING SEQUENCE:
180        LOCA_SCALE_NORM,X,ALFA,LOC,SCA
181
182 INPUTS:
183       X = experimental data set.
184       ALFA = percentage of experimental data included in the interpercentile
185              range (must be gt 0 and lt 1).
186
187 OUTPUT:
188        LOC = location parameter of standard normal distribution.
189        SCA = scale parameter of standard normal distribution.
190
191 OPTIONAL OUTPUT PARAMETERS:
192        NONE.
193
194 COMMON BLOCKS:
195        NONE.
196
197 SIDE EFFECTS:
198        NONE.
199
200 RESTRICTIONS:
201        NONE.
202
203 PROCEDURE:
204
205 LOCAL PROCEDURE CALLED:
206        NONE
207
208 LOCAL FUNCTION USED:
209        NONE
210
211 LOCAL SYSTEM VARIABLE USED:
212        NONE
213
214 REVISION HISTORY:
215       Written by pcr 2008/05/31
216        last modification by pcr 2009/04/14
217
218----- Documentation for Stat\mem_stdev.pro -----
219 NAME:
220        MEM_STDEV
221
222 AUTHOR:
223       pierre.cruzalebes@oca.eu
224
225 PURPOSE:
226       compute standard deviation of asymmetric distribution from maximum entropy principle.
227
228 CATEGORY:
229       statistics
230
231 CALLING SEQUENCE:
232        MEMSTD = MEM_STDEV(LOWER_ERROR,UPPER_ERROR)
233
234 INPUTS:
235        LOWER_ERROR =  lower error.
236        UPPER_ERROR =  upper error.
237
238 OPTIONAL INPUT PARAMETER:
239        none.
240
241 OUTPUTS:
242        Function result = MEMSTD = standard deviation of distribution.
243
244 OPTIONAL OUTPUT PARAMETER:
245        NONE.
246
247 COMMON BLOCKS:
248        NONE.
249
250 SIDE EFFECTS:
251        NONE.
252
253 RESTRICTIONS:
254        NONE.
255
256 PROCEDURE:
257       memvar=MEMSTD^2=UPPER_ERROR*LOWER_ERROR-(UPPER_ERROR-LOWER_ERROR)/lambda
258       where lambda is computed according to JCGM 100:2008, section 4.3.8, note 2
259
260 LOCAL PROCEDURE CALLED:
261        NONE
262
263 LOCAL FUNCTION USED:
264        NONE
265
266 LOCAL SYSTEM VARIABLE USED:
267        NONE
268
269 REVISION HISTORY:
270       Written by pcr 2009/10/05
271        Last modification by pcr 2010/12/07
272
273----- Documentation for Stat\plot_distri.pro -----
274 NAME:
275        PLOT_DISTRI
276
277 AUTHOR:
278       pierre.cruzalebes@oca.eu
279
280 PURPOSE:
281        plot chi-square and parameter statistics, used with fit_model and fit_sed
282
283 CATEGORY:
284       statistics
285
286 CALLING SEQUENCE:
287        PLOT_DISTRI,DELTACHI2,NU,PARA,FTIT,OUTPUT_FILE_STEM,WSKIP,CLEVEL
288
289 INPUTS:
290       DELTACHI2 = row vector of reduced chi-square differences.
291        NU = number of degrees of freedom.
292        FTIT = row vector of string chains for plot of free parameter axis titles.
293       PARA = 2-dim array of free model parameters, sorted in ascending order
294              of chi-square diff, length=length(DELTACHI2)*length(FTIT).
295        OUTPUT_FILE_STEM = output file stem for writing in postscript files.
296        WSKIP = index of first newly created window in current procedure.
297        CLEVEL = confidence level.
298
299 OUTPUT:
300        WSKIP = index of next newly created window after current procedure.
301
302 OPTIONAL OUTPUT PARAMETERS:
303        NONE.
304
305 COMMON BLOCKS:
306        NONE.
307
308 SIDE EFFECTS:
309        NONE.
310
311 RESTRICTIONS:
312        NONE.
313
314 PROCEDURE:
315       uniform order statistic medians (quantiles) given by
316       http://www.itl.nist.gov/div898/handbook/eda/section3/probplot.htm
317       standardization of mean-squares from
318       Wilson & Hilferty, Proc. of Nat. Acad. of Sciences of USA, 17, 684-688 (1931)
319
320 LOCAL PROCEDURE CALLED:
321        LOCA_SCALE_CHI2
322        RESET_PLOT
323
324 LOCAL FUNCTION USED:
325        NONE
326
327 LOCAL SYSTEM VARIABLE USED:
328        OBS_PATH
329        PLOT_DEV
330
331 REVISION HISTORY:
332       Written by pcr 2008/04/08
333        last modification by pcr 2011/01/14
334
335----- Documentation for Stat\rect_stdev.pro -----
336 NAME:
337        RECT_STDEV
338
339 AUTHOR:
340       pierre.cruzalebes@oca.eu
341
342 PURPOSE:
343       compute standard deviation of rectangular (uniform) distribution.
344
345 CATEGORY:
346       statistics
347
348 CALLING SEQUENCE:
349        RECSTD = RECT_STDEV(LOWER_ERROR,UPPER_ERROR,CONFID_LEVEL,LOWER_BOUND,UPPER_BOUND)
350
351 INPUTS:
352        LOWER_ERROR =  lower error.
353        UPPER_ERROR =  upper error.
354       CONFID_LEVEL = level of confidence.
355
356 OPTIONAL INPUT PARAMETER:
357        none.
358
359 OUTPUTS:
360        Function result = RECSTD = standard deviation of distribution.
361
362 OPTIONAL OUTPUT PARAMETER:
363        NONE.
364
365 COMMON BLOCKS:
366        NONE.
367
368 SIDE EFFECTS:
369        NONE.
370
371 RESTRICTIONS:
372        NONE.
373
374 PROCEDURE:
375       recvar=RECSTD^2=cover^2*(UPPER_ERROR+LOWER_ERROR)^2/12
376       where cover=sqrt(2)/3*inv_erf(CONFID_LEVEL)
377
378 LOCAL PROCEDURE CALLED:
379        NONE
380
381 LOCAL FUNCTION USED:
382        NONE
383
384 LOCAL SYSTEM VARIABLE USED:
385        NONE
386
387 REVISION HISTORY:
388       Written by pcr 2009/10/15
389
390----- Documentation for Stat\tri_stdev.pro -----
391 NAME:
392        TRI_STDEV
393
394 AUTHOR:
395       pierre.cruzalebes@oca.eu
396
397 PURPOSE:
398       compute standard deviation and upper/lower bounds of triangular
399       distribution from quantiles.
400
401 CATEGORY:
402       statistics
403
404 CALLING SEQUENCE:
405        TRISTD = TRI_STDEV(MODE_VAL,LOWER_QUANT,UPPER_QUANT,LOWER_PROB,
406                          UPPER_PROB,UPPER_BOUND,LOWER_BOUND)
407
408 INPUTS:
409        MODE_VAL = mode value.
410        LOWER_QUANT =  lower quantile.
411        UPPER_QUANT =  upper quantile.
412        LOWER_PROB =  probability of lower quantile.
413        UPPER_QUANT =  probability of upper quantile.
414
415 OPTIONAL INPUT PARAMETER:
416        none.
417
418 OUTPUTS:
419        Function result = TRISTD = standard deviation of
420                                  triangular distribution.
421       UPPER_BOUND = upper limit of triangular distribution
422       LOWER_BOUND = lower limit
423
424 OPTIONAL OUTPUT PARAMETER:
425        NONE.
426
427 COMMON BLOCKS:
428        NONE.
429
430 SIDE EFFECTS:
431        NONE.
432
433 RESTRICTIONS:
434        NONE.
435
436 PROCEDURE:
437       trivar=TRISTD^2=(LOWER_BOUND^2+MODE_VAL^2+UPPER_BOUND^2
438                       -LOWER_BOUND*MODE_VAL-UPPER_BOUND*MODE_VAL
439                       -LOWER_BOUND*UPPER_BOUND)/18
440       where LOWER_BOUND and UPPER_BOUND are triangular distribution
441             limits calculated from upper and lower quantiles
442             according to Kotz & Van Dorp (2004) section 1.6
443
444 LOCAL PROCEDURE CALLED:
445        NONE
446
447 LOCAL FUNCTION USED:
448        NONE
449
450 LOCAL SYSTEM VARIABLE USED:
451        EPSILON
452
453 REVISION HISTORY:
454       Written by pcr 2009/10/05
455        last modification by pcr 2010/12/07
456
457----- Documentation for Stat\wavg.pro -----
458 NAME:
459        WAVG
460
461 AUTHOR:
462       pierre.cruzalebes@oca.eu
463
464 PURPOSE:
465       compute weighted mean of data vector.
466
467 CATEGORY:
468       statistics
469
470 CALLING SEQUENCE:
471        WMEAN = WAVG(WEIGHT,DATA)
472
473 INPUTS:
474        WEIGHT = row vector of weights .
475        DATA = row vector of data (same length as WEIGHT).
476
477 OUTPUTS:
478        Function result = WMEAN = weighted mean.
479
480 OPTIONAL OUTPUT PARAMETER:
481        NONE.
482
483 COMMON BLOCKS:
484        NONE.
485
486 SIDE EFFECTS:
487        NONE.
488
489 RESTRICTIONS:
490        NONE.
491
492 PROCEDURE:
493
494 LOCAL PROCEDURE CALLED:
495        NONE
496
497 LOCAL FUNCTION USED:
498        NONE
499
500 LOCAL SYSTEM VARIABLE USED:
501        NONE
502
503 REVISION HISTORY:
504       Written by pcr 2008/05/27
505        last modification by pcr 2010/11/10
506
507----- Documentation for Stat\wstdev.pro -----
508 NAME:
509        WSTDEV
510
511 AUTHOR:
512       pierre.cruzalebes@oca.eu
513
514 PURPOSE:
515       compute weighted mean of data vector
516       and weighted standard deviation (or mean standard weighted error).
517
518 CATEGORY:
519       statistics
520
521 CALLING SEQUENCE:
522        WSTD = WSTDEV(WEIGHT,DATA,WMEAN,ERROR)
523
524 INPUTS:
525        WEIGHT = row vector of weights .
526        DATA = row vector of data (same length as WEIGHT).
527
528 OPTIONAL INPUT PARAMETER:
529        ERROR = row vector of errors (same length as WEIGHT).
530
531 OUTPUTS:
532        Function result = WSTD = unbiased estimator of weighted standard deviation (or mean weighted error).
533        WMEAN = weighted mean.
534
535 OPTIONAL OUTPUT PARAMETER:
536        NONE.
537
538 COMMON BLOCKS:
539        NONE.
540
541 SIDE EFFECTS:
542        NONE.
543
544 RESTRICTIONS:
545        NONE.
546
547 PROCEDURE:
548       WMEAN=SUM(WEIGHT*DATA)/SUM(WEIGHT)
549  following http://pygsl.sourceforge.net/reference/pygsl/node52.html
550  and http://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html :
551       WSTDEV^2=SUM(WEIGHT)/((SUM(WEIGHT))^2-SUM(WEIGHT^2))
552               *SUM(WEIGHT*(DATA-WMEAN)^2)
553  if ERROR vector provided :
554       WSTDEV^2=SUM(WEIGHT^2*ERROR^2)/(SUM(WEIGHT))^2
555
556 LOCAL PROCEDURE CALLED:
557        NONE
558
559 LOCAL FUNCTION USED:
560        NONE
561
562 LOCAL SYSTEM VARIABLE USED:
563        NONE
564
565 REVISION HISTORY:
566       Written by pcr 2008/05/27
567        last modification by pcr 2010/11/13
568
569----- Documentation for Stat\wsterr.pro -----
570 NAME:
571        WSTERR
572
573 AUTHOR:
574       pierre.cruzalebes@oca.eu
575
576 PURPOSE:
577       compute weighted mean of data vector
578       and standard error on weighted mean
579
580 CATEGORY:
581       statistics
582
583 CALLING SEQUENCE:
584        WSTE = WSTERR(WEIGHT,DATA,WMEAN)
585
586 INPUTS:
587        WEIGHT = row vector of weights .
588        DATA = row vector of data (same length as WEIGHT).
589
590 OPTIONAL INPUT PARAMETER:
591        NONE.
592
593 OUTPUTS:
594        Function result = WSTE = unbiased estimator of standard error
595                                on weighted mean.
596        WMEAN = weighted mean.
597
598 OPTIONAL OUTPUT PARAMETER:
599        NONE.
600
601 COMMON BLOCKS:
602        NONE.
603
604 SIDE EFFECTS:
605        NONE.
606
607 RESTRICTIONS:
608        NONE.
609
610 PROCEDURE:
611       WMEAN=SUM(WEIGHT*DATA)/SUM(WEIGHT)
612  following Gatz & Smith, Atmosph. Env. 29, 11, pp. 1185-1193 (1995) :
613       WSTERR^2 = (N/N-1)/(SUM(WEIGHT))^2
614                * ( TOTAL((WEIGHT*DATA-AVG(WEIGHT)*WMEAN)^2)
615                  - 2.*WMEAN*TOTAL((WEIGHT-AVG(WEIGHT))
616                  * (WEIGHT*DATA-AVG(WEIGHT)*WMEAN))
617                  + WMEAN^2*TOTAL((WEIGHT-AVG(WEIGHT))^2) )
618
619 LOCAL PROCEDURE CALLED:
620        NONE
621
622 LOCAL FUNCTION USED:
623        NONE
624
625 LOCAL SYSTEM VARIABLE USED:
626        NONE
627
628 MODIFICATION HISTORY:
629      Written by pcr 2010/11/13
630