Joshua PollackEstimating SWU with Expert Opinion

So what is the separative power of the IR-1 centrifuge, really? We’ve touched on this question before, sometimes in connection with Natanz breakout scenarios. Lately, it’s come up in connection with the Qom facility, a.k.a. Fordow (see: Iran: Compliance in Defiance?, December 1, 2009). But simple answers are not forthcoming.

If you follow this blog, you’re probably already aware of the running exchange between Ivan Oelrich and Ivanka Barzashka of FAS on one hand, and David Albright and Paul Brannan of ISIS on the other. If not, see the FAS article in the Bulletin, the response by ISIS, the reply by FAS, and the rebuttal by ISIS.

This dispute, whose technicalities I won’t presume to referee, mainly underscores that estimates of the separative power of the IR-1 centrifuge are sensitive to a variety of assumptions. Open-source data do not suffice to provide a single, authoritative answer to the question, “How many kg SWU per annum?”

If You Don’t Know, Say So

When experts disagree, we can think of the answer as a range that indicates some uncertainty. (I’ve taken this approach before.) But what’s the right range?

Starting from Table 1 in the new FAS paper Calculating the Capacity of Fordow, we can assemble a collection of informed judgments. Then we can do something mildly controversial: we can average them and derive confidence intervals in the usual manner. Voilà! An instant range of IR-1 separative power estimates.

The rationale for averaging was aptly described in James Surowiecki’s 2004 book The Wisdom of Crowds. Each estimate “has two components: information and error. Subtract the error, and you’re left with the information.” The process of averaging “subtracts” the error because, if the individual errors are randomly distributed, they will largely cancel each other out.

Careful readers will notice that this requires that estimates be mutually independent. (Correlated errors are not random.) In that spirit, I’ve weeded the FAS table of all estimates that appear to be replications of earlier figures. I’ve used only the most recent estimate available from each expert, and have added a couple of estimates not included in the FAS table. I’ve taken the trouble to re-check the sources, too.

This exercise produces a table of nine opinions. Two were given as ranges rather than point estimates; in these two cases, I’ve used the average of each range as the estimate. There are good reasons for excluding three of these opinions (Hibbs, Persbo, and Salehi) as being related to the nominal performance of the IR-1 on paper (or the performance of its predecessors), not its actual performance.

[Update | Dec. 7, 2009. I’ve added a 10th opinion, that of Houston Wood as related by Scott Kemp, to the bottom of the table. Notice that it’s a nominal estimate, a calculation of the “maximum performance of a P-1,” the immediate predecessor of the IR-1.]

Source kg SWU/yr Date
Hibbs* in Nuclear Fuel 2 Jan. 31, 2005
Lewis & “Feynman” at ACW 1.46 May 16, 2006
Garwin at BAS 1.362 Jan. 17, 2008
Persbo* at vThoughts 2.2 Feb. 27, 2009
Salehi* in Fars News 2.1 Sep. 22, 2009
Oelrich & Barzashka at FAS 0.5 Sep. 25, 2009
Wisconsin Project 0.5 Nov. 16, 2009
Albright & Brannan at ISIS 1.0 to 1.5 Nov. 30, 2009
Kemp at ACW 0.6 to 0.9 Dec. 1, 2009
Wood* via Kemp at ACW 2.1 to 2.2 Dec. 1, 2009

*Estimates of nominal performance

(I’m assuming that the Wisconsin Project’s figure was derived independently of FAS’s.)

Some Findings (Provisionally Speaking)

First, a caveat: I’ve used ye olde normal distribution to demonstrate the concept, which seems problematic when considering the lower range of the intervals it produces in this instance. A more sophisticated iteration might involve another distribution. Math wonks are encouraged to engage directly with the data table.

Excluding the three nominal figures provides a mean estimate of 0.97 kg SWU/yr, +/-0.86 with 95% confidence. The resulting interval is 0.11 to 1.83 kg SWU/yr. (You see the problem with the lower end of the range.)

At 68% confidence – corresponding to one standard deviation – the result is 0.97 +/-0.44, with a resulting interval of 0.53 to 1.41 kg SWU/yr.

The technique demonstrated here makes the latest ISIS estimate (1.0 to 1.5 kg SWU/yr) look pretty good. It’s more or less coterminous with the upper half of the first standard deviation.

But this is not the whole story. Even within the trimmed-down data table above, a decline in estimates is apparent over time. The larger FAS table shows a sustained decline; a few years ago, estimates of 2 or 3 kg SWU/yr. were common, in ISIS papers and elsewhere. As mentioned earlier here, as new information about the performance of the IR-1 at Natanz has become available, it keeps driving down estimates of what the machine can do (see: Why Iran’s Clock Keeps Resetting, August 19, 2009). So it is not entirely surprising to see expert judgments converging somewhat south of the figures given here.

So what happens if we drop the first [remaining] estimate from the table above [i.e., Lewis & “Feynman”], on the grounds that it was produced before operations commenced at the FEP at Natanz? The result at 95% confidence is 0.87 +/-0.80, or 0.07 to 1.68 kg SWU. (There’s that sticky lower end again.) The result at 68% confidence is 0.87 +/-0.41, or 0.46 to 1.28 kg SWU/yr. From the perspective of a “moving average,” then, the central estimates of FAS and ISIS — respectively at 0.5 and 1.25 — just about bracket the first standard deviation. Statistically speaking, the truth is probably somewhere between them.

Update. See the comments for further elaboration, in which the Poisson distribution and the bootstrap raise their heads. Fun times.

Comments

  1. Geoff Forden (History)

    Very interesting idea, Josh! I have problems with expert system analyses but I can put that aside for the time being. What I would suggest, however, is that you use Poisson statistics rather than Gaussian. The problem with assuming a Gaussian is that it assumes the true value could be zero or even less than zero. A clearly unphysical situation; after all, who would run a centrifuge if it had even zero enrichment capability?

    If you do use Poisson statistics, which are made for just this asymmetric situation, you get a most probable value of 0.97 kg-SWU/yr with 95% confidence levels between 0.35 and 2.1 (same units) for your reduced data set. But why use the reduced data set if you are doing an expert system analysis? After all, you are asking for the best judgment of a number of people. Perhaps those people who base their analysis on others know something? (This is one of the reasons why I don’t like expert system analysis.) For the full data set, the most probable value is 1.24 kg-SWU/yr with 95% confident limits of 0.6 to 2.3 kg-SWU/yr. Those are the limits I would suggest using and they rule out Oelrich-Barzashka, who have an unjustifiably small number.

    For the sake of completeness, the 68% Poisson limits are:

    Full set: 1.0 to 1.5

    Reduced set: 0.8 to 1.3

    One could do a “bootstrap” analysis of the error on the 95% limits to capture this dependence on which points to use but that is probably using a sledge hammer to swat an annoying bug.

  2. Josh (History)

    Geoff:

    Thanks. That’s the sort of insight I was hoping was out there.

    For the record, I don’t think simple averaging is necessarily the best approach for this process, but what I’d prefer — expert calibration weighting — isn’t really possible under the present circumstances. Averaging (i.e., equal weights) puts a premium on independence.

  3. Geoff Forden (History)

    Josh,

    I couldn’t resist calculating the effect of either including or deleting various points in the data set on the 95% confidence limits. This, of course, produces a “second” range of numbers but for those interested the bootstrap (with replacement) gives the following effect on the upper and lower limits:

    Lower 95% confidence limit: 0.26 to 0.45

    Upper 95% CL: 1.9 to 2.3

  4. Josh (History)

    Geoff,

    Thanks for contributing that. But for the bootstrap, which dataset did you use?

    In case it wasn’t clear, one of the most important reasons not to use the entirety of Table 1 in the FAS paper for our purposes here is that it includes multiple estimates provided by the same people, who have changed their minds over time as new information has become available. Using the whole table “drowns” up-to-date information in this larger “soup” of outdated estimates.

    For this reason, comparing only information provided in similar timeframes should minimize methodological objections about apples-to-oranges comparisons.

    (Incidentally, one of the consequences of including the older material, because of the downward trend over time, is to make low estimates look more like outliers than they really are.)

    Now, ideally, one would get the information from people in the same room on the same day. Unfortunately, the second-best approach — the one I’ve used at the end of this post — involving only 2008 and 2009 estimates, and only “actual,” not “nominal” estimates, leaves us with just five figures. It’s feasible, but does seem like few.

    I’m tempted to include the “nominal” figures — I guess that’s why I presented them in the table — but whatever else one is doing in an expert opinion drill, it should involve asking different people the same question. It should really involve doing so under controlled circumstances, too. That makes this more of an illustration of the concept than an instance of the “real thing.” For the time being, though, it ought to suffice to show a little of what’s possible.

  5. Mark Gubrud

    I see no argument for a poissonian distribution here. Note that the counted variable is the number of people who estimate a certain SWU rate per interval of SWU rates. A Poisson distribution would require that at least some people claim the IR-1s don’t work at all. That may be plausible, but it would also predict a long tail on the high side, i.e. lots of people claiming that the IR-1s work 2 or 3 times as well as they possibly can, since a substantial number of people are saying they only work 1/3 or 1/4 as well as they should. We don’t see that.

    Experience suggests that the distribution of realized performance as a fraction of nominal performance (i.e. if the Iranians did everything right and the IR-1s performed exactly as they should) should be heavily skewed low, i.e. performance would often be poor but would never exceed the nominal performance. I would expect the distribution of estimates to reflect this knowledge, and it appears to do so. I take it the nominal performance is something around 2.1. So, nobody estimates anything much higher, but different analysts discount different amounts from the nominal value, based on picking and weighing the available evidence, as influenced by their own biases and degree of effort.

    I would suggest that, first, there is some narrow distribution of estimates of the nominal performance, which we may as well take as a given value (e.g. 2.1). Analysts then apply a discount factor 1/(1+x), in effect saying the machines are down x/(x+1) of the time, or only work 1/(1+x) as well as they should.

    Possible models for the distribution of x include poissonian, gaussian, exponential, or power law. The latter two possibilities would make x=0 the most likely discount for an analyst to use, which may seem unreasonable, but look at Josh’s data. An exponential distribution seems the most likely to me. But this only predicts the distribution of analysts’ numbers; it does not tell us what the real number is.

    I am, however, rather impressed by Scott Kemp’s arguments, and would probably put my money on his estimates.

  6. Geoff Forden (History)

    Mark,

    I couldn’t disagree with you more. There is some “true” value for the P-1 enrichment rate and for which, a priori, we (the people asking for the expert opinion) have no estimate. Thus your concept of “discounting” a “nominal value” is not very helpful. The simplest way of seeing that is to consider the possibility that P-1s have carbon fiber rotors; then they could spin much, much faster and all the experts would be wrong. Of course, we usually accept that the P-1s have a metal rotor but that is not absolutely certain, it’s simply what the experts assume—something we are letting the “expert opinion analysis” decide. (Again, I have to state that I am no fan of this type of analysis.)

    It is true that Poisson distributions are often thought of as a rate counting distribution but with enough creative arguing, I believe that this experiment could be re-phrased to make it appear more in line with such standard usages. But regardless of the details, the Poisson has some very important characteristics: it’s always positive and it approaches a Gaussian for large enough “most likely values” (as measured by the width of the distribution of the data). Does that mean that what I call the 95% confidence intervals are really 95%? Perhaps not, but it gives an indication of the range of the experts opinions that is qualitatively similar to 95%.

    Josh, I used the full data set in the table for calculating the bands for the 95% CI limits. These bands indicate just how important individual data points are statistically to determining values we use to characterize them. Dropping the two data points you mark with an asterisk produces a “95%” limit out side the indicated band so they are statistically important. But since you argue for dropping them on a theoretical basis, that could be ok to do.

  7. Andreas Persbo

    Thanks for pointing out that I use the theoretical value in my calculations. I hope I make that reasonably clear myself. But as I’m close to Mark, I feel like I’m in good company.

  8. Mark Gubrud

    Geoff,

    Whatever the underlying process that generates the observed statistical distribution of analysts’ estimates may be, it can’t be one that would generate a Poissonian distribution of estimates, since that would imply that at least some analysts (given a large enough population of them) would estimate that the IR-1s don’t work at all, while others would estimate that they work much better than the physics says they can (the “nominal” value, given such assumptions as the use of metal rotors).

    By the same kind of argument, I would eliminate the gaussian and power law distributions for x in the model I proposed. That leaves Poisson and exponential distributions as possibilities for x (not the SWU estimates themselves, but the discount factor 1/(1+x) applied to the physics-based “nominal” estimate that I take it nearly all analysts agree on within a small variation).

    The Poisson distribution might be right for x if you assume that there is a correct value for x and that analysts are just better or worse at estimating it. I was making a different assumption, namely that analysts are making more or less effort, and are biased one way or another, in accounting for the evidence and factors that might make x larger (and hence the SWU rate lower). In the worst case, which the exponential model suggests is actually the most likely case (though less likely than all the other cases added up), the analyst just quotes the nominal number and makes no effort to estimate the discount. Josh’s data shows 3 examples of this.

    As I wrote, this really does not help us to perform a weighted average and assess with given confidence limits what the correct value of the SWU rate is, but it does point toward the lower end of the distribution, since it suggests that analysts tending toward this end are taking more factors into account, as against the natural “reluctance” to apply a large discount factor which is implied by the exponential distribution for x.

  9. John Field (History)

    Gamma distribution is a continuous distribution related to Poisson used to model defect statistics in semiconductor process control. The reason that it is useful is that it models the piece failure rate as a sum of Poissonian failure rates due to independent failure processes of unrelated type within the manufacturing process – e.g. from different machines, different fabrication steps, etc. What you end up with is the distribution of failure rate coming off the line.

    I think it might make sense to turn this around and say that the SWU of the centrifuge is the sum of a series of technological innovations of differing magnitude and differing probability of having occurred – each one of which independently endows a certain amount of SWU to the centrifuge. Clearly, some innovations are related to others and therefore the number of degrees of freedom would only be equal to the rank of the “technology innovation endowment matrix.” In our case, we probably wouldn’t bother and just fit the distribution to the data. This needs to be done with extreme care because of the nonlinear nature of any parameter estimator that one might use. The bootstrap, as an effectively Bayesian estimate, seems like a good starting point.

    I am a little busy today, but if people are favorably disposed to this reasoning, perhaps someone else could do it, or otherwise, maybe in a few days I could get to it.

  10. Josh (History)

    John:

    To my naive ears, at least, that sounds like a worthwhile idea. If you have the time and energy to pursue it, by all means do so.

    By the way, for completeness’s sake, I have now added a tenth estimate to the table: the nominal performance figure calculated by Houston Wood, as related by Scott Kemp. I’d overlooked it somehow.

    If anyone is aware of any estimates that were produced before this post, and that are not superseded by the estimates in the table above, and do not duplicate any of them, please call them out in the comments.

  11. John Field (History)

    OK, for what it’s worth, below is my answer for the fitted gamma distribution. I took the expert ranges as the midpoints, but it would be possible to go back and include a weighting factor that would account for the ranges, but it seems like it would make only a very small difference – far less than the uncertainty in this exercise!

    mean = 1.537
    variance = 0.927
    gamma distribution shape parameters =
    k = 3.563
    theta = 0.550
    99% chance that it is above 0.36 SWU/yr
    90% chance that it is above 0.8 SWU/yr
    90% chance that it is below 3.35 SWU/yr
    95% chance that it is below 3.93 SWU/yr
    99% chance that it less than about 5 SWU/yr

    The probability distribution looks like this :

    The python generating script along with the raw data output can be found here

    – John

  12. Geoff Forden (History)

    John’s suggestion of trying a Gamma distribution is a good one. Since I seem to have a lot of time on my hands, I thought I’d go ahead and try it. The resulting fit, for all of the data points in the new table is shown here:

    What’s interesting about it is just how broad the 95% confidence intervals are. Naively, I would have expected a much better constraint than this, even at the 95% level. The most probable values are 1.0 SWU-kg/yr for the fitted Gamma distribution with 0.08 and 7.7 SWU-kg/yr for the lower and upper CI respectively. (I don’t have the inclination right now to bother with errors on the CIs nor the differences between means etc. and most probable.)

    So absolutely nothing is ruled out, at the 95% level. I’m inclined to believe that this is a good illustration of the lack of power of “expert systems” analysis. But that might be my natural prejudice against it just showing up.

  13. Geoff Forden (History)

    John and I have both tried to fit Gamma distributions to the data. There is clearly a difference but I might certainly have made a mistake since (I assume since I am on the East coast and he is on the West) I’ve been up longer than he has. Will check it again in the morning.

  14. Geoff Forden (History)

    I have checked my calculations and, other than finding that I inexplicably forgot Houston’s data point, I believe that I calculated the fit to a Gamma distribution correctly. There is a slight difference when Houston’s data point is included but it seems minor. I’m not familiar with python but, to my inexperienced eye, it appears that John is calculating the 95% CI from the fitted function, i.e. he is calculating the points at which the tails yield 5% above and 5% below. I believe the correct procedure is to calculate new distribution parameters that correspond to new distributions such that the observed data set has a 5% chance of being a random fluctuation from that distribution. That is what I have done and the two distributions, one high and one low, are shown in my graph above.

  15. John Field (History)

    Sorry, one very confusing thing I’m afraid . . .
    The plot in my post above was for the distribution of expected shape which is not the same thing as the expected distribution. In other words, that plot is of a gamma dist with the expected values of the parameters. This is a plot of both of the distributions, and you will see that they aren’t quite the same. Furthermore, it’s probably the expected distribution which is more useful . . .

    Sorry . . .

  16. John Field (History)

    Yes, Geoff is right that new distributions need to be calculated for confidence intervals. I believe that is taken into account with the expected distribution plot in my second post. Seems like properly the 95% limits should be around 0.4 and 3.3 SWU/yr above and below respectively. Or, at least, that’s what I get until Geoff and I manage to reconcile our answers.

  17. FSB

    So, 0.4-1.7 seems perfectly feasible given the analyses above.

    This range would include the Wisconsin Project and the FAS folks and ISIS values.

    I guess we can all hold hands again.

    BTW, I liked the analysis by Walsh, Pickering and Luers on the tactical myopia in dealing with Iran.

  18. Mark Gubrud

    The plotted fitted gamma distributions show exactly why the gamma and Poisson distributions are inappropriate models here: The long tails showing substantial support above the physics-based “nominal” limit of 2.0-2.2 SWU kg/yr. How many analysts will claim that the IR-1s work at the level of 3 or 4 SWU kg/yr? I think we can immediately discount any who do claim that.

    Failure of a given statistical distribution to impose the correct boundary conditions is a strong clue that it does not model the underlying stochastic process. This is exactly the argument Geoff used against the gaussian distribtion, and it rules out Poisson and gamma just as well.

    There seems to be some confusion in this discussion about what is being modeled. Josh’s data is not a distribution of observed performance levels for the IR-1s. It is a distribution of assessments given by various analysts, who do not have access to direct measurements, of aggregate performance for the entire set of IR-1s. I see no relationship between this data set and defect statistics in the semiconductor industry.

  19. John Field (History)

    I disagree with Mark, although he is correct that it would be wrong to infer any tail probability from these distributions as the tails do not accurately model the physical process. If Mark is saying that the whole exercise is pointless, allow me to show why even in the face of great uncertainty of the model it is useful to do statistical work as follows:

    It is essential to realize that in any estimation of required time to produce a certain amount of material, the estimate will include a division by the SWU of the centrifuge. In this division, the fluctuations due to uncertainty play an important role. Therefore, the time required is NOT equal to the amount of SWUs required divided by the estimated SWU. In fact, these calculations have indicated that when you multiply by SWU, you should multiply by a number that is about 30% less than when you divide by SWU. This is due to the width of the distribution.

    So, if I want to calculate how long it takes for Iran to get 20 kg of HEU together, I divide by something around 2 SWU. If, on the other hand, I want to calculate how much material Iran will have produced after a year, then I multiply by about 1.5 SWU.

    To Mark’s objection, this is a contribution which would be largely independent of the exact form of the distribution chosen – gaussian, poisson, gamma. It would only vary a little bit. But, if we’re going to do the exercise, why shouldn’t we choose a distribution that has as many of the right properties as possible? Gamma at least as the properties of being a continuous distribution which goes to zero for SWU<0 and exponentially to zero as SWU->infinity. Gauss and poissonian don’t even satisfy that. Meanwhile, I have suggested what I consider to be a perhaps plausible analogy between the mathematical origin of Gamma distribution and what might go on in the development of a centrifuge. I accept that Mark disagrees with this interpretation. That fact does not render the exercise pointless as shown above.

    I am all for using a better distribution – if Mark has one to suggest I would be happy to repeat the calculations.

  20. Mark Gubrud

    John, my suggestion was posted above. To reiterate, I proposed that essentially all analysts agree on a physics-based assessment of the “nominal” performance, of about 2.1 +- 0.1 SWU. That’s a narrow enough distribution to just take it as 2.1 for all analysts. Then, different analysts differently assess the discount from that due to defects, downtime, and other factors which degrade the performance. Call this factor D, then performance is given by 2.1*D. Since D varies between the limits of 0 and 1, I suggested the form D=1/(1+x), where x varies between 0 and infinity. So the question becomes, What is the distribution of x? The gamma distribution may well be the correct answer, although I had suggested an exponential distribution.

  21. John Field (History)

    Mark,

    It would seem that the distribution of x as you define it would have mean ~ 1.0 and standard deviation ~0.95. If you like it in a gamma distribution, that would be
    k = 1.089
    theta = 0.91
    and plotted it is almost indistinguishable from an exponential distribution exp(-1) as you had suggested. Of course, this is the distribution of the Gubrud derating factor, not the distribution of SWU so it is not clear to me that there is any good reason to assume gamma distr for the Gubrud factor – might as well just call Gubrud factor to be exponential I suppose.
    !http://www.dorringtoninstruments.com/acw/iranian_gubrud_derating.gif!
    Image here

    I’m quite happy now and I hope you are as well because it appears that we are saying basically the same thing just in a slightly different way.

  22. bradley laing (History)

    Iran puts conditions on nuclear fuel swap
    By BARBARA SURK

    —I hope this is the quickest way to get this news headline to you.

  23. bradley laing (History)

    MANAMA (Reuters) – Iran needs up to 15 nuclear plants to generate electricity, its foreign minister said on Saturday, underlining Tehran’s determination to press ahead with a program the West suspects is aimed at making bombs.

    Manouchehr Mottaki, addressing a security conference in Bahrain, also cast further doubt on a U.N.-drafted nuclear fuel deal meant to allay international concern about the Islamic Republic’s atomic ambitions.

    —i hope this is the quickest way to show you this. it may not be.