It is amazing that a seriously defective draft paper (NOT YET PEER-REVIEWED) by Richard Muller is getting MASSIVE coverage in the press!
Benjamin D. Santer, a climate researcher at the Lawrence Livermore National Laboratory says he found it troubling that Muller claimed such definitive results without his work undergoing peer-review. Two peer review reports on this paper raise serious questions about its methodology.
And yet, it is being reported in newspapers across the world as if it is the spoken word of God.
If the referee reports below are a sign of the quality of the paper, I have no doubt Richard Muller ("Professor!") is intent on hoodwinking the world.
In any event I have two BASIC problems (apart from methodological) with Richard's widely publicised paper:
a) I need his studies to start from 0 AD, and show how temperatures have fared across 2000 years.
b) I need him to PROVE that there is a significant correlation between CO2 and temperatures observed over the past 2000 years.
By cherry picking the lowest point of the Little Ice Age, and then FAILING entirely to prove correlation (there is NONE, if one eyeballs the data, particularly over the past 2000 years), he presents himself as a POLITICIAN not scientist.
Referee Report #1 (September 2011)
Influence of Urban Heating on the Global Temperature Land Average Using Rural Sites Identified from MODIS Classifications by Wickham et al.Overall CommentThis paper tries to do 2 things in a single, short paper; namely introduce a new global temperature data product with a much larger number of stations than are available in GHCN and related products, and provide a quantification of non-climatic biases in surface temperature records. While the authors have developed an impressive new data base, the paper fails unfortunately to do a satisfactory job of either task. First, it omits many of the technical details readers need to assess the new data base construction methodology. Second, the analysis of the urban-rural split is simplistic in light of where the current literature stands, and is not able to support the conclusions drawn. Specifically, the authors’ empirical results are consistent either with the stated conclusion or its opposite, and therefore they are in no position to say anything decisive.I will recommend that the paper be rejected in its current form. I have no doubt that presentation of an important new surface data base is a publishable contribution, as long as some major improvements to the manuscript are made, as detailed below. But with regard to the analysis of surface disruptions and the spatial distribution of temperature trends, the analysis presented herein has serious inadequacies that make it unpublishable in its current form.Introduction of new data baseThe paper referred to as Rhode et al. (2010) on page 6, lines 106-107, and elsewhere, does not appear to be a publication. It should not be listed in the references. Yet almost all the necessary technical details that should be in this manuscript are apparently in it instead. This is a disservice to the reader. The material relegated to an unpublished source would appear to include just about everything that readers need to know about the new data set to decide on its validity.A partial listing of technical material that needs to be incorporated into the present paper includes the following.
- List the source data sets and metadata.
- Explain the averaging methodology in detail, using sufficient math to permit independent replication.
- p. 7 lines 124-131: show the effect of varying the definition of “very rural” to something other than the assumed tenth of a degree separation. How important is this parameter?
- p. 10 lines 174-183: Explain how many missing months are permitted in a continuous series before the series is split, or discarded.
- p. 12 lines 210-212: Explain the rationale behind this apparently ad hoc statistical procedure for determining standard errors. Is this some sort of block bootstrap method? There needs to be reference to standard, mainstream statistical literature explaining why this resampling procedure is used and why the authors believe it yields asymptotically valid standard errors. If no theoretical guidance is available the authors could perhaps use Chebychev’s inequality to provide an upper bound on the variance.
- p. 14 lines 252-266: Provide a discussion of how the tradeoff between continuity and fragmentation affects the data quality. That is, the rule for terminating a series will determine whether there are a few long but intermittent series, or many short but continuous series. Under what circumstances is the latter a better measurement, and how is the choice optimized?
- p. 14 lines 252-266: The authors claim to have “taken into account spatial correlation” yet there isn’t a word anywhere in the paper about how this is done. Since the authors cite McKitrick 2010 and McKitrick and Nierenberg 2010 they presumably have read both papers, which contain (especially the latter) detailed explanations about how spatial autocorrelation is tested for and corrected in models of surface temperature trends. The elements of the discussion required for a proper treatment of this topic include reporting a robust LM statistic, a parametric model of the spatial weights, a description of the estimation method for computing the SAC terms and the optimal distance weighting parameter, and test results on the residuals to indicate whether the SAC model was adequate.
- p. 14 lines 252-266: Again in this section there is reference to a resampling method to compute standard deviations, but no explanation is given, nor is there any reference to statistical literature. Is this some sort of bootstrap method? An explanation is needed.
- p. 14 lines 268-272: With respect to the iterative weighting procedure, how do you know that this converges to a unique solution? It is possible the weights are path-dependent. We need to be shown some details about the convergence rule and the way the results are tested by trying different starting values.In the absence of so much elementary material it is difficult even to review this paper. I understand that a great deal of work has gone into the project, and the release of a new data set with improved sampling characteristics is a valuable contribution. In rejecting the present manuscript I hope the authors will revisit the task of explaining their work with some alacrity and will resubmit a much expanded paper so that the new data base can be published.Quantifying the effect of nonclimatic contamination of the dataJudging by the paper’s title this appears to be the topic the authors want to focus on. It is clear that, if published, this will be a very prominent paper and its findings will be wielded to considerable polemical effect: indeed one of the authors has already taken the liberty of announcing partial findings in Congressional testimony. Great care must be taken to ensure that findings are accurate and are fully supported by the empirical analysis. In this regard I note two problems: the paper reads as if the authors have been careless in reviewing the existing debate, and the empirical work does not imply the conclusions.The authors cite, in passing, papers by de Laat and Maurellis and McKitrick and coauthors (pp. 5-6) that present evidence of significant surface data contamination. They also cite papers that argue for the absence of such contamination. Despite the fact that Wickham et al. purport to adjudicate between these different literatures they do not summarise or explain the very different methodologies involved nor how their analysis relates to them, if at all.On page 13 lines 234-235 the authors conclude that their result “agrees with the conclusions in the literature that we cited previously” which is a baffling statement given that they cite papers that directly contradict one another. My overall impression is that the authors have not actually read all the papers they cite, and have not come to terms with the technical issues involved in the current debate. If it is their purpose to draw conclusions about the surface data contamination question they need to position their own analysis properly in the existing literature, which will require a detailed explanation of what has been done hitherto, and the use of an empirical framework capable of encompassing existing methodologies.With regard to their own empirical work, a basic problem is that they are relating a change term (temperature trend) to a level variable (in this case MODIS classification) rather than to a corresponding change variable (such as the change in surface conditions).I will give a simple example of why this is a flawed method, then I will demonstrate it empirically.Suppose there are only two weather stations in the world, one rural and one urban. Suppose also that there is zero climatic warming over some interval, but there is a false warming due to local population growth, the effect of which is logarithmic, as is commonly assumed. Then the measured trends would be proportional to the respective tangent lines:
A sample split according to the rural/urban distinction would apparently show that the rural station has a higher trend than the urban one. Far from proving that there is no urban bias in the overall average, it is precisely the result we expect if there is such a bias! And the contrast would be larger, the wider the difference between “urban” and “very rural”. Consequently the authors’ univariate analysis cannot, in principle, be the basis of their assertion that there is little or no urbanization bias, since the results are consistent with such a bias being present.To provide an empirical demonstration, I obtained the GEcon data base from Yale University (http://gecon.yale.edu/) which provides gridded population, GDP, climatic and other indicators over the 1990-2005 interval for 27,500 terrestrial grid cells at 1 degree resolution. I then interpolated CRU grid cell trends over 1990-2010 for the same grid cells. After removing cells with missing socioeconomic data, or in which more than 25% of the years are missing 4 or months of temperature data, I was left with just under 18,000 grid cells with observations on the linear temperature trend, latitude, minimum temperature, standard deviation of precipitation, distance to coast, number of missing months in temperature record, 2005 population per square km, change in population (1990 to 2005), 2005 GDP (U$, PPP-based) per square km, change in GDP per sq km 1990 to 2005, 2005 GDP per capita and change in GDP per capita over 1990 to 2005.To replicate the results in Wickham et al, I regressed the vector of trends on a static measure of surface disruption, namely 2005 grid cell population/km2, using White’s corrected residuals.The results mirror those of Wickham et al. The coefficient on POP2005 is negative and significant, apparently indicating that regions with higher population per square km have slightly (but significantly) lower trends. I then re-did the same analysis using 2005 GDP/km2 as the measure of surface temperature disruption.Again the results mirror those of Wickham et al. The coefficient on GDP2005 is negative and significant, thus “confirming” that relatively undisturbed regions apparently have higher warming trends, a result they deem anomalous in light of prior expectations.But it is not anomalous at all, it just reflects the fact that this class of empirical model cannot measure what the authors have tried to measure. The problem can be remedied by adding in fixed climatic covariates and socioeconomic change terms. Ideally I would also put in the lower tropospheric trend terms on the right hand side, but I don’t have them handy and they are not needed for the illustration. Here are the results of the multivariate model:Latitude, MinTemp and SD of precipitation are all significant. “Miss_months” indicates the number of missing months in the data series after 1990. It is significant, and indicates that the more missing months in a series, the higher the estimated trend.Look carefully at GDP2005 and POP2005: they are still negative but they have become small and insignificant. 2005 per capita income (INC2005) is also insignificant.Meanwhile the change term CHG_POP (population growth) is positive and significant, as is CHG_INC (income growth). In other words it is the change in socioeconomic measures that correlates to the change in temperature over an interval of time, and once these effects are controlled the apparent contrast in trends based on a static measure of surface disruption such as GDP or Population (or, likely, MODIS land classification) becomes insignificant and irrelevant.The joint test on the socioeconomic variables has an F statistic of 132.47, which is extremely significant, indicating that we would reject the hypothesis that surface trends are unaffected by socioeconomic factors at the surface. Using the method outlined in McKitrick and Michaels 2007 to filter the trend vector, the mean trend falls from about 0.33 to 0.26, indicating the socioeconomic effects add up to a net warm bias of about 0.07 C/decade, which is comparable to the results in Table 6 of McKitrick and Nierenberg 2010, even though this is a different data set using different covariates on a different time period; but this part of the analysis is difficult to do without the full set of covariates including the satellite-based trends.To emphasize the contrast: on a large global data set, if I use a naïve analysis comparable to Wickham et al., namely relying on 2005 population as the only regression covariate, I get the same, “anomalous” result that they do, namely that higher-population regions apparently have slightly lower trends than low-population regions. But when I remedy the conceptual weakness in their model by introducing change terms on the right hand side, the population level turns out to be insignificant, and instead the population change term has a positive and significant effect on the trend, implying that population growth biases the surface trends upwards. Likewise per capita income growth, but not the level, is positively correlated with the size of the trend.ConclusionThe simple univariate analysis in Wickham et al. does not establish a sound basis for their assertion that surface temperature data are unaffected by urbanization and related socioeconomic disruption of the surface. To draw such a conclusion would require setting up a model capable of measuring these effects. At least three improvements to the modeling framework are needed to bring the analysis up to the level of the current debate.
- Use of a suite of covariates that can identify the contrasting effects of different sources of bias such as anthropogenic surface processes, data inhomogeneities and regional atmospheric pollution;
- Comparison of the observed spatial trend pattern to those predicted in climate models so that a null hypothesis is clearly identified and spurious results can be ruled out;
- Examination of spatial autocorrelation of the model residuals to permit identification of the explanatory variables needed to yield iid residuals, in support of making asymptotically accurate inferences.A very simple way to proceed would be to compute post-1979 gridded trends in the BEST archive and merge them with the McKitrick and Nierenberg data set, then run the code available online. I conjecture that the results will look a lot like those reported in McKitrick and Nierenberg (2010), but whatever is the case I encourage the authors to use their new data set for such an analysis and see what emerges. Meanwhile I cannot recommend this draft for publication.- Minor point: M&N should be citedMcKitrick, Ross R. and Nicolas Nierenberg (2010) Socioeconomic Patterns in Climate Data. Journal of Economic and Social Measurement, 35(3,4) pp. 149-175. DOI 10.3233/JEM-2010- 0336.Signed review: Ross McKitrick.
Referee Report #2 March 2012
Second referee report: Influence of Urban Heating on the Global Temperature Land Average Using Rural Sites Identified from MODIS Classificationsby Wickham et al.I now realize that the aim of this paper is much more narrow than I had originally thought it to be. The Rhode et al. paper at http://berkeleyearth.org/pdf/berkeley-earth-averaging-process.pdf is the “flagship” in which the BE data construction and methodological details are presented, and this paper is only focused on the urban heating issue. Consequently I can see that some of the technical details I asked for are written up elsewhere, and in their response, the authors rely heavily on the existence of the Rhode et al. paper to justify leaving so much out of their own. That being the case, however, all the credit attached to the new data set construction and methodology belong to the Rhode et al. paper, so the only grounds for deciding on the publishability of this particular paper is whether it is a good analysis of the topic of urban contamination of the surface record.A weak analysis on an old data set would certainly not be publishable; a good analysis on a new one probably would. This paper presents a weak analysis on a new data set, and the novelty of the data set cannot be weighed in its favour.
I had given some suggestions about how to fix the problems in the methodology in my earlier review, including one idea that would have been relatively straightforward to implement using easily-available data. Unfortunately the authors have made no methodological improvements, and the arguments they offered for keeping their technique unchanged are, as I will explain, unpersuasive. So it will come as no surprise that my view of this draft remains unchanged from before.On page 7, the sentences on lines 114 to 121 represent an improvement in the discussion of the range of findings in the published literature. But having drawn attention to the contradictory results in previous published analyses, the authors offer a weak explanation as to why some teams find an effect while others do not. They first suggest the issue comes down to a lack of adjustments in CRUTEM products. This is inconsistent with what CRU says about its own data. The CRU web page (http://www.cru.uea.ac.uk/cru/data/hrg/) presents two products: TS and CRUTEM. The TS series are not subject to adjustments for non-climatic influences, and for that reason users are cautioned not to use them for climate analysis, and instead users are directed to the CRUTEM data based on its supposed additional processing:Question OneQ1. Is it legitimate to use CRU TS 2.0 to 'detect anthropogenic climate change' (IPCC language)?A1. No. CRU TS 2.0 is specifically not designed for climate change detection or attribution in the classic IPCC sense. The classic IPCC detection issue deals with the distinctly anthropogenic climate changes we are already experiencing. Therefore it is necessary, for IPCC detection to work, to remove all influences of urban development or land use change on the station data….If you want to examine the detection of anthropogenic climate change, we recommend that you use the Jones temperature data-set. This is on a coarser (5 degree) grid, but it is optimised for the reliable detection of anthropogenic trends. (http://www.cru.uea.ac.uk/cru/data/hrg/timm/grid/ts-advice.html)Brohan et al. (2006, p. 6) don’t claim that their data are unadjusted, they say that the raw data may have been adjusted but they do not have original records so they can’t say what was done. Jones and Moberg (2003) say of the CRUTEM2 data set (emph added):“All 2000+ station time series used have been assessed for homogeneity by subjective interstation comparisons performed on a local basis. Many stations were adjusted and some omitted because of anomalous warming trends and/or numerous nonclimatic jumps (complete details are given by Jones et al. [1985, 1986c]).”So the CRUTEM products are not as raw as the authors imply, even if it is difficult for users to understand what the particular adjustments were. Even if CRUTEM3 is unadjusted, McKitrick and Nierenberg (2010) used both versions 2 and 3 in their analysis, with clear similarity in results between them, so the issue is moot.The authors then try (lines 117-121) to draw a distinction between analysis of local trends and the global average. I don’t follow the logic here, since it is a global sample of local trends. Widespread problems in the local records will carry over to the global average. Had this been properly noted the sentence in question would read (emph added): “McKitrick and Michaels (2004, 2007) and McKitrick and Nierenberg (2010) also focus on finding the heating signal in a global sample of local trends rather than evaluating the effect on a global average.” Stated in this way, it would be clear that the authors are saying that the discovery of a global pattern of problems in local trends does not imply a problem exists in the global trend, which is a pretty weak position to take.The more obvious, and plausible, explanation for the difference in results across the different studies is the difference in testing methodologies. I demonstrated this in my previous report, showing that one set of results can be shown to emerge as restricted estimates from a model whose general form indicates the opposite conclusions, and the restrictions can be rejected.The authors dismissed this demonstration by saying something that I confess I can’t make much sense of:The empirical demonstration is interesting, but we view it as a way to do the “trend analysis” part of our paper “correctly”. That isn’t our goal. Our conclusions are based on the Berkeley Average on the very-rural stations compared to all stations.
Are they really saying it is not their goal to do the trend analysis “correctly”? I don’t think I have ever encountered a situation where authors have said of their own work that it was not their goal to do it correctly. I am sure they did not mean this, but I draw a blank at trying to figure out what they did mean. Later they say:We are not asserting that surface temperature data are unaffected by urbanization, but that a global average based on data that includes stations that may have warmed due to urbanization is not significantly different to one based only on stations that are assumed not to contain urban effects.
I suspect that any reasonable reader, upon completing the paper, would be startled to learn that the authors did not intend to assert that surface temperature data are unaffected by urbanization. I think the above sentence was meant to say something like: “We are not claiming there are no contaminating influences in individual locations, only that they are too small and isolated to affect the global average.” Unfortunately the whole issue is whether their methodology reliably supports this conclusion, and in this draft they have done nothing to deal with the evidence that it does not, instead they simply assumed the problems away.The authors dismissed the conceptual example with an argument that is both incorrect and beside the point. Ignoring their observation that the convex function could be a square root (which would look pretty much the same), they say that the argument relies on each tangent line being defined over an infinitesimal domain of the same length, and that the population change in the urban region would likely be larger in magnitude than in the rural region.If the diagram were redrawn to reflect this case, the underlying point would emerge even more strongly, since for any convex function, an arc connecting two points has a flatter slope than does a tangent at the first point, and the farther apart the points, the flatter is the arc line. Hence a steeper slope in the rural sample is what we would expect if urbanization were a large effect in the data and the urban population increased more than did the rural population.The point of this argument, to which the authors did not respond, was that their method is, in principle, unable to support the conclusions they draw, since their findings are consistent both with the absence or the presence of a significant urban warming bias. Nothing in their response or their revised paper addresses this problem. Instead they seem to rule out one interpretation by assumption and then claim to have proven the other interpretation.Moreover their empirical results are becoming harder and harder to reconcile with their own preferred interpretation. Between the last draft and this one, the negative rural/all trend divergences got even larger. Over the full sample the trend difference was -0.10C/100yr before, and is now -0.14 C/100yr. On the subset of records ≥30 yrs, the trend difference was -0.12 C/100yr before, and is now -0.15 C/100yr. The authors downplay the negative divergence in their conclusions, and try to portray it as essentially a zero difference, but the number reported in the Conclusion, -0.10 ±0.24 C/100yr, seems to have been derived in a very different way than by differencing the trends in Table 1, not least since the standard error is on a far larger scale. (Unfortunately the reader is not informed how this was computed. Was it a time series regression on the post-1950 in Figure 5B?)The larger the size and sign of such divergences, the less consistent their data get with their preferred story, namely that there is no difference between samples; but they more consistent they get with the existence of a global-scale urbanization contamination problem as conjectured in the above figure.Or maybe there are other explanations. For instance, the very rural data set is heavily dominated by stations in North America and northern Europe (Fig 2). If recent regional warming in northern mid latitudes is stronger than in the SH, the very rural sample is more heavily drawn from faster-warming regions. Then the Kriging method has to do more work to compensate for this. So one interpretation of the stronger relative warming in the very rural sample is that the Kriging method is not providing an adequate offset for the sample change through the spatial weighting system. In other words, we have to assume the validity of their method to accept the interpretation of their results, since otherwise the results could just as well be interpreted as evidence against the validity of the methods. The authors do not present any evidence to suggest they considered how to rule this possibility out.I had hoped that in response to my previous review the authors would have made some attempt to strengthen their methodology and rule out rival interpretations, and I even suggested a relatively straightforward test they could have done using data readily available online. The authors chose not to do any of these things. As before, I would be willing to re-read a major revision that deals with the methodological problems, but at this point the authors appear determined to leave their methodology unchanged, so not surprisingly my original recommendation against publication is also unchanged.Signed review: Ross McKitrick
If you found this post useful, then consider subscribing to my blog by email: