![]() | ![]() |
Formats:
|
||||||||||||||||||||||
Copyright © 2009 Kayser et al; licensee BioMed Central Ltd. Theory of sampling and its application in tissue based diagnosis 1UICC-TPCC, Institute of Pathology, Charite, Berlin, Germany 2Clin. & Exp. Pathology, Research Center Borstel, Borstel, Germany 3Deep Computing, IBM, Amsterdam, the Netherlands 4Institute of Pathology, University of Freiburg, Freiburg, Germany Corresponding author.Klaus Kayser: klaus.kayser/at/charite.de; Holger Schultz: hschultz/at/fz-borstel.de; Torsten Goldmann: tgoldmann/at/fz-borstel.de; Jürgen Görtler: goertler/at/de.ibm.com; Gian Kayser: gian.kayser/at/uniklinik-freiburg.de; Ekkehard Vollmer: evollmer/at/fz-borstel.de Received January 27, 2009; Accepted February 16, 2009. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background A general theory of sampling and its application in tissue based diagnosis is presented. Sampling is defined as extraction of information from certain limited spaces and its transformation into a statement or measure that is valid for the entire (reference) space. The procedure should be reproducible in time and space, i.e. give the same results when applied under similar circumstances. Sampling includes two different aspects, the procedure of sample selection and the efficiency of its performance. The practical performance of sample selection focuses on search for localization of specific compartments within the basic space, and search for presence of specific compartments. Methods When a sampling procedure is applied in diagnostic processes two different procedures can be distinguished: I) the evaluation of a diagnostic significance of a certain object, which is the probability that the object can be grouped into a certain diagnosis, and II) the probability to detect these basic units. Sampling can be performed without or with external knowledge, such as size of searched objects, neighbourhood conditions, spatial distribution of objects, etc. If the sample size is much larger than the object size, the application of a translation invariant transformation results in Kriege's formula, which is widely used in search for ores. Usually, sampling is performed in a series of area (space) selections of identical size. The size can be defined in relation to the reference space or according to interspatial relationship. The first method is called random sampling, the second stratified sampling. Results Random sampling does not require knowledge about the reference space, and is used to estimate the number and size of objects. Estimated features include area (volume) fraction, numerical, boundary and surface densities. Stratified sampling requires the knowledge of objects (and their features) and evaluates spatial features in relation to the detected objects (for example grey value distribution around an object). It serves also for the definition of parameters of the probability function in so – called active segmentation. Conclusion The method is useful in standardization of images derived from immunohistochemically stained slides, and implemented in the EAMUS™ system http://www.diagnomX.de. It can also be applied for the search of "objects possessing an amplification function", i.e. a rare event with "steering function". A formula to calculate the efficiency and potential error rate of the described sampling procedures is given. Introduction Diagnostic surgical pathology or tissue – based diagnosis is confronted with remarkable changes in its environment and workflow. The technological progress has led to a broad application of molecular biological methods such as Fluorescent in Situ Hybridization (FISH), and other DNA – sequence amplification techniques [1,2]. Commercially available slide scanners digitize a complete glass slide within a few minutes, and permit the implementation of completely digitized images into routine diagnostics [3,4]. In other words, the workload of a pathologist increases steadily not only by increase of material, but, in addition, due to the mandatory introduction of new, still tissue – based diagnostic technologies. Thus, the question arises: How can the availability of and access to digitized histological slides (virtual slides) be used to release the diagnostic pathologist from time consuming work steps in order to make the pathologist's work more effective and disease related? In the early days of telepathology, which can be considered to be the "mother of the digital pathologist's world", several authors reported on the diagnostic accuracy of viewing digitized slides in comparison to conventional microscopy [4-8]. The results were clear: the diagnostic accuracy viewing at a digitized (or virtual) slide is indistinguishable to that of conventional microscopy; however, the required time is essentially longer [9,10]. The non appropriate and more time consuming search for appropriate fields of view or the performed sampling procedure are obviously one reason of these constraints. To our knowledge, the theory of sampling in cytology and histopathology has not been described in detail, and is nearly unknown in the environment of diagnostic pathologists. In this article we want to explain the main theoretical aspects and the derivatives of sampling which are performed in routine tissue – based diagnostics. The derived formulas will allow interested pathologists or scientists to search for applications that can diminish the sampling time in virtual slides. Basic aspects of sampling in digitized histological slides (virtual slides) Surgical pathology is a medical discipline that "extracts" information from human tissue and classifies the information in distinct terms that are called diagnoses. The common performance is to screen an organ or a tissue section for those spaces or areas that contain the most significant information, and try to classify this information seen in the specific field of view. Thus, tissue – based diagnosis is based upon a procedure to search for small samples that allow to derive information that is valid for the whole (or even patient). In other words, an appropriate sampling procedure is a precondition to evaluate accurate and reproducible diagnoses [2,4,11-15]. Therefore, a detailed definition and accurate description of the sampling method is a necessity if we want to further evaluate the diagnostic algorithms. This statement induces the definition of sampling as follows: Sampling is a method to derive information from a limited (small) compartment of a large (even unlimited) system that is valid for the entire (basic) system. The system can be a space, a function or set of functions, a body, an organ, a slide, or a DNA sequence. The definition includes the term information, which has again to be defined: Information is a property that is exchanged between a sender and a receiver. Information is a property that can be understood by, and allows the receiver to react in an adequate, i.e., predictive manner. This definition of sampling includes two different aspects, which depend upon each other: 1. the method of sampling, and 2. the aim of the sampling procedure, i.e., which information should be extracted. Different aims can require different methods of sampling, or at least different parameters of the same algorithm. The inclusion of an "aim" or "goal" to be assessed introduces the calculation of efficiency, or a cost/benefit estimation. The most frequently used sampling goals are search for localization of specific items within the basic space, with the knowledge or assumption, that the space under consideration contains such items, and search for presence of specific items (tumour cells, ores, lobster, etc.), where the exact localisation of these items is of minor interest (for example localization of tumour cells in a cytological smear).The prepositions to apply an adequate sampling procedure in tissue – based diagnosis include that number and size of the samples are limited. In addition, the detectable information has to be known. This information commonly depends upon additional (external) factors, and can be translated into diagnostic features that allow the detection and identification of a probe within the sampling space. These features can depend upon the size of the probes, their number, and their position within the collective, or even within the sampling space. Let us assume that the final goal of our sampling method is the extraction of information from the entire space, and the classification of this information into a diagnosis. The diagnostic process can be separated into two different procedures: I) the evaluation of a diagnostic significance of a certain object or "basic unit" which is the probability that the object can be grouped into a certain diagnosis, and II) the probability to detect these basic units within the entire space. The detection probability of a wanted object depends then upon the size of the basic space A (filed of view, organ, nucleus, etc), the number and size of the samples (diagnostic frames) D, and number and the sizes of the detected objects [Ci], as demonstrated in figure figure1.1
Sampling aims, applications and examples Sampling is basically an information detection and transformation procedure, and thus undertaken to reach a certain final aim, for example to state a diagnosis, or to identify the presence or absence of certain objects. A time and space invariant translation of the sampling procedure can be assumed as long as we want to obtain reproducible results (figure (figure3).3 Random and stratified sampling procedures The prerequisite of any (random, stratified) sampling is at least a binary image, i.e. a foreground defining the basic units and a background have to exist. Any sampling procedure can be performed either as random or stratified sampling figure figure55
Random sampling is the selection of biological meaningful units (nuclei, cells, proteins, etc.) at random. It is used to measure the frequency of the analyzed units in relation to each other or to the basic space (structure); certain features of the biological units in relation to the basic space (to further identify and classify the objects).No information about the basic (reference) space is needed. The detection of biological meaningful units is then equivalent to the segmentation of the image and analysis of randomly chosen segmented elements. This procedure forms the basis of numerous investigations since the 1950s. It is commonly called stereology [22-26]. In principle, a grid consisting of regular lines (points) with identical length (and distance in between) is overlayed to the image, and the number of hits (intersections) is counted. From the number of intersections the volume – adjusted frequency, size, surface can be derived, independent from the orientation and shape of the elements. In a binary image the pixels (binary x, y points) can be used as a grid. Random chosen are the cutting angle (plane/volume), and the start point of the grid (pixel). Stratified are the selection of the grid (all pixels) and the count of intersections. Thus, any random sampling is provided by the start of the procedure, for example by random selection of the upper right position (x, y) coordinates of the sample space. From the relation x/A (number of hits x/reference area A) two-dimensional (and also three – dimensional) parameters can be derived. These include the area density (Aa), the volume density (Vv), the boundary density (Ba), the numerical density (Na), and the surface density (Sv). It should be noted, that this quite easily applied procedure permits the estimation of significant three – dimensional object features without any sophisticated three dimensional reconstruction [23,27,18,29]. Stratified sampling, in contrast, is provided by a specific selection of intersections (objects). Its objective is the detection of specific objects (of known features), and the measurement of features of known specific objects, or the estimation of objective-associated reference volumes, for example the density of proliferating cells related to distance from the nearest vessel [3,9,18,21]. Using again a grid as a measurement tool, the cutting angle (plane/volume) and the start point of the grid (pixel) are also randomly chosen. Stratified are the selection of the grid (all pixels), and the count of specific intersections (for example large cells) only. Thus: stratified sampling is provided by specific a selection of intersections (objects). A classic example is its application in cytology, i.e. to find the diagnosis-relevant cell (tumor cell) within a large number of "normal" cells. One could try to analyze 1. only those areas which contain features of (any) cell (gray value selection at low magnification) 2. within these areas only those cells which seem to be abnormal (gray value, size, moderate magnification) 3. within these cells those with abnormal nuclear size (DNA content), high magnification. 4. terminate the procedure once the diagnosis – significant information has been obtained All other items are disregarded or neglected. The implementation of such an algorithm can speed up the time required "to screen a slide" significantly [5,12,30]. Stratified sampling requires some external knowledge in order to detect the biological meaningful events such as cancer cells. The image features of a cancer cell have to be known if one would like to detect this event by stratified sampling. The alternative algorithm would be to "sample" all cells, and start, if possible, a statistical analysis. This would then try to evaluate the rare events (supposing that cancer cells are rare to normal cells). Again, some external knowledge would be necessary. Obviously, this is related to the diagnosis function s(Ci, D). Stratified sampling requires an accurate segmentation of objects with known features. Independent upon the actual segmentation procedure the sampling can be performed as active and passive sampling. Active and passive sampling Any segmentation procedure has to accurately define the area of an object, which is equivalent to detect its boundary. Each pixel has to be distinguished either to belong to the object or not, which can be written: f(x, y, meaning) = [1,0], with f(x, y, object) = [1], and f(x, y, backgound) = [0] This approach is called passive sampling, as it discriminates the object area by a simple yes – no function [14]. In other words, passive sampling is provided by a constant relation between the objects and the grid (intersections). The intersection has the probability function p(i) = [1]. Active sampling is a different approach. It is provided by an objective-specific relation between the objects and the grid (intersections). The probability that a pixel belongs to an object ranges between [1,0]. The intersection has a probability function p(i, o), i.e., the probability to detect the pixels that belong to a certain object depends on the object itself and its neighborhood [20]. For example, a pixel displays a probability of 0.7 that it belongs to the object. This probability can increase or decrease dependent upon additional parameters, such as size, orientation, or shape of neighboring objects. Naturally, the probability value of 0.7 itself might be used to define whether it is an "object" – or a "background" pixel. The probability function p(i, o) can be calculated if we separate p(i, o) in its two components: p(i, o) = gr(x, v) * af(gr, v). gr(x, y) is the frequency distribution of different objects in the reference space v, af(gr, v) is the detection probability in the space v. If we assume that af(gr, v) = const in the reference space v, we can estimate p(i, o) by a set of measurements in different sample spaces and transform p(i, o) = [1] if gr(x, y) > const, and p(I, o) = [0] elsewhere. Active sampling has been reported to be an effective method to correct the variation in immunohistochemistry staining, for example to compute the threshold of positive staining intensity [2]. The classic problem is: At which staining intensity (color level) can an "immunohistochemically analyzed cell" be grouped into the positive class? The active sampling attempt is to measure the relation positive/negative cells at different threshold levels in several randomly selected sample areas. We then select the discrimination threshold which results in (number of positive objects/number of negative objects) = const for all selected samples. A characteristic application is demonstrated in figure figure7.7
Functional sampling The idea of functional sampling focuses on the interpretation of rare events [12,14,15,31-33]. The question arises: Do there exist certain rare cells within a cellular society (tissue) that possess a high functional power similar to catalysts in chemistry. If yes, how can they be identified? Therefore, functional sampling is defined as the search for a specific (key) function of rare biological objects within a different (majority) population. As the function to be analyzed might be unknown and we cannot observe the proposed function directly, we have to state the following prerequisites for proper analysis: 1. The specific object (cell) is rare within the basic population. 2. It has to possess regular neighborhood relations to objects of the basic population. 3. It has to be randomly distributed within the reference space. The proposed algorithm tries to evaluate the distance properties between the rare events and the frequent events, and the general distribution of rare events within the reference space as follows: 1. We perform a random sampling of the specific (rare) object (O) within the basic population Ni (to estimating O [Ni]). 2. We perform a stratified sampling "around" each detected specific object (to estimating Ni(0)). 3. If Ni(O) = constant we can assume a specific function of the object (cell) within the basic population (for example cellular immune competence, functional activation of cells, etc.). An example is shown in figure figure88
Sampling efficiency As we have defined sampling, it is a procedure that wants to describe space and time-related properties in surgical pathology, i.e. in tissue – based diagnosis. Such an investigation can be performed in different manners, which can be of different efficiencies. How can sampling efficiency be measured? Obviously, any sampling efficiency is closely related to the spatial distribution of the events searched for in the reference space. If the spatial distribution is known we can adjust the sampling procedure correspondingly. However, its spatial distribution is often not known, and we have to start with a random selection of certain compartments (samples). We can then measure the spatial distribution (frequency) p = s(N)/v, and the variation of p is the error E(p), which should be small in order to have an efficient sampling procedure. Usually, we can state that the reference space v >> s(Ne) (size of element e). The error E(p) can then be calculated according to
with E(Ne) = error of detecting an individual event (i.e., probability of identification/missing a tumor cell) E(B(n)) = error of measuring all elements in the reference space (i.e., related to the biological variance of the tissue, dependent upon N) E(Ne/v) = error of measuring the size of events e in relation to size of sampling space S (frequency of e in sample space sv). We can derive the following statements from this formula: 1. We obtain the smallest sampling error if we select the reference volume as sample size, and if we are dealing with regular tissue (small biological variance). 2. The smaller the sample sizes in relation to the size of events, the bigger is the sampling error, as long as the error to segment (identify) per event is not increasing. 3. The sampling error is increasing if we choose different sizes of the samples. Discussion To take and to analyze samples of a broad variety of tissues is a basic procedure in surgical pathology, or in tissue – based diagnosis. All diagnostic algorithms depend upon a correct and reliable sampling procedure, and extensive training in surgical pathology addresses to identify and sample those tissue compartments that probably contain the most significant information to classify the disease present [7,19,34-38]. The majority of investigations addresses to an optimum sampling procedure, for example. How many sentinel lymph nodes should be investigated in relation to the stage of breast cancer [29,31], or "optimizing sampling of tomato fruit for carotenoid content, or how to perform endometrial sampling in patients with trophoblastic disease after suction curettage [39,40]. In the early days of stereology several authors took attention on the sampling procedures, as the results of counting interceptions are closely associated to the nature of the used sampling method [22,23]. Recently, sampling has returned to the focus of investigations, especially in live imaging [41]. Most of the investigations try to optimize the sampling, which is equivalent to evaluate the "best" stratified sampling method. In addition to medical applications, sampling plays a dominant role in geology, especially mining. In fact, Krige's sampling analysis can be considered to be the first approach to develop a "sampling theory" [12,20]. In this article we want to derive a scheme of sampling that permits a principle view of sampling, its different methods, and to calculate the efficiency of the used sampling method. In principle, two different algorithms exist, the random sampling and the stratified sampling [12,9]. Random sampling has to be performed, if no knowledge of the information searched for exists. It is the appropriate technique to measure features of biological units such as chromosomes, DNA fragments, nuclei, cells, vessels, etc. Its accuracy (error rate) can be predefined by number and size of the chosen samples in relation to the expected size of events and to the reference space. Its results can be implemented in additional classification algorithms, such as diagnostic procedures. The sampling can be terminated if a certain classification can be performed with a predefined accuracy, i.e, a diagnosis can be assessed with high certainty. The accurate measurement of events' features is a prerequisite, but not the aim of stratified sampling. Its implementation requires additional (external) information, and numerous investigations have been performed to "speed up" the procedure (or to make it more efficient) using spatial structures within the reference space. When an exponential event probability distribution is given, Krige's formula can be derived from stratified sampling. In addition to the discussed principle differences between random and stratified sampling procedures, passive and active sampling plays a major role in image segmentation algorithms. The common principle of active sampling associates neighbourhood knowledge (i.e. knowledge derived from general external observations) to the object under investigation, for example to accurately define its boundaries [18]. Especially in measuring accurate thresholds for grading purposes in immunohistochemistry this approach has been proven to be successful [2]. A furthermore derived application is the functional sampling, which is again a stratified sampling in principle. This procedure can assist to investigate in the "biological importance" of rare events, which is widely not known to our experience. In aggregate, a general theory of sampling is derived that possesses its applications in numerous, if not all natural sciences. They range from agriculture to mining, from aircraft maintenance to medicine. In surgical pathology it is of major importance that all diagnostic investigations start with appropriate sampling. Competing interests The authors declare that they have no competing interests. Authors' contributions KK and EV initialized the study and drafted the paper. HS, TG, JG and GK were involved in generating and evaluating the data and in the writing of the manuscript. Acknowledgements The financial support of the international Academy of Telepathology e.V., and of the Verein zur Förderung des biologisch-technologischen Fortschritts in der Medizin e.V. are gratefully acknowledged. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||
Curr Opin Gastroenterol. 2008 Sep; 24(5):627-30.
[Curr Opin Gastroenterol. 2008]Anal Quant Cytol Histol. 2006 Feb; 28(1):27-38.
[Anal Quant Cytol Histol. 2006]J Clin Pathol. 2003 Jun; 56(6):433-8.
[J Clin Pathol. 2003]J Clin Pathol. 2003 Jun; 56(6):433-8.
[J Clin Pathol. 2003]J Pathol. 2002 Jun; 197(2):211-7.
[J Pathol. 2002]J Am Acad Dermatol. 2008 Nov; 59(5):814-21.
[J Am Acad Dermatol. 2008]Anal Quant Cytol Histol. 2006 Feb; 28(1):27-38.
[Anal Quant Cytol Histol. 2006]J Clin Pathol. 2003 Jun; 56(6):433-8.
[J Clin Pathol. 2003]Anal Cell Pathol. 2000; 21(3-4):207-12.
[Anal Cell Pathol. 2000]Arch Pathol Lab Med. 2008 Dec; 132(12):1920-3.
[Arch Pathol Lab Med. 2008]Diagn Pathol. 2008 Apr 18; 3():17.
[Diagn Pathol. 2008]Diagn Pathol. 2006 Jun 10; 1():10.
[Diagn Pathol. 2006]Prog Histochem Cytochem. 1997; 32(2):1-106.
[Prog Histochem Cytochem. 1997]J Microsc. 1986 Jul; 143(Pt 1):3-45.
[J Microsc. 1986]Bone. 1987; 8(1):13-7.
[Bone. 1987]J Microsc. 1985 May; 138(Pt 2):127-42.
[J Microsc. 1985]Anal Quant Cytol Histol. 2008 Dec; 30(6):323-35.
[Anal Quant Cytol Histol. 2008]Praxis (Bern 1994). 2007 Sep 26; 96(39):1459-74.
[Praxis (Bern 1994). 2007]Anal Quant Cytol Histol. 2008 Dec; 30(6):323-35.
[Anal Quant Cytol Histol. 2008]Prog Histochem Cytochem. 1997; 32(2):1-106.
[Prog Histochem Cytochem. 1997]Anal Quant Cytol Histol. 1999 Aug; 21(4):319-28.
[Anal Quant Cytol Histol. 1999]Thorax. 2008 Apr; 63(4):360-5.
[Thorax. 2008]Curr Opin Gastroenterol. 2008 Sep; 24(5):627-30.
[Curr Opin Gastroenterol. 2008]Arch Pathol Lab Med. 2008 Aug; 132(8):1313-6.
[Arch Pathol Lab Med. 2008]Curr Opin Gastroenterol. 2008 Sep; 24(5):627-30.
[Curr Opin Gastroenterol. 2008]Anal Quant Cytol Histol. 2006 Feb; 28(1):27-38.
[Anal Quant Cytol Histol. 2006]Diagn Pathol. 2008 Apr 18; 3():17.
[Diagn Pathol. 2008]Diagn Pathol. 2008 Jul 15; 3 Suppl 1():S11.
[Diagn Pathol. 2008]Arch Pathol Lab Med. 2008 Aug; 132(8):1313-6.
[Arch Pathol Lab Med. 2008]Arch Pathol Lab Med. 2008 Dec; 132(12):1920-3.
[Arch Pathol Lab Med. 2008]Eur Arch Otorhinolaryngol. 2008 Jul; 265 Suppl 1():S19-23.
[Eur Arch Otorhinolaryngol. 2008]J Clin Virol. 2007 Sep; 40(1):1-8.
[J Clin Virol. 2007]Prog Histochem Cytochem. 1997; 32(2):1-106.
[Prog Histochem Cytochem. 1997]J Pathol. 2001 Nov; 195(4):508-14.
[J Pathol. 2001]Diagn Pathol. 2006 Jun 10; 1():10.
[Diagn Pathol. 2006]Anal Quant Cytol Histol. 1988 Aug; 10(4):299-306.
[Anal Quant Cytol Histol. 1988]Pathologica. 1995 Jun; 87(3):255-62.
[Pathologica. 1995]Praxis (Bern 1994). 2007 Sep 26; 96(39):1459-74.
[Praxis (Bern 1994). 2007]Anal Quant Cytol Histol. 2008 Dec; 30(6):323-35.
[Anal Quant Cytol Histol. 2008]Anal Quant Cytol Histol. 2006 Feb; 28(1):27-38.
[Anal Quant Cytol Histol. 2006]