Feature Extraction Software Training

Feature Extraction Software Training
Insert AE Name and Date

Agenda Product structure and pricing
Downloading and shipping of software and manual Installing Feature Extraction 7.5.1 License types and redemption Activating FE software Data Workflow Features and Benefits of Feature Extraction software User Interface Grid Mode for Agilent and 3rd party microarrays Feature Extraction algorithms Feature Extraction results Known issues fixed in FE 7.5.1 Feature Extraction 7.5 Customer Training Updated 08/18/04

Product Structure Product# G2565BA consists of the scanner control software and 3 licenses of FE s/w. G2566AA is internal product number to be used by CALC and not orderable by the customer. We are now offering FE as a stand alone product, G2567AA, which includes 1 license of FE s/w shipped with a printed manual. Additional licenses can be purchased through G2568AA. Alternatively, customers can get FE along with Scanner Control by purchasing the scanner upgrade service contract. Or customers can upgrade only FE s/w by ordering G or G Feature Extraction 7.5 Customer Training Updated 08/18/04

Downloading FE 7.5.1 Software and Manual
FE software and manual are available for download from: Agilent website eRoom - Gene Expression Informatics Software > Fe_cd7.5.1 EPI Warehouse LSM website Customers can access the Agilent website. They cannot access eRoom, EPI Warehouse, nor LSM websites. Feature Extraction 7.5 Customer Training Updated 08/18/04

Shipping of FE 7.5.1 Software and Manual
FE software and hard-copy manual will be shipped to: New scanner orders originating on or after June 16 Upgrade service contracts originating on or after June 16 Existing FE customers will NOT receive FE installation CD or hard-copy manual in the mail; Must download from Agilent website Feature Extraction 7.5 Customer Training Updated 08/18/04

Installing Feature Extraction 7.5.1
Before installing FE 7.5.1, make sure that previous version of FE is uninstalled completely Delete all associated .dll files Un-installation of FE will NOT delete the existing license file After FE is installed, software will recognize and use the FE license Compatibility and known issues with FE 7.5.1 Internal tool version is compatible with FE 7.5.1 Concurrent (multiple) sessions of FE is NOT supported on same PC Running concurrent sessions of FE s/w on the same PC will cause spurious error messages, inaccurate results, and/or failure in FE analysis! Feature Extraction 7.5 Customer Training Updated 08/18/04

License Types Two types of licenses available: Node-locked licenses
User must install Feature Extraction software on specific PC User must provide the host ID of PC that FE s/w is installed on 30-day demo licenses Software is fully functional but expires 30 days after date of issue Run on any PC FE uses the SAME license file that FE used This means free upgrade for existing FE customers!!! Customers using beta versions of FE 7.4.x will need to upgrade to FE by July 15 when the beta software self-inactivate 30-day demo licenses can be requested from the Agilent website. After the license has expired, customers won’t be able to view nor feature extract images with the software. However, all data files saved before the expiration date will not be deleted. Information on how to access the host ID is provided in the next slide. Feature Extraction 7.5 Customer Training Updated 08/18/04

License Redemption License key is redeemed at Agilent website Also, from FE 7.5 menu, click Help > Agilent License User must provide Agilent with the following: Order Number (available on Software Entitlement Certificate) Certificate Number (available on Software Entitlement Certificate) Host ID (available in FE 7.5, Help > About Analysis) address where the license will be sent to The website (mentioned on the Certificate of Entitlement) connects to Host ID is a MAC or ethernet address of the PC that FE is installed on. Feature Extraction 7.5 Customer Training Updated 08/18/04

Access Host ID in Feature Extraction Software
Click Help > About Analysis About Analysis dialog displays Software version Host ID (MAC/Ethernet address) Feature Extraction 7.5 Customer Training Updated 08/18/04

Where to Save the License Key
License file name ends with “.lic“ License file should be saved in this directory: Program Files\Agilent\MicroArray The license file name can be anything as long as it ends with “.lic”. Usually the license file name will be the customer’s order number. It is highly recommended that users backup the license file before attempting to uninstall FE 7.1, since the license file will be deleted from the Program Files\Agilent\Microarray directory during un-installation. Users will need to re-activate the software after re-installation. Feature Extraction 7.5 Customer Training Updated 08/18/04

Activating Feature Extraction Software
Software needs to be activated after it is installed and the license file is saved in directory Program Files\Agilent\MicroArray To activate Feature Extraction software, open an image file FLEXlm License Finder dialog pops up asking for license file Select “Specify the License File” and browse to the directory where you have saved the license file FYI – It is not absolutely necessary that the license file be saved in the directory Program Files\Agilent\MicroArray. But for support pruposes, we want users to save the license file in a common location so that we know where to access the license file in a support situation. Feature Extraction 7.5 Customer Training Updated 08/18/04

Data Flow Feature Extraction Result Files: QC File (print file) Text
Shape Scanner software TIFF JPEG Image Analysis MAGE-ML Feature Extraction GEML Pattern File FE now generates JPEG and MAGE-ML files in addition to the text, shp and GEML (xml) files that were already available in the previous versions. Users can send image files (tiff or jpg) and result files (mage-ml or geml) to Resolver or Luminator via FTP. Rosetta Resolver™ or Luminator™ Feature Extraction 7.5 Customer Training Updated 08/18/04

Features and Benefits of Feature Extraction Software
Feature Extraction 7.5 Customer Training Updated 08/18/04

What You Can Do with Image Analysis Tool
Visualize spots on microarray Change color and scale of image Flip and rotate image from landscape to portrait mode and vice versa Interactively position grids to find spots on microarrays Compare nominal spot centroid laid down by grid with centroid position for the spot Move centroid position to where you want it on the spot Select spots to ignore – these won’t be used in Feature Extraction Create histogram and line plots View visual results and outlier flags for features & backgrounds Feature Extraction 7.5 Customer Training Updated 08/18/04

What You Can Do with Feature Extraction Algorithms
Find Spots – positions a grid and finds centroid positions of spots Spot Analyzer – removes outlier pixels and defines pixels for features & local backgrounds Poly Outlier Flagger – flags features and backgrounds that are non-uniformity outliers and population outliers Background Subtraction – corrects for the background and determines if background-adjusted signal is positive and significant from background Deletion Control (25mer in-situ) – corrects for cross-hybridization Dye Normalization – selects features for dye bias evaluation and corrects for dye bias Ratio – calculates log (rProcessedSignal/gProcessedSignal), log ratio error, and p-value of log ratio for each feature Deletion controls may be available for longer sequence than 25mer. Feature Extraction 7.5 Customer Training Updated 08/18/04

Feature Extraction User Interface

User Interface Display panels
Image Info (available when image is loaded) Single channel display Grid Definition (available when grid mode is on) Maximun Fit Movements for subgrids and spots Grid Adjustment (available when grid mode is on) Spot location (col, row) Spot center Ignore spot – The user can select spots that are to be ignored from analysis. No data including feature number, row, column information will be displayed in feature extraction output file. There are new display panels on the left side of the user interface. When an image is initially opened, only the Image Info panel is displayed. If Grid Mode is ON, then the Grrid Definition panel and Grid Adjustment panel are displayed. Note: the “ignore spot” check box under the Grid Adjustment panel is available only when Adjust Spot is ON. Feature Extraction 7.5 Customer Training Updated 08/18/04

Toolbar Buttons for Grid Mode
Crop Mode On/Off On = Crop Off = Zoom Grid Mode On/Off Adjust Main Grid Adjust Subgrid Skew Subgrid Preview Spot Centroids Undo and Redo The Edit menu also has these options Preview Spot Centroids Adjust Main Grid Adjust Subgrid Skew Subgrid Grid Mode Adjust Spot Feature Extraction 7.5 Customer Training Updated 08/18/04

Zooming In and Out Zoom in Zoom out 100 percent
Cropping button in the OFF mode is used for zooming in on any boxed area Toolbar buttons Click View > Zoom, then select a magnification Mouse shortcuts To zoom in - Ctrl + left double click on the image To zoom out - Ctrl + right double click on the image Zoom in Zoom out 100 percent Feature Extraction 7.5 Customer Training Updated 08/18/04

Tools Menu Tools > Flip Upper Left to Lower Right (Landscape/Portrait) Tools > Preferences (to set default options) Image View tab – Set initial window size, option to start with crop mode, image color, data range of image display, and more Grid Mode tab – Start grid mode with a gene list type, default view zoom setting, and maximum fit movements for grid adjustment Feature Extraction tab – Search for grid file or design file first when analyzing Agilent microarray, default save directory for result files, option to save log file, and FTP settings to send result files to Resolver and Luminator Graph View tab – Histogram bin size and bin number General tab – Hyperlink to Agilent web site Feature Extraction 7.5 Customer Training Updated 08/18/04

Demo User Interface Show the user interface without image and with image opened. Show the Help menu Open a tiff image Show the display panels on the left side of main window Show and explain the Crop Mode icon and zooming function when crop mode icon is OFF Show toolbar buttons for Grid Mode Show the Tools menu for flipping image orientation and setting preferences Feature Extraction 7.5 Customer Training Updated 08/18/04

Using Grid Mode to Analyze Agilent and Non-Agilent Microarrays

Grid Mode Analysis Ability to grid and feature extract Agilent and non-Agilent microarrays scanned on Agilent scanner New spot finding tool allows the user to interactively position and find spots on microarray Accepts annotation and array layout information via the following gene lists Agilent grid files Agilent design files GAL files (GenePix Array Layout) Tab-delimited text files Feature Extraction 7.5 Customer Training Updated 08/18/04

Setting Up an Initial Grid
Grid file is required to feature extract non-Agilent microarrays Agilent microarrays can be feature extracted with grid file, if desired To create a grid, click on Grid Mode On/Off icon to select a gene list type to grid _grid.csv, gal, xml, tab text no gene list Grids are saved as two files (_grid.csv, _feat.csv) and can be used to grid and analyze other arrays of same layout Recommend to save grid file in same directory as image – this is where the software looks at when it needs a grid file Please emphasize that the image orientation must match the gene list orientation. If *.gal file is used, make sure that the image is in portrait. Note that tab-text files for Agilent microarrays are available in landscape and portrait orientation. A _grid.csv file created for a particular microarray can be used on other images of the same microarray type. *.xml files are not recommended for Agilent cDNA microarrays since the xml files contain array layout info on fiducial probes. Because the fiducial probes are not on all rows, the grid geometry is not accurate. Feature Extraction 7.5 Customer Training Updated 08/18/04

Benefits to using Grid Mode
If no gene list is available, a grid can be created de novo If a gene list is selected, Feature Extraction uses the layout information and annotation from the gene list to grid the microarray Users can interactively adjust the main grid, subgrids, spots, and preview spot centroids Users can select spots to ignore from analysis Grid files can feature extract images that are too rotated and therefore cannot be used with Agilent design files (*.xml) Preview Spot Centroids Adjust Main Grid Adjust Subgrid Skew Subgrid Grid Mode Adjust Spot Some benefits to creating and using Agilent grid files for microarrays: Users can grid microarray de novo when a gene list is not available Users can interactively move/adjust main grid, subgrids, spots, and preview spot centroids Users can manually select spots to ignore in Feature Extraction analysis Grid file can feature extract images that are too rotated and cannot be used with Agilent design files. Usually, the error message in Feature Extraction is “Description: (Fit2BrightSpots) The grid is too rotated to be valid.” Feature Extraction 7.5 Customer Training Updated 08/18/04

Demo How to Grid a Microarray
Demo how to grid an Agilent or non-Agilent microarray. If Agilent microarray, recommend using the .xml file as the gene list. If non-Agilent microarray, recommend using tab-text file or no gene list. Feature Extraction 7.5 Customer Training Updated 08/18/04

Set Design File Search Path (Agilent Microarrays Only)
Feature Extraction checks the design file search path to find the microarray design file Click Tools > Preferences > Feature Extraction > Design File Search Path Click Browse in the “Configure Directory Path for finding Design Files” dialog box Locate the directory containing the design file in the “Browse for Folder” dialog box Click Add in the “Configure Directory Path for finding Design Files” dialog box Feature Extraction 7.5 Customer Training Updated 08/18/04

How to Manually Specify a Grid/Design File
If Feature Extractor can not find a grid file or design file, then the “Load Grid/Design File” dialog box appears Browse for grid or design file and then click the “Load” button To avoid having to manually specify a grid or design file, do the following: Select the preference of search order for Agilent grid or design file If a design file is to be used, make sure the design file search path has been properly set If a grid file is to be used, make sure the grid file is saved in same path as TIFF image file Feature Extraction 7.5 Customer Training Updated 08/18/04

Feature Extraction Input Files
Input Files - Required TIFF image of Agilent and non-Agilent microarray scanned on Agilent scanner Array design file – describes the layout of probes and probe annotation Design file (.xml) – for Agilent microarrays Grid file (_grid.csv, feat.csv) – for Agilent and non-Agilent microarrays Input Files - Optional Printing File (cDNA microarrays only) – Contains cDNA clones that failed printing QC and are to be ignored from analysis. Location of printing file needs to be set in Design File Search Path Feature Extraction 7.5 Customer Training Updated 08/18/04

Feature Extraction Output Files
GEML – Expression data in the GEML 1.0 format Can be exported to Rosetta Resolver and Luminator software MAGE – Expression data in the MAGE-ML format Can be exported to Rosetta Resolver 4.0 and future version of Luminator JPEG – Compressed version of image file in JPEG format Tab-delimited text – Expression data in tab-delimited text format Visual Result – “Shapes” annotation generated by and viewed in Feature Extraction Shows the feature size, local background region, raw signals, log ratio, gene name, non-uniformity and population outlier flags Allows for subsequent viewing of the “shapes” annotation without having to re-extract the scan image Currently, MAGE files cannot be exported to Resolver 3.0 nor Luminator 2.0. But future versions of Resolver (v.4.0) and Luminator (v.3.0?) will accept MAGE files from Feature Extraction software. MAGE result files offers several benefits: 1) Automatic linkage of chip barcode to the profile 2) Contains information about FE parameters used 3) Can be imported to ArrayExpress and Stanford Microarray Database. Feature Extraction 7.5 Customer Training Updated 08/18/04

MAGE-ML Result Files Feature Extraction result files can be saved in MAGE-ML format Microarray Gene Expression Markup Language (MAGE-ML) is a language designed to describe and communicate information about microarray based experiments MAGE-ML is based on XML and can describe microarray designs, microarray manufacturing information, microarray experiment setup and execution information, gene expression data and data analysis results. A format accepted by major public microarray databases such as ArrayExpress (EBI)and GEO (NIH) Feature Extraction 7.5 Customer Training Updated 08/18/04

Exporting Files to Resolver/Luminator (Intranet)
FTP transfer of files GEML or MAGE-ML results TIFF or JPEG images FTP settings Destination: enter name where Resolver or Luminator resides FTP port: enter FTP port # User name: enter user’s name Password: enter password Feature Extraction 7.5 Customer Training Updated 08/18/04

Demo Feature Extraction and the Algorithm Modules

Running Feature Extraction
Barcode, design ID, filename The 1st screen capture shows input files loaded for a microarray with an Agilent design file. The 2nd screen capture shows the input files loaded for a microarray with an Agilent grid file. Array dimensions, array pattern, feature size, feature layout, probe names, etc. QC information, flagged features Feature Extraction 7.5 Customer Training Updated 08/18/04

Default Parameters in Feature Extraction Modules
Default parameters are loaded based on type of microarray, design or grid file, and settings changes saved during a run Default check boxes are marked (on) or (off) Default radio buttons are marked (*) Default numbers are displayed in parentheses Default parameters are only recommended when Agilent’s complete system is used (i.e. Agilent labeling and hybridization protocols, Agilent microarrays, and Agilent scanner) If there is any deviation from Agilent’s complete system, users need to carry out experiments to fine tune the parameters If parameter numbers appear in red, it means that they are different from the values optimized for the Agilent microarray system Users need to carry out experiments to optimize these values for their microarrays and protocols “Red” also means that they are “conservative” estimates we have done as a “place-holder”… Customers need to do these themselves. Feature Extraction 7.5 Customer Training Updated 08/18/04

Where can you find the default parameters
Table 6 is on page 75 in FE User Manual (v.7.1). For details of what parameters are loaded, refer the tables on pages Feature Extraction 7.5 Customer Training Updated 08/18/04

Feature Extraction Algorithms

FindSpots Algorithm – Grid Initialization
Locates all spots on microarray Finds corner spots for grid placement Places initial or nominal grid based on location of corner spots, spot size and inter-spot distances obtained from the design file Finds bright spots (based on high intensity) Adjusts grid according to location of bright spots Finds dim spots by interpolating location from adjusted grid Feature Extraction 7.5 Customer Training Updated 08/18/04

FindSpots – Deviation Limit
Dev Limit restricts how far a spot can deviate from the nominal grid position and still be called “found” Default deviation limit is automatically loaded User can change default deviation limit between 0-70 microns (Agilent arrays) Setting deviation limit too low can cause spots to be missed Setting deviation limit too high can cause spots in adjacent rows and columns to be swapped. Feature Extraction 7.5 Customer Training Updated 08/18/04

SpotAnalyzer Algorithm
Determines which pixels represent the spot and the local background Spot size is determined by CookieCutter or WholeSpot method Optional: calculate spot size Local background area is determined by the radius distance Rejects outlier pixels in spot and local background based on Standard Deviation or Inter Quartile Range method (Default; more robust) Flags feature as saturated if > 50% of pixels remaining after outlier rejection have intensities above 65502 To access WholeSpot analysis method, the “calculate spot size” option must be turned on. There is no benefit to turn calculate spot size on, if CookieCutter spot analysis method is chosen. Feature Extraction 7.5 Customer Training Updated 08/18/04

SpotAnalyzer – Spot Size and Spot Analysis Methods
Spot size is calculated when enabled in the UI Spot size determines the number of pixels that are chosen to represent a feature The spot size is reported with the final results as "SpotRadiusX" and "SpotRadiusY" Spot analysis methods use the spot size to define features For CookieCutter method Spot size is obtained from the XML design file or the calculation that user selects from SpotAnalyzer tab For WholeSpot method Spot size is obtained from spot size calculation that user selects from the SpotAnalyzer tab Feature Extraction 7.5 Customer Training Updated 08/18/04

SpotAnalyzer – What Defines Spot Size and Local Background
CookieCutter WholeSpot Feature Extraction 7.5 Customer Training Updated 08/18/04

SpotAnalyzer – Determination of Local Background Radius
Self Minimum local background radius (Default) Adjusted local background radius (Max of n = 4) Where n is minimum of 1 to maximum of 4 sets of closest neighbors n = 1 has at least 8 nearest neighbors n = 2 has at least 24 nearest neighbors n = 3 has at least 48 nearest neighbors n = 4 has at least 80 nearest neighbors 24 nearest neighbors (n = 2) 2 Maximum radius Example: CEILING [3.2] = 4 Feature Extraction 7.5 Customer Training Updated 08/18/04

SpotAnalyzer – Pixel Rejection Based on Standard Deviation
Pixel outlier rejection for features and backgrounds in both colors +/- 2 SD, encompasses ~ 95% distribution Feature intensity is mean signal of inlier pixels Feature Extraction 7.5 Customer Training Updated 08/18/04

SpotAnalyzer – Pixel Rejection Based on Inter Quartile Range
Interquartile Range (IQR) is range of intensities under Gaussian distribution between the 25th and 75th percentile Pixels of feature and background are rejected if ~ 99 % of the distribution encompassed between the lower and upper rejection boundaries, when using 1.42*IQR Feature Extraction 7.5 Customer Training Updated 08/18/04

PolyOutlierFlagger Algorithm
Flags features and backgrounds as non-uniformity outliers based on statistical deviations from Agilent noise model: Polynomial Variance Model – expected variances from array manufacturing, wet lab chemistry, and scanner noise Flags feature and background as population outlier using: IQR Method – using intra-array replicate features and the associated background areas Feature Extraction 7.5 Customer Training Updated 08/18/04

PolyOutlierFlagger – NonUniformity Outlier
Expected Variance Measured Variance n = # inlier pixels in feature or background X = raw pixel intensity in feature or background X bar = raw mean signal of feature or background x is mean signal of feature or background minus minimum signal feature or background on array A (Gaussian) – variance estimated from labeling and feature synthesis B (Poisson) – variance estimated from scanning measurement or counting error C (Constant) – variance expected from electronic scanner noise and glass background noise Feature or background is flagged as non-uniformity outlier if: where CI is confidence interval calculated from chi square distribution Feature Extraction 7.5 Customer Training Updated 08/18/04

PolyOutlierFlagger – Population Outlier
Performs population statistics on features and background areas if microarray has the minimum number of replicate features Feature or background is flagged as population outlier if: ~ 99 % of the distribution encompassed between the lower and upper rejection boundaries, when using 1.42*IQR Feature Extraction 7.5 Customer Training Updated 08/18/04

PolyOutlierFlagger Pink triangles are features flagged as NonUnifOL

PolyOutlierFlagger Non-Uniformity Outliers indicated by : Colored inner ring (Feature) or colored outer ring (Local_BG) Feature appears “uniform”… Try changing color scales, or, looking at single-channel window Feature Extraction 7.5 Customer Training Updated 08/18/04

BGSub Algorithm Estimates and corrects for systematic biases in data arising from: Substrate fluorescence Non-specific binding to substrate Possible biases introduced during scanning Artifacts from hyb and wash Determines if feature signal is significant compared to background Spatial detrend to correct for Adjusts background globally (to a user-defined value) to correct for under or over estimation of the background Feature Extraction 7.5 Customer Training Updated 08/18/04

BGSub - Background Subtraction Methods
No background subtraction This method does NOT subtract the background signal from the feature signal Feature raw signal (MeanSignal) is passed on to spatial detrend (if turned on) If “no background subtraction” method is selected, then by default, the background is not adjusted globally Local Method Local background (Radius method) Global Methods: Average of all background areas Average of negative control features Minimum signal (feature or background) Minimum signal (feature) on array ~ simulated negative control Feature Extraction 7.5 Customer Training Updated 08/18/04

BGSub - Order of Background Correction
Analysis flow for background correction is in this order: Background subtraction method Spatial detrend, if it is turned on Feature significance test Adjust background globally, if it is turned on Feature signal is passed on to and processed by next method that is available Feature Extraction 7.5 Customer Training Updated 08/18/04

BGSub - Spatial Detrend Algorithm
Decreases the contribution of any systematic signal gradient on the array to the “foreground” signal Estimates the surface of the “foreground” signal by picking dimmest 1-2% of the array feature intensities “Foreground” signal is the portion of feature signal that is not related to intended signal from dye-labeled target complementary to the probes on the feature SpatialDetrendSurfaceValue is determined for each feature per channel Feature Extraction 7.5 Customer Training Updated 08/18/04

The Problem – Differential Expression Gradient FE 7. 1
The Problem – Differential Expression Gradient FE – Default Parameters Up-regulated Down-regulated Feature Extraction 7.5 Customer Training Updated 08/18/04

The Cause – Differential Expression Gradient
Regional variations in the “foreground” are present on the microarrays In previous version of FE (v.7.1.1), the background subtraction method did not adequately measure these variations Background was underestimated in some regions of the microarray Consequently, log ratios and differential expression calls were inaccurate Feature Extraction 7.5 Customer Training Updated 08/18/04

The Approach – New Background Estimation Method
FE default is local background subtraction Estimates non-specific signal on feature based on intensity of area between features FE default is no background subtraction with spatial detrend Estimates non-specific signal based on the dimmest 1% of feature intensities. This baseline is estimated regionally to account for variation. Feature Extraction 7.5 Customer Training Updated 08/18/04

Spatial Detrend – Estimating Foreground Intensity
A “FilteredSet” of features are identified in process known as Low Pass Filter Features with dimmest 1% of feature intensity per window are selected If “no background subtraction” option is selected, then feature intensity is raw mean signal. If a background subtraction option is selected, then feature intensity is background subtracted signal. Window size is 10 columns x 10 rows of features Window is moving horizontally and vertically on array by increment of 5 Foreground surface is estimated from the “FilteredSet” of features (i.e. features with dimmest 1% feature intensity per window) 2-D Loess algorithm fits a smooth surface through the “FilteredSet” of feature intensities using 20% nearest neighborhood filtered points For features NOT in the “FilteredSet”, a 2-D Loess algorithm with similar neighborhood size of filtered points is used to predict surface value for each feature Lastly, SpatialDetrendSurfaceValue is subtracted from MeanSignal (or BGSubSignal, if BG subtraction is selected) for each feature Feature Extraction 7.5 Customer Training Updated 08/18/04

Identify Features in FilteredSet by Low Pass Filter
Default: Window = 10, Increment = 5, Percentage = 1 Feature Extraction 7.5 Customer Training Updated 08/18/04

Low Pass Filter Schematic – Effect of Moving Window on Sampling
No Moving Window Left diagram is a scatter plot of features in the “filtered set”. Each features is identified as dimmest 1% of feature intensity per window of 10 rows by 10 columns. Feature Extraction 7.5 Customer Training Updated 08/18/04

Low Pass Filter Schematic
Features from Low Pass Filter – Raw Green Intensities Left diagram is a scatter plot of features in the “filtered set” identified on the array. Right diagram is 3-D plot of gMeanSignal of “filtered set” of features in the y-axis, row in z-axis and column in x-axis. Feature Extraction 7.5 Customer Training Updated 08/18/04

2D Loess Fit – Estimate Foreground Surface to All Features
gMeanSignal gSpatialDetrendSurfaceValue Feature Extraction 7.5 Customer Training Updated 08/18/04

The Solution – No Differential Expression Gradient FE 7. 5
The Solution – No Differential Expression Gradient FE Default Parameters Up-regulated Down-regulated Feature Extraction 7.5 Customer Training Updated 08/18/04

BGSub – Feature Significance and Well Above BG
Feature Significance Test Calculates significance of feature signal vs background signal (local or global) using: 2-sided Student’s t-test (implemented as an incomplete Beta Function approximation) Feature gets Boolean flag of 1 under the IsPositiveAndSignif column (in FE result file) if the calculated p-value is less than the user-defined max p-value Well Above Background Test If background-subtracted signal is “well above” background as calculated by the equation below: And the feature passes the IsPositiveAndSignif test, then feature gets Boolean flag of 1 under the IsWellAboveBG column in Feature Extraction result file A significant p-Value is LESS than the user-defined threshold. pValue (calculated) is derived from the 2-sided Student’s t-test (which is implemented in FE as an incomplete Beta Function approximation). P-value (max) is a user-defined number in the BGSub tab in FE. WellAboveSDMulti is well above SD multiplier (e.g. 2.6, default in FE). BGSDUsed is background standard deviation, based on the background method used (i.e. local or global). If local background method is used, then BGSDUsed is the standard deviation of pixel level of the feature’s local background. If global background method is used, then BGSDUsed is the standard deviation of the population of the background on the array. For example, if background subtraction method is average of negative control features, then BGSDUsed is SD of MeanSignals of negative control features. Note: if there are more than one negative control sequences, then BGSDUsed is calculated from the negative control sequence with the lowest SD, per channel. Feature Extraction 7.5 Customer Training Updated 08/18/04

BGSub – Global Background Adjustment
Background subtraction errors arise from inaccuracies in background estimation Basic ideas behind global background adjustment Adjusts for over or under-estimation of background in one channel over the other channel Corrects “hook” effect at the low-end intensity scale Applies background correction using curve fitting method to adjust the initial background-subtracted intensities As mentioned earlier, this adjustment corrects for the non-specific signal due to probes themselves; as is done more directly with the use of Negative Controls. Feature Extraction 7.5 Customer Training Updated 08/18/04

Adjust Background Globally to User-Specified Value
Global Background Adjust algorithm is same as in FE7.1.1: Evaluates background-subtracted signal and finds a rank consistent set of features with low signal Finds a constant in both channels that moved the median of these signals to zero In FE 7.5.1, user can enter a constant value between 0 to 500 to “pad” all feature signals to that value This will have the effect of compressing log ratios, but will decrease the variability (SD) in the log ratio between inter- and intra-array replicates. Note: The “pad” is an exploratory tool for variance stabilization. Customers are advised NOT to use the “pad” for production purposes. Reference: “Transformations…What For…Which One” by W. Huber Feature Extraction 7.5 Customer Training Updated 08/18/04

How is Adjust Background Globally Value Used
If red signal vs green signal plot has a slope of rank consistent features > 1, then “pad” value chosen by user is assigned to green channel Pad value = 50 and slope = 1.2 Value of 50 is added to the green background-subtracted signal all features Value of (50*1.2) = 60 is added to the red background-subtracted signal of all features Pad value = 50 and slope = 0.5 Value of 50 is added to the red background-subtracted signal all features Value of (50/0.5) = 100 is added to the green background-subtracted signal of all features Feature Extraction 7.5 Customer Training Updated 08/18/04

BGSub – Features Used for Global BG Adjustment
Select a suitable subset of the entire dataset (probes) for applying the global adjustment algorithm  Select Features w/ no or negligible differential expression (i.e. Rank Consistency Filter - Features along the central tendency line (red) of the distribution) Basic filters for feature selection: 1. Control type = 0 2. Non Population Outlier 3. Non Non-Uniformity Outlier 4. Pass Rank Consistency Filter r(MeanSignal) Features used to calculate the global background adjustment must have no or negligible differential expression (i.e. passing the Ranking Consistency Filter) as shown in red on the graph. For these features to be selected, they must meet the following criteria: Not be control probes. These features must have Boolean of 0 under the ControlType column in FE result file. Not be population outliers. These features must have Boolean of 0 under the columns r(IsFeatPopnOL), g(IsFeatPopOL), g(IsBGPopOL), r(IsBGPopOL). Not be non-uniformity outliers. These features must have Boolean of 0 under the columns r(IsFeatNonUnifOL), g(IsFeatNonUnifOL), g(IsBGNonUnifOL), r(IsBGNonUnifOL). Pass the Rank Consistency Filter. g(MeanSignal) Feature Extraction 7.5 Customer Training Updated 08/18/04

Compute a correlation strength per feature
Identify Features along the Central Tendency Line – Rank Consistency Filter Compute a correlation strength per feature Transform(Intensity)  Rank Correlation Strength per feature = |R - G |/(Features)   where : threshold percentile If you compare the rank of a given feature in R & G channels the ranks should be within  percentile. Example: A feature should be correlated in R & G channels within 5%ile   = A feature should be correlated in R & G channels within 15%ile   =0.15 Feature Number Intensity_R IR Rank_R  R Intensity_G IG Rank_GG 1 30 560 5 2 170 390 4 3 99 146 360 6 452 43 300 45 149 7 423 700 R G Blue ellipse is 5 percentile threshold. Red ellipse is 15 percentile threshold – so less stringent in the rank correlation. Feature Extraction 7.5 Customer Training Updated 08/18/04

BGSub – Calculation of Global BG Adjust Values
R G 0,0 Algorithm determines offset in red and green channels using the features near the central tendency of the data, especially in the lower intensity range 2. Below X%ile cutoff 1. Identify features that pass Rank Consistency Filter 3. Median fit to distribution(orange) 5. Y%ile cutoff 6. Compute IRMedian & IGMedian M’ = IRMedian/IGMedian M’Projected M’ projected to median fit line IGMedian_proj = GBGOffset = gBGAdjust IRMedian_proj = RBGOffset = rBGAdjust 4. Add negative controls Users can look in the STATS table (in FE result file) for the BGAdjust values for red and green channels. These values are calculated by the following steps. Identify features that pass the Rank Consistency Filter, as mentioned in the previous slide. Next, only features below the X percentile cutoff are considered. Median fit is done on the distribution of features remaining after the X percentile cutoff. (The value of X percentile cutoff can not be disclosed.) A Tolerance Envelope is computed about the median fit line (central black line) to define a region of interest. This envelope is defined by the black arrows parallel to the median fit line. Negative control features (if available in the dataset) will be added to the distribution of data below the X percentile cutoff and the median tolerance envelope. Next, only features below the Y percentile cutoff (which is more stringent than X) will remain. The value of Y percentile cutoff can not be disclosed. Lastly, the median MeanSignal is computed for each channel. This median signal is then projected perpendicular to the median fit line, where rBGAdjust and gBGAdjust values are computed off the y-axis and x-axis, respectively. Feature Extraction 7.5 Customer Training Updated 08/18/04

BGSubSignal Calculation
BGSubSignal = MeanSignal – BGUsed where BGSubSignal and BGUsed depend on type of background method and settings for spatial detrend and global background adjust Feature Extraction 7.5 Customer Training Updated 08/18/04

BGSub – Before Global Background Adjustment
After background subtraction, a green or red bias may exist at low signal intensity If this bias is uncorrected, the log ratio vs. signal plot of a “self” array will not be symmetric about the log ratio axis We expect symmetry for self-arrays; harder to predict for differential expression arrays. For any array: if bias is uncorrected there will be inaccuracies in the log ratios. Feature Extraction 7.5 Customer Training Updated 08/18/04

BGSub – After Global Background Adjustment
The background adjustment algorithm corrects the bias in both the red and green channels The resulting log ratio vs. signal plot is symmetrical around the log ratio axis for a “self” array Feature Extraction 7.5 Customer Training Updated 08/18/04

New Default Parameters
BGSub Tab Ratio Tab Array Type Background Subtraction Method* Spatial Detrend Adjust Background Globally Auto-estimate Additive Errors** InSitu No background subtraction On Off cDNA 8x Format Local background subtraction Grid Files Minimum signal from feature On (to 0) Agilent * The default Background Subtraction Method cannot be overridden in the design file (except for 8x format). ** If there is no negative control probe found, "Auto Estimate Additive Error" will be skipped (if it is asked) and FE will use the value(s) from the Additive error edit box (loaded from the defaults or inputted by the user) and report a WARNING message at the end of FE instead. Feature Extraction 7.5 Customer Training Updated 08/18/04

DyeNorm Algorithm Estimates and corrects for dye bias arising from systematic variation like: Differences in labeling efficiency between two dyes Differences in power settings of two lasers Selects features used as normalization set to evaluate the dye bias Optional – Omit feature with background PopOL from the normalization set Computes dye normalization factors and corrects dye bias using: Linear (Global Method) Linear&LOWESS (LOWESS method preceded by linear method) LOWESS (Local or Non-Parametric Method) When the option “Omit feature with background PopnOL” is chosen, only features with BGPopOL= 0 will be included in the normalization set. Features that are background population outliers will be omitted from the normalization set. This helps cordon off features that may be sitting under hybridization artifacts. Linear: Applies the linear dye normalization method. Linear&LOWESS: Applies the linear dye normalization method first and then followed by the LOWESS dye normalization method. NOTE: Linear&LOWESS method in FE 7.1 is same as LOWESS method in FE 6.1. LOWESS: Applies the LOWESS dye normalization method. Feature Extraction 7.5 Customer Training Updated 08/18/04

DyeNorm – Methods for Normalization Feature Selection
Selects a set of normalization features to evaluate the dye bias Rank Consistency Filter (Default) Use features falling within central tendency of the data, having consistent trends between the red and green channels “Real-time house keeping genes” Use all significant, non-control, and non-outlier features IsPosAndSignif = 1 for each channel ControlType = 0 for each channel IsFeatNonUnifOL, IsFeatNonPopnOL, and IsSaturated = 0 for each channel Use a list of normalization genes House keeping genes or genes that should not be differentially expressed Feature Extraction 7.5 Customer Training Updated 08/18/04

Compute a correlation strength per feature
Identify Features along the Central Tendency Line – Rank Consistency Filter Compute a correlation strength per feature Transform(Intensity)  Rank Correlation Strength per feature = |R - G |/(Features)   where : threshold percentile If you compare the rank of a given feature in R & G channels the ranks should be within  percentile. Example: A feature should be correlated in R & G channels within 5%ile   = A feature should be correlated in R & G channels within 15%ile   =0.15 Feature Number Intensity_R IR Rank_R  R Intensity_G IG Rank_GG 1 30 560 5 2 170 390 4 3 99 146 360 6 452 43 300 45 149 7 423 700 R G Again, rank consistency filter is used here to determine the set of normalization genes to evaluate the dye bias. Blue ellipse is 5%ile threshold. Red ellipse is 15%ile threshold, which is less stringent in the rank correlation than the 5 percentile threshold. Feature Extraction 7.5 Customer Training Updated 08/18/04

Features selected using the Rank Consistency Filter

DyeNorm – Linear Normalization Method
Assumes dye bias is NOT intensity-dependent A global approach to dye normalization – forces the average log ratio to zero Problem with this approach – not adequate for cases where bias is intensity-dependent A global constant is determined separately for red and green channels LinearDyeNormFactor is calculated such that geometric mean of the normalization features equals 1000 For example, geometric mean of the normalization features is 250, then the LinearDyeNormFactor is 4 LinearDyeNormFactor (red and green channels) values are in the STATS table of FE result file. The equation for linear dye norm factor is shown on slide 70. Feature Extraction 7.5 Customer Training Updated 08/18/04

DyeNorm – LOWESS Normalization Method
LOWESS is locally weighted linear regression Handles data that has intensity-dependent dye bias Fits the locally weighted linear regression curve to the normalization features (chosen from selection method) Determines the amount of dye bias from the curve for each feature’s intensity Each feature gets different LOWESS dye normalization factor for each channel Feature Extraction 7.5 Customer Training Updated 08/18/04

DyeNorm – Calculation of DyeNormFactor (DNF)
For Linear dye normalization method: For Linear&LOWESS dye normalization method: For LOWESS dye normalization method: where n is # features in the normalization set (i.e. features with IsNormalization = 1) You can find which features are used in the normalization set by filtering the column IsNormalization = 1 in the FEATURES table. LinearDyeNormFactor for the red and green channels are given in the STATS table of FE result text file. Linear&LOWESSDyeNormFactor is not given in FE result text file. But it can be calculated using information from DyeNormSignal, BGSubSignal, and LinearDyeNormFactor which are given in the FEATURES and STATS tables, respectively. LOWESSDyeNormFactor is not given in FE result text file. But it can be calculated using information from DyeNormSignal and BGSubSignal which are given in the FEATURES table. Feature Extraction 7.5 Customer Training Updated 08/18/04

DyeNorm – Linear vs Non-Linear Fit
X Y Linear Fit: y = Slope*x + Intercept + scatter y = m*x + c + e Assumptions in Linear Fit: 1. Scatter is Gaussian about a Mean = 0 2. Standard Deviation of scatter about a point on the curve is independent of the x-variable. Blue line represents the linear fit. Red line represents the non-linear (LOWESS) fit. Feature Extraction 7.5 Customer Training Updated 08/18/04

DyeNorm – LOWESS Locally Weighted Linear Regression
X Y Feature Extraction 7.5 Customer Training Updated 08/18/04

DyeNorm – LOWESS Changing the Granularity of the Fit
X Y X Y Feature Extraction 7.5 Customer Training Updated 08/18/04

DyeNorm – Calculating Dye Norm Signal
Dye normalized signal is calculated per feature per channel LinearDyeNormFactor (red and green) values are displayed in the STATS table of FE result text file. Linear&LOWESSDyeNormFactor and LOWESSDyeNormFactor are NOT displayed in the STATS table (nor the FEATURES table) since every feature would have its own dye norm factor (i.e. DNF). Feature Extraction 7.5 Customer Training Updated 08/18/04

Ratio Algorithm Calculates the log ratio of red signal over green signal Log(rProcessedSignal/gProcessedSignal) Calculates significance of log ratio Log ratio error p-value Determines if feature is differentially expressed according to the error model used Auto-estimate additive error values Applies surrogate values to dye normalized signals for more accurate and reproducible log ratio Feature Extraction 7.5 Customer Training Updated 08/18/04

Ratio – Error Models Three error models available to estimate random error on log ratio Agilent’s propagated error method based on pixel-level statistics Rosetta’s Universal Error Model (UEM) More conservative error estimate between propagated error and UEM (Default) p-value calculated is based upon the probability of log ratio = 0 Recommend using the more conservative error estimate between propagated error and UEM Feature Extraction 7.5 Customer Training Updated 08/18/04

Ratio – Propagated Error vs Universal Error
Propagated Error Model Measures the error on the log ratio by propagating the pixel-level error from calculations made in the analysis (e.g. raw signal and background subtraction) Good at capturing the error at the low intensity level Underestimates error at the mid to high intensity level Universal Error Model Measures the expected error between the red and green channels using the additive and multiplicative errors Additive - constant noise term that dominates at low intensity level Multiplicative – intensity scaled term that dominates at high intensity level Good at capturing the error at mid to high intensity level Underestimates error in noisy features, especially at low signal ranges The last column (ErrorModel) in FE output file displays which error model is used. Propagated error = 0, UEM = 1. Most conservative estimate of Propagated Error Model and Universal Error Model (Recommended) Evaluates both error models and reports the higher (more conservative estimate) p-value of two error models Feature Extraction 7.5 Customer Training Updated 08/18/04

Auto-estimate Additive Error Values
In FE 7.1.1, a default additive error constant (25) is used for Agilent in-situ arrays processed using Agilent protocols and scanner. In FE 7.5.1, the additive error value is auto-estimated for each array per channel by looking at: Standard deviation of negative control features Spatial variability of spatial detrend surface. This is RMS difference between each point on the surface and mean of surface. Note: Arrays with less than 500 features will use only negative control features to calculate auto-estimate additive error because the surface cannot be fitted through small number of data points. Auto-estimate of additive error should be used with spatial detrend option turned on Note: Selection of spatial detrend option is independent of selection of auto-estimate of additive error option. Spatial detrend surface will be determined for use of auto-estimate but it won’t be subtracted from data if “spatial detrend” option is NOT selected. Feature Extraction 7.5 Customer Training Updated 08/18/04

Ratio – Use of Surrogates
Log ratios are calculated from red and green dye normalized signals Dye normalized signals cannot be used to calculate log ratio if: BGSubSignal fails the IsPosAndSignif test BGSubSignal is less than its background standard deviation (i.e. BGSDUsed) If the above cases occur, a surrogate value is used instead of DyeNormSignal Surrogate value is calculated as 1 SD of BG intensities x DyeNormFactor For local background method, SD of BG is at pixel-level of local background For global background method, SD of BG is at background population level on array If surrogate is used, then a non-zero value is displayed in SurrogateUsed column and ProcessedSignal = SurrogateUsed If surrogate is not used, then a zero value is displayed in SurrogateUsed column and ProcessedSignal = DyeNormSignal "BG intensities" depends on whether the background subtracted method is local or global. If local bkg subtraction method, the BG intensities are the raw signals from pixels making up the local background area For global bkg subtraction method, let's say you have chosen negative controls for background subtraction. The BG intensities are raw signals (MeanSignals) from all negative control probes on the array. However, if you have chosen minimum feature, then BG intensities are from the pixels that make up that particular feature. If you have chosen minimum signal (i.e from background or feature), and the minium signal is from the background, then BG intensities are from the pixels that make up the background area. You can see what is the value of "1SD of BG intensities" by looking at the BGSDUsed column. BGSDUsed is the same as BGPixSDev, if you are using local bkg subtraction method. BGSDUsed is the same as GlobalBGInlierSDev, if you are using global bkd subtraction method. Feature Extraction 7.5 Customer Training Updated 08/18/04

Surrogates: If signal is around the background signal, use Background_SD * DyeNormFactor
Case 1: R/G Both channels use DyeNormSignals; p-value and log ratio are calculated as usual. Log ratio error is calculated according to error model chosen by the user. Case 2: r/G r = rSurrogateUsed G = gDyeNormSignal; p-value and log ratio are calculated as usual. If r/G > 1, then FE software automatically sets LogRatio = 0 and pValueLogRatio = 1 Case 3: R/g R = rDyeNormSignal g = gSurrogateUsed; If R/g < 1, then FE software automatically sets LogRatio = 0 and pValueLogRatio = 1 Case 4: r/g Both channels use surrogates; FE software automatically sets LogRatio = 0 and pValueLogRatio = 1 For signals using surrogates, the g(r)ProcessedSignal is equal to g(r)SurrogateUsed value, used to calculate log ratio. Log ratio error is calculated according to error model chosen by the user. Feature Extraction 7.5 Customer Training Updated 08/18/04

Ratio – pValue and Log Ratio Error Calculations
Equation 1 Equation 2 A more detailed error modeling paper is available under Confidential Disclosure Agreement (CDA). xdev is deviation of LogRatio from 0. This is analogous to a signal to noise metric. xDev is displayed in the FEATURES table in FE result file. Feature Extraction 7.5 Customer Training Updated 08/18/04

Feature Extraction Results

Feature Extraction Visual Results
Click View > Extraction Results View Results View Outlier Only Hide Outer Local BG Ring Use Simple Colors Click Help > Feature Extraction Output Quick Reference Shape visual results can be viewed only with Feature Extraction .shp files from v.7.1 cannot be opened with v and earlier Feature Extraction 7.5 Customer Training Updated 08/18/04

Feature Extraction Text Results
Open up an FE text file and describe the 3 tables: FEPARAMS – contains options and values of parameters used in the feature extraction run STATS – contains values of parameters used in or derived from statistical calculations FEATURES – contains results for each feature on the microarray per channel. Point to the important results to look at: log ratio p-value log ratio feature and background non-uniformity outlier flags feature and background population outlier flags feature saturation flags surrogates used Feature Extraction 7.5 Customer Training Updated 08/18/04

Know Issues Fixed in FE 7.5.1 Know issues fixed in FE 7.5 are available in the Release Note Release Note 7.5 is on the installation CD Downloadable from eRoom and EPI Warehouse Feature Extraction 7.5 Customer Training Updated 08/18/04

Known Typographical Errors in FE 7.5.1 Manual
The following equations are correct and will amended in version 1.1 of FE manual (p. 216), which will be available on Agilent website [r,g]SpatialDetrendRMSFit [r,g]SpatialDetrendRMSFilteredMinusFit Feature Extraction 7.5 Customer Training Updated 08/18/04

Visit our website for current info on Feature Extraction Software Download latest: Software 30-day License Example Images User Manual Technical Notes View software showcase Feature Extraction 7.5 Customer Training Updated 08/18/04

Feature Extraction Software Training

Similar presentations

Presentation on theme: "Feature Extraction Software Training"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Feature Extraction Software Training

Similar presentations

Presentation on theme: "Feature Extraction Software Training"— Presentation transcript:

Similar presentations

About project

Feedback