© 2012 IBM Corporation Content Analytics with Enterprise Search Putting Your Content in Motion Realize the value of content to transform your business.

© 2012 IBM Corporation Enterprise Content Management 2 Agenda IBM Content Analytics and Enterprise Search  Introduction  Components  Architecture  Administration  Security  Development  Integrations

© 2012 IBM Corporation Enterprise Content Management 3 IBM Content Analytics is a platform to derive rapid insight  Transform raw information into business insight quickly without building models or deploying complex systems.  Derive insight in hours or days … not weeks or months.  Easy to use for all knowledge workers to search and explore content.  Flexible and extensible for deeper insights.

© 2012 IBM Corporation Enterprise Content Management 4 … to form large text-based collections from multiple internal and external sources (and types), including ECM repositories, structured data, social media and more. … from collections to confirm what is suspected or uncover something new - before customizing models and integrating with other systems and processes Aggregate and extract from multiple sources Organize, analyze and visualize Search and explore to derive insight Uncover business insight through unique visual-based approach … enterprise content (and data) by identifying trends, patterns, correlations, anomalies and business context from collections. Content Analytics Going from raw information to rapid insight

© 2012 IBM Corporation Enterprise Content Management 5  Multiple views for visual analysis, exploration and investigation ─8 unique views of content, including subdocument views  Dynamically search and explore content for new business insight ─Connections and Dashboard views to easily detect insights ─Add your own custom views  Powerful solution modeling and support for advanced classification tools for more accurate and deeper insight ─Enhanced analytics configuration tools  Deliver rapid insight to other systems, users and applications for complete business view ─Quickly generate Cognos BI reports, link between Cognos reports and ICA views ─Deliver analysis to IBM Case Manager solutions IBM Content Analytics – A platform for rapid insight 5

© 2012 IBM Corporation Enterprise Content Management Content Analytics – A platform for rapid insight Document Analysis Facets Time Series Deviations / Trends Dashboard 6 Facet Pairs Connections Sentiment

© 2012 IBM Corporation Enterprise Content Management 7 Enterprise Search – Delivering analytics-driven search  Secure, Scalable Enterprise Search featuring high-performance faceted navigation, saved searches, search profiles, document previews, type-ahead and more  Enterprise-wide content reach with support for ~30 content sources  Standards-based environment, including Lucene & UIMA, for the analysis, discovery, composition, development and deployment for unstructured information  Powerful, flexible, customizable User Interface ─Facet tree, time series, query tree, query builder, custom plug-ins, drag and drop panes, duplicate detection, document clustering and more 7

© 2012 IBM Corporation Enterprise Content Management What is Text Analytics? Text Analytics (NLP*) describes a set of linguistic, statistical, and machine learning techniques that allow text to be analyzed and key information extraction for business integration What is Content Analytics? Content Analytics (Text Analytics + Mining) refers to the text analytics process plus the ability to visually identify and explore trends, patterns, and statistically relevant facts found in various types of content spread across internal and external content sources * Natural Language Processing Text Analytics is the basis for Content Analytics 9 Not only was the pick-up line at the counter very long, but I waited 30 minutes just to talk to a rude representative who gave me a car that smelled like smoke, had stained floor mats, a dented fender, and only half a tank of gas

© 2012 IBM Corporation Enterprise Content Management 10 Analyzed Content (and Data) “Owner” “reports” “check engine lite” “flashes” “after refueling”... Source Information Corporate (Contact Center, Test Data, Dealer notes, ECM, etc.) and External (NHTSA, Edmunds, Consumer Reports, MotorTrend etc.) Noun Verb Noun PhrasePrep Phrase Person Issue Warning Driver action Component Issue: “Engine Light” Situation: “Refueling” Extracted Concept Content Analytics UIMA Pipeline + Annotators Fine grain control over the entities and facets that are created Content Analytics Crawlers IBM Master Data Mgmt RDB Real-time NLP REST API Content Push API IBM Content Analytics – How it works

© 2012 IBM Corporation Enterprise Content Management  Introduction  Components  Architecture  Administration  Security  Development  Integrations 11 Agenda IBM Content Analytics and Enterprise Search

© 2012 IBM Corporation Enterprise Content Management 13 Content Analytics Miner  Documents View lists documents limited by a query  Facets View lists keywords in a facet  Time Series View shows frequency changes over time  Deviations View shows deviation of keywords on cyclic timeline  Trends View detects sharp increase over time  Facet Pairs View shows two-dimensional facet correlation  Connections View shows relationships of different facets  Sentiment View shows the sentiment behind facets and content  Dashboard View shows multiple analysis views in various charts and tables 13

© 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Sentiment View 24 Explore the sentiment behind facets, see positive/negative expressions and the content attributed to the sentiment

© 2012 IBM Corporation Enterprise Content Management 26 Enterprise Search Application  Basic Enterprise Search – facets view – type-ahead – save search – search within results – search by file type – user preferences – query expansion – thumbnails – and more 26

© 2012 IBM Corporation Enterprise Content Management 28 Search Application Type ahead search: 1.Suggests queries based on index content and past queries 2.Shows estimated results count as part of suggestion 3.Customizable by Search Administrators

© 2012 IBM Corporation Enterprise Content Management 32 Search Application 1 – Toggles on and off document properties, language, source, type 2 – Allows users to set individual results display preferences

© 2012 IBM Corporation Enterprise Content Management 36 Enterprise Search Application  Analytics-Driven Search – timeline view – facet correlation – named entity annotator – document clustering – document flagging – duplicate/near-duplicate identification – query builder – custom panels – and more 36

© 2012 IBM Corporation Enterprise Content Management Architecture Raw Data Store SchedulerLogging Control Configuration MonitorSecurity Common Infrastructure Crawler Plug-in Crawler Framework Custom Crawler QuickPlace Crawler Domino Doc Mgt Crawler Notes Crawler SharePoint Crawler Exchange Server Crawler NNTP Crawler DB2 Crawler JDBC Database Crawler Content Integrator Crawler DB2 Content Mgr Crawler FileNet P8 Crawler Web Crawler Seed List Crawler Web Content Mgr Crawler WebSphere Portal Crawler Windows File System Crawler Unix File System Crawler Agent for File System Crawler Analytics Collection Custom Point Admin Application Document Cache Thumbnail Index Facet Count Sub Index Taxonomy Index Search Index Global Processing Web Link Analysis Thumbnail Generation Export Plug-in Indexer Indexer Service Document Processor Y Parser Doc Generator Ann. UIMA Document Processor 1 Document Processor X Search Collection Exporter Search Node Y Search Node X Search Runtime Search Node 1 Search Application REST API Application Document Cache Thumbnail Index Facet Count Sub Index Taxonomy Index Search Index Export Plug-in Indexer Indexer Service Document Processor Y Parser Doc Generator Ann. UIMA Document Processor 1 Document Processor X Exporter Analytics Node Y Analytics Node X Analytics Runtime Analytics Node 1 Content Miner Application REST API Application Cluster Analysis Global Processing Web Link Analysis Thumbnail Generation Cluster Analysis Ann. UIMA optional BigInsights Server Search Index

© 2012 IBM Corporation Enterprise Content Management Scalability – Challenge and Approach  Challenge – Achieve massive scale-out – Utilize cloud environment as resource pool  Approach – Keep compatibility with current version to respect existing customers No end user impact Seamless administration – Utilize current assets UIMA Infrastructure UIMA Annotators (LW, System-T, Takmi,…) Various data source crawlers – Utilize BigInsights as scale-out infrastructure

© 2012 IBM Corporation Enterprise Content Management 48 Content Analytics with Enterprise Search offers 3 types of system configuration according to the volume of data POC with small data can be done on a single workstation Production system will be deployed to 1 to N servers Production system analyzing big data will utilize BigInsights * BigInsights is supported only on Linux Seamless Scale-out options

© 2012 IBM Corporation Enterprise Content Management Feature Overview: Collection on BigInsights  Search & Text Analytics Capability – UIMA – System-T – Advance Tuning Rules (Gumshoe)  Scale Out – IBM Hadoop – ILEL BigIndex  Flexible Job Flow – Orchestrator (a.k.a. MetaTracker)  Easy Data Manipulation – JAQL  Robust File System – GPFS (Shared Nothing Cluster version, not yet released)

© 2012 IBM Corporation Enterprise Content Management ICAwES – Analytics Flow on BigInsights Crawler Importer Text Analytics / Search Runtime Exporter Document Processing Flow Indexing Service Process Global Analysis Local Analysis (UIMA base) Document Processing Flow IBM InfoSphere BigInsights Regular OS Various Data source Other App. UI Slave Index IBM Content Analytics Pre-Processing UIMA Analysis System-T Analysis - Gumshoe LA - Gumshoe GA IndexingICA GA Job Flow controlled by Orchestrator (MetaTracker) Operation by JAQL Custom Data HDFS/GPFS UIMA Annotators - LanguageWare - TAKMI - User Custom RDS Cache Orchestrator Job Request BigIndex - Link Analysis - Dup Doc Elimination - Facet Grouping - Custom GA - Gumshoe Relevancy RDS

© 2012 IBM Corporation Enterprise Content Management Differences : In general Regular collectionBigInsights collection Time to refresh indexQuickLazy ScalabilityUnder 10 serversOver 100 servers FlexibilitySystem must have peak capacitySystem resource can allocate as required Best for the use case Documents are continuously added/removed/updated Can have powerful server Large number of documents are processed at once Already have BigInsights Needs flexibility 51

© 2012 IBM Corporation Enterprise Content Management Difference : Supported feature Regular collectionBigInsights collection Rebuild from index Supported Resumable Supported but not resumable Optional facet index – index for facet counting Supported Index for ILEL facets Supported String based non-ILEL facet index Thumbnail generation Supported Can be skipped when rebuild Need document cache Supported Always been rebuilt Can have thumbnail without cache Document statusSupported Supported Index document status page also needs searcher running Custom GA (JAQL)Not supportedSupported FlagSupportedNot Supported Export flagged document SupportedNot supported Reorg IndexSupportedNot supported 52

© 2012 IBM Corporation Enterprise Content Management Easy Configuration  Specify BigInsights Sever Information Admin user can confirm the setting on Topology View  Specify “Use IBM BigInsights” while creating a collection – Then configuration files and ICA libraries, UIMA PEARs (including custom PEAR) and other required modules will be distributed to BIgInsights servers automatically

© 2012 IBM Corporation Enterprise Content Management Advanced configuration on BigInsights 54 Maximum memory size of some Hadoop tasks Detault : 1024 MB Limiting total RDS files to be processed at one time Default: unlimited  Maximum memory size have to be increased when user have memory consuming annotators  Some temporary files those used by JAQL/Hadoop propose with input RDS file size  It still required storage for index update

© 2012 IBM Corporation Enterprise Content Management  Provides dashboard style UI for the administration  Administrator can move to the configuration panel in one step from these views Collection Dashboard View  Monitor the status of components in one panel  Not need switch between edit/monitor mode System Dashboard View  Monitor and manage multiple servers Security Dashboard View  Configure security settings Administration – Dashboard User Interface

© 2012 IBM Corporation Enterprise Content Management Administration – Collection Dashboard View  All monitor and edit functions for collections are integrated into one view Export monitor Start / Stop multiple crawlers Tree style context menu items links to existing edit page Import Progress status

© 2012 IBM Corporation Enterprise Content Management Administration – Collection Actions  Administrator can do the following general actions for each collection – Settings Edit collection settings View collection settings – Logging View log files Configure log file options Configure alerts Configure email options for messages – Clone this collection – Delete this collection

© 2012 IBM Corporation Enterprise Content Management Administration – Crawl and Import Add a new crawler （ Link to “create crawler” wizard) Import CSV documents (Link to “Import CSV documents” wizard) Start / Stop multiple crawlers

© 2012 IBM Corporation Enterprise Content Management Administration – Parse and Index Configure export The export component will be displayed below the component Expand Tree Menu Link to parse and index setting Status and operations for each document processor Show the annotators status (enabled/disabled) Each annotator has it’s own icon Status and operations for global processing Link to annotator configuration

© 2012 IBM Corporation Enterprise Content Management Administration – Search and Analytics The component is displayed when the export setting is configured The deep inspector and Cognos BI report component has the same behavior Expand Tree Menu Link to search and text analytics setting Link to Query Statistics page Status for each searcher Configure export

© 2012 IBM Corporation Enterprise Content Management Administration – Export  Administrator can export the following documents for use in other applications – Crawled documents (exported from “Crawl and Import”) – Analyzed documents (exported from “Parse and Index”) – Searched documents (exported from “Search”)

© 2012 IBM Corporation Enterprise Content Management Administration – Confirmation Dialog for Auto Logout  Admin UI has auto logout function which no operations has been done while 30 minutes for all pages  (New feature) Admin UI shows a confirmation dialog before 5 minutes of the auto logout

© 2012 IBM Corporation Enterprise Content Management Administration – Collection cloning  User can create a new collection which has configuration cloned from another collection ─Only configuration is copied and data (such as index) is not copied ─Some collection options can be modified at cloning ─Cannot change collection type

© 2012 IBM Corporation Enterprise Content Management Administration – System Dashboard View  Administrators can configure and multi-server settings with grid and topology views Link to query statistics Start / Stop multiple servers Start / Stop server Backup Server Master Server IBM InfoSphere BigInsights Server

© 2012 IBM Corporation Enterprise Content Management Administration – Security Dashboard View  Login/Collection level/ System level security can be checked and configured on a dashboard Login security Collection level security System level security

© 2012 IBM Corporation Enterprise Content Management Administration – Roles  Master administrator can define the role for each administrator – 9 roles are available (4 rules are new) Facet tree administrator, Rule-based category administrator, Dictionary administrator, Application customizer – For example, when a customer has analyzers to maintain user dictionaries, a master administrator can assign them as dictionary administrators who can edit user dictionaries but cannot have privilege to start / stop sessions Edit user dictionaries via admin UI Administrators assigned as dictionary admin Monitor only (No operations are allowed) Show only an edit menu for dictionary

© 2012 IBM Corporation Enterprise Content Management Administration – Role Comparison DescriptionMonitorOperation (Start/Stop)Edit Configuration Master administratorAdminister all aspects of your system Both Collection TypeAll OperationAll Configuration page Collection administratorEdit, monitor, and control collection operations Both Collection TypeAll OperationCollection related page OperatorMonitor and control collection operations Both Collection TypeAll OperationNo MonitorMonitor collectionsBoth Collection TypeNo Content analytics administrator Edit and monitor analytic resources Content Analytics Collection Only  Analytic Resource  Rule-based Category  Facet tree  Dictionary  Rule-based category Facet tree administratorConfigure facet tree for analytics collections Content Analytics Collection Only No  Facet tree Rule-based category administrator Configure rule-based categories Both Collection TypeNo  Rule-based category Dictionary administrator Configure dictionaries for analytics collections Content Analytics Collection Only No  Dictionary Application customizerCustomize applicationsBoth Collection TypeNo  Configure applications via customizer

© 2012 IBM Corporation Enterprise Content Management 69 Search Customizer  Administrator can modify major search UI configurations thru customizer GUI  Customization Points – Server Configuration Search server’s hostname, port, and timeout… – Appearance Displayed application name, logo image, show/hide links, data source icons… – Default value for search UI preference Search page, facets, top results, results, result columns  No need to restart the search session Customizer Dialog Customizer Controls

© 2012 IBM Corporation Enterprise Content Management 70 Title and URL Filter Use specific field value as a title or URL of document (you can use modified filed value by using regular expression) Multiple filter can be defined (in order) Can specify specific collection or data source that enable this filter Title and URL Filter Use specific field value as a title or URL of document (you can use modified filed value by using regular expression) Multiple filter can be defined (in order) Can specify specific collection or data source that enable this filter Layout Customizer Define default pane and container layout by drag & drop operation Specify the properties of left, right, top, bottom containers and -Enabled or not -Expanded or not -Default width or height Layout Customizer Define default pane and container layout by drag & drop operation Specify the properties of left, right, top, bottom containers and -Enabled or not -Expanded or not -Default width or height Analytics Mode Enable analytics mode for Enterprise Search Application Analytics Mode Enable analytics mode for Enterprise Search Application Search Customizer

© 2012 IBM Corporation Enterprise Content Management Search Customizer – Examples  Show fields as a result table column  Change the order of columns in results pages  Add or remove custom fields Default Customized

© 2012 IBM Corporation Enterprise Content Management 72 Query Statistics  Query statistics UI shows: – Time transition of Number of queries, number of users, average response time (ms), worst response time (ms) – Query popularity – History of submitted queries  Query Statistics enables you to: – Export history data to CSV file – Change time range, collection or user ID – Change display of charts or a table – Refresh data automatically

© 2012 IBM Corporation Enterprise Content Management Multiple Levels of Security  System level security – OS, Network security – Encryption  Web application security  Administrative security  Collection level security  Document level security – (as known as secure search)

© 2012 IBM Corporation Enterprise Content Management Administrative Security  The ICA administrator is usually referred as “esadmin” – esadmin’s password is stored in es.cfg – esadmin always can use any resources in ICA, like OS, network, web application, etc.  It needs to synchronize passwords for OS user and the one in es.cfg. You can change the password in es.cfg by: – $ \bin\eschangepw[.sh] newpassword

© 2012 IBM Corporation Enterprise Content Management Administrative Security  esadmin can delegate parts of administrative roles to individual users  esadmin can define which collections to be controlled by the specified users – For details on each role, see Admin UI materials

© 2012 IBM Corporation Enterprise Content Management Collection Level Control  Each collection is associated with one or more Application IDs (AppID)  Search applications present AppID – Will only see those collections associated with the Application ID  Pre-defined AppIDs - All, Search, Analytics – automatically included in these collections based on collection type Security dashboard on Admin UI

© 2012 IBM Corporation Enterprise Content Management Document Level Security  Ensures that users are only allowed to search documents they have access rights to  Prerequisite to document level security – Must enable web app. authentication by login setting or global security – Must enable collection for security when it is created Cannot be done after the collection is created  Two types of access control would be supported – Access control by security token (token security) – Inherit native ACL derived from the data sources (native security) – Token security is not used so often. It needs only to achieve special requirements

© 2012 IBM Corporation Enterprise Content Management Document Level Security by Security Token  You can assign security token at crawling by – Add the fixed value as security token – Assign the security token based on field values (only some crawlers) – Attach the token programatically using custom crawler plug-in  It needs to customize search application to pass tokens that the current user has  The search engine returns documents only if the given tokens match to indexed security tokens on each document Plugin Plug-in Parser Indexer Search runtime Crawler Data source 1.Assigning security tokens to documents Or extracted from native data source 2.User authentication and credential retrieval 3.Results filtering by matching Security tokens with user credentials Search Index

© 2012 IBM Corporation Enterprise Content Management  Introduction  Components  Architecture  Administration  Security  Development  Integrations  Technical Information 82 Agenda IBM Content Analytics and Enterprise Search

© 2012 IBM Corporation Enterprise Content Management  An Eclipse based inventory of over 230 text analytics available for deployment into IBM Content Analytics  Features Include: ─Analytics organized into an easy to browse tree of functional categories ─Search function for rapid location of specific analytics ─Allowed to arrange the order of text analytic execution in the UIMA pipeline ─A 3 step wizard for easy deployment into ICA What is the Text Analytics Catalog

© 2012 IBM Corporation Enterprise Content Management  Greatly reduces the time to deployment of multiple text analytics into ICAwES (from days to minutes) ─Excellent for rapid development of demos and POCs  Bridges the “learning curve” gap between what ICAwES offers Out-of- the-box and developing text analytics in Content Analytics Studio ─Obviates the need to create a consolidated UIMA pipeline in LanguageWare of selected text analytics (this is automatically done for you) Can be used to jump start the Content Analytics Studio development process  Provides a one stop shopping experience Currently assets are spread among different groups and wikis in varying degrees of assembly, maturity, and documentation Why is the Text Analytics Catalog Useful?

© 2012 IBM Corporation Enterprise Content Management  Implemented as a folder (directory) tree within an Eclipse project ─Text analytic pear and/or dictionary files stored under category folders  Text Analytic Catalog browser implemented as an Eclipse plugin ─Provides all the functionality to search, select, and deploy multiple text analytics from the catalog into ICA and LanguageWare How does the Text Analytics Catalog Work?

© 2012 IBM Corporation Enterprise Content Management  To Create new categories... ─Simply create a new folder(s) underneath the “Catalog Taxonomy” folder  To add new Text Analytics... ─Simply drag and drop.pear and/or.dic files into catalog taxonomy folders ─Then update its detailed information using the catalog browser How to extend the Text Analytics Catalog

© 2012 IBM Corporation Enterprise Content Management With Content Analytics Studio, you can........ – Create language and domain specific dictionaries – Write rules to match character patterns – Write rules to identify patterns of tokens and other annotations – Create UIMA annotators based on these dictionaries and rules – Annotate text documents and view the details of annotations – Annotate collections of documents...... all without needing to write code or understand underlying technology Content Analytics Studio 90 Content Analytics Studio is an integrated development environment for creating your own custom analysis engine

© 2012 IBM Corporation Enterprise Content Management Content Analytics Studio 94 ICA Document Cache Studio Build Create Modify Analyze Validate Text Analytics & Search Session Index Service Session Annotator UIMA Doc Processing Session REST APIs Crawler Session Studio helps an iterative process to make tailored content analytics with ICA Extract flagged documents Deploy custom engine Configure ICA Facet Browse annotation results Find possible patterns and add the flag to documents Content Analytics Studio provides an iterative process to tailored Content Analytics

© 2012 IBM Corporation Enterprise Content Management 95 Annotators  Person names  Location names  Organization names  Part of speech like noun, verb, adjective  Phrases like noun phrase, adjective-noun, predicate phrase  Numbers  Automatic clustered category creation  Significant terms detection - actions and predicates **Apache RegEx engine where any RegEx can be plugged in (via XML config changes, no web UI for this, facets are not built in). Shipped with IBM Content Analytics with Enterprise Search & have built-in facets

© 2012 IBM Corporation Enterprise Content Management 96 Annotators  Dates  Days, Months, Years  Addresses  Cities  States  Postal Codes  People Aliases  Dates of Birth  Car Brands  Car Parts  Departments  Ordinals  Durations  First Names  Last Names  Titles  Crimes  Criminal Sentences  Trials Publicly available in Content Analytics Studio demonstrator workspace

© 2012 IBM Corporation Enterprise Content Management Content Analytics with Enterprise Search REST API  REST API is presented as an official API set of ICAwES – programming language independent – easy for developers to try out – easy to understand because of text communication – enables loosely coupled integration between other products – compatible with IBM Search REST 2.x, beneficial for interoperability  REST API provides almost everything which was offered by SIAPI

© 2012 IBM Corporation Enterprise Content Management 98 REST API  Custom Search and Admin applications can be implemented by REST API  Language independent  Provides all required functions for creating a search UI – Search navigation – Facet navigation – Search functions Faceted search Fetch content, thumbnails and previous document List spell correction, synonym expansions and type-ahead suggestions And more…  Provides required functions for administrating search – Managing collections – Controlling and monitoring components – Adding documents to a collection

© 2012 IBM Corporation Enterprise Content Management Search REST API Topics (1)  The Search REST API is comply with IBM Search REST 2.0 and 2.1 which are supported by some other IBM products – WebSphere Portal – Web Content Management – IBM Connections https//w3-connections.ibm.com/wikis/home?lang=en#/wiki/ Wd3961b7b20cc_4eda_a774_2373d278b232/page/Specifications  JSONP response format is introduced from ICAwES 3.0 – Useful for JavaScript binding to break the same origin policy Be careful to use, because it easily leads security exposures

© 2012 IBM Corporation Enterprise Content Management Search REST API Topics (2)  To get facet counting – Call /facets/namespaces to get namespace ID – Call /facets with the namespace ID to get facet list and specific facet ID you are interested in – Call /search (with search result) or /search/facet (without search result) specifying the namespace ID, the facet ID, count and depth

© 2012 IBM Corporation Enterprise Content Management Admin REST API Topics (1)  Document push API – add A document can be added at a request Either String or File can be specified as content of document – addMultiDocs More than one documents can be added at a request Only File can be specified as content of document  How to specify File as value for a parameter? – Use MultiPart to specify file as content parameter in HTTP POST method – e.g. Apache HTTP Commons PostMethod postMethod = new PostMethod(url); Part[] parts = {new FilePart("content", file)}; RequestEntity request = new MultipartRequestEntity(parts, postMethod.getParams()); postMethod.setRequestEntity(request);

© 2012 IBM Corporation Enterprise Content Management Admin REST API Topics (2)  Authentication – Access to Admin REST API requires BASIC Authentication with ICAwES administrative user – In addition, Admin REST API requires user name and password specified as value for parameter api_username and api_password at every request to prevent Cross-Site Request Forgery (CSRF) attack Please specify the same user name and password with those specified at Basic Authentication  Authorization – Each API specifies required role of user to execute  Limitation – If SSO is enabled, only esadmin (the default administrative user) can access to Admin REST API

© 2012 IBM Corporation Enterprise Content Management Authentication and Authorization  Authentication for calling REST API is controlled as same as UI login – Embedded server : login setting on Admin UI – WAS : global security  Authentication protocol is HTTP BASIC – Admin API needs additional credential parameters for more security  Authorization would be different among each API – Who can use an API? -> Read the API documentation

© 2012 IBM Corporation Enterprise Content Management 105 Connectors to Enterprise Repositories Collaboration IBM Case Manager IBM Lotus Connections IBM Lotus Domino DM IBM Lotus Domino IBM Lotus Quickr (NSF & J2EE) Lotus Web Content Management IBM WebSphere Portal Content Mgmt IBM Case Manager IBM Content Manager Enterprise Edition FileNet Content Services FileNet P8 Content Manager Hummingbird DM EMC/Documentum CA-Datacom Open Text Livelink Enterprise Server Data Management DB2 for iSeries DB2 UDB for Linux, UNIX, Windows DB2 for z/OS IMS Informix Dynamic Server Microsoft SQL Server MySQL Oracle Software AG Adabas Sybase Miscellaneous Microsoft Exchange Server Microsoft Windows SharePoint Services SharePoint Server Windows file systems Network News Protocol Newsgroup UNIX file systems VSAM for z/OS Web (HTTP or HTTPS)

© 2012 IBM Corporation Enterprise Content Management 107 Integration with WebSphere Portal  Enterprise Search provides analytics-driven enterprise search capabilities to WebSphere Portal and related products –Provides new search portlet and ESSearchPortlet (for classic search collections)

© 2012 IBM Corporation Enterprise Content Management 108 Integration with Cognos BI reports  From ICA Text Miner, a user can: ─Issue a request to create a report ─List the created reports ─Open the created report ─Delete the created report ─Cognos reports can link to and from Text Miner

© 2012 IBM Corporation Enterprise Content Management 110 Integration with Netezza  Use Cases ─ ICA Output for Content Integration with Netezza ─ ICA accesses content in Netezza warehouse ─ Cross system queries between ICA and Netezza

© 2012 IBM Corporation Enterprise Content Management 111 Integration with SPSS Step 1: Search and explore (or mine) information to understand source data Step 2: Customize by building content (NLP) and predictive models Analyzed Information Text Mining / Analytics Content (NLP) Modeling Predictive Modeling End Users Analysts Step 3: Expose insights to multiple users and systems (e.g. custom apps, mobile devices, dashboards)

© 2012 IBM Corporation Enterprise Content Management 112 Integration with Case Manager  Case Manager is a default crawler  Default properties: –CmAcmBaseCase. FolderName –CmAcmCaseFolder.CmA cmCaseState –CmAcmCaseComment.C mAcmCommentText –Folder.ClassName –Folder.PathName –Folder.DateCreated –Document.ClassName –Document.DateCreated  Additional properties available via configuration

© 2012 IBM Corporation Enterprise Content Management 113 IBM Content Analytics: Analysis Export Capability Export 1 Crawled Document Export Export documents with its metadata and content as those are crawled 2 Analyzed Document Export Export documents with the result of text Analytics such as Natural Language Processing, Named Entity Extraction, classification or user implemented logic before indexing 3 Searched Document Export Export documents limited by search or analysis with original content from the index RDB Limit documents by search or analysis Content Analytics Crawler Data Store Parser / Tokenizer UIMA Annotators Indexer Search Index Plug-in Exporter IBM Master Data Mgmt Content Intelligence Consumers ECM Solutions Import InfoSphere

© 2012 IBM Corporation Enterprise Content Management 116 Basic Analytics and Search Concepts  Structured Content – data that has unambiguous values and is easily processed by a computer program.  Unstructured Content – information that is generally recorded in a natural language as free text.  Text Analytics – A form of natural language processing that includes linguistic, statistical, and machine learning techniques for analyzing text and extracting key information  Collection – A set of data sources and options for crawling, parsing, indexing, and searching those data sources  Analytics Collection – a collection that is set up to be used for content mining.  Search Collection – a collection that is set up to be used for search application  Crawler – A software program that retrieves documents from data sources and gathers information that can be used to create search indexes  Annotator – A software component that performs specific linguistic analysis tasks and produces and records annotations  Parser – A program that interprets documents that are added to the data store. The parser extracts information from the documents and prepares them for indexing, search, and retrieval

© 2012 IBM Corporation Content Analytics with Enterprise Search Putting Your Content in Motion Realize the value of content to transform your business.

Similar presentations

Presentation on theme: "© 2012 IBM Corporation Content Analytics with Enterprise Search Putting Your Content in Motion Realize the value of content to transform your business."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© 2012 IBM Corporation Content Analytics with Enterprise Search Putting Your Content in Motion Realize the value of content to transform your business.

Similar presentations

Presentation on theme: "© 2012 IBM Corporation Content Analytics with Enterprise Search Putting Your Content in Motion Realize the value of content to transform your business."— Presentation transcript:

Similar presentations

About project

Feedback