Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2012 IBM Corporation Content Analytics with Enterprise Search Putting Your Content in Motion Realize the value of content to transform your business.

Similar presentations


Presentation on theme: "© 2012 IBM Corporation Content Analytics with Enterprise Search Putting Your Content in Motion Realize the value of content to transform your business."— Presentation transcript:

1 © 2012 IBM Corporation Content Analytics with Enterprise Search Putting Your Content in Motion Realize the value of content to transform your business 1 1

2 © 2012 IBM Corporation Enterprise Content Management 2 Agenda IBM Content Analytics and Enterprise Search  Introduction  Components  Architecture  Administration  Security  Development  Integrations

3 © 2012 IBM Corporation Enterprise Content Management 3 IBM Content Analytics is a platform to derive rapid insight  Transform raw information into business insight quickly without building models or deploying complex systems.  Derive insight in hours or days … not weeks or months.  Easy to use for all knowledge workers to search and explore content.  Flexible and extensible for deeper insights.

4 © 2012 IBM Corporation Enterprise Content Management 4 … to form large text-based collections from multiple internal and external sources (and types), including ECM repositories, structured data, social media and more. … from collections to confirm what is suspected or uncover something new - before customizing models and integrating with other systems and processes Aggregate and extract from multiple sources Organize, analyze and visualize Search and explore to derive insight Uncover business insight through unique visual-based approach … enterprise content (and data) by identifying trends, patterns, correlations, anomalies and business context from collections. Content Analytics Going from raw information to rapid insight

5 © 2012 IBM Corporation Enterprise Content Management 5  Multiple views for visual analysis, exploration and investigation ─8 unique views of content, including subdocument views  Dynamically search and explore content for new business insight ─Connections and Dashboard views to easily detect insights ─Add your own custom views  Powerful solution modeling and support for advanced classification tools for more accurate and deeper insight ─Enhanced analytics configuration tools  Deliver rapid insight to other systems, users and applications for complete business view ─Quickly generate Cognos BI reports, link between Cognos reports and ICA views ─Deliver analysis to IBM Case Manager solutions IBM Content Analytics – A platform for rapid insight 5

6 © 2012 IBM Corporation Enterprise Content Management Content Analytics – A platform for rapid insight Document Analysis Facets Time Series Deviations / Trends Dashboard 6 Facet Pairs Connections Sentiment

7 © 2012 IBM Corporation Enterprise Content Management 7 Enterprise Search – Delivering analytics-driven search  Secure, Scalable Enterprise Search featuring high-performance faceted navigation, saved searches, search profiles, document previews, type-ahead and more  Enterprise-wide content reach with support for ~30 content sources  Standards-based environment, including Lucene & UIMA, for the analysis, discovery, composition, development and deployment for unstructured information  Powerful, flexible, customizable User Interface ─Facet tree, time series, query tree, query builder, custom plug-ins, drag and drop panes, duplicate detection, document clustering and more 7

8 © 2012 IBM Corporation Enterprise Content Management Enterprise Search FacetsTime Series Query Tree Near Dup Detection Find Similar Clustering Query Builder Custom plug-in Drag/Drop panes 8

9 © 2012 IBM Corporation Enterprise Content Management What is Text Analytics? Text Analytics (NLP*) describes a set of linguistic, statistical, and machine learning techniques that allow text to be analyzed and key information extraction for business integration What is Content Analytics? Content Analytics (Text Analytics + Mining) refers to the text analytics process plus the ability to visually identify and explore trends, patterns, and statistically relevant facts found in various types of content spread across internal and external content sources * Natural Language Processing Text Analytics is the basis for Content Analytics 9 Not only was the pick-up line at the counter very long, but I waited 30 minutes just to talk to a rude representative who gave me a car that smelled like smoke, had stained floor mats, a dented fender, and only half a tank of gas

10 © 2012 IBM Corporation Enterprise Content Management 10 Analyzed Content (and Data) “Owner” “reports” “check engine lite” “flashes” “after refueling”... Source Information Corporate (Contact Center, Test Data, Dealer notes, ECM, etc.) and External (NHTSA, Edmunds, Consumer Reports, MotorTrend etc.) Noun Verb Noun PhrasePrep Phrase Person Issue Warning Driver action Component Issue: “Engine Light” Situation: “Refueling” Extracted Concept Content Analytics UIMA Pipeline + Annotators Fine grain control over the entities and facets that are created Content Analytics Crawlers IBM Master Data Mgmt RDB Real-time NLP REST API Content Push API IBM Content Analytics – How it works

11 © 2012 IBM Corporation Enterprise Content Management  Introduction  Components  Architecture  Administration  Security  Development  Integrations 11 Agenda IBM Content Analytics and Enterprise Search

12 © 2012 IBM Corporation Enterprise Content Management  Content Analytics Miner  Enterprise Search Application  Content Analytics Studio Content Analytics with Enterprise Search - Components 12

13 © 2012 IBM Corporation Enterprise Content Management 13 Content Analytics Miner  Documents View lists documents limited by a query  Facets View lists keywords in a facet  Time Series View shows frequency changes over time  Deviations View shows deviation of keywords on cyclic timeline  Trends View detects sharp increase over time  Facet Pairs View shows two-dimensional facet correlation  Connections View shows relationships of different facets  Sentiment View shows the sentiment behind facets and content  Dashboard View shows multiple analysis views in various charts and tables 13

14 © 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Documents View 14 Basic document view

15 © 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Document Analysis View 15 Detailed document analysis view

16 © 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Facets View 16 Facets with corresponding keywords, frequency and correlation

17 © 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Time Series View 17 Time Series for selected content

18 © 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Deviations View 18 Deviations of a facet or facet value for a given period of time

19 © 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Trends View 19 Trends of a facet or facet value for a given period of time

20 © 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Facet Pairs Table View 20 Facet pairs show how one facet relates to another

21 © 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Facet Pairs Birdseye View 21 Quickly identify the highly correlated intersections among all the data

22 © 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Facet Pairs Grid View 22 Detailed view of the selected portion of the birdseye view

23 © 2012 IBM Corporation Enterprise Content Management 23 Content Analytics Miner – Connections View Identify relationships between different facets

24 © 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Sentiment View 24 Explore the sentiment behind facets, see positive/negative expressions and the content attributed to the sentiment

25 © 2012 IBM Corporation Enterprise Content Management 25 Content Analytics Miner – Dashboard View View multiple analysis results in one place

26 © 2012 IBM Corporation Enterprise Content Management 26 Enterprise Search Application  Basic Enterprise Search – facets view – type-ahead – save search – search within results – search by file type – user preferences – query expansion – thumbnails – and more 26

27 © 2012 IBM Corporation Enterprise Content Management 27 Search Application

28 © 2012 IBM Corporation Enterprise Content Management 28 Search Application Type ahead search: 1.Suggests queries based on index content and past queries 2.Shows estimated results count as part of suggestion 3.Customizable by Search Administrators

29 © 2012 IBM Corporation Enterprise Content Management 29 Search Application Save your search and re-execute saved queries

30 © 2012 IBM Corporation Enterprise Content Management 30 Search Application Search within current results set

31 © 2012 IBM Corporation Enterprise Content Management 31 Search Application Quick select for file type searching

32 © 2012 IBM Corporation Enterprise Content Management 32 Search Application 1 – Toggles on and off document properties, language, source, type 2 – Allows users to set individual results display preferences

33 © 2012 IBM Corporation Enterprise Content Management 33 Search Application Automatic query expansion suggestions and spell check

34 © 2012 IBM Corporation Enterprise Content Management 34 Search Application Thumbnail view for first page of documents in results page

35 © 2012 IBM Corporation Enterprise Content Management 35 Search Application Faceted search provides drill-through capabilities out-of-the-box and customizable by the business

36 © 2012 IBM Corporation Enterprise Content Management 36 Enterprise Search Application  Analytics-Driven Search – timeline view – facet correlation – named entity annotator – document clustering – document flagging – duplicate/near-duplicate identification – query builder – custom panels – and more 36

37 © 2012 IBM Corporation Enterprise Content Management 37 Search Application Document Clustering

38 © 2012 IBM Corporation Enterprise Content Management 38 Search Application Timeline View

39 © 2012 IBM Corporation Enterprise Content Management 39 Search Application Named Entity Annotations High correlation color indicators

40 © 2012 IBM Corporation Enterprise Content Management 40 Search Application Document flagging

41 © 2012 IBM Corporation Enterprise Content Management 41 Search Application Near duplicate content identification Duplicate content identification

42 © 2012 IBM Corporation Enterprise Content Management 42 Search Application Query Builder

43 © 2012 IBM Corporation Enterprise Content Management 43 Search Application Customizable panels and layout

44 © 2012 IBM Corporation Enterprise Content Management  Introduction  Components  Architecture  Administration  Security  Development  Integrations 44 Agenda IBM Content Analytics and Enterprise Search

45 © 2012 IBM Corporation Enterprise Content Management 45 Architecture

46 © 2012 IBM Corporation Enterprise Content Management Architecture Raw Data Store SchedulerLogging Control Configuration MonitorSecurity Common Infrastructure Crawler Plug-in Crawler Framework Custom Crawler QuickPlace Crawler Domino Doc Mgt Crawler Notes Crawler SharePoint Crawler Exchange Server Crawler NNTP Crawler DB2 Crawler JDBC Database Crawler Content Integrator Crawler DB2 Content Mgr Crawler FileNet P8 Crawler Web Crawler Seed List Crawler Web Content Mgr Crawler WebSphere Portal Crawler Windows File System Crawler Unix File System Crawler Agent for File System Crawler Analytics Collection Custom Point Admin Application Document Cache Thumbnail Index Facet Count Sub Index Taxonomy Index Search Index Global Processing Web Link Analysis Thumbnail Generation Export Plug-in Indexer Indexer Service Document Processor Y Parser Doc Generator Ann. UIMA Document Processor 1 Document Processor X Search Collection Exporter Search Node Y Search Node X Search Runtime Search Node 1 Search Application REST API Application Document Cache Thumbnail Index Facet Count Sub Index Taxonomy Index Search Index Export Plug-in Indexer Indexer Service Document Processor Y Parser Doc Generator Ann. UIMA Document Processor 1 Document Processor X Exporter Analytics Node Y Analytics Node X Analytics Runtime Analytics Node 1 Content Miner Application REST API Application Cluster Analysis Global Processing Web Link Analysis Thumbnail Generation Cluster Analysis Ann. UIMA optional BigInsights Server Search Index

47 © 2012 IBM Corporation Enterprise Content Management Scalability – Challenge and Approach  Challenge – Achieve massive scale-out – Utilize cloud environment as resource pool  Approach – Keep compatibility with current version to respect existing customers No end user impact Seamless administration – Utilize current assets UIMA Infrastructure UIMA Annotators (LW, System-T, Takmi,…) Various data source crawlers – Utilize BigInsights as scale-out infrastructure

48 © 2012 IBM Corporation Enterprise Content Management 48 Content Analytics with Enterprise Search offers 3 types of system configuration according to the volume of data POC with small data can be done on a single workstation Production system will be deployed to 1 to N servers Production system analyzing big data will utilize BigInsights * BigInsights is supported only on Linux Seamless Scale-out options

49 © 2012 IBM Corporation Enterprise Content Management Feature Overview: Collection on BigInsights  Search & Text Analytics Capability – UIMA – System-T – Advance Tuning Rules (Gumshoe)  Scale Out – IBM Hadoop – ILEL BigIndex  Flexible Job Flow – Orchestrator (a.k.a. MetaTracker)  Easy Data Manipulation – JAQL  Robust File System – GPFS (Shared Nothing Cluster version, not yet released)

50 © 2012 IBM Corporation Enterprise Content Management ICAwES – Analytics Flow on BigInsights Crawler Importer Text Analytics / Search Runtime Exporter Document Processing Flow Indexing Service Process Global Analysis Local Analysis (UIMA base) Document Processing Flow IBM InfoSphere BigInsights Regular OS Various Data source Other App. UI Slave Index IBM Content Analytics Pre-Processing UIMA Analysis System-T Analysis - Gumshoe LA - Gumshoe GA IndexingICA GA Job Flow controlled by Orchestrator (MetaTracker) Operation by JAQL Custom Data HDFS/GPFS UIMA Annotators - LanguageWare - TAKMI - User Custom RDS Cache Orchestrator Job Request BigIndex - Link Analysis - Dup Doc Elimination - Facet Grouping - Custom GA - Gumshoe Relevancy RDS

51 © 2012 IBM Corporation Enterprise Content Management Differences : In general Regular collectionBigInsights collection Time to refresh indexQuickLazy ScalabilityUnder 10 serversOver 100 servers FlexibilitySystem must have peak capacitySystem resource can allocate as required Best for the use case Documents are continuously added/removed/updated Can have powerful server Large number of documents are processed at once Already have BigInsights Needs flexibility 51

52 © 2012 IBM Corporation Enterprise Content Management Difference : Supported feature Regular collectionBigInsights collection Rebuild from index Supported Resumable Supported but not resumable Optional facet index – index for facet counting Supported Index for ILEL facets Supported String based non-ILEL facet index Thumbnail generation Supported Can be skipped when rebuild Need document cache Supported Always been rebuilt Can have thumbnail without cache Document statusSupported Supported Index document status page also needs searcher running Custom GA (JAQL)Not supportedSupported FlagSupportedNot Supported Export flagged document SupportedNot supported Reorg IndexSupportedNot supported 52

53 © 2012 IBM Corporation Enterprise Content Management Easy Configuration  Specify BigInsights Sever Information Admin user can confirm the setting on Topology View  Specify “Use IBM BigInsights” while creating a collection – Then configuration files and ICA libraries, UIMA PEARs (including custom PEAR) and other required modules will be distributed to BIgInsights servers automatically

54 © 2012 IBM Corporation Enterprise Content Management Advanced configuration on BigInsights 54 Maximum memory size of some Hadoop tasks Detault : 1024 MB Limiting total RDS files to be processed at one time Default: unlimited  Maximum memory size have to be increased when user have memory consuming annotators  Some temporary files those used by JAQL/Hadoop propose with input RDS file size  It still required storage for index update

55 © 2012 IBM Corporation Enterprise Content Management  Introduction  Components  Architecture  Administration  Security  Development  Integrations 55 Agenda IBM Content Analytics and Enterprise Search

56 © 2012 IBM Corporation Enterprise Content Management  Provides dashboard style UI for the administration  Administrator can move to the configuration panel in one step from these views Collection Dashboard View  Monitor the status of components in one panel  Not need switch between edit/monitor mode System Dashboard View  Monitor and manage multiple servers Security Dashboard View  Configure security settings Administration – Dashboard User Interface

57 © 2012 IBM Corporation Enterprise Content Management Administration – Collection Dashboard View  All monitor and edit functions for collections are integrated into one view Export monitor Start / Stop multiple crawlers Tree style context menu items links to existing edit page Import Progress status

58 © 2012 IBM Corporation Enterprise Content Management Administration – Collection Actions  Administrator can do the following general actions for each collection – Settings Edit collection settings View collection settings – Logging View log files Configure log file options Configure alerts Configure email options for messages – Clone this collection – Delete this collection

59 © 2012 IBM Corporation Enterprise Content Management Administration – Crawl and Import Add a new crawler ( Link to “create crawler” wizard) Import CSV documents (Link to “Import CSV documents” wizard) Start / Stop multiple crawlers

60 © 2012 IBM Corporation Enterprise Content Management Administration – Parse and Index Configure export The export component will be displayed below the component Expand Tree Menu Link to parse and index setting Status and operations for each document processor Show the annotators status (enabled/disabled) Each annotator has it’s own icon Status and operations for global processing Link to annotator configuration

61 © 2012 IBM Corporation Enterprise Content Management Administration – Search and Analytics The component is displayed when the export setting is configured The deep inspector and Cognos BI report component has the same behavior Expand Tree Menu Link to search and text analytics setting Link to Query Statistics page Status for each searcher Configure export

62 © 2012 IBM Corporation Enterprise Content Management Administration – Export  Administrator can export the following documents for use in other applications – Crawled documents (exported from “Crawl and Import”) – Analyzed documents (exported from “Parse and Index”) – Searched documents (exported from “Search”)

63 © 2012 IBM Corporation Enterprise Content Management Administration – Confirmation Dialog for Auto Logout  Admin UI has auto logout function which no operations has been done while 30 minutes for all pages  (New feature) Admin UI shows a confirmation dialog before 5 minutes of the auto logout

64 © 2012 IBM Corporation Enterprise Content Management Administration – Collection cloning  User can create a new collection which has configuration cloned from another collection ─Only configuration is copied and data (such as index) is not copied ─Some collection options can be modified at cloning ─Cannot change collection type

65 © 2012 IBM Corporation Enterprise Content Management Administration – System Dashboard View  Administrators can configure and multi-server settings with grid and topology views Link to query statistics Start / Stop multiple servers Start / Stop server Backup Server Master Server IBM InfoSphere BigInsights Server

66 © 2012 IBM Corporation Enterprise Content Management Administration – Security Dashboard View  Login/Collection level/ System level security can be checked and configured on a dashboard Login security Collection level security System level security

67 © 2012 IBM Corporation Enterprise Content Management Administration – Roles  Master administrator can define the role for each administrator – 9 roles are available (4 rules are new) Facet tree administrator, Rule-based category administrator, Dictionary administrator, Application customizer – For example, when a customer has analyzers to maintain user dictionaries, a master administrator can assign them as dictionary administrators who can edit user dictionaries but cannot have privilege to start / stop sessions Edit user dictionaries via admin UI Administrators assigned as dictionary admin Monitor only (No operations are allowed) Show only an edit menu for dictionary

68 © 2012 IBM Corporation Enterprise Content Management Administration – Role Comparison DescriptionMonitorOperation (Start/Stop)Edit Configuration Master administratorAdminister all aspects of your system Both Collection TypeAll OperationAll Configuration page Collection administratorEdit, monitor, and control collection operations Both Collection TypeAll OperationCollection related page OperatorMonitor and control collection operations Both Collection TypeAll OperationNo MonitorMonitor collectionsBoth Collection TypeNo Content analytics administrator Edit and monitor analytic resources Content Analytics Collection Only  Analytic Resource  Rule-based Category  Facet tree  Dictionary  Rule-based category Facet tree administratorConfigure facet tree for analytics collections Content Analytics Collection Only No  Facet tree Rule-based category administrator Configure rule-based categories Both Collection TypeNo  Rule-based category Dictionary administrator Configure dictionaries for analytics collections Content Analytics Collection Only No  Dictionary Application customizerCustomize applicationsBoth Collection TypeNo  Configure applications via customizer

69 © 2012 IBM Corporation Enterprise Content Management 69 Search Customizer  Administrator can modify major search UI configurations thru customizer GUI  Customization Points – Server Configuration Search server’s hostname, port, and timeout… – Appearance Displayed application name, logo image, show/hide links, data source icons… – Default value for search UI preference Search page, facets, top results, results, result columns  No need to restart the search session Customizer Dialog Customizer Controls

70 © 2012 IBM Corporation Enterprise Content Management 70 Title and URL Filter Use specific field value as a title or URL of document (you can use modified filed value by using regular expression) Multiple filter can be defined (in order) Can specify specific collection or data source that enable this filter Title and URL Filter Use specific field value as a title or URL of document (you can use modified filed value by using regular expression) Multiple filter can be defined (in order) Can specify specific collection or data source that enable this filter Layout Customizer Define default pane and container layout by drag & drop operation Specify the properties of left, right, top, bottom containers and -Enabled or not -Expanded or not -Default width or height Layout Customizer Define default pane and container layout by drag & drop operation Specify the properties of left, right, top, bottom containers and -Enabled or not -Expanded or not -Default width or height Analytics Mode Enable analytics mode for Enterprise Search Application Analytics Mode Enable analytics mode for Enterprise Search Application Search Customizer

71 © 2012 IBM Corporation Enterprise Content Management Search Customizer – Examples  Show fields as a result table column  Change the order of columns in results pages  Add or remove custom fields Default Customized

72 © 2012 IBM Corporation Enterprise Content Management 72 Query Statistics  Query statistics UI shows: – Time transition of Number of queries, number of users, average response time (ms), worst response time (ms) – Query popularity – History of submitted queries  Query Statistics enables you to: – Export history data to CSV file – Change time range, collection or user ID – Change display of charts or a table – Refresh data automatically

73 © 2012 IBM Corporation Enterprise Content Management  Introduction  Components  Architecture  Administration  Security  Development  Integrations 73 Agenda IBM Content Analytics and Enterprise Search

74 © 2012 IBM Corporation Enterprise Content Management Multiple Levels of Security  System level security – OS, Network security – Encryption  Web application security  Administrative security  Collection level security  Document level security – (as known as secure search)

75 © 2012 IBM Corporation Enterprise Content Management System Level Security  Login setting can be configured at security dashboard: Note: Need to restart ICA server to take effect

76 © 2012 IBM Corporation Enterprise Content Management Web Application Security  In case of WAS, global security needs to be configured for login setting

77 © 2012 IBM Corporation Enterprise Content Management Administrative Security  The ICA administrator is usually referred as “esadmin” – esadmin’s password is stored in es.cfg – esadmin always can use any resources in ICA, like OS, network, web application, etc.  It needs to synchronize passwords for OS user and the one in es.cfg. You can change the password in es.cfg by: – $ \bin\eschangepw[.sh] newpassword

78 © 2012 IBM Corporation Enterprise Content Management Administrative Security  esadmin can delegate parts of administrative roles to individual users  esadmin can define which collections to be controlled by the specified users – For details on each role, see Admin UI materials

79 © 2012 IBM Corporation Enterprise Content Management Collection Level Control  Each collection is associated with one or more Application IDs (AppID)  Search applications present AppID – Will only see those collections associated with the Application ID  Pre-defined AppIDs - All, Search, Analytics – automatically included in these collections based on collection type Security dashboard on Admin UI

80 © 2012 IBM Corporation Enterprise Content Management Document Level Security  Ensures that users are only allowed to search documents they have access rights to  Prerequisite to document level security – Must enable web app. authentication by login setting or global security – Must enable collection for security when it is created Cannot be done after the collection is created  Two types of access control would be supported – Access control by security token (token security) – Inherit native ACL derived from the data sources (native security) – Token security is not used so often. It needs only to achieve special requirements

81 © 2012 IBM Corporation Enterprise Content Management Document Level Security by Security Token  You can assign security token at crawling by – Add the fixed value as security token – Assign the security token based on field values (only some crawlers) – Attach the token programatically using custom crawler plug-in  It needs to customize search application to pass tokens that the current user has  The search engine returns documents only if the given tokens match to indexed security tokens on each document Plugin Plug-in Parser Indexer Search runtime Crawler Data source 1.Assigning security tokens to documents Or extracted from native data source 2.User authentication and credential retrieval 3.Results filtering by matching Security tokens with user credentials Search Index

82 © 2012 IBM Corporation Enterprise Content Management  Introduction  Components  Architecture  Administration  Security  Development  Integrations  Technical Information 82 Agenda IBM Content Analytics and Enterprise Search

83 © 2012 IBM Corporation Enterprise Content Management

84 © 2012 IBM Corporation Enterprise Content Management

85 © 2012 IBM Corporation Enterprise Content Management Drag-n-Drop Local disk Text Analytics Catalog for ICA Internet download IBM Content Analytics Studio Text Analytics Catalog

86 © 2012 IBM Corporation Enterprise Content Management  An Eclipse based inventory of over 230 text analytics available for deployment into IBM Content Analytics  Features Include: ─Analytics organized into an easy to browse tree of functional categories ─Search function for rapid location of specific analytics ─Allowed to arrange the order of text analytic execution in the UIMA pipeline ─A 3 step wizard for easy deployment into ICA What is the Text Analytics Catalog

87 © 2012 IBM Corporation Enterprise Content Management  Greatly reduces the time to deployment of multiple text analytics into ICAwES (from days to minutes) ─Excellent for rapid development of demos and POCs  Bridges the “learning curve” gap between what ICAwES offers Out-of- the-box and developing text analytics in Content Analytics Studio ─Obviates the need to create a consolidated UIMA pipeline in LanguageWare of selected text analytics (this is automatically done for you) Can be used to jump start the Content Analytics Studio development process  Provides a one stop shopping experience Currently assets are spread among different groups and wikis in varying degrees of assembly, maturity, and documentation Why is the Text Analytics Catalog Useful?

88 © 2012 IBM Corporation Enterprise Content Management  Implemented as a folder (directory) tree within an Eclipse project ─Text analytic pear and/or dictionary files stored under category folders  Text Analytic Catalog browser implemented as an Eclipse plugin ─Provides all the functionality to search, select, and deploy multiple text analytics from the catalog into ICA and LanguageWare How does the Text Analytics Catalog Work?

89 © 2012 IBM Corporation Enterprise Content Management  To Create new categories... ─Simply create a new folder(s) underneath the “Catalog Taxonomy” folder  To add new Text Analytics... ─Simply drag and drop.pear and/or.dic files into catalog taxonomy folders ─Then update its detailed information using the catalog browser How to extend the Text Analytics Catalog

90 © 2012 IBM Corporation Enterprise Content Management With Content Analytics Studio, you can........ – Create language and domain specific dictionaries – Write rules to match character patterns – Write rules to identify patterns of tokens and other annotations – Create UIMA annotators based on these dictionaries and rules – Annotate text documents and view the details of annotations – Annotate collections of documents...... all without needing to write code or understand underlying technology Content Analytics Studio 90 Content Analytics Studio is an integrated development environment for creating your own custom analysis engine

91 © 2012 IBM Corporation Enterprise Content Management Content Analytics Studio 91 View Project Resources

92 © 2012 IBM Corporation Enterprise Content Management Content Analytics Studio 92 Sample text for building a model

93 © 2012 IBM Corporation Enterprise Content Management Content Analytics Studio 93 UIMA Pipeline components

94 © 2012 IBM Corporation Enterprise Content Management Content Analytics Studio 94 ICA Document Cache Studio Build Create Modify Analyze Validate Text Analytics & Search Session Index Service Session Annotator UIMA Doc Processing Session REST APIs Crawler Session Studio helps an iterative process to make tailored content analytics with ICA Extract flagged documents Deploy custom engine Configure ICA Facet Browse annotation results Find possible patterns and add the flag to documents Content Analytics Studio provides an iterative process to tailored Content Analytics

95 © 2012 IBM Corporation Enterprise Content Management 95 Annotators  Person names  Location names  Organization names  Part of speech like noun, verb, adjective  Phrases like noun phrase, adjective-noun, predicate phrase  Numbers  Automatic clustered category creation  Significant terms detection - actions and predicates **Apache RegEx engine where any RegEx can be plugged in (via XML config changes, no web UI for this, facets are not built in). Shipped with IBM Content Analytics with Enterprise Search & have built-in facets

96 © 2012 IBM Corporation Enterprise Content Management 96 Annotators  Dates  Days, Months, Years  Addresses  Cities  States  Postal Codes  People Aliases  Dates of Birth  Car Brands  Car Parts  Departments  Ordinals  Durations  First Names  Last Names  Titles  Crimes  Criminal Sentences  Trials Publicly available in Content Analytics Studio demonstrator workspace

97 © 2012 IBM Corporation Enterprise Content Management Content Analytics with Enterprise Search REST API  REST API is presented as an official API set of ICAwES – programming language independent – easy for developers to try out – easy to understand because of text communication – enables loosely coupled integration between other products – compatible with IBM Search REST 2.x, beneficial for interoperability  REST API provides almost everything which was offered by SIAPI

98 © 2012 IBM Corporation Enterprise Content Management 98 REST API  Custom Search and Admin applications can be implemented by REST API  Language independent  Provides all required functions for creating a search UI – Search navigation – Facet navigation – Search functions Faceted search Fetch content, thumbnails and previous document List spell correction, synonym expansions and type-ahead suggestions And more…  Provides required functions for administrating search – Managing collections – Controlling and monitoring components – Adding documents to a collection

99 © 2012 IBM Corporation Enterprise Content Management Search REST API Topics (1)  The Search REST API is comply with IBM Search REST 2.0 and 2.1 which are supported by some other IBM products – WebSphere Portal – Web Content Management – IBM Connections https//w3-connections.ibm.com/wikis/home?lang=en#/wiki/ Wd3961b7b20cc_4eda_a774_2373d278b232/page/Specifications  JSONP response format is introduced from ICAwES 3.0 – Useful for JavaScript binding to break the same origin policy Be careful to use, because it easily leads security exposures

100 © 2012 IBM Corporation Enterprise Content Management Search REST API Topics (2)  To get facet counting – Call /facets/namespaces to get namespace ID – Call /facets with the namespace ID to get facet list and specific facet ID you are interested in – Call /search (with search result) or /search/facet (without search result) specifying the namespace ID, the facet ID, count and depth

101 © 2012 IBM Corporation Enterprise Content Management Admin REST API Topics (1)  Document push API – add A document can be added at a request Either String or File can be specified as content of document – addMultiDocs More than one documents can be added at a request Only File can be specified as content of document  How to specify File as value for a parameter? – Use MultiPart to specify file as content parameter in HTTP POST method – e.g. Apache HTTP Commons PostMethod postMethod = new PostMethod(url); Part[] parts = {new FilePart("content", file)}; RequestEntity request = new MultipartRequestEntity(parts, postMethod.getParams()); postMethod.setRequestEntity(request);

102 © 2012 IBM Corporation Enterprise Content Management Admin REST API Topics (2)  Authentication – Access to Admin REST API requires BASIC Authentication with ICAwES administrative user – In addition, Admin REST API requires user name and password specified as value for parameter api_username and api_password at every request to prevent Cross-Site Request Forgery (CSRF) attack Please specify the same user name and password with those specified at Basic Authentication  Authorization – Each API specifies required role of user to execute  Limitation – If SSO is enabled, only esadmin (the default administrative user) can access to Admin REST API

103 © 2012 IBM Corporation Enterprise Content Management Authentication and Authorization  Authentication for calling REST API is controlled as same as UI login – Embedded server : login setting on Admin UI – WAS : global security  Authentication protocol is HTTP BASIC – Admin API needs additional credential parameters for more security  Authorization would be different among each API – Who can use an API? -> Read the API documentation

104 © 2012 IBM Corporation Enterprise Content Management  Introduction  Components  Architecture  Administration  Security  Development  Integrations 104 Agenda IBM Content Analytics and Enterprise Search

105 © 2012 IBM Corporation Enterprise Content Management 105 Connectors to Enterprise Repositories Collaboration IBM Case Manager IBM Lotus Connections IBM Lotus Domino DM IBM Lotus Domino IBM Lotus Quickr (NSF & J2EE) Lotus Web Content Management IBM WebSphere Portal Content Mgmt IBM Case Manager IBM Content Manager Enterprise Edition FileNet Content Services FileNet P8 Content Manager Hummingbird DM EMC/Documentum CA-Datacom Open Text Livelink Enterprise Server Data Management DB2 for iSeries DB2 UDB for Linux, UNIX, Windows DB2 for z/OS IMS Informix Dynamic Server Microsoft SQL Server MySQL Oracle Software AG Adabas Sybase Miscellaneous Microsoft Exchange Server Microsoft Windows SharePoint Services SharePoint Server Windows file systems Network News Protocol Newsgroup UNIX file systems VSAM for z/OS Web (HTTP or HTTPS)

106 © 2012 IBM Corporation Enterprise Content Management 106 Integration with Content Classification Content Classification adds meaningful category facets Content Classification clusters similar content

107 © 2012 IBM Corporation Enterprise Content Management 107 Integration with WebSphere Portal  Enterprise Search provides analytics-driven enterprise search capabilities to WebSphere Portal and related products –Provides new search portlet and ESSearchPortlet (for classic search collections)

108 © 2012 IBM Corporation Enterprise Content Management 108 Integration with Cognos BI reports  From ICA Text Miner, a user can: ─Issue a request to create a report ─List the created reports ─Open the created report ─Delete the created report ─Cognos reports can link to and from Text Miner

109 © 2012 IBM Corporation Enterprise Content Management 109 Integration with other UIs, including mobile

110 © 2012 IBM Corporation Enterprise Content Management 110 Integration with Netezza  Use Cases ─ ICA Output for Content Integration with Netezza ─ ICA accesses content in Netezza warehouse ─ Cross system queries between ICA and Netezza

111 © 2012 IBM Corporation Enterprise Content Management 111 Integration with SPSS Step 1: Search and explore (or mine) information to understand source data Step 2: Customize by building content (NLP) and predictive models Analyzed Information Text Mining / Analytics Content (NLP) Modeling Predictive Modeling End Users Analysts Step 3: Expose insights to multiple users and systems (e.g. custom apps, mobile devices, dashboards)

112 © 2012 IBM Corporation Enterprise Content Management 112 Integration with Case Manager  Case Manager is a default crawler  Default properties: –CmAcmBaseCase. FolderName –CmAcmCaseFolder.CmA cmCaseState –CmAcmCaseComment.C mAcmCommentText –Folder.ClassName –Folder.PathName –Folder.DateCreated –Document.ClassName –Document.DateCreated  Additional properties available via configuration

113 © 2012 IBM Corporation Enterprise Content Management 113 IBM Content Analytics: Analysis Export Capability Export 1 Crawled Document Export Export documents with its metadata and content as those are crawled 2 Analyzed Document Export Export documents with the result of text Analytics such as Natural Language Processing, Named Entity Extraction, classification or user implemented logic before indexing 3 Searched Document Export Export documents limited by search or analysis with original content from the index RDB Limit documents by search or analysis Content Analytics Crawler Data Store Parser / Tokenizer UIMA Annotators Indexer Search Index Plug-in Exporter IBM Master Data Mgmt Content Intelligence Consumers ECM Solutions Import InfoSphere

114 © 2012 IBM Corporation Enterprise Content Management 114 Thank You! Content Analytics Putting Your Content in Motion

115 © 2012 IBM Corporation Enterprise Content Management BACKUP 115

116 © 2012 IBM Corporation Enterprise Content Management 116 Basic Analytics and Search Concepts  Structured Content – data that has unambiguous values and is easily processed by a computer program.  Unstructured Content – information that is generally recorded in a natural language as free text.  Text Analytics – A form of natural language processing that includes linguistic, statistical, and machine learning techniques for analyzing text and extracting key information  Collection – A set of data sources and options for crawling, parsing, indexing, and searching those data sources  Analytics Collection – a collection that is set up to be used for content mining.  Search Collection – a collection that is set up to be used for search application  Crawler – A software program that retrieves documents from data sources and gathers information that can be used to create search indexes  Annotator – A software component that performs specific linguistic analysis tasks and produces and records annotations  Parser – A program that interprets documents that are added to the data store. The parser extracts information from the documents and prepares them for indexing, search, and retrieval


Download ppt "© 2012 IBM Corporation Content Analytics with Enterprise Search Putting Your Content in Motion Realize the value of content to transform your business."

Similar presentations


Ads by Google