Download presentation
Presentation is loading. Please wait.
Published byJoel Lee Modified over 7 years ago
1
© 2012 IBM Corporation Content Analytics with Enterprise Search Putting Your Content in Motion Realize the value of content to transform your business 1 1
2
© 2012 IBM Corporation Enterprise Content Management 2 Agenda IBM Content Analytics and Enterprise Search Introduction Components Architecture Administration Security Development Integrations
3
© 2012 IBM Corporation Enterprise Content Management 3 IBM Content Analytics is a platform to derive rapid insight Transform raw information into business insight quickly without building models or deploying complex systems. Derive insight in hours or days … not weeks or months. Easy to use for all knowledge workers to search and explore content. Flexible and extensible for deeper insights.
4
© 2012 IBM Corporation Enterprise Content Management 4 … to form large text-based collections from multiple internal and external sources (and types), including ECM repositories, structured data, social media and more. … from collections to confirm what is suspected or uncover something new - before customizing models and integrating with other systems and processes Aggregate and extract from multiple sources Organize, analyze and visualize Search and explore to derive insight Uncover business insight through unique visual-based approach … enterprise content (and data) by identifying trends, patterns, correlations, anomalies and business context from collections. Content Analytics Going from raw information to rapid insight
5
© 2012 IBM Corporation Enterprise Content Management 5 Multiple views for visual analysis, exploration and investigation ─8 unique views of content, including subdocument views Dynamically search and explore content for new business insight ─Connections and Dashboard views to easily detect insights ─Add your own custom views Powerful solution modeling and support for advanced classification tools for more accurate and deeper insight ─Enhanced analytics configuration tools Deliver rapid insight to other systems, users and applications for complete business view ─Quickly generate Cognos BI reports, link between Cognos reports and ICA views ─Deliver analysis to IBM Case Manager solutions IBM Content Analytics – A platform for rapid insight 5
6
© 2012 IBM Corporation Enterprise Content Management Content Analytics – A platform for rapid insight Document Analysis Facets Time Series Deviations / Trends Dashboard 6 Facet Pairs Connections Sentiment
7
© 2012 IBM Corporation Enterprise Content Management 7 Enterprise Search – Delivering analytics-driven search Secure, Scalable Enterprise Search featuring high-performance faceted navigation, saved searches, search profiles, document previews, type-ahead and more Enterprise-wide content reach with support for ~30 content sources Standards-based environment, including Lucene & UIMA, for the analysis, discovery, composition, development and deployment for unstructured information Powerful, flexible, customizable User Interface ─Facet tree, time series, query tree, query builder, custom plug-ins, drag and drop panes, duplicate detection, document clustering and more 7
8
© 2012 IBM Corporation Enterprise Content Management Enterprise Search FacetsTime Series Query Tree Near Dup Detection Find Similar Clustering Query Builder Custom plug-in Drag/Drop panes 8
9
© 2012 IBM Corporation Enterprise Content Management What is Text Analytics? Text Analytics (NLP*) describes a set of linguistic, statistical, and machine learning techniques that allow text to be analyzed and key information extraction for business integration What is Content Analytics? Content Analytics (Text Analytics + Mining) refers to the text analytics process plus the ability to visually identify and explore trends, patterns, and statistically relevant facts found in various types of content spread across internal and external content sources * Natural Language Processing Text Analytics is the basis for Content Analytics 9 Not only was the pick-up line at the counter very long, but I waited 30 minutes just to talk to a rude representative who gave me a car that smelled like smoke, had stained floor mats, a dented fender, and only half a tank of gas
10
© 2012 IBM Corporation Enterprise Content Management 10 Analyzed Content (and Data) “Owner” “reports” “check engine lite” “flashes” “after refueling”... Source Information Corporate (Contact Center, Test Data, Dealer notes, ECM, etc.) and External (NHTSA, Edmunds, Consumer Reports, MotorTrend etc.) Noun Verb Noun PhrasePrep Phrase Person Issue Warning Driver action Component Issue: “Engine Light” Situation: “Refueling” Extracted Concept Content Analytics UIMA Pipeline + Annotators Fine grain control over the entities and facets that are created Content Analytics Crawlers IBM Master Data Mgmt RDB Real-time NLP REST API Content Push API IBM Content Analytics – How it works
11
© 2012 IBM Corporation Enterprise Content Management Introduction Components Architecture Administration Security Development Integrations 11 Agenda IBM Content Analytics and Enterprise Search
12
© 2012 IBM Corporation Enterprise Content Management Content Analytics Miner Enterprise Search Application Content Analytics Studio Content Analytics with Enterprise Search - Components 12
13
© 2012 IBM Corporation Enterprise Content Management 13 Content Analytics Miner Documents View lists documents limited by a query Facets View lists keywords in a facet Time Series View shows frequency changes over time Deviations View shows deviation of keywords on cyclic timeline Trends View detects sharp increase over time Facet Pairs View shows two-dimensional facet correlation Connections View shows relationships of different facets Sentiment View shows the sentiment behind facets and content Dashboard View shows multiple analysis views in various charts and tables 13
14
© 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Documents View 14 Basic document view
15
© 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Document Analysis View 15 Detailed document analysis view
16
© 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Facets View 16 Facets with corresponding keywords, frequency and correlation
17
© 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Time Series View 17 Time Series for selected content
18
© 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Deviations View 18 Deviations of a facet or facet value for a given period of time
19
© 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Trends View 19 Trends of a facet or facet value for a given period of time
20
© 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Facet Pairs Table View 20 Facet pairs show how one facet relates to another
21
© 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Facet Pairs Birdseye View 21 Quickly identify the highly correlated intersections among all the data
22
© 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Facet Pairs Grid View 22 Detailed view of the selected portion of the birdseye view
23
© 2012 IBM Corporation Enterprise Content Management 23 Content Analytics Miner – Connections View Identify relationships between different facets
24
© 2012 IBM Corporation Enterprise Content Management Content Analytics Miner – Sentiment View 24 Explore the sentiment behind facets, see positive/negative expressions and the content attributed to the sentiment
25
© 2012 IBM Corporation Enterprise Content Management 25 Content Analytics Miner – Dashboard View View multiple analysis results in one place
26
© 2012 IBM Corporation Enterprise Content Management 26 Enterprise Search Application Basic Enterprise Search – facets view – type-ahead – save search – search within results – search by file type – user preferences – query expansion – thumbnails – and more 26
27
© 2012 IBM Corporation Enterprise Content Management 27 Search Application
28
© 2012 IBM Corporation Enterprise Content Management 28 Search Application Type ahead search: 1.Suggests queries based on index content and past queries 2.Shows estimated results count as part of suggestion 3.Customizable by Search Administrators
29
© 2012 IBM Corporation Enterprise Content Management 29 Search Application Save your search and re-execute saved queries
30
© 2012 IBM Corporation Enterprise Content Management 30 Search Application Search within current results set
31
© 2012 IBM Corporation Enterprise Content Management 31 Search Application Quick select for file type searching
32
© 2012 IBM Corporation Enterprise Content Management 32 Search Application 1 – Toggles on and off document properties, language, source, type 2 – Allows users to set individual results display preferences
33
© 2012 IBM Corporation Enterprise Content Management 33 Search Application Automatic query expansion suggestions and spell check
34
© 2012 IBM Corporation Enterprise Content Management 34 Search Application Thumbnail view for first page of documents in results page
35
© 2012 IBM Corporation Enterprise Content Management 35 Search Application Faceted search provides drill-through capabilities out-of-the-box and customizable by the business
36
© 2012 IBM Corporation Enterprise Content Management 36 Enterprise Search Application Analytics-Driven Search – timeline view – facet correlation – named entity annotator – document clustering – document flagging – duplicate/near-duplicate identification – query builder – custom panels – and more 36
37
© 2012 IBM Corporation Enterprise Content Management 37 Search Application Document Clustering
38
© 2012 IBM Corporation Enterprise Content Management 38 Search Application Timeline View
39
© 2012 IBM Corporation Enterprise Content Management 39 Search Application Named Entity Annotations High correlation color indicators
40
© 2012 IBM Corporation Enterprise Content Management 40 Search Application Document flagging
41
© 2012 IBM Corporation Enterprise Content Management 41 Search Application Near duplicate content identification Duplicate content identification
42
© 2012 IBM Corporation Enterprise Content Management 42 Search Application Query Builder
43
© 2012 IBM Corporation Enterprise Content Management 43 Search Application Customizable panels and layout
44
© 2012 IBM Corporation Enterprise Content Management Introduction Components Architecture Administration Security Development Integrations 44 Agenda IBM Content Analytics and Enterprise Search
45
© 2012 IBM Corporation Enterprise Content Management 45 Architecture
46
© 2012 IBM Corporation Enterprise Content Management Architecture Raw Data Store SchedulerLogging Control Configuration MonitorSecurity Common Infrastructure Crawler Plug-in Crawler Framework Custom Crawler QuickPlace Crawler Domino Doc Mgt Crawler Notes Crawler SharePoint Crawler Exchange Server Crawler NNTP Crawler DB2 Crawler JDBC Database Crawler Content Integrator Crawler DB2 Content Mgr Crawler FileNet P8 Crawler Web Crawler Seed List Crawler Web Content Mgr Crawler WebSphere Portal Crawler Windows File System Crawler Unix File System Crawler Agent for File System Crawler Analytics Collection Custom Point Admin Application Document Cache Thumbnail Index Facet Count Sub Index Taxonomy Index Search Index Global Processing Web Link Analysis Thumbnail Generation Export Plug-in Indexer Indexer Service Document Processor Y Parser Doc Generator Ann. UIMA Document Processor 1 Document Processor X Search Collection Exporter Search Node Y Search Node X Search Runtime Search Node 1 Search Application REST API Application Document Cache Thumbnail Index Facet Count Sub Index Taxonomy Index Search Index Export Plug-in Indexer Indexer Service Document Processor Y Parser Doc Generator Ann. UIMA Document Processor 1 Document Processor X Exporter Analytics Node Y Analytics Node X Analytics Runtime Analytics Node 1 Content Miner Application REST API Application Cluster Analysis Global Processing Web Link Analysis Thumbnail Generation Cluster Analysis Ann. UIMA optional BigInsights Server Search Index
47
© 2012 IBM Corporation Enterprise Content Management Scalability – Challenge and Approach Challenge – Achieve massive scale-out – Utilize cloud environment as resource pool Approach – Keep compatibility with current version to respect existing customers No end user impact Seamless administration – Utilize current assets UIMA Infrastructure UIMA Annotators (LW, System-T, Takmi,…) Various data source crawlers – Utilize BigInsights as scale-out infrastructure
48
© 2012 IBM Corporation Enterprise Content Management 48 Content Analytics with Enterprise Search offers 3 types of system configuration according to the volume of data POC with small data can be done on a single workstation Production system will be deployed to 1 to N servers Production system analyzing big data will utilize BigInsights * BigInsights is supported only on Linux Seamless Scale-out options
49
© 2012 IBM Corporation Enterprise Content Management Feature Overview: Collection on BigInsights Search & Text Analytics Capability – UIMA – System-T – Advance Tuning Rules (Gumshoe) Scale Out – IBM Hadoop – ILEL BigIndex Flexible Job Flow – Orchestrator (a.k.a. MetaTracker) Easy Data Manipulation – JAQL Robust File System – GPFS (Shared Nothing Cluster version, not yet released)
50
© 2012 IBM Corporation Enterprise Content Management ICAwES – Analytics Flow on BigInsights Crawler Importer Text Analytics / Search Runtime Exporter Document Processing Flow Indexing Service Process Global Analysis Local Analysis (UIMA base) Document Processing Flow IBM InfoSphere BigInsights Regular OS Various Data source Other App. UI Slave Index IBM Content Analytics Pre-Processing UIMA Analysis System-T Analysis - Gumshoe LA - Gumshoe GA IndexingICA GA Job Flow controlled by Orchestrator (MetaTracker) Operation by JAQL Custom Data HDFS/GPFS UIMA Annotators - LanguageWare - TAKMI - User Custom RDS Cache Orchestrator Job Request BigIndex - Link Analysis - Dup Doc Elimination - Facet Grouping - Custom GA - Gumshoe Relevancy RDS
51
© 2012 IBM Corporation Enterprise Content Management Differences : In general Regular collectionBigInsights collection Time to refresh indexQuickLazy ScalabilityUnder 10 serversOver 100 servers FlexibilitySystem must have peak capacitySystem resource can allocate as required Best for the use case Documents are continuously added/removed/updated Can have powerful server Large number of documents are processed at once Already have BigInsights Needs flexibility 51
52
© 2012 IBM Corporation Enterprise Content Management Difference : Supported feature Regular collectionBigInsights collection Rebuild from index Supported Resumable Supported but not resumable Optional facet index – index for facet counting Supported Index for ILEL facets Supported String based non-ILEL facet index Thumbnail generation Supported Can be skipped when rebuild Need document cache Supported Always been rebuilt Can have thumbnail without cache Document statusSupported Supported Index document status page also needs searcher running Custom GA (JAQL)Not supportedSupported FlagSupportedNot Supported Export flagged document SupportedNot supported Reorg IndexSupportedNot supported 52
53
© 2012 IBM Corporation Enterprise Content Management Easy Configuration Specify BigInsights Sever Information Admin user can confirm the setting on Topology View Specify “Use IBM BigInsights” while creating a collection – Then configuration files and ICA libraries, UIMA PEARs (including custom PEAR) and other required modules will be distributed to BIgInsights servers automatically
54
© 2012 IBM Corporation Enterprise Content Management Advanced configuration on BigInsights 54 Maximum memory size of some Hadoop tasks Detault : 1024 MB Limiting total RDS files to be processed at one time Default: unlimited Maximum memory size have to be increased when user have memory consuming annotators Some temporary files those used by JAQL/Hadoop propose with input RDS file size It still required storage for index update
55
© 2012 IBM Corporation Enterprise Content Management Introduction Components Architecture Administration Security Development Integrations 55 Agenda IBM Content Analytics and Enterprise Search
56
© 2012 IBM Corporation Enterprise Content Management Provides dashboard style UI for the administration Administrator can move to the configuration panel in one step from these views Collection Dashboard View Monitor the status of components in one panel Not need switch between edit/monitor mode System Dashboard View Monitor and manage multiple servers Security Dashboard View Configure security settings Administration – Dashboard User Interface
57
© 2012 IBM Corporation Enterprise Content Management Administration – Collection Dashboard View All monitor and edit functions for collections are integrated into one view Export monitor Start / Stop multiple crawlers Tree style context menu items links to existing edit page Import Progress status
58
© 2012 IBM Corporation Enterprise Content Management Administration – Collection Actions Administrator can do the following general actions for each collection – Settings Edit collection settings View collection settings – Logging View log files Configure log file options Configure alerts Configure email options for messages – Clone this collection – Delete this collection
59
© 2012 IBM Corporation Enterprise Content Management Administration – Crawl and Import Add a new crawler ( Link to “create crawler” wizard) Import CSV documents (Link to “Import CSV documents” wizard) Start / Stop multiple crawlers
60
© 2012 IBM Corporation Enterprise Content Management Administration – Parse and Index Configure export The export component will be displayed below the component Expand Tree Menu Link to parse and index setting Status and operations for each document processor Show the annotators status (enabled/disabled) Each annotator has it’s own icon Status and operations for global processing Link to annotator configuration
61
© 2012 IBM Corporation Enterprise Content Management Administration – Search and Analytics The component is displayed when the export setting is configured The deep inspector and Cognos BI report component has the same behavior Expand Tree Menu Link to search and text analytics setting Link to Query Statistics page Status for each searcher Configure export
62
© 2012 IBM Corporation Enterprise Content Management Administration – Export Administrator can export the following documents for use in other applications – Crawled documents (exported from “Crawl and Import”) – Analyzed documents (exported from “Parse and Index”) – Searched documents (exported from “Search”)
63
© 2012 IBM Corporation Enterprise Content Management Administration – Confirmation Dialog for Auto Logout Admin UI has auto logout function which no operations has been done while 30 minutes for all pages (New feature) Admin UI shows a confirmation dialog before 5 minutes of the auto logout
64
© 2012 IBM Corporation Enterprise Content Management Administration – Collection cloning User can create a new collection which has configuration cloned from another collection ─Only configuration is copied and data (such as index) is not copied ─Some collection options can be modified at cloning ─Cannot change collection type
65
© 2012 IBM Corporation Enterprise Content Management Administration – System Dashboard View Administrators can configure and multi-server settings with grid and topology views Link to query statistics Start / Stop multiple servers Start / Stop server Backup Server Master Server IBM InfoSphere BigInsights Server
66
© 2012 IBM Corporation Enterprise Content Management Administration – Security Dashboard View Login/Collection level/ System level security can be checked and configured on a dashboard Login security Collection level security System level security
67
© 2012 IBM Corporation Enterprise Content Management Administration – Roles Master administrator can define the role for each administrator – 9 roles are available (4 rules are new) Facet tree administrator, Rule-based category administrator, Dictionary administrator, Application customizer – For example, when a customer has analyzers to maintain user dictionaries, a master administrator can assign them as dictionary administrators who can edit user dictionaries but cannot have privilege to start / stop sessions Edit user dictionaries via admin UI Administrators assigned as dictionary admin Monitor only (No operations are allowed) Show only an edit menu for dictionary
68
© 2012 IBM Corporation Enterprise Content Management Administration – Role Comparison DescriptionMonitorOperation (Start/Stop)Edit Configuration Master administratorAdminister all aspects of your system Both Collection TypeAll OperationAll Configuration page Collection administratorEdit, monitor, and control collection operations Both Collection TypeAll OperationCollection related page OperatorMonitor and control collection operations Both Collection TypeAll OperationNo MonitorMonitor collectionsBoth Collection TypeNo Content analytics administrator Edit and monitor analytic resources Content Analytics Collection Only Analytic Resource Rule-based Category Facet tree Dictionary Rule-based category Facet tree administratorConfigure facet tree for analytics collections Content Analytics Collection Only No Facet tree Rule-based category administrator Configure rule-based categories Both Collection TypeNo Rule-based category Dictionary administrator Configure dictionaries for analytics collections Content Analytics Collection Only No Dictionary Application customizerCustomize applicationsBoth Collection TypeNo Configure applications via customizer
69
© 2012 IBM Corporation Enterprise Content Management 69 Search Customizer Administrator can modify major search UI configurations thru customizer GUI Customization Points – Server Configuration Search server’s hostname, port, and timeout… – Appearance Displayed application name, logo image, show/hide links, data source icons… – Default value for search UI preference Search page, facets, top results, results, result columns No need to restart the search session Customizer Dialog Customizer Controls
70
© 2012 IBM Corporation Enterprise Content Management 70 Title and URL Filter Use specific field value as a title or URL of document (you can use modified filed value by using regular expression) Multiple filter can be defined (in order) Can specify specific collection or data source that enable this filter Title and URL Filter Use specific field value as a title or URL of document (you can use modified filed value by using regular expression) Multiple filter can be defined (in order) Can specify specific collection or data source that enable this filter Layout Customizer Define default pane and container layout by drag & drop operation Specify the properties of left, right, top, bottom containers and -Enabled or not -Expanded or not -Default width or height Layout Customizer Define default pane and container layout by drag & drop operation Specify the properties of left, right, top, bottom containers and -Enabled or not -Expanded or not -Default width or height Analytics Mode Enable analytics mode for Enterprise Search Application Analytics Mode Enable analytics mode for Enterprise Search Application Search Customizer
71
© 2012 IBM Corporation Enterprise Content Management Search Customizer – Examples Show fields as a result table column Change the order of columns in results pages Add or remove custom fields Default Customized
72
© 2012 IBM Corporation Enterprise Content Management 72 Query Statistics Query statistics UI shows: – Time transition of Number of queries, number of users, average response time (ms), worst response time (ms) – Query popularity – History of submitted queries Query Statistics enables you to: – Export history data to CSV file – Change time range, collection or user ID – Change display of charts or a table – Refresh data automatically
73
© 2012 IBM Corporation Enterprise Content Management Introduction Components Architecture Administration Security Development Integrations 73 Agenda IBM Content Analytics and Enterprise Search
74
© 2012 IBM Corporation Enterprise Content Management Multiple Levels of Security System level security – OS, Network security – Encryption Web application security Administrative security Collection level security Document level security – (as known as secure search)
75
© 2012 IBM Corporation Enterprise Content Management System Level Security Login setting can be configured at security dashboard: Note: Need to restart ICA server to take effect
76
© 2012 IBM Corporation Enterprise Content Management Web Application Security In case of WAS, global security needs to be configured for login setting
77
© 2012 IBM Corporation Enterprise Content Management Administrative Security The ICA administrator is usually referred as “esadmin” – esadmin’s password is stored in es.cfg – esadmin always can use any resources in ICA, like OS, network, web application, etc. It needs to synchronize passwords for OS user and the one in es.cfg. You can change the password in es.cfg by: – $ \bin\eschangepw[.sh] newpassword
78
© 2012 IBM Corporation Enterprise Content Management Administrative Security esadmin can delegate parts of administrative roles to individual users esadmin can define which collections to be controlled by the specified users – For details on each role, see Admin UI materials
79
© 2012 IBM Corporation Enterprise Content Management Collection Level Control Each collection is associated with one or more Application IDs (AppID) Search applications present AppID – Will only see those collections associated with the Application ID Pre-defined AppIDs - All, Search, Analytics – automatically included in these collections based on collection type Security dashboard on Admin UI
80
© 2012 IBM Corporation Enterprise Content Management Document Level Security Ensures that users are only allowed to search documents they have access rights to Prerequisite to document level security – Must enable web app. authentication by login setting or global security – Must enable collection for security when it is created Cannot be done after the collection is created Two types of access control would be supported – Access control by security token (token security) – Inherit native ACL derived from the data sources (native security) – Token security is not used so often. It needs only to achieve special requirements
81
© 2012 IBM Corporation Enterprise Content Management Document Level Security by Security Token You can assign security token at crawling by – Add the fixed value as security token – Assign the security token based on field values (only some crawlers) – Attach the token programatically using custom crawler plug-in It needs to customize search application to pass tokens that the current user has The search engine returns documents only if the given tokens match to indexed security tokens on each document Plugin Plug-in Parser Indexer Search runtime Crawler Data source 1.Assigning security tokens to documents Or extracted from native data source 2.User authentication and credential retrieval 3.Results filtering by matching Security tokens with user credentials Search Index
82
© 2012 IBM Corporation Enterprise Content Management Introduction Components Architecture Administration Security Development Integrations Technical Information 82 Agenda IBM Content Analytics and Enterprise Search
83
© 2012 IBM Corporation Enterprise Content Management
84
© 2012 IBM Corporation Enterprise Content Management
85
© 2012 IBM Corporation Enterprise Content Management Drag-n-Drop Local disk Text Analytics Catalog for ICA Internet download IBM Content Analytics Studio Text Analytics Catalog
86
© 2012 IBM Corporation Enterprise Content Management An Eclipse based inventory of over 230 text analytics available for deployment into IBM Content Analytics Features Include: ─Analytics organized into an easy to browse tree of functional categories ─Search function for rapid location of specific analytics ─Allowed to arrange the order of text analytic execution in the UIMA pipeline ─A 3 step wizard for easy deployment into ICA What is the Text Analytics Catalog
87
© 2012 IBM Corporation Enterprise Content Management Greatly reduces the time to deployment of multiple text analytics into ICAwES (from days to minutes) ─Excellent for rapid development of demos and POCs Bridges the “learning curve” gap between what ICAwES offers Out-of- the-box and developing text analytics in Content Analytics Studio ─Obviates the need to create a consolidated UIMA pipeline in LanguageWare of selected text analytics (this is automatically done for you) Can be used to jump start the Content Analytics Studio development process Provides a one stop shopping experience Currently assets are spread among different groups and wikis in varying degrees of assembly, maturity, and documentation Why is the Text Analytics Catalog Useful?
88
© 2012 IBM Corporation Enterprise Content Management Implemented as a folder (directory) tree within an Eclipse project ─Text analytic pear and/or dictionary files stored under category folders Text Analytic Catalog browser implemented as an Eclipse plugin ─Provides all the functionality to search, select, and deploy multiple text analytics from the catalog into ICA and LanguageWare How does the Text Analytics Catalog Work?
89
© 2012 IBM Corporation Enterprise Content Management To Create new categories... ─Simply create a new folder(s) underneath the “Catalog Taxonomy” folder To add new Text Analytics... ─Simply drag and drop.pear and/or.dic files into catalog taxonomy folders ─Then update its detailed information using the catalog browser How to extend the Text Analytics Catalog
90
© 2012 IBM Corporation Enterprise Content Management With Content Analytics Studio, you can........ – Create language and domain specific dictionaries – Write rules to match character patterns – Write rules to identify patterns of tokens and other annotations – Create UIMA annotators based on these dictionaries and rules – Annotate text documents and view the details of annotations – Annotate collections of documents...... all without needing to write code or understand underlying technology Content Analytics Studio 90 Content Analytics Studio is an integrated development environment for creating your own custom analysis engine
91
© 2012 IBM Corporation Enterprise Content Management Content Analytics Studio 91 View Project Resources
92
© 2012 IBM Corporation Enterprise Content Management Content Analytics Studio 92 Sample text for building a model
93
© 2012 IBM Corporation Enterprise Content Management Content Analytics Studio 93 UIMA Pipeline components
94
© 2012 IBM Corporation Enterprise Content Management Content Analytics Studio 94 ICA Document Cache Studio Build Create Modify Analyze Validate Text Analytics & Search Session Index Service Session Annotator UIMA Doc Processing Session REST APIs Crawler Session Studio helps an iterative process to make tailored content analytics with ICA Extract flagged documents Deploy custom engine Configure ICA Facet Browse annotation results Find possible patterns and add the flag to documents Content Analytics Studio provides an iterative process to tailored Content Analytics
95
© 2012 IBM Corporation Enterprise Content Management 95 Annotators Person names Location names Organization names Part of speech like noun, verb, adjective Phrases like noun phrase, adjective-noun, predicate phrase Numbers Automatic clustered category creation Significant terms detection - actions and predicates **Apache RegEx engine where any RegEx can be plugged in (via XML config changes, no web UI for this, facets are not built in). Shipped with IBM Content Analytics with Enterprise Search & have built-in facets
96
© 2012 IBM Corporation Enterprise Content Management 96 Annotators Dates Days, Months, Years Addresses Cities States Postal Codes People Aliases Dates of Birth Car Brands Car Parts Departments Ordinals Durations First Names Last Names Titles Crimes Criminal Sentences Trials Publicly available in Content Analytics Studio demonstrator workspace
97
© 2012 IBM Corporation Enterprise Content Management Content Analytics with Enterprise Search REST API REST API is presented as an official API set of ICAwES – programming language independent – easy for developers to try out – easy to understand because of text communication – enables loosely coupled integration between other products – compatible with IBM Search REST 2.x, beneficial for interoperability REST API provides almost everything which was offered by SIAPI
98
© 2012 IBM Corporation Enterprise Content Management 98 REST API Custom Search and Admin applications can be implemented by REST API Language independent Provides all required functions for creating a search UI – Search navigation – Facet navigation – Search functions Faceted search Fetch content, thumbnails and previous document List spell correction, synonym expansions and type-ahead suggestions And more… Provides required functions for administrating search – Managing collections – Controlling and monitoring components – Adding documents to a collection
99
© 2012 IBM Corporation Enterprise Content Management Search REST API Topics (1) The Search REST API is comply with IBM Search REST 2.0 and 2.1 which are supported by some other IBM products – WebSphere Portal – Web Content Management – IBM Connections https//w3-connections.ibm.com/wikis/home?lang=en#/wiki/ Wd3961b7b20cc_4eda_a774_2373d278b232/page/Specifications JSONP response format is introduced from ICAwES 3.0 – Useful for JavaScript binding to break the same origin policy Be careful to use, because it easily leads security exposures
100
© 2012 IBM Corporation Enterprise Content Management Search REST API Topics (2) To get facet counting – Call /facets/namespaces to get namespace ID – Call /facets with the namespace ID to get facet list and specific facet ID you are interested in – Call /search (with search result) or /search/facet (without search result) specifying the namespace ID, the facet ID, count and depth
101
© 2012 IBM Corporation Enterprise Content Management Admin REST API Topics (1) Document push API – add A document can be added at a request Either String or File can be specified as content of document – addMultiDocs More than one documents can be added at a request Only File can be specified as content of document How to specify File as value for a parameter? – Use MultiPart to specify file as content parameter in HTTP POST method – e.g. Apache HTTP Commons PostMethod postMethod = new PostMethod(url); Part[] parts = {new FilePart("content", file)}; RequestEntity request = new MultipartRequestEntity(parts, postMethod.getParams()); postMethod.setRequestEntity(request);
102
© 2012 IBM Corporation Enterprise Content Management Admin REST API Topics (2) Authentication – Access to Admin REST API requires BASIC Authentication with ICAwES administrative user – In addition, Admin REST API requires user name and password specified as value for parameter api_username and api_password at every request to prevent Cross-Site Request Forgery (CSRF) attack Please specify the same user name and password with those specified at Basic Authentication Authorization – Each API specifies required role of user to execute Limitation – If SSO is enabled, only esadmin (the default administrative user) can access to Admin REST API
103
© 2012 IBM Corporation Enterprise Content Management Authentication and Authorization Authentication for calling REST API is controlled as same as UI login – Embedded server : login setting on Admin UI – WAS : global security Authentication protocol is HTTP BASIC – Admin API needs additional credential parameters for more security Authorization would be different among each API – Who can use an API? -> Read the API documentation
104
© 2012 IBM Corporation Enterprise Content Management Introduction Components Architecture Administration Security Development Integrations 104 Agenda IBM Content Analytics and Enterprise Search
105
© 2012 IBM Corporation Enterprise Content Management 105 Connectors to Enterprise Repositories Collaboration IBM Case Manager IBM Lotus Connections IBM Lotus Domino DM IBM Lotus Domino IBM Lotus Quickr (NSF & J2EE) Lotus Web Content Management IBM WebSphere Portal Content Mgmt IBM Case Manager IBM Content Manager Enterprise Edition FileNet Content Services FileNet P8 Content Manager Hummingbird DM EMC/Documentum CA-Datacom Open Text Livelink Enterprise Server Data Management DB2 for iSeries DB2 UDB for Linux, UNIX, Windows DB2 for z/OS IMS Informix Dynamic Server Microsoft SQL Server MySQL Oracle Software AG Adabas Sybase Miscellaneous Microsoft Exchange Server Microsoft Windows SharePoint Services SharePoint Server Windows file systems Network News Protocol Newsgroup UNIX file systems VSAM for z/OS Web (HTTP or HTTPS)
106
© 2012 IBM Corporation Enterprise Content Management 106 Integration with Content Classification Content Classification adds meaningful category facets Content Classification clusters similar content
107
© 2012 IBM Corporation Enterprise Content Management 107 Integration with WebSphere Portal Enterprise Search provides analytics-driven enterprise search capabilities to WebSphere Portal and related products –Provides new search portlet and ESSearchPortlet (for classic search collections)
108
© 2012 IBM Corporation Enterprise Content Management 108 Integration with Cognos BI reports From ICA Text Miner, a user can: ─Issue a request to create a report ─List the created reports ─Open the created report ─Delete the created report ─Cognos reports can link to and from Text Miner
109
© 2012 IBM Corporation Enterprise Content Management 109 Integration with other UIs, including mobile
110
© 2012 IBM Corporation Enterprise Content Management 110 Integration with Netezza Use Cases ─ ICA Output for Content Integration with Netezza ─ ICA accesses content in Netezza warehouse ─ Cross system queries between ICA and Netezza
111
© 2012 IBM Corporation Enterprise Content Management 111 Integration with SPSS Step 1: Search and explore (or mine) information to understand source data Step 2: Customize by building content (NLP) and predictive models Analyzed Information Text Mining / Analytics Content (NLP) Modeling Predictive Modeling End Users Analysts Step 3: Expose insights to multiple users and systems (e.g. custom apps, mobile devices, dashboards)
112
© 2012 IBM Corporation Enterprise Content Management 112 Integration with Case Manager Case Manager is a default crawler Default properties: –CmAcmBaseCase. FolderName –CmAcmCaseFolder.CmA cmCaseState –CmAcmCaseComment.C mAcmCommentText –Folder.ClassName –Folder.PathName –Folder.DateCreated –Document.ClassName –Document.DateCreated Additional properties available via configuration
113
© 2012 IBM Corporation Enterprise Content Management 113 IBM Content Analytics: Analysis Export Capability Export 1 Crawled Document Export Export documents with its metadata and content as those are crawled 2 Analyzed Document Export Export documents with the result of text Analytics such as Natural Language Processing, Named Entity Extraction, classification or user implemented logic before indexing 3 Searched Document Export Export documents limited by search or analysis with original content from the index RDB Limit documents by search or analysis Content Analytics Crawler Data Store Parser / Tokenizer UIMA Annotators Indexer Search Index Plug-in Exporter IBM Master Data Mgmt Content Intelligence Consumers ECM Solutions Import InfoSphere
114
© 2012 IBM Corporation Enterprise Content Management 114 Thank You! Content Analytics Putting Your Content in Motion
115
© 2012 IBM Corporation Enterprise Content Management BACKUP 115
116
© 2012 IBM Corporation Enterprise Content Management 116 Basic Analytics and Search Concepts Structured Content – data that has unambiguous values and is easily processed by a computer program. Unstructured Content – information that is generally recorded in a natural language as free text. Text Analytics – A form of natural language processing that includes linguistic, statistical, and machine learning techniques for analyzing text and extracting key information Collection – A set of data sources and options for crawling, parsing, indexing, and searching those data sources Analytics Collection – a collection that is set up to be used for content mining. Search Collection – a collection that is set up to be used for search application Crawler – A software program that retrieves documents from data sources and gathers information that can be used to create search indexes Annotator – A software component that performs specific linguistic analysis tasks and produces and records annotations Parser – A program that interprets documents that are added to the data store. The parser extracts information from the documents and prepares them for indexing, search, and retrieval
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.