Presentation is loading. Please wait.

Presentation is loading. Please wait.

What´s New? SAP HANA SPS 07 Text Analysis (Delta from SPS 06 to SPS 07) SAP HANA Product Management November, 2013.

Similar presentations

Presentation on theme: "What´s New? SAP HANA SPS 07 Text Analysis (Delta from SPS 06 to SPS 07) SAP HANA Product Management November, 2013."— Presentation transcript:

1 What´s New? SAP HANA SPS 07 Text Analysis (Delta from SPS 06 to SPS 07) SAP HANA Product Management November, 2013

2 ©2013 SAP AG. All rights reserved.2 Public Agenda New or Improved Text Analysis Features  Custom dictionaries  Custom configurations  Indexing throughput Improved Language Coverage  Social Media extraction for Japanese & Simplified Chinese  Numerical extraction for Simplified Chinese  Core extraction for Russian  Voice of Customer for Simplified Chinese Related Topics  Fulltext search  Fuzzy search

3 New or Improved Text Analysis Features

4 ©2013 SAP AG. All rights reserved.4 Public New Custom Dictionary Support You can now specify your own entity types and names to be used with text analysis, which may be critical for particular industries or data domains  Single custom dictionary may support all languages or a single language  Custom dictionaries reside in the HANA repository and benefit from its life cycle management Steps 1. Choose the project to contain the new dictionary in the Development perspective of SAP HANA Studio. 2. Enter or select a parent folder and enter the dictionary file name in the Wizard. Your text analysis dictionary file is created locally and opens as an empty file in the text editor. 3. Enter your text analysis dictionary specification into the new file and save it locally. 4. Commit your new dictionary. The dictionary is now synchronized to the repository as a design time object and the icon shows the dictionary is committed. 5. Activate once you have finished editing your dictionary. The dictionary is created in the repository as a runtime object and the icon shows the dictionary is activated. This allows you and others to use the dictionary. If you haven’t done so previously, you will need to create a custom text analysis configuration as well…

5 ©2013 SAP AG. All rights reserved.5 Public New Custom Configuration Support You can now customize the features and options used for text analysis rather than using the predefined configurations:  LINGANALYSIS_BASIC  LINGANALYSIS_STEMS  LINGANALYSIS_FULL  EXTRACTION_CORE  EXTRACTION_CORE_VOICEOFCUSTOMER Custom configurations allow you to suppress the default output and incorporate custom dictionaries. You can either:  Create a new XML configuration file within SAP HANA Studio  Copy one of the predefined configurations and modify it

6 ©2013 SAP AG. All rights reserved.6 Public Greater Indexing Throughput Improved scalability of the highlighted preprocessing steps:  File filtering –converting binary document formats to text/HTML  Tokenization –decompose word sequence, e.g. “the quick brown fox” -> “the” “quick” “brown” “fox”  Stemming –reduction of tokens to linguistic base form, e.g. houses -> house; ran -> run  Linguistic analysis –part-of-speech identification, e.g. quick: Adjective; houses: Plural Noun Utilizes more threads and efficient data transfers  Applies to all text analysis configurations 50% greater throughput Depending upon hardware configuration 30% less time Depending upon hardware configuration

7 Improved Language Coverage

8 ©2013 SAP AG. All rights reserved.8 Public Available Text Analysis Configuration Options LanguageLINGANALYSIS_BASIC LINGANALYSIS_STEMS LINGANALYSIS_FULLEXTRACTION_COREEXTRACTION_CORE_VOICEOFCUSTOMER Arabic  X Catalan  XX Chinese (Simplified)  IMPROVED Chinese (Traditional)  XX Croatian  XX Czech  XX Danish  XX Dutch  X English  Farsi  X French  German  Greek  XXX Hebrew  XXX Hungarian  XXX Italian  X Japanese  IMPROVEDX Korean  X Norwegian (Bokmal)  XX Norwegian (Nynorsk)  XX Polish  XXX Portuguese  X Romanian  XXX Russian  IMPROVEDX Serbian  XX Slovak  XX Slovenian  XX Spanish  Swedish  XX Thai  XXX Turkish  XXX

9 ©2013 SAP AG. All rights reserved.9 Public Improved Social Media Extraction for Japanese & Simplified Chinese Identifies with high recall and precision SOCIAL_MEDIA entities with corresponding offsets  Tags SOCIAL_MEDIA entities such as IDs (@MyTwitterName) or topics (#MyWeiboKeyword)  Distinguishes between SOCIAL_MEDIA entities and emoticons like @__@  Distinguishes between SOCIAL_MEDIA entities and emails like  Respects important Weibo and Twitter differences, Ex: #W-TOPIC# vs. #T-TOPIC1 #T-TOPIC2

10 ©2013 SAP AG. All rights reserved.10 Public Improved Numerical Extraction for Simplified Chinese Better identifies numerical entities with special characters  CURRENCY – expressions denoting amounts of money – 33.8 万元 – 港币五千万 – 一百四十四亿七千万美元  DATE – minimally composed of a number and month name – 7 月 2 日 – 十月十七日  MEASURE – expressions – 二百五十六公斤 – 5.5 米  TIME – clock times and time expressions – 8 时 – 3 点零 5 分

11 ©2013 SAP AG. All rights reserved.11 Public TITLE President PERSON Barak Obama PEOPLE Greeks LANGUAGE Greek ADDRESS1 245 First Street Floor 16 ADDRESS2 Cambridge, MA 02142 LOCALITY Cambridge REGION@MINOR Napa Country REGION@MAJOR Connecticut COUNTRY Brazil CONTINENT South America GEO_FEATURE Mount Fuji GEO_AREA Scandinavia ORGANIZATION@COMMERCIAL AT&T ORGANIZATION@EDUCATIONAL University of Washington ORGANIZATION@OTHER FBI PRODUCT iPhone TICKER NYSE:SAP SOCIAL_MEDIA@TWITTER_ID @SAP SOCIAL_MEDIA@TWITTER_TOPIC #HANA DATE 2/14/2011 DAY Monday MONTH June YEAR 2011 TIME 3:47pm TIME_PERIOD 3 days, from 9 to 5pm HOLIDAY Memorial Day CURRENCY 17 euros MEASURE 217 meters PERCENT 4% PHONE 617-677-2030 URI@EMAIL URI@IP URI@URL Syntactic Entities: NOUN_GROUP big umbrella PROP_MISC Cup o’ Soup Additional Predefined Core Extractions for Russian

12 ©2013 SAP AG. All rights reserved.12 Public Improved Voice of Customer Extraction for Simplified Chinese The following major fact types are classified:  Sentiments: expression of a customer’s feelings about something  Problems: a statement about something which impedes a customer’s work  Requests: expression of a customer’s desire for an enhancement/change  Profanity: defines a set of pejorative vocabulary  Emoticons: expression of someone's feelings about the whole sentence or situation Focuses on finer extraction of online reviews and implementing customer feedback  Dramatic overall improvement in stances and topics  Recall and precision testing results jumped significantly higher

13 ©2013 SAP AG. All rights reserved.13 Public Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase decision. This presentation is not subject to your license agreement or any other agreement with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or to develop or release any functionality mentioned in this presentation. This presentation and SAP’s strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice. This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.

14 Thank you Contact information Anthony Waite SAP HANA Product Management To get the best overview of what’s new in SAP HANA SPS 07, read this

15 ©2013 SAP AG. All rights reserved.15 Public © 2013 SAP AG. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Please see for additional trademark information and notices.

16 ©2013 SAP AG. All rights reserved.16 Public © 2013 SAP AG. Alle Rechte vorbehalten. Weitergabe und Vervielfältigung dieser Publikation oder von Teilen daraus sind, zu welchem Zweck und in welcher Form auch immer, ohne die ausdrückliche schriftliche Genehmigung durch SAP AG nicht gestattet. In dieser Publikation enthaltene Informationen können ohne vorherige Ankündigung geändert werden. Einige der von der SAP AG und ihren Distributoren vermarkteten Softwareprodukte enthalten proprietäre Softwarekomponenten anderer Softwareanbieter. Produkte können länderspezifische Unterschiede aufweisen. Die vorliegenden Unterlagen werden von der SAP AG und ihren Konzernunternehmen („SAP-Konzern“) bereitgestellt und dienen ausschließlich zu Informationszwecken. Der SAP-Konzern übernimmt keinerlei Haftung oder Gewährleistung für Fehler oder Unvollständigkeiten in dieser Publikation. Der SAP-Konzern steht lediglich für Produkte und Dienstleistungen nach der Maßgabe ein, die in der Vereinbarung über die jeweiligen Produkte und Dienstleistungen ausdrücklich geregelt ist. Keine der hierin enthaltenen Informationen ist als zusätzliche Garantie zu interpretieren. SAP und andere in diesem Dokument erwähnte Produkte und Dienstleistungen von SAP sowie die dazugehörigen Logos sind Marken oder eingetragene Marken der SAP AG in Deutschland und verschiedenen anderen Ländern weltweit. Weitere Hinweise und Informationen zum Markenrecht finden Sie unter en/legal/copyright/index.epx#trademark. en/legal/copyright/index.epx#trademark

Download ppt "What´s New? SAP HANA SPS 07 Text Analysis (Delta from SPS 06 to SPS 07) SAP HANA Product Management November, 2013."

Similar presentations

Ads by Google