Presentation is loading. Please wait.

Presentation is loading. Please wait.


Similar presentations

Presentation on theme: "Tokenization"— Presentation transcript:

1 Tokenization

2 C preprocessor Phases 1 Tokenization: The preprocessor breaks the result into preprocessing tokens and whitespace. It replaces comments with whitespace.

3 Enterprise search Content processing and analysis 1 As part of processing and analysis, tokenization is applied to split the content into tokens which is the basic matching unit. It is also common to normalize tokens to lower case to provide case-insensitive search, as well as to normalize accents to provide better recall.

4 Lexical analysis - Token 1 A token is a string of one or more characters that is significant as a group. The process of forming tokens from an input stream of characters is called tokenization.

5 Lexical analysis - Tokenization 1 Tokenization is the process of demarcating and possibly classifying sections of a string of input characters. The resulting tokens are then passed on to some other form of processing. The process can be considered a sub-task of parsing input.

6 PerspecSys - Technology 1 The AppProtex Cloud Data Control Gateway secures data in software as a service and platform as a service provider applications through the use of encryption or tokenization. Gartner, a marketing research firm, refers to this type of technology as a cloud encryption gateway, and categorizes providers of this technology cloud access security brokers.

7 PerspecSys - Technology 1 Within the Gateway organizations may define encryption, and tokenization options at the field-level

8 PerspecSys - Standards 1 Its tokenization option was evaluated by Coalfire, a PCI DSS Qualified Security Assessor (QSA) and a FedRamp 3PAO, to ensure that it adheres to industry guidelines

9 Identity resolution - Data preprocessing 1 Standardization can be accomplished through simple rule-based data transformations or more complex procedures such as lexicon-based tokenization and probabilistic hidden Markov models

10 Lexing - Token 1 A 'token' is a string of one or more characters that is significant as a group. The process of forming tokens from an input stream of characters is called 'tokenization'.

11 Syntax (programming languages) - Levels of syntax 1 This modularity is sometimes possible, but in many real-world languages an earlier step depends on a later step – for example, the lexer hack in C is because tokenization depends on context

12 Tokenization (disambiguation) 1 * Tokenization in language processing (both natural and computer)

13 Tokenization 1 'Tokenization' is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining. Tokenization is useful both in linguistics (where it is a form of text segmentation), and in computer science, where it forms part of lexical analysis.

14 Tokenization - Methods and obstacles 1 Typically, tokenization occurs at the word level. However, it is sometimes difficult to define what is meant by a word. Often a tokenizer relies on simple heuristics, for example:

15 Tokenization - Methods and obstacles 1 Tokenization is particularly difficult for languages written in scriptio continua which exhibit no word boundaries such as Ancient Greek, Chinese language|Chinese,Huang, C., Simon, P., Hsieh, S., Prevot, L. (2007)[ 7/P07-2018.pdf Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Word break Identification] or Thai language|Thai.

16 Tokenization - Services 1 *[ TokenEx] Cost- effective tokenization solution on the market for one-time, recurring and archival transaction data.

17 Tokenization (data security) 1 Tokenization can be used to safeguard sensitive data involving, for example, bank accounts, financial statements, medical records, criminal records, driver's licenses, loan applications, stock trade (financial instrument)|trades, voter registrations, and other types of personally identifiable information (PII).[ okenization.cfm What is Tokenization?]

18 Tokenization (data security) 1 In payment card industry (PCI) context, tokens are used to reference cardholder data that is stored in a separate database, application or off-site secure facility.”.[ 7_tokenizationindepth.cfm Shift4 Corporation Releases Tokenization in Depth White Paper]

19 Tokenization (data security) 1 Building an alternate payments ecosystem requires a number of entities working together in order to deliver Near field communication|NFC or other tech based payment services to the end users. One of the issues is the interoperability between the players and to resolve this issue the role of trusted service manager (TSM) is proposed to establish a technical link between MNOs and providers of services, so that these entities can work together. Tokenization helps you to do that.

20 Tokenization (data security) 1 The Payment Card Industry Data Security Standard, an industry-wide standard that must be met by any organization that stores, processes, or transmits cardholder data, mandates that Creditcard data must be protected when stored.[ ds/pci_dss.shtml The Payment Card Industry Data Security Standard] Tokenization, as applied to payment card data, is often implemented to meet this mandate, replacing Creditcard numbers in some systems with a random value.[ baseAnswer/0,289625,sid14_gci1275256,00.html Can Tokenization of Creditcard Numbers Satisfy PCI Requirements?] Tokens can be formatted in a variety of ways

21 Tokenization (data security) 1 Tokenization makes it more difficult for hackers to gain access to cardholder data outside of the token storage system. Implementation of tokenization could simplify the requirements of the Payment Card Industry Data Security Standard|PCI DSS, as systems that no longer store or process sensitive data are removed from the scope of the PCI audit.[ / “Securing Data: What Tokenization Does”]

22 Credit card fraud - Countermeasures 1 * Tokenization (data security) – not storing the full number in computer systems

23 Speech synthesis 1 This process is often called text normalization, pre- processing, or tokenization

24 Informix - Key Products 1 There is also an advanced data warehouse edition of Informix. This version includes the Informix Warehouse Accelerator which uses a combination of newer technologies including in-memory data, tokenization, deep compression, and columnar database technology to provide extreme high performance on business intelligence and data warehouse style queries.

25 Yacc 1 Yacc produces only a parser (phrase analyzer); for full syntactic analysis this requires an external lexical analyzer to perform the first tokenization stage (word analysis), which is then followed by the parsing stage proper. Lexical analyzer generators, such as Lex programming tool|Lex or Flex lexical analyser|Flex are widely available. The IEEE POSIX P1003.2 standard defines the functionality and requirements for both Lex and Yacc.

26 Credit card number - Security 1 * Tokenization (data security)|Tokenization – in which an artificial account number (token) is printed, stored or transmitted in place of the true account number.

27 OpenNLP 1 It supports the most common NLP tasks, such as tokenization, Sentence boundary disambiguation|sentence segmentation, part-of-speech tagging, Named entity recognition|named entity extraction, Shallow parsing|chunking, Syntactic parsing|parsing, and coreference|coreference resolution

28 Index (search engine) - Document parsing 1 The terms 'indexing', 'parsing', and 'tokenization' are used interchangeably in corporate slang.

29 Index (search engine) - Document parsing 1 Natural language processing, as of 2006, is the subject of continuous research and technological improvement. Tokenization presents many challenges in extracting the necessary information from documents for indexing to support quality searching. Tokenization for indexing involves multiple technologies, the implementation of which are commonly kept as corporate secrets.

30 Index (search engine) - Challenges in natural language processing 1 The goal during tokenization is to identify words for which users will search

31 Index (search engine) - Tokenization 1 During tokenization, the parser identifies sequences of characters which represent words and other elements, such as punctuation, which are represented by numeric codes, some of which are non- printing control characters

32 Index (search engine) - Language recognition 1 If the search engine supports multiple languages, a common initial step during tokenization is to identify each document's language; many of the subsequent steps are language dependent (such as stemming and part of speech tagging)

33 Index (search engine) - Format analysis 1 If the search engine supports multiple File format|document formats, documents must be prepared for tokenization

34 Index (search engine) - Section recognition 1 Some search engines incorporate section recognition, the identification of major parts of a document, prior to tokenization

35 Index (search engine) - Meta tag indexing 1 The design of the HTML markup language initially included support for meta tags for the very purpose of being properly and easily indexed, without requiring tokenization.Berners-Lee, T., Hypertext Markup Language - 2.0, RFC 1866, Network Working Group, November 1995.

36 Applesoft BASIC - Speed issues, features 1 Furthermore, because the language used tokenization, a programmer had to avoid using any consecutive letters that were also Applesoft commands or operations (one could not use the name SCORE for a variable because it would interpret the OR as a Boolean operator, thus rendering it SC OR E, nor could one use BACKGROUND because the command GR invoked the low- resolution graphics mode, in this case creating a syntax error).

37 Identifier - In computer languages 1 However, a common restriction is not to permit whitespace characters and language operators; this simplifies tokenization by making it Free-form language|free-form and context-free

38 Identifier - In computer languages 1 This overlap can be handled in various ways: these may be forbidden from being identifiers – which simplifies tokenization and parsing – in which case they are reserved words; they may both be allowed but distinguished in other ways, such as via stropping; or keyword sequences may be allowed as identifiers and which sense is determined from context, which requires a context-sensitive lexer

39 Tokens - Computing 1 ** Tokenization (data security), the process of substituting a sensitive data element

40 IVONA - Inside IVONA 1 This process is often called text normalization, pre- processing, or tokenization

41 Identifier (computer science) - In computer languages 1 However, a common restriction is not to permit whitespace characters and language operators; this simplifies tokenization by making it Free-form language|free-form and context-free

42 Underscore - Multi-word identifiers 1 However, spaces are not typically permitted inside identifiers, as they are treated as delimiters between tokenization|tokens

43 W-shingling 1 The document, a rose is a rose is a rose can be tokenization|tokenized as follows:

44 Slot machines - Description 1 Recently, some casinos have chosen to take advantage of a concept commonly known as tokenization, where one token buys more than one credit

45 VTD-XML - Non-Extractive, Document-Centric Parsing 1 Traditionally, a lexical analysis|lexical analyzer represents tokens (the small units of indivisible character values) as discrete string objects. This approach is designated extractive parsing. In contrast, non-extractive tokenization mandates that one keeps the source text intact, and uses offsets and lengths to describe those tokens.

46 CipherCloud 1 Hickey, CipherCloud Uses Encryption, Tokenization to Bolster Cloud Security, CRN, February 14, 2011]

47 CipherCloud - Platform 1 Snooping, The Washington Times, August 18, 2013] The company uses Tokenization (data security)|tokenization, which is the process of substituting a sensitive data element with a non-sensitive equivalent

48 Parsing expression grammar - Advantages 1 Parsers for languages expressed as a CFG, such as LR parsers, require a separate tokenization step to be done first, which breaks up the input based on the location of spaces, punctuation, etc. The tokenization is necessary because of the way these parsers use lookahead to parse CFGs that meet certain requirements in linear time. PEGs do not require tokenization to be a separate step, and tokenization rules can be written in the same way as any other grammar rule.

49 ProPay 1 'ProPay, Inc' is an American financial services company headquartered in Lehi, UT. The company provides payment solutions that include Merchant account provider|merchant accounts, payment processing, ACH services, pre-paid cards and other payment- related products. ProPay also provides end- to-end encryption and tokenization services. In December, 2012, ProPay was acquired by Total System Services, Inc. (TSYS) a publicly traded company, TSS (NYSE).

50 ProPay - History 1 In 2009, ProPay was among a handful of companies that began to offer an end-to-end encryption and tokenization service.ProPay Unlocks ProtectPay Encrypted Credit Card Processing, 02/20/2009 At that time, ProPay also introduced the MicroSecure Card Reader®, allowing small merchants to securely accept card present transactions.Pocket Credit Card Reader Takes Transactions on the Go, PC World 01/07/2009 In 2010, ProPay received the Independent Sales Organization of Year award from the Electronic Transaction Association.ProPay Receives 2010 Electronic Transaction Association ISO of the Year Award, Silicone Slopes 04/20/2010

51 Casio fx-7000G - Programming 1 Tokenization is performed by using characters and symbols in place of long lines of code to minimize the amount of memory being used

52 Cuban art 1 A movement that mirrored this artistic piece was underway in which the shape of Cuba became a token in the artwork in a phase known as tokenization

53 For More Information, Visit: m/the-tokenization- toolkit.html m/the-tokenization- toolkit.html The Art of Service

Download ppt "Tokenization"

Similar presentations

Ads by Google