Presentation is loading. Please wait.

Presentation is loading. Please wait.

ICDM 2002 Using Text Mining to Infer Semantic Attributes for Retail Data Mining Rayid Ghani & Andrew Fano Accenture Technology Labs, USA.

Similar presentations


Presentation on theme: "ICDM 2002 Using Text Mining to Infer Semantic Attributes for Retail Data Mining Rayid Ghani & Andrew Fano Accenture Technology Labs, USA."— Presentation transcript:

1 ICDM 2002 Using Text Mining to Infer Semantic Attributes for Retail Data Mining Rayid Ghani & Andrew Fano Accenture Technology Labs, USA

2 Who are we? Accenture Technology Labs R&D Group for Accenture ~ 50 researchers in Chicago, Palo Alto (California) and Sophia Antipolis (France) Research in Data Mining, Machine Learning, Ubiquitous Computing, Wearable Computing, Language Technologies, Virtual & Augmented Reality, Collaborative Workspaces…

3 Current State of Retail Data Mining Large amounts of data captured about transactions Each Retailer has terabytes of data in their data warehouse Several data mining algorithms applied to this data

4 Problem: Today’s transaction data can’t answer important marketing questions. What do your best selling items have in common? What about the worst sellers? What do the products a customer has purchased say about them? What do the products your competitors sell say about them?

5 What’s Missing? Captured data focuses on the transaction, not the product. Product information captured with transactions is typically limited to little more than SKU, size, brand and price. But what does a SKU mean?

6 Current Data Mining Practice Treat products as generic unique entities/objects with no associated semantics Semantics are applied by humans AFTER the algorithm has done the learning e.g. interpreting association rules, decision trees

7 Product Semantics: What does a product mean? What does this shirt say about her? Is it conservative or flashy? Trendy or classic? Formal or casual? Where would we get this information?

8 Extract underlying attributes from product marketing descriptions Marketing descriptions are designed to convey a particular image to customers. These descriptions implicitly contain these more elusive attributes. DKNY Jeans Ruched Side-Tie Tee Get back to basics with a fresh new look this season. The Ruched Side-Tie Tee has a drawstring tie at left hip with shirred detail down the side. Stretch provides a flattering, shapely fit. V-neck. SKU : 655432 UPC: 4200006200 Item: DKNYTee Price $49

9 Product Descriptions Domain Experts Product descriptions marked up with attribute values Supervised Learning Algorithm Learned Statistical Models Training the System

10 Inferring Attributes via Text Classification Build one classifier for each attribute type Simple statistical classifier – Naïve Bayes Multinomial model (McCallum & Nigam 1998) –For all words (description) and attribute values: calculate P(word | attribute value) using the manually rated items –Given a new item description: Calculate P(attribute value | item description) for all attribute values Use Maximum Likelihood

11 Naïve Bayes Results Classification Accuracy

12 Can we get something for free? Semi-supervised Learning Lot of product descriptions available for minimal/no cost from retail websites Labeling them is expensive Can we utilize the unlabeled product descriptions to provide better performance?

13 Semi-Supervised Learning Apply algorithms that combine labeled and unlabeled data for classification –Expectation-Maximization (Nigam et al. 1999) –Co-Training (Blum & Mitchell 1999) –Co-EM (Nigam & Ghani, 2000) –ECOC + Co-Training (Ghani, 2002)

14 The EM Algorithm Naïve Bayes Learn from labeled data Estimate labels Probabilistically add to labeled data E-Step M-Step

15 EM Results Classification Accuracy

16 Extremely Conservative Double-breasted seasonless trouser classic Blazer A Peek at the Learned Models Not Conservative (Flashy) leopard chemise straps flirty

17 Informal jean denim sweater tee Formal jacket skirt lines seam crepe A Peek at the Learned Models

18 Loungewear chemise silk kimono lounge robe gown A Peek at the Learned Models Extremely Sporty sneaker rubber miraclesuit athletic Mesh

19 Populating the Knowledge Base New Product Descriptions Product descriptions automatically marked up with attribute values Learned Statistical Models Product Semantics Knowledge Base

20 What can this be used for? Applications Example applications that we have built include: Recommender System Copywriter’s Workbench Competitive Comparisons

21 Retailer’s Web Site Extracted Descriptions of Products Browsed Product Semantics Knowledge Base Learned Statistical Models Evolving User Profile Query the Knowledge Base for Matching Products Recommend Matching Products to User Recommender System

22

23

24

25

26

27 Advantages over Traditional Recommender Systems This approach provides us some of the underlying attributes that characterize a customer’s preference. We can therefore begin to explain the preference rather than simply rely on the co-occurrence of purchases (e.g. people who bought x also bought y). This helps with: Handling new products/rapidly changing products Low Frequency Products Cross Category Recommendations

28 Cross-Category Recommendations Difficult for collaborative filtering and content-based systems Build a model of the user - personality, stylistic attributes Taste in clothing might also be suggestive of taste in other products, say furniture and home decoration Create models for different product classes and create mappings among these models

29 Application II Competitive Comparison Tool Just as consumers may be profiled by what they buy, retailers can be profiled by what they sell Track and compare how the positioning of products from different retailers changes over time Brands can track how different retailers/stores position their products

30

31

32 Application III Copywriters toolkit Can this system be used to help write product descriptions? A tool for copywriters that provides feedback to help them position a product in a particular way. Writers can assess their descriptions and get word recommendations

33 ScreenShot Classy and chic, this long- sleeve pinstripe shirt has the glamorous appeal of a 40s movie star or European songstress. Shirring along front button placket. Double-button extended cuffs. 3 1/2" side-seam slits. Cotton/polyester; dry clean. By BCBG Max Azria; imported. Increase Tone : skin, flirty, low-neck, slim-fit, straps,

34 Summary “Understand” a product and hence the individual customer Use Text Learning (supervised and semi-supervised) to abstract from product (description) to subjective, domain-specific features to create enhanced product databases Create applications that have more semantic knowledge of products and can help understand consumer behavior Provide Data Mining algorithms with semantic attributes to operate on and build better and more domain specific models

35 Next Steps: Tracking Marketing Appeals Can we begin to keep track of the kinds of messages individuals respond to? Status: Ralph Lauren "Polo" Solid Towels From a collection of exquisite designer looks for the bath, each solid color shower towel sports the signature Ralph Lauren Polo player. Designed in thick and thirsty 100% cotton. Function: Charter Club "Cotton Twist" Bath Rugs Prevent slipping and cold feet with these Charter Club “Cotton Twist” Bath Rugs. Choose from an assortment of colors. These rugs are made of 100% cotton twist with non-skid backing. “Feelgood” Charter Club Solid Towels Bask in the luxury of superior plush cotton. The deluxe Charter Club towels are composed of 100% ringspun cotton for supreme softness, with wider, more sophisticated dobby borders for longer wear. Online exclusively @macys.com. Quality: Charisma® Solid Bath Towels Available in a range of vibrant colors to coordinate with any decor, these thirsty towels are designed in luxurious 100% soft Supima® cotton with an incredible 720 ringspun terry loops per square inch and finished with lock-stitched hems for extra-strength.


Download ppt "ICDM 2002 Using Text Mining to Infer Semantic Attributes for Retail Data Mining Rayid Ghani & Andrew Fano Accenture Technology Labs, USA."

Similar presentations


Ads by Google