Presentation on theme: "Dr. XiaoFeng Wang (Associate Professor, IUB) Cloud Security: New Challenges, New Opportunities."— Presentation transcript:
Dr. XiaoFeng Wang (Associate Professor, IUB) Cloud Security: New Challenges, New Opportunities
The Cloud is the future Ultra cost-effective computation The only scalable platform for data- intensive computing Cloud Computing The Cloud is not Secure Security/Privacy is the main concern Service providers avoid liability “…We strive to keep Your Content secure, but cannot guarantee that we will be successful at doing so, given the nature of the Internet.” - Amazon - Amazon
Cloud Security: Anything New? Web Security Virtual Machine Security Secure Computing Outsourcing Data Storage Security and Privacy Decentralized Access Control and more
What We Believe New The Cloud brings us New Challenges: E.g., Move applications into the Cloud Applications change New security problems emerge New The Cloud brings us New Opportunities: E.g., Design for Data-Intensive Computing Hybrid Cloud Infrastructure
What We Found Surprising New Security Challenges in SaaS: fundamental vulnerability to side-channel attacks Susceptibility of API service integrations to logic flaws Surprising New Solution: Secure DNA Alignments can scale on the Cloud
Challenges in Making SaaS Right Joint Work with Rui Wang, Kehuan Zhang, Zhou Li (my students) and Dr. Shuo Chen, Dr. Shaz Qadeer (MSR) Related Publications: Oakland’10, CCS’10, Oakland’11
(1) split between client and server (2) state transitions driven by network traffic Desktop application Web application Worry about privacy? Let’s do encryption.
The Problem is serious High-profile Web Applications Serious information leaks: health records, family income, investment details, search queries The problem is fundamental stateful communication, low entropy input and significant traffic distinctions. Defense is non-trivial effective defense needs to be application specific. calls for a disciplined web programming methodology.
Impacts Your health, tax, and search data siphoned: Software-as-a-service springs SSL leak, The Register, 23 March 2010 Side-Channel Leaks in Web Applications, Freedom to Tinker, Ed Felten's blog, 23 March 2010 Researchers sound alarm on Web app "side channel" data leaks, Network World, 24 March 2010 Bruce Schneier commented on our work Someone wikied our work SaaS Apps May Leak Data Even When Encrypted, Study Says, darkReading, 26 March 2010 and more: Searching Google with “side-channel leaks in web applications”
Two most-popular online tax software. Design: a wizard-style questionnaire Tailor the conversation based on user’s previous input. Problem: Information Leaks from control flows E.g., Filing status, Number of children, Paid big medical bill, the adjusted gross income (AGI)
Entry page of Deductions & Credits Summary of Deductions & Credits Full credit Not eligible Partial credit All transitions have unique traffic patterns. $0 $110000$150000 Not eligible Full credit Partial credit (two children scenario)
Entry page of Deductions & Credits Summary of Deductions & Credits Full credit Not eligible Partial credit Enter your paid interest $0 $115000 $145000 Not eligible Full credit Partial credit
Disabled Credit $24999 Retirement Savings $53000 IRA Contribution $85000$105000 College Expense $116000 $115000 Student Loan Interest $145000 First-time Homebuyer credit $150000 $170000 Earned Income Credit $41646 Child credit * $110000 Adoption expense $174730 $214780 $130000 or $150000 or $170000 … $0 OnlineTax A can find more than 350 credits/deductions.
Funds you invested in? GIF image from MarketWatch. Just compare the image sizes! 3346 B 3312 B 3294 B Fund allocation Pie Charts
Inference of Fund Allocation 800 charts 80 charts 8 charts 1 chart Size of day 1 Size of day 2; Prices of the day Size of day 3; Prices of the day Size of day 4; Prices of the day 80000 charts
Challenges in API Service Integrations Examples: Payment Services (PayPal, Amazon Payment, Google Checkout, etc.) Login Services (Google ID, Facebook Connect, etc.) Other cloud API services (Amazon EC2, Simple DB, etc.) Hybrid Web Applications Integrating API services from other parties VERY, VERY HARD TO DO IT RIGHT
Our Study: Cashier-as-a-Service 3 rd -party cashiers e.g., PayPal, Amazon Payments, Google Checkout A decision to be made jointly The web store handles the order The CaaS handles the payment
T Pay Now Please confirm: shipping address: xxxxxxxxxxxxxxxxxxx billing address: xxxxxxxxxxxxxxxxxx total amount: $39.54 T Thank you for your order! Your order #12345 will be shipped. View the order PayPal (CaaS) Buy.com RT3.a.a RT3.a.b Shopper RT4.a RT4.b RT2.b RT2.a RT1.a RT1.b RT: HTTP round-trip : Web API RT3.b RT3.a Many other payment services (PayPal Standard, PayPal Express, Amazon Simple Pay, Checkout By Amazon, Google Checkout, etc.) Different integrations
Mom, can I do X ? Mom Dad Naughty kid Sounds reasonable, but ask Dad to call me. Dad, Mom is ok about X’, can you call her? Sounds like a wacky idea. I am not sure. What do you think? I think it is fine. OK.
Software powering web stores NopCommerce – popular open-source Interspire – ranked #1 by Top10Reviews.com Amazon SDKs – used by stores to integrate Amazon Payments Closed-source big stores JR.com Buy.com What We Studied
Note: the phone number is analogous to the returnURL field in Amazon Simple Pay Jeff, I want to buy this DVD.. Shopper Chuck Amazon Jeff Chuck, pay in Amazon with this signed letter: Dear Amazon, order#123 is $10, when it is paid, call me at 425-111-2222. [Jeff’s signature] Amazon, I want to pay with this letter Dear Amazon, order#123 is $10, when it is paid, call me at 425-111-2222. [Jeff’s signature] Hi, $10 has been paid for order#123. payeeEmail= email@example.com [Amazon’s signature] Great, I will ship order#123! Amazon Simple Pay Integrated in NopCommerce
Register an Amazon Seller Account a $25 MasterCard gift card can work We registered it under the name “Mark Smith” with PayPal, Amazon and Google using the card. Pay to Mark but Check out from Jeff (and seller Mark) Jeff, I want to buy this DVD. Shopper Chuck Amazon (CaaS) Jeff Chuck, pay in Amazon with this signed letter: Dear Amazon, order#123 is $10, when it is paid, call me at 425-111-2222. [Jeff’s signature] Amazon, I want to pay with this letter Dear Amazon, order#123 is $10, when it is paid, call me at 425-111-2222. [Jeff’s signature] [Mark’s signature] Hi, $10 has been paid for order#123. payeeEmail= firstname.lastname@example.org [Amazon’s signature] Great, I will ship order#123! How to Shop for Free
(RT3.b) redir to TStore.com/updateOrder? orderID1 T* Session1: pay for a cheap order (orderID1) in PayPal, but avoid the merchant from updating it status to PAID RT3.b RT4.a (RT4.a) call TStore.com/updateOrder? orderID1 T* orderID2 T* TStore.com Session 2: place an expensive order (orderID2), but skip the payment step in PayPal RT3.b (RT3.b) redir to TStore.com/updateOrder? orderID2 T* TStore.com PayPal Express Integrated in Interspire
The order is generated based on the cart at the payment time (RT3.a). The payment amount is calculated based on the cart at the moment when the shopper clicks “checkout” (RT2.a). Between RT2.a and RT3.a, the shopper can add more items to the cart. Google Checkout Integrated in Interspire
Logic flaws in 9 checkout scenarios scenariodescription NopCommerce with PayPal standardPay arbitrary amount to check out NopCommerce with Amazon Simple PayPay to the shopper himself, but check out from the victim store. Interspire with Amazon Simple PayNo need to pay Interspire with PayPal ExpressPay for a low-price order, check out a high-price one. Interspire with PayPal StandardPay for an order, check out an arbitrary number of orders of the same price. Interspire with Google CheckoutPay for only the least expensive item in the cart JR.com with Checkout-By-AmazonPay arbitrary amount to check out Buy.com with PayPal ExpressPay for an order, check out an arbitrary number of orders. Amazon Payments SDKNo need to pay What We Found
Dear Buy.com customer service, Last week I placed the two orders (Order Number: 54348156 Order number: 54348723) in buy.com. Both items were shipped recently, but I found that my paypal account has not been charged for the order 54348723 (the alcohol tester). My credit card information is: [xxxxxxxxx] The total of the order 54348723 is $5.99. Please charge my credit card. Thank you very much From: Buy.Com Support Date: Sun, Jun 13, 2010 at 3:32 PM Subject: Re: Other questions or comments (KMM3534132I15977L0KM) To: Test Wang email@example.com@firstname.lastname@example.org Thank you for contacting us at Buy.com. Buy.com will only bill your credit card only when a product has been shipped. We authorize payment on your credit card as soon as you place an order. Once an item has shipped, your credit card is billed for that item and for a portion of the shipping and/or tax charges (if applicable). If there are items on "Back Order" status, your credit card is re-authorized for the remaining amount and all previous authorizations are removed. This is the reason you may have multiple billings for your order. … A generic reply that misunderstood the situation Dear buy.com customer service, I am a Ph.D. student doing research on e-commerce security. I bumped into an unexpected technical issue in buy.com's mechanism for accepting the paypal payments. I appreciate if you can forward this email to your engineering team. The finding is regarding the order 54348723. I placed the order in an unconventional manner (by reusing a previous paypal token), which allowed me to check out the product without paying. I have received the product in the mail. Of course I need to pay for it. Here is my credit card information [xxxxxxxxxxxx]. Please charge my card. The total on the invoice is $5.99. Re: Other questions or comments (KMM3545639I15977L0KM) Buy.Com Support Wed, Jun 16, 2010 at 6:25 PM To: Test Wang Hello Test, Thank you for contacting us at Buy.com. Based on our records you were billed on 6/10/2010 for $5.99. To confirm your billing information please contact PayPal at https://www.paypal.com/helpcenter or at 1-402-935-2050. After our refund–eligible period, we mailed the products back by a certified mail. We disclosed technical details to them.
What Do We Learn? SaaS is different Two party, multi-party applications Openness and complexity of the web platform Formal verification: how to build the model? Need new development methodologies Treat side channels seriously Better integration supports
Defense is Nontrivial Padding strategies Rounding Random padding Challenges Overheads (at least one-third bandwidth usage) Implementation hurdles Demand for the Change to the Web-app Development methodology
Sidebuster Side-channel detection tool for web applications Information flow analysis to locate where leaks happen 1.Sensitive data “taints” network traffic 2.Content of the data associated with different traffic features Quantify the information being leaked
Secure and Scalable Read Mapping on Hybrid Clouds Joint work with Yangyi Chen, Bo Peng (my students) and Dr. Haixu Tang (bioinformatics expert) Related Publication: NDSS’12
Read Mapping Align a short DNA (read) to a long one (reference genome) Pre-requisites for all DNA analyses Computation intensive (involving millions of reads) A TCGC A GTCGC A T Reference Genome (about 6 billion bps) A TTGC Read (about 100 bps) A TTGC
Read Mapping on the Cloud Technical Challenges Millions of reads a reference of billions of nucleotides Edit-distance based alignment Cloud solutions Cost of sequencing < cost of mapping within organizaitons Cloud computing is the only solution Privacy NIH disallows reads with human DNA to be given to the public Cloud
Secure Computation Outsourcing Too heavyweight for data-intensive computations E.g., align two sequences of 25 3 minutes via homomorphic encryption/oblivious transfer/secret sharing [Atallah03] 4 seconds via improved SMC [Jha07] Problems with Secret Sharing Communication overheads Policy concerns
Data Anonymization Data aggregation and noise adding Vulnerable to re-identification attacks Given a reference population and a DNA sample, a read donor could be identified from aggregate data
Can we find a Better Solution? The Cloud is special Design for simple, parallelizable tasks Good at processing a large amount of data Work in a hybrid way The problem is special Small edit distance (<= 6) Public Cloud Public Cloud Private Cloud
Seed and Extend For a distance d, at least one of d +1 seeds of a read matches an l-mer on its alignment A TTGC T GATGCA A GAA TGC TGTGCA A TTGC TGATGCA G l-mer Seeding: compare with l-mers to find possible positions of a read Extension: extend from the positions to find the right alignments
A Surprisingly Simple Solution Public Cloud does the seeding on encrypted data Simple, parallelizable, data-intensive Private Cloud does the extension Complicated, yet involving relatively small amount of data Goals High security assurance Outsourcing dominant portion of Workload Scalability Limited Inter-Cloud Communication
Our Design 1.Prepare encrypted reference sequences: Extract from the reference subsequences (l-mers) Encrypt unique l-mers Send ciphertext to the public cloud 2.Public Cloud: Match encrypted seeds to the encrypted references 3.Private Cloud: Extend the reads according to the matches
Does This Approach Work? Performance: Can we move the most workload to the public cloud? Security: Can sensitive information be inferred from encrypted sequences?
Short Seeds Problem: When d >= 5, seeds are too short (< 20) Short seeds lots of matches workload for the private cloud Our idea: 2-seed combinations E.g., given d=5, a read can be divided into 7 seeds, with 2-seed combination of 28 bps
Seed Combinations 2-combinations of Reference 12-mers: about 7 TB with SHA-1 TCGTACT A GTC A TCT A TAT ATCGCGACTTCAT C G TCA G T CAAT 1 2 3 4 5 … …. 87 88 89 90 … 99 100 101 CGCAT 2-combinations of seeds (12bps): about 2.8 GB for 10 M reads
Menace of Frequency Analysis Performance gain comes with security threats Public Cloud needs to know when matches happen deterministic cryptosystem the threat of frequency analysis How serious the problem is? Over 80% of 24-mers are unique The rest 20% often carry little information
Outcomes of the Analysis Test group: the individuals are not in case and reference Data: YRI population from HapMap and Reference Genome (40 cases, 40 for the test group)
Performance Evaluation Implementation over Hadoop Clouds Public Cloud: 20 nodes on FutureGrid (8-core 2.93GHz Intel, 24 GB memory, 862 GB local disk) Private Cloud: a single node Data 10 million real human microbiome data with 4% human DNA, total 250 MB Reference genome: Chromosome 1 (252.4 million bps), Chromosome 22 (52.3 million bps)
New Security Challenges, New Opportunities Side-channel leaks and API integrations Wouldn’t be such a serious problem without SaaS Secure read mapping Made possible by the cloud’s data processing capabilities Made practical by the hybrid-cloud infrastructure Key: features of the cloud and the problems it works on