Web Security Privacy CS 136 Computer Security Peter Reiher December 1, 2011
Web Security Lots of Internet traffic is related to the web Much of it is financial in nature Also lots of private information flow around web applications An obvious target for attackers
The Web Security Problem Many users interact with many servers Most parties have little other relationship Increasingly complex things are moved via the web No central authority Many developers with little security experience Many critical elements originally designed with no thought to security Sort of a microcosm of the overall security problem
Aspects of the Web Problem
Who Are We Protecting? The clients From the server From the client From each other
What Are We Protecting? The client’s private data The server’s private data The integrity (maybe secrecy) of their transactions The client and server’s machines Possibly server availability For particular clients?
Some Real Threats Buffer overflows and other compromises Client attacks server SQL injection Malicious downloaded code Server attacks client
More Threats Cross-site scripting Clients attack each other Threats based on non-transactional nature of communication Client attacks server Denial of service attacks Threats on server availability (usually)
Compromise Threats Much the same as for any other network application Web server might have buffer overflow Or other remotely usable flaw Not different in character from any other application’s problem And similar solutions
What Makes It Worse Web servers are complex They often also run supporting code Which is often user-visible Large, complex code base is likely to contain such flaws Nature of application demands allowing remote use
Solution Approaches Patching Use good code base Minimize code that the server executes Maybe restrict server access When that makes sense Lots of testing and evaluation Many tools for web server evaluation
SQL Injection Attacks Many web servers have backing databases Much of their information stored in database Web pages are built (in part) based on queries to database Possibly using some client input . . .
SQL Injection Mechanics Server plans to build a SQL query Needs some data from client to build it E.g., client’s user name Server asks client for data Client, instead, provides a SQL fragment Server inserts it into planned query Leading to a “somewhat different” query
An Example Intent is that user fills in his ID and password “select * from mysql.user where username = ‘ “ . $uid . “ ‘ and password=password(‘ “. $pwd “ ‘);” Intent is that user fills in his ID and password What if he fills in something else? ‘or 1=1; -- ‘
What Happens Then? $uid has the string substituted, yielding “select * from mysql.user where username = ‘ ‘ or 1=1; -- ‘ ‘ and password=password(‘ “. $pwd “ ‘);” This evaluates to true Since 1 does indeed equal 1 And -- comments out rest of line If script uses truth of statement to determine valid login, attacker has logged in
Basis of SQL Injection Problem Unvalidated input Server expected plain data Got back SQL commands Didn’t recognize the difference and went ahead Resulting in arbitrary SQL query being sent to its database With its privileges
Solution Approaches Carefully examine all input To filter out injected SQL Use database access controls Of limited value Randomization of SQL keywords Making injected SQL meaningless
Malicious Downloaded Code The web relies heavily on downloaded code Full language and scripting language Mostly scripts Instructions downloaded from server to client Run by client on his machine Using his privileges Without defense, script could do anything
Types of Downloaded Code Java Full programming language Scripting languages Java Script VB Script ECMAScript XSLT
Solution Approaches Disable scripts Not very popular Use secure scripting languages Also not popular Particularly with code writers Isolation mechanisms VM or application-based Vista mandatory access control
Cross-Site Scripting XSS Many sites allow users to upload information Blogs, photo sharing, Facebook, etc. Which gets permanently stored And displayed Attack based on uploading a script Other users inadvertently download it And run it . . .
The Effect of XSS Arbitrary malicious script executes on user’s machine In context of his web browser At best, runs with privileges of the site storing the script Often likely to run at full user privileges
Why Is XSS Common? Use of scripting languages widespread For legitimate purposes Most users leave them enabled in browser Only a question of getting user to run your script Often only requires fetching URL
Typical Effects of XSS Attack Most commonly used to steal personal information That is available to legit web site User IDs, passwords, credit card numbers, etc. Such information often stored in cookies at client side
Solution Approaches Don’t allow uploading of scripts Usually by carefully analyzing uploaded data Provide some form of protection in browser
Exploiting Statelessness HTTP is designed to be stateless But many useful web interactions are stateful Various tricks used to achieve statefulness Usually requiring programmers to provide the state Often trying to minimize work for the server
A Simple Example Web sites are set up as graphs of links You start at some predefined point A top level page, e.g. And you traverse links to get to other pages But HTTP doesn’t “keep track” of where you’ve been Each request is simply the name of a link
Why Is That a Problem? What if there are unlinked pages on the server? Should a user be able to reach those merely by naming them? Is that what the site designers intended?
A Concrete Example The ApplyYourself system Used by colleges to handle student applications For example, by Harvard Business School in 2005 Once all admissions decisions made, results available to students
What Went Wrong? Pages representing results were created as decisions were made Stored on the web server But not linked to anything, since results not yet released Some appliers figured out how to craft URLs to access their pages Finding out early if they were admitted
The Core Problem No protocol memory of what came before So no protocol way to determine that response matches request Could be built into the application that handles requests But frequently isn’t Or is wrong
Solution Approaches Get better programmers Or better programming tools Back end system that maintains and compares state Front end program that observes requests and responses Producing state as a result
Conclusion Web security problems not inherently different than general software security But generality, power, ubiquity of the web make them especially important Like many other security problems, constrained by legacy issues
Privacy Data privacy issues Network privacy issues Some privacy solutions
What Is Privacy? The ability to keep certain information secret Usually one’s own information But also information that is “in your custody” Includes ongoing information about what you’re doing
Privacy and Computers Much sensitive information currently kept on computers Which are increasingly networked Often stored in large databases Huge repositories of privacy time bombs We don’t know where our information is
Privacy and Our Network Operations Lots of stuff goes on over the Internet Banking and other commerce Health care Romance and sex Family issues Personal identity information We used to regard this stuff as private Is it private any more?
Threat to Computer Privacy Cleartext transmission of data Poor security allows remote users to access our data Sites we visit can save information on us Multiple sites can combine information Governmental snooping Location privacy Insider threats in various places
Some Specific Privacy Problems Poorly secured databases that are remotely accessible Or are stored on hackable computers Data mining by companies we interact with Eavesdropping on network communications by governments Insiders improperly accessing information Cell phone/mobile computer-based location tracking
Data Privacy Issues My data is stored somewhere Can I control who can use it/see it? Can I even know who’s got it? How do I protect a set of private data? While still allowing some use? Will data mining divulge data “through the back door”?
Personal Data Who owns data about you? What if it’s really personal data? Social security number, DoB, your DNA record? What if it’s data someone gathered about you? Your Google history or shopping records Does it matter how they got it?
Protecting Data Sets If my company has (legitimately) a bunch of personal data, What can I/should I do to protect it? Given that I probably also need to use it? If I fail, how do I know that? And what remedies do I have?
Options for Protecting Data Careful system design Limited access to the database Networked or otherwise Full logging and careful auditing Using only encrypted data Must it be decrypted? If so, how to protect the data and the keys?
Data Mining and Privacy Data mining allows users to extract models from databases Based on aggregated information Often data mining allowed when direct extraction isn’t Unless handled carefully, attackers can use mining to deduce record values
Insider Threats and Privacy Often insiders need access to private data Under some circumstances But they might abuse that access How can we determine when they misbehave? What can we do?
Network Privacy Mostly issues of preserving privacy of data flowing through network Start with encryption With good encryption, data values not readable So what’s the problem?
Traffic Analysis Problems Sometimes desirable to hide that you’re talking to someone else That can be deduced even if the data itself cannot How can you hide that? In the Internet of today?
Location Privacy Mobile devices often communicate while on the move Often providing information about their location Perhaps detailed information Maybe just hints This can be used to track our movements
Implications of Location Privacy Problems Anyone with access to location data can know where we go Allowing government surveillance Or a private detective following your moves Or a maniac stalker figuring out where to ambush you . . .
Some Privacy Solutions The Scott McNealy solution “Get over it.” Anonymizers Onion routing Privacy-preserving data mining Preserving location privacy Handling insider threats via optimistic security
Anonymizers Network sites that accept requests of various kinds from outsiders Then submit those requests Under their own or fake identity Responses returned to the original requestor A NAT box is a poor man’s anonymizer
The Problem With Anonymizers The entity running it knows who’s who Either can use that information himself Or can be fooled/compelled/hacked to divulge it to others Generally not a reliable source of real anonymity
Onion Routing Meant to handle issue of people knowing who you’re talking to Basic idea is to conceal sources and destinations By sending lots of crypo-protected packets between lots of places Each packet goes through multiple hops
A Little More Detail A group of nodes agree to be onion routers Users obtain crypto keys for those nodes Plan is that many users send many packets through the onion routers Concealing who’s really talking
Sending an Onion-Routed Packet Encrypt the packet using the destination’s key Wrap that with another packet to another router Encrypted with that router’s key Iterate a bunch of times
In Diagram Form Source Destination Onion routers
What’s Really in the Packet
Delivering the Message
What’s Been Achieved? Nobody improper read the message Nobody knows who sent the message Except the receiver Nobody knows who received the message Except the sender Assuming you got it all right
Issues for Onion Routing Proper use of keys Traffic analysis Overheads Multiple hops Multiple encryptions
Privacy-Preserving Data Mining Allow users access to aggregate statistics But don’t allow them to deduce individual statistics How to stop that?
Approaches to Privacy for Data Mining Perturbation Add noise to sensitive value Blocking Don’t let aggregate query see sensitive value Sampling Randomly sample only part of data
Preserving Location Privacy Can we prevent people from knowing where we are? Given that we carry mobile communications devices And that we might want location-specific services ourselves
Location-Tracking Services Services that get reports on our mobile device’s position Probably sent from that device Often useful But sometimes we don’t want them turned on So, turn them off then
But . . . What if we turn it off just before entering a “sensitive area”? And turn it back on right after we leave? Might someone deduce that we spent the time in that area? Very probably
Handling Location Inferencing Need to obscure that a user probably entered a particular area Can reduce update rate Reducing certainty of travel Or bundle together areas Increasing uncertainty of which was entered
Conclusion Privacy is a difficult problem in computer systems Good tools are lacking Or are expensive/cumbersome Hard to get cooperation of others Probably an area where legal assistance is required