Presentation on theme: "System Design and Memory Limits. Problem If you were integrating a feed of end of day stock price information (open, high, low, and closing price) for."— Presentation transcript:
Problem If you were integrating a feed of end of day stock price information (open, high, low, and closing price) for 5,000 companies, how would you do it? You are responsible for the development, rollout and ongoing monitoring and maintenance of the feed. Describe the different methods you considered and why you would recommend your approach. The feed is delivered once per trading day in a comma- separated format via an FTP site. The feed will be used by 1000 daily users in a web application
Problem Let’s assume we have some scripts which are scheduled to get the data via FTP at the end of the day. Where do we store the data? How do we store the data in such a way that we can do various analyses of it?
Solution 1 Keep the data in text files. This would be very difficult to manage and update, as well as very hard to query. Keeping unorganized text files would lead to a very inefficient data model.
Solution 2 We could use a database. This provides the following benefits: Logical storage of data. Facilitates an easy way of doing query processing over the data. Example return all stocks having open > N AND closing price < M Advantages: Makes the maintenance easy once installed properly. Roll back, backing up data, and security could be provided using standard database features. We don’t have to “reinvent the wheel.”
Solution 3 If requirements are not that broad and we just want to do a simple analysis and distribute the data, then XML could be another good option. Our data has fixed format and fixed size: company_name, open, high, low, closing price. The XML could look like this:
Benefits: Very easy to distribute. This is one reason that XML is a standard data model to share /distribute data. Efficient parsers are available to parse the data and extract out only desired data. We can add new data to the XML file by carefully appending data. We would not have to re-query the database. However, querying the data could be difficult.