Presentation is loading. Please wait.

Presentation is loading. Please wait.

Updating Computer Science Education Jacques Cohen Brandeis University Waltham, MA USA January 2007.

Similar presentations


Presentation on theme: "Updating Computer Science Education Jacques Cohen Brandeis University Waltham, MA USA January 2007."— Presentation transcript:

1 Updating Computer Science Education Jacques Cohen Brandeis University Waltham, MA USA January 2007

2 Topics Preliminary remarks Preliminary remarks Present state of affairs and concerns Present state of affairs and concerns Objectives of this talk Objectives of this talk Trends ( hardware, software, networks, others) Trends ( hardware, software, networks, others) Illustrative examples Illustrative examples Suggestions Suggestions

3 Present state of affairs and concerns Huge increase in PC and internet usage. Huge increase in PC and internet usage. Decreasing enrollment. Decreasing enrollment. (USA mainly) (USA mainly)

4

5 Possible Reasons Previous high school preparation Previous high school preparation Bubble burst ( 2000) + outsourcing Bubble burst ( 2000) + outsourcing Widespread usage of computers by lay persons Widespread usage of computers by lay persons Interest in interdisciplinary topics (e.g., biology, business, economics) Interest in interdisciplinary topics (e.g., biology, business, economics) Public perception about: Public perception about: What is Computer Science? What is Computer Science?

6 The Nature of Computer Science Two main components: Two main components: Theoretical and Experimental Theoretical and Experimental Mathematics and Engineering Mathematics and Engineering What characterizes CS is the notion of Algorithms Emphasis on the discrete and logic Emphasis on the discrete and logic An interdisciplinary approach with other sciences may well revive the interest on the continuous (or use of qualitative reasoning) An interdisciplinary approach with other sciences may well revive the interest on the continuous (or use of qualitative reasoning)

7 Related fields Sciences in general (scientific computing), Sciences in general (scientific computing), Management, Management, Psychology (human interaction), Psychology (human interaction), Business, Business, Communications, Communications, Journalism, Journalism, Arts, etc. Arts, etc.

8 The role of Computer Science among other sciences ( How we are perceived by the other sciences ) In physics, chemistry, biology, nature is the ultimate umpire. In physics, chemistry, biology, nature is the ultimate umpire. Discovery is paramount Discovery is paramount In math and engineering: aesthetics, ease of use, acceptance, permanence, play key roles In math and engineering: aesthetics, ease of use, acceptance, permanence, play key roles

9 Uneasy dialogue with biologists It is not unusual to hear from a physicist, chemist or biologist: It is not unusual to hear from a physicist, chemist or biologist: “If computer scientists do not get involved in our field, we will do it ourselves!!” “If computer scientists do not get involved in our field, we will do it ourselves!!” It looks very likely that the biological sciences (including, of course, neuroscience) will dominate the 21st century It looks very likely that the biological sciences (including, of course, neuroscience) will dominate the 21st century

10 Differences in approaches Most scientific and creative discoveries proceed in a bottom-up manner Most scientific and creative discoveries proceed in a bottom-up manner Computer scientists are taught to emphasize top-down approaches Computer scientists are taught to emphasize top-down approaches Polya’s “How to solve it” often mentions Polya’s “How to solve it” often mentions First specialize then generalize. First specialize then generalize. Hacking is beautiful (mostly bottom-up) Hacking is beautiful (mostly bottom-up)

11 Objectives Provide a bird’s eye view of what is happening in CS education (USA) and attempt to make recommendations about possible directions. Hopefully, some of it would be applicable to European universities. Provide a bird’s eye view of what is happening in CS education (USA) and attempt to make recommendations about possible directions. Hopefully, some of it would be applicable to European universities. Premise Premise Changes ought to be gradual and depend on resources and time constraints Changes ought to be gradual and depend on resources and time constraints

12 First we have to observe current trends Generality, Storage, Speed, Networks, o thers. Trying to make sense of present directions. Trying to make sense of present directions. Difficult and risky to foresee future, e.g., PC (windows, mouse), internet, parallelism Difficult and risky to foresee future, e.g., PC (windows, mouse), internet, parallelism Topics influencing computer science education. Topics influencing computer science education. Trends in hardware, software, networks. Trends in hardware, software, networks.

13 Huge volume of data (terabytes and petabytes) Statistical nature of data Statistical nature of data Clustering, classification Clustering, classification Probability and Statistics become increasingly important Probability and Statistics become increasingly important

14 Trend towards generality Need to know more about what is going on in related topics Need to know more about what is going on in related topics A few examples: A few examples: Robotics and mechanical engineering Robotics and mechanical engineering Hardware, electrical engineering, material science, nanotechnology Hardware, electrical engineering, material science, nanotechnology Multi-field visualization (e.g., medicine) Multi-field visualization (e.g., medicine) Biophysics and bioinformatics Biophysics and bioinformatics

15 Nature of data structures Sequences (strings), streams Sequences (strings), streams Trees, DAGs, and Graphs Trees, DAGs, and Graphs 3D structures 3D structures Emphasis in discrete structures Emphasis in discrete structures Neglect of the continuous should be corrected ( e.g., use of MatLab ) Neglect of the continuous should be corrected ( e.g., use of MatLab )

16 Trends on data growth How Much Information Is There In the World? The 20-terabyte size of the Library of Congress derived by assuming that LC has 20 million books and each requires 1 MB. Of course, LC has much other stuff besides printed text, and this other stuff would take much more space. The 20-terabyte size of the Library of Congress derived by assuming that LC has 20 million books and each requires 1 MB. Of course, LC has much other stuff besides printed text, and this other stuff would take much more space. From Lesk From Lesk

17 Library of Congress data (cont) 1. Thirteen million photographs, even if compressed to a 1 MB JPG each, would be 13 terabytes. 2. The 4 million maps in the Geography Division might scan to 200 TB. 3. LC has over five hundred thousand movies; at 1 GB each they would be 500 terabytes (most are not full-length color features). 4. Bulkiest might be the 3.5 million sound recordings, which at one audio CD each, would be almost 2,000 TB. This makes the total size of the Library perhaps about 3 petabytes (3,000 terabytes).

18 How Much Information Is There In the World?

19 Lesk’s Conclusions There will be enough disk space and tape storage in the world to store everything people write, say, perform or photograph. For writing this is true already; for the others it is only a year or two away. There will be enough disk space and tape storage in the world to store everything people write, say, perform or photograph. For writing this is true already; for the others it is only a year or two away.

20 Lesk’s Conclusions (cont) The challenge for librarians and computer scientists is to let us find the information we want in other people's work; and the challenge for the lawyers and economists is to arrange the payment structures so The challenge for librarians and computer scientists is to let us find the information we want in other people's work; and the challenge for the lawyers and economists is to arrange the payment structures so that we are encouraged to use the work of others rather than re-create it. that we are encouraged to use the work of others rather than re-create it.

21 The huge volume of data implies: Linearity of algorithms is a must Linearity of algorithms is a must Emphasis in pattern matching Emphasis in pattern matching Increased preprocessing Increased preprocessing Different levels of memory transfer rates Different levels of memory transfer rates Algorithmic incrementality (avoid redoing tasks) Algorithmic incrementality (avoid redoing tasks) Need of approximate algorithms ( optimization ) Need of approximate algorithms ( optimization ) Distributed computing Distributed computing Centralized parallelism (Blue Gene, Argonne) Centralized parallelism (Blue Gene, Argonne)

22 The importance of pattern matching (searches) in large number of items Pattern matching has to be “tolerant” (approximate) Find closest matches (dynamic programming, optimization) Sequences Sequences Pictures Pictures 3D structures (e.g. proteins) 3D structures (e.g. proteins) Sound Sound Photos Photos Video Video

23 Trends in computer cycles (speed) Moore’s law appears to be applicable until at least 2020 Moore’s law appears to be applicable until at least 2020

24 Use of supercomputers (2006) Researchers at Los Alamos National Laboratory have set a new world's record by performing the first million-atom computer simulation in biology. Using the "Q Machine" supercomputer, Los Alamos computer scientists have created a molecular simulation of the cell's protein-making structure, the ribosome. The project, simulating 2.64 million atoms in motion, is more than six times larger than any biological simulations performed to date. Researchers at Los Alamos National Laboratory have set a new world's record by performing the first million-atom computer simulation in biology. Using the "Q Machine" supercomputer, Los Alamos computer scientists have created a molecular simulation of the cell's protein-making structure, the ribosome. The project, simulating 2.64 million atoms in motion, is more than six times larger than any biological simulations performed to date.

25 Graphical visualization of the simulation of a Ribosome at work

26 Network transmission speed (Lambda Rail Net) USA backbone USA backbone

27 Trends in Transmission Speed The High Energy Physics team's demonstration achieved a peak throughput of 151 Gbps and an official mark of Gbps beating their previous mark for peak throughput of 101 Gbps by 50 percent. The High Energy Physics team's demonstration achieved a peak throughput of 151 Gbps and an official mark of Gbps beating their previous mark for peak throughput of 101 Gbps by 50 percent.

28 Trends in Transmission Speed II The new record data transfer speed is also equivalent to serving 10,000 MPEG2 HDTV movies simultaneously in real time, or transmitting all of the printed content of the Library of Congress in 10 minutes. The new record data transfer speed is also equivalent to serving 10,000 MPEG2 HDTV movies simultaneously in real time, or transmitting all of the printed content of the Library of Congress in 10 minutes.

29 Trend in Languages Importance of scripting and string processing Importance of scripting and string processing XML, Java C++, Trend towards Python, Matlab, Mathematica XML, Java C++, Trend towards Python, Matlab, Mathematica No ideal languages No ideal languages No agreement of what the first language ought to be No agreement of what the first language ought to be

30 A recently proposed language ( Fortress 2006 ) From Guy Steel, The Fortress Programming Language, Sun Micro- Systemshttp://iic.harvard.edu/documents/steeleLecture2006public.pdf From Guy Steel, The Fortress Programming Language, Sun Micro- Systemshttp://iic.harvard.edu/documents/steeleLecture2006public.pdfhttp://iic.harvard.edu/documents/steeleLecture2006public.pdf

31 Fortress Language (Sun, Guy Steele)

32 Meta-level approach to teaching Learn 2 or 3 languages and assume that expertise in other languages can be acquired on the fly. Learn 2 or 3 languages and assume that expertise in other languages can be acquired on the fly. Hopefully, the same will occur in learning a topic in depth. Once in-depth research is taught using a particular area it can be extrapolated to other areas. Hopefully, the same will occur in learning a topic in depth. Once in-depth research is taught using a particular area it can be extrapolated to other areas. Increasing usage of canned programs or data banks Typical examples: GraphViz, WordNet Increasing usage of canned programs or data banks Typical examples: GraphViz, WordNet

33 Trends in Algorithmic Complexity Overcoming the scare of NP problems Overcoming the scare of NP problems (it happened before with undecidability) (it happened before with undecidability) 3-SAT lessons 3-SAT lessons Mapping polynomial problems within NP Mapping polynomial problems within NP Optimization, approximate or random algorithms Optimization, approximate or random algorithms

34 Three Examples Example I The lessons of BLAST (preprocessing, incrementability, approximation ) Example I The lessons of BLAST (preprocessing, incrementability, approximation ) Example II The importance of analyzing very large networks. Example II The importance of analyzing very large networks. (probability, sensors, sociological implications) (probability, sensors, sociological implications) Example III Time Series. Example III Time Series. (data mining, pattern searches, classification) (data mining, pattern searches, classification)

35 Example I (History of BLAST) sequence alignment Biologists matched sequences of nucleotides or aminoacids empirically using Dot Matrices Biologists matched sequences of nucleotides or aminoacids empirically using Dot Matrices

36 Dot matrices

37 No exact matching

38 Alignment with Gaps

39 Dynamic Programming Approach

40 Dynamic Programming complexity O(n 2 )

41 Two solutions with gaps Complexity can be exponential for determining all solutions

42 The BLAST approach complexity is almost linear Equivalent Dot Matrices would have the size 3 billion columns ( human genome ) 3 billion columns ( human genome ) and and Z rows where Z is the size of the sequence being matched against a genome ( possibly tens of thousands ) Z rows where Z is the size of the sequence being matched against a genome ( possibly tens of thousands )

43 BLAST Tricks Preprocessing Preprocessing Compile the locations in a genome containing all possible “seeds” (combinations of 6 nucleotides or aminoacids) Compile the locations in a genome containing all possible “seeds” (combinations of 6 nucleotides or aminoacids) Hacking Hacking Follow diagonals as much as possible (Blast strategy) Follow diagonals as much as possible (Blast strategy) Use dynamic programming as a last resort Use dynamic programming as a last resort

44 Lots of approximations but a very successful outcome No multiple solutions No multiple solutions BLAST may not find best matches BLAST may not find best matches The notion of p-values becomes very important (probability of matches in random sequences) The notion of p-values becomes very important (probability of matches in random sequences) Tuning of the BLAST algorithm parameters Tuning of the BLAST algorithm parameters Mixture of hacking and theory Mixture of hacking and theory Advantage: satisfies incrementability Advantage: satisfies incrementability

45 Example II (Networks and Sociology)

46 Money travels (bills)

47 Probabilities P(time,distance)

48 Money travels The entire process could be implemented using sensors. The entire process could be implemented using sensors. Mimics spread of disease. Mimics spread of disease. The impact of computing will go deeper into the sciences and spread more into the social sciences (Jon Kleinberg, 2006) The impact of computing will go deeper into the sciences and spread more into the social sciences (Jon Kleinberg, 2006)

49 Example III (Time Series) Illustrates data mining and how much CS can help other sciences Illustrates data mining and how much CS can help other sciences Slides from Dr Eamonn Keogh University of California. Riverside,CA

50 Examples of time series

51 Time Series (cont 1)

52 Time Series (cont 2)

53 Time Series (cont 3)

54 Time Series (cont 4)

55 Time Series (cont 5)

56 Using Logic Programming in Multivariate Time Series (Sleep Apnea) from G Guimarães and L. Moniz Pereira

57 Back to curricula recommendations Present status (USA) and suggested changes Present status (USA) and suggested changes

58 Current recommended curricula ACM, SIGCSE 2001 (USA) 1. Discrete Structures (43 core hours) 1. Discrete Structures (43 core hours) 2. Programming Fundamentals (54 core hours) 2. Programming Fundamentals (54 core hours) 3. Algorithms and Complexity (31 core hours) 3. Algorithms and Complexity (31 core hours) 4. Programming Languages (6 core hours) 4. Programming Languages (6 core hours) 5. Architecture and Organization (36 core hours) 5. Architecture and Organization (36 core hours) 6. Operating Systems (18 core hours) 6. Operating Systems (18 core hours) 7. Net-Centric Computing (15 core hours) 7. Net-Centric Computing (15 core hours) 8. Human-Computer Interaction (6 core hours) 8. Human-Computer Interaction (6 core hours) 9. Graphics and Visual Computing (5 core hours) 9. Graphics and Visual Computing (5 core hours) 10. Intelligent Systems (10 core hours) 10. Intelligent Systems (10 core hours) 11. Information Management (10 core hours) 11. Information Management (10 core hours) 12. Software Engineering (30 core hours) 12. Software Engineering (30 core hours) 13. Social and Professional Issues (16 core hours) 13. Social and Professional Issues (16 core hours) 14. Computational Science (no core hours) 14. Computational Science (no core hours) From Domik G.: Glimpses into the Future of Computer Science Education University of Paderhor, Germany

59 Changing Curricula Two extremes Two extremes Increased Generality and Limited Depth Limited Generality and Increased Depth

60 The two extremes in graphical form Breadth ( generality ) D Depth

61 The MIT pilot program for freshmen At MIT there is a unified EECS department At MIT there is a unified EECS department Two choices for the first year course: Robotics using probabilistic Bayesian approaches (CS) Robotics using probabilistic Bayesian approaches (CS) Study of cell phones inside out (EE) Study of cell phones inside out (EE)

62 Concrete suggestions I Teaching is inextricably linked to research. Time and resources govern curriculum changes. Gradual changes are essential. Avoid overlap of material among different required courses. If possible introduce an elective course on Current trends in computer science. Deal with massive data even in intro courses.

63 Concrete suggestions II When teaching algorithms stress the potential of: Preprocessing Preprocessing Incrementality Incrementality Parallelization Parallelization Approximations Approximations Taking advantage of sparseness Taking advantage of sparseness

64 Concrete suggestions III Emphasize probability and statistics Emphasize probability and statistics Bayesian approaches Bayesian approaches Hidden Markov Models Hidden Markov Models Random algorithms Random algorithms Clustering and classification Clustering and classification Machine learning and Data Mining Machine learning and Data Mining

65 Finally, … Encourage interdisciplinary work. Encourage interdisciplinary work. It will inspire new directions in computer science. It will inspire new directions in computer science. Thank you!!

66 Future of Computer Intensive Science in the U.S. (Daniel Reed 2006) Ten years – a geological epoch on the computing time scale. Looking back, a decade brought the web and consumer , digital cameras and music, broadband networking, multifunction cell phones, WiFi, HDTV, telematics, multiplayer games, electronic commerce and computational science. Ten years – a geological epoch on the computing time scale. Looking back, a decade brought the web and consumer , digital cameras and music, broadband networking, multifunction cell phones, WiFi, HDTV, telematics, multiplayer games, electronic commerce and computational science. It also brought spam, phishing, identity theft, software insecurity, outsourcing and globalization, information warfare and blurred work-life boundaries. What will a decade of technology advances bring in communications and collaboration, sensors and knowledge management, modeling and discovery, electronic commerce and digital entertainment, critical infrastructure management and security? It also brought spam, phishing, identity theft, software insecurity, outsourcing and globalization, information warfare and blurred work-life boundaries. What will a decade of technology advances bring in communications and collaboration, sensors and knowledge management, modeling and discovery, electronic commerce and digital entertainment, critical infrastructure management and security? What will it mean for research and education? What will it mean for research and education? Daniel A. Reed is the director of the Renaissance Computing Institute. He also is Chancellor's Eminent Professor and Vice-Chancellor for Information Technology at the University of North Carolina at Chapel Hill. Daniel A. Reed is the director of the Renaissance Computing Institute. He also is Chancellor's Eminent Professor and Vice-Chancellor for Information Technology at the University of North Carolina at Chapel Hill.

67 Cyberinfrastructure and Economic Curvature Creating Curvature in a Flat World (Singtae Kim, Purdue, 2006) Cyberinfrastructure is central to scientific advancement in the modern, data-intensive research environment. For example, the recent revolution in the life sciences, including the seminal achievement of sequencing the human genome on an accelerated time frame, was made possible by parallel advances in cyberinfrastructure for research in this data-intensive field. Cyberinfrastructure is central to scientific advancement in the modern, data-intensive research environment. For example, the recent revolution in the life sciences, including the seminal achievement of sequencing the human genome on an accelerated time frame, was made possible by parallel advances in cyberinfrastructure for research in this data-intensive field. But beyond the enablement of basic research, cyberinfrastructure is a driver for global economic growth despite the disruptive 'flattening' effect of IT in the developed economies. But even at the regional level, visionary cyber investments to create smart infrastructures will induce 'economic curvature' a gravitational pull to overcome the dispersive effects of the 'flat' world and the consequential acceleration in economic growth. But beyond the enablement of basic research, cyberinfrastructure is a driver for global economic growth despite the disruptive 'flattening' effect of IT in the developed economies. But even at the regional level, visionary cyber investments to create smart infrastructures will induce 'economic curvature' a gravitational pull to overcome the dispersive effects of the 'flat' world and the consequential acceleration in economic growth.

68 Miscellaneous I Claytronics Claytronics Game theory (economics - psychology) Game theory (economics - psychology) Other examples in bioinformatics Other examples in bioinformatics Beautiful interaction between sequence (strings) and structures Beautiful interaction between sequence (strings) and structures Reverse engineering Reverse engineering In biology Geography and Phenotype (external structural appearance) are of paramount importance In biology Geography and Phenotype (external structural appearance) are of paramount importance Systems Biology Systems Biology

69 Miscellaneous II Cross word puzzle using Google Cross word puzzle using Google Skiena and statistical NLP Skiena and statistical NLP


Download ppt "Updating Computer Science Education Jacques Cohen Brandeis University Waltham, MA USA January 2007."

Similar presentations


Ads by Google