Presentation is loading. Please wait.

Presentation is loading. Please wait.

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 1 Scaling in the Geography of US Computer Science Rui Carvalho.

Similar presentations


Presentation on theme: "PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 1 Scaling in the Geography of US Computer Science Rui Carvalho."— Presentation transcript:

1 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 1 Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty University College London Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL)

2 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 2 Motivation Why Geography? –Scientists: who can I collaborate with in my city/country? –Funding Agencies: where are new research centres emerging? Is regional distribution of funds optimal? –Scientometrics: distinguish between J. Smith (PSU) and J. Smith (UCL); Preprint server challenges: –[USA] NIH-funded investigators are required to submit to PubMed their papers within 1 year of publication (effective May 2, 2005); –[UK] Wellcome Trust-funded papers will in future have to be placed in a central public archive within six months of publication; Data mining challenges: –Processing of large databases give promise to uncover knowledge hidden behind the mass of available data; –Dramatically speed up achievements formerly reached solely by human effort and provide new results that could not have been reached by humans unaided; Statistical Challenges: –Conventional wisdom holds that (geographical) spatial point processes have characteristic scales... –Yet most “real world” phenomena are often far from equilibrium. PNAS, 6 April 2004

3 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 3 Plan Open Archives Datasets: –Citeseer (Computer Science); –arXiv.org (mainly Physics, but also Maths and CS) Geographical Datasets: –The US census bureau makes available on the web datasets for geocoding, but Europe lacks a unified ‘open-access’ database; Plan: –Extract ZIP codes from authors’ addresses; –Map research centres geographically; Questions about the research centres: –How productive are they? –Are there non-trivial spatial structures at a geographical scale?

4 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 4 Plan Open Archives Datasets: –Citeseer (Computer Science); –arXiv.org (mainly Physics, but also Maths and CS) Geographical Datasets: –The US census bureau makes available on the web datasets for geocoding, but Europe lacks a unified ‘open-access’ database; Plan: –Extract ZIP codes from authors’ addresses; –Map research centres geographically; Questions about the research centres: –How productive are they? –Are there non-trivial spatial structures at a geographical scale? Can Statistical Physics Help?

5 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 5 What is Citeseer? Founded by Steve Lawrence and C. Lee Giles in 1997 (NEC); Now at Penn State Archive of computer science research papers harvested from the web and submitted by users; Currently (Dec 2005) contains over 730,000 documents; Citeseer was developed as a model for Autonomous Citation Indexing, i.e. citation indexes are created automatically; Can search content in postscript and PDF files.

6 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 6 Data Collecting and Parsing Citeseer metadata: –525,055 computer science research papers; –399,757 (76.14%) of which are unique; –103,172 (25.81%) of the unique papers have one or more US authors; –2,975 different ZIP codes in the unique papers belong to the US conterminous states (48 states, plus the District of Columbia); 5 most productive ZIP codes: 1.Count: 3950 Zip: Carnegie Mellon Univ, Pittsburgh PA; 2.Count 3403 Zip: MIT, Cambridge, MA; 3.Count: 2954 Zip: Stanford Univ, CA; 4.Count: 2691 Zip: Univ California at Berkley, CA; 5.Count: 2309 Zip: Univ Illinois at Urbana Champaign, IL

7 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 7 Q1: How productive are the research centres?

8 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 8 Q2: Non-trivial spatial structures?

9 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 9 The Geography of Citeseer

10 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 10 Cartograms Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, (2004) Density-equalizing map projections: Diffusion-based algorithm and applications Michael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)

11 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 11 Cartograms Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, (2004) Density-equalizing map projections: Diffusion-based algorithm and applications Michael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)

12 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 12 Cartograms Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, (2004)

13 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 13 Cartograms Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, (2004)

14 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 14 Spatial Point Processes Moments: –First moment: ρ, expected number of points per unit area; –Second moment: Ripley’s function. ρK(r) is the expected number of points within distance r of a point. For a Poisson process, ; But neither the first or second moments give a feel for the way in which spatial distribution changes within an area.

15 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 15 The Two-Point Correlation Function The two-point correlation function describes the probability to find a point in volume dV(x 1 ) and another point in dV(x 2 ) at distance r = |x 1 -x 2 |; For a Poisson process g(r)=1; Edge corrections (Ripley’s Weights): take a circle centred on point x passing through another point y. If the circle lies entirely within the domain, D, the point is counted once. If a proportion p(x,y) of the circle lies within D, the point is counted as 1/p points.

16 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 16 Computation of the Two-Point Correlation Function Intersection with border gives more than one polygon Geographical range at which the two-point correlation function can be approximated by a power- law

17 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 17 Two-Point Correlation Function

18 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 18 Speculation: knowledge diffusion?

19 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 19 Speculation: Universality?

20 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 20 To find out more Spatially Embedded Complex Systems Engineering (SECSE): members: UCL, Leeds, Southampton, Sussex

21 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 21

22 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 22 Plot of state R&D expenditure (NSF) vs population

23 PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 23 Poisson Point Process We say that a spatial process is completely random iff: –The number of events in any planar region A with area |A| follows a Poisson distribution with mean λ |A|, where λ is the density of points; –For any two disjoint regions A and B, the random variables N(A) and N(B) are independent.


Download ppt "PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 1 Scaling in the Geography of US Computer Science Rui Carvalho."

Similar presentations


Ads by Google