introduction Reading –Rubin chapter 2. –Rubin chapter 4 until page 153 –Library of Congress "Copyright Basics", available at http://www.copyright.gov/circs/circ1.html Structure –information science –information policy
Taylors 1966 definition Information science is the science that investigates the properties and behavior of information, the forces governing the flow of information, and means of processing information for optimum accessibility and usability. The processes include the originations, dissemination, collection, organization, storage retrieval and use of information.
Rubins organization 1. Information needs, information seeking, information use and information users. 2. Information storage and retrieval. 3. Defining the nature of information and its use. 4. Bibliometrics and citation analysis 5. Management and administrative issues. ?. new areas
JITA classification This is about the only publicly available library and information science classification scheme http://eprints.rclis.org/jita.html It was done for the E-LIS system of Library and Information Science (LIS) eprints at http://eprints.rclis.org.
area 1: information needs Much of this literature waffles vaguely about imprecise concepts. Information seeking in context is now popular. Some broad trends –People prefer personal to institutional sources. –People seldomly see librarians as a source. –People make little effort.
Berrypicking (Bates 1989) Users sift through information like pickers of berries. –The query is constantly shifting. –Users may move through a variety of sources. –New information may give people new ideas and direction –The value of information is all the bits and pieces gathered during the process. This contrast sharply with information retrieval research.
Kuhlthau (1991) Proposes a 6-stage information seeking process –Initiation –Selection –Exploration –Formulation –Collection –Presentation IMHO perfectly useless.
area 2: information retrieval Information storage is not really much of an issues anymore. When I dealt with it I meant storage as including the organization of the information, which is a bit of a stretch Ideally, one needs to know the retrieval needs before designing the organization of the information
information retrieval This has to do with anything of how the user gets to the information out of an information system. It is different from data retrieval since the retrieved data has to be relevant to the user. It is very difficult to say what relevance is, objectively.
typical research This usually involves looking at a set of documents that have been classified. Then we can pick computer algorithms that best sort the documents satisfying the user need from those who dont. Usually this stuff is heavily mathematical/computational. I have been applying work from that area.
information retrieval performance How was it for you? The traditional methods are –precision = number of relevant documents retrieved divided by total number of retrieved documents –recall = number of relevant documents retrieved divided by total number of relevant document. They only evaluate a search! I have done some work in that area.
information retrieval models They give formal account of the retrieval process. there are three basic flavor –Boolean information retrieval –Vector information retrieval –Probabilistic information retrieval All are mathematical model I would also add web information retrieval as a new type
web information retrieval This has become big business now because finding a users need is a way to connect them with advertising. One way that has made Google such a success is that they discovered a way to make quality web sites appear at the top. Basically, a quality web site is one that has many links to it from other quality sites.
information storage It can mean the preparation of information before searching –which fields are searchable –can there be a variety of means to rank searches? –is there use of a controlled vocabulary It is difficult to make general conclusions but to say that advanced search features are not much used.
human-computer interface Tries to understand how users work with computer systems. The idea is to build user-friendly systems. But dont leave that to a computer designer as suggested by Rubin. Note that information systems go way beyond computers. This area is usually connected to psychology.
natural language processing Rubin classifies this as a part of computer- human interface. Natural language processing is still in its infancy. Speech recognition is the best developed part. Others are working on connecting computers to the brain.
artificial intelligence This has been around for a while. The field has developed a number of theoretical tools. Some of them are being used in practice now. Things like RDF, the Resource Description Framework, are based on artificial intelligence theory. It is a tool to aggregate knowledge from web resource. Still no practical application that demonstrates the use of AI on the web.
Area 3: defining information & its value There is debate on the nature of –data (Thomas: things that can be processed in the information system) –knowledge (Thomas: stuff that is in peoples head) –information (something between data and knowledge). Rubin says its meaning given to data. Rubin also talks about wisdom as knowledge applied for the benefit of humanity
scientific view of information Usually information is modeled as something that reduces uncertainty People have a rough idea about something, say tomorrows temperature The information is the fact that this something will actually take a precise value, when we know what the temperature is or when we have less uncertainty. There is an approach to measuring information through the concept of entropy. Thomas used to teach such stuff.
value of information Economists can use a probabilistic model we can set out an approach that puts value to information. But their definition is useless for practical purposes. Much of the work then involves some cost/benefit analysis. In such analysis one can reach almost any result one wants.
elements of value-added in libraries access to resources accuracy (for example of bibliographic data) browsing (like in library stacks) currency (things are up-to-date) flexibility (through human interaction) formatting (laying out the collection, signs) interfacing (probably close to flexibility) ordering (buy access to things) access to means to get to resources
area 4: bibliometrics Is the application of quantitative methods to the study of information resources Mainly concerned with the structure of the resources. The typical example is citation analysis. Quantitative studies of use fall more to the first area of interest. An expanding area is the use of network analysis.
bibliometric laws Zipfs law related to the usage of terms in text. Lotkas law related to the number of papers written by authors. Bradfords law relates to the distribution of articles in a field across a number of periodicals.
citation analysis This is the heart of bibliometrics. Two important concept –bibliographic coupling means two documents share some reference –co-citation means two documents are cited by the same documents Citation analysis is also important for scientific activity evaluation.
area 5: management & admin This is an expanding area in libraries. Rather than collecting physical books, libraries have to negotiate on-line access. Area covers all of information policy. Example problems are –copyright –censorship Measuring performance is part of user studies
service evaluation This is an important area is libraries. Libraries need to demonstrate value in order to fight for their continued existence. They also need to examine usage of the systems that the vendors propose.
area 6: information architecture art and science of organizing information and its interfaces so that seekers find what they want quickly mainly used with respect to large web sites. it looks at the contents rather than technical factors or the look-and-feel A related idea is usability
area 7: knowledge management this comes from the business environment it is a management fad that has overstayed its welcome.
information policy This is any –law –regulation –practice that affects the –creation-- organization –acquisition-- dissemination –evaluation of information.
private value of information Information has value for its creators. Some creators require that you pay them in order to use that information. US law encourages the private creation of information and knowledge. There is market for information.
limiting access to information The creators of commercial information providers are concerned about unpaid access. Other companies, that do not primarily produce information may also be concerned about leaking of data such as –R&D data – financial data –product information
protecting privacy This is a major issue in society in general. Financial, health and other data are protected by law. In libraries, the concern has been the protection of circulation records. The Patriot Act has created fairly loose conditions under which law enforcement agencies can access circulation records.
freedom of information This refers to the idea that government information other than –military secrets –law enforcement records –private medical and financial information Government information should be made available to the citizens so they can scrutinize government. This should be an important task for public libraries with respect to local government.
private dissemination of public information There has been a tendency away from giving the distribution of government documents by the government printing office to private companies. This has caused some as the companies charge the taxpayers for something that has already been produced at the taxpayers expense. Such companies can copyright the information that the government could not.
example: legal information In principle, text of laws and legal information should be free Some old data of it still is in print form and can not be circulated without some cost Recent data could all be made available on the web. The judicial system does not organize upload of data and organization of data well.
national security Protecting the cyber infrastructure has been made a priority. But nothing much is there that the government can directly do to protect private installations. There have been some restrictions on the distribution of formerly government data that has been considered to provide information on potential terror targets.
the library awareness program The FBI started to monitor the use of libraries by foreign individuals in the 70s. Libraries were believed to be places where foreign agents could get critical intelligence to gain a technological edge. Since the material held in libraries is published and sold commercially, it seems quite silly to monitor its use.
the Patriot Act some knee-jerk legislation to try to protect the USA –increases power to monitor citizens behavior –authorizes roving wiretap –intelligence authorities can require any business record the person concerned in the record must not be informed there is a gagging order to disseminate information about the request no independent judicial review of the request
control of expressions There has been a long history of censorship in all countries at all times. –IRA/Sinn Fein –sexually explicit material Sometimes the pressure on artistic works is indirect, e.g. through funding channels. Libraries generally fight censorship, but they have to keep their target communities in mind.
http://openlib.org/home/krichel Please shut down the computers now. Thank you for your attention!