Presentation is loading. Please wait.

Presentation is loading. Please wait.

Collections Information Database (CID)

Similar presentations

Presentation on theme: "Collections Information Database (CID)"— Presentation transcript:

1 Collections Information Database (CID)
Gabriele Popp BFI Head of Information & Stephen McConnachie Collections Information Data Manager

2 BFI collections National Archive of Film and Television
60,000 fiction films 120,000 non-fiction film 750,000 television titles Special Collections 30,000 unpublished scripts 15,000 film posters 600 named paper collections 1 million still images 3,000 production & costume designs Library 53,000 books 5,000 journal titles Almost 1 million indexed journal articles Over 1 million digitised press cuttings

3 “What am I. Where did I come from. Where am I going
“What am I? Where did I come from? Where am I going? How long have I got?”

4 The past

5 The future El Cid – data in safe hands

6 CID phase one Combine moving image collection datasets into one single database, incl. TecRec: archive holdings (ca. 1.4m items) BID: filmographic records (800k works / variants) SIDX: subject index / thesaurus (50k entries) DDE: direct data entry project data (50k) Barcoding: standalone item tracking data (ca. 120k) And build a workflow management system…

7 BFI collections management
High volume of access provision to film and video collections – direct access to viewing materials or online Four sites in different geographical locations from Central London to Hertfordshire and Warwickshire Conservation programmes on a large scale – from nitrate film to obsolete video

8 What we do with our collections
Restoration Preparation for projection Film printing Film processing 8

9 … and more Audio encoding Video copying 9

10 What we wanted Workflow to deliver
Standard approach regardless of activity or combination of activities Multiple activities grouped under jobs Defined activities with the ability to repeat as required Automated pick and return lists for retrieval from vaults Assignment of staff resources Working to deadlines for individual activities and the job overall Tracking items as they move through workflow Status for activities, items and jobs: In Progress, Finished, Cancelled or On hold 10

11 Defined workflow activities
Pick items; Return items; Transport out; Transport in; Loan in; Loan out Vaults Operations Inspection; Technical selection; Preparation for printing; Preparation for scanning; Preparation for projection; Other preparation; Service on return Dry Lab Operations SD scanning; HD scanning; 2K/4K scanning; Audio encoding; Video encoding; Transcoding; Ingest data; Data migration Digital Operations Dubbing Video copy; Audio copy Wet Lab Operations Film cleaning; Film printing; Film processing Analogue image grading; Digital image grading; Digital image restoration/manipulation; Silent inter-title restoration; New title creation Film Image Quality Technical acceptance – Theatre; Video quality control; Audio quality control; Digital quality control Quality Control 11

12 Who and How? Supplier: Adlib Off-the shelf
Archival data standard ISAD(G) for hierarchy Museum standard Spectrum for collections management New development of CEN compliant hierarchical data structure and Workflow

13 Project initiation (Nov 2008) Tender process starts (Oct 2009)
Timeline Project initiation (Nov 2008) Tender process starts (Oct 2009) Appointment of supplier (March 2010) Contract signed (May 2010) Development starts (Aug 2010) System goes live on 14 September 2011 Remaining collections (library, posters, stills and designs) added by end of 2012

14 Lessons learnt Opportunities
Being the first In-house vs. external supplier Legacy data – a problem that lasts forever? Change management New department, new roles & responsibilities, e.g. Information Specialists Turning a project into business as usual

15 A new BFI policy BFI consults widely on new Collection Policy archive-collections-policy (publ. 16 Nov 2011) Includes documentation principles Fitness for purpose, Efficiency and Quality Commitment to standards Cataloguing, Format, Information architecture and Vocabularies Roles and responsibilities Overall ownership, data quality, training 􀁹􀀃 Fitness for purpose—all data created shall be fit for purpose and adhere to the defined guidelines agreed for each dataset. Data creation must aim to be accurate and reliable. 􀁹􀀃 Efficiency—data shall be created only once within the dataset designated to hold that type of information. To avoid duplication of data and effort, all subsequent use of the same data shall link to the designated core data set. 􀁹􀀃 Quality—no member of staff shall undertake data entry or editing unless they have received training in the relevant rules and appropriate system. Each dataset shall have a designated member of staff responsible for its data quality. Additional quality checks will be undertaken for a proportion of all data. Validated data shall not be altered to suit particular purposes. 􀁹􀀃 Audit trail—all data shall be traceable through an audit trail as to who created or edited it. Data must reference its source, were applicable, and record source differences. implemented by the relevant communities of practice, such as archives or libraries. 􀁹􀀃 Format—all datasets shall be compatible with international data exchange standards suitable for the type of material, such as EN and EN (European standards on interoperability of film databases). 􀁹􀀃 Information architecture—all datasets shall have a variety of access points. To avoid search inefficiencies, ambiguity and duplication, all access points shall be standardised and derive from a single authoritative list for each category, such a film titles or people. 􀁹􀀃 Subject and genre headings—across all BFI collections shall be streamlined into a single terminology. Maintenance and development of this taxonomy structure shall be undertaken by an information professional; new terms added to the vocabulary shall be vetted by a centralised process. 15

16 The future in action – BFI vocabularies
Film and Television genre terms Revision project began October 2010 on a flat list of 365 genres Final list of 95 preferred and 66 non-preferred terms in a hierarchical structure Data cleaning and enrichment on-going: so far over 200,000 records cleaned Old genres: Compilations, Outtakes, Trailers, N-Lit, Film Noir, Adult films… New genre term: Artists’ Moving Image SN Works which use experimentation to subvert mainstream cinema and television practices UF Abstract film Avant-garde Experimental Structural films Underground films

17 Vocabularies – continued
Subject indexing terms and headings 61,000 terms/headings in one database Consolidation of vocabularies now technically feasible Work begun on methodology and strategy UDC-based thesaurus (47,000 terms) for film and television programmes Pubs UF Public house BT Public refreshment buildings NT Fox Inn RT Bars In-house thesaurus (1,000 terms) for film stills Accessory BT           Clothing NT          Hat UDC classification scheme (4,000 terms) for books Lantern slides Subject heading list (9,000 headings) for journal articles Cinemas. Latin America

18 Meanwhile, in Europe … A new European metadata standard on cataloguing and indexing of cinematographic works emerges… EN 15907:2010 Film identification – Enhancing interoperability of metadata – Element sets and structures Based on FRBR More info at BFI becomes first implementer of new standard

19 Objectives for the collection
Manage: enable search using descriptive and filmographic data (names/country/subject/genre/synopsis) and technical data Link: enable linking of film / TV collection to other BFI collections: stills, posters, designs, scripts, pressbooks, BFI Library books and periodicals Digitise: enable linking of collection records to digitised video / images / documents (eg low-res proxy files, animation cels, correspondence) Deliver: XML to enable delivery of records and linked digital assets through API, for web platforms (eg BFI website, wireless barcode scanners) NOTE: 1. This is the background to our implementation of the CEN standard, it helps to explain why we needed it. 19

20 The data model: EN 15907* NOTE: 1. This is the theoretical structure, with the allowed relationships between the entities. 2. The Work record is the only entity which has Content characteristics, describing the moving image content itself (shotlist, synopsis, genre, subject, etc) 3. Variant is an optional level, and we have not implemented the Variant level in our system, currently. It requires intellectual analysis, work by work, and the volume of our data makes that difficult (800,000 work records). Eventually, we’ll consider the Variant. * Film identification - Enhancing interoperability of metadata - Element sets and structures 20

21 Mapping BFI data to CEN NOTE:
Mapping the various complex BFI datasets to this neat model took a lot of thinking and a lot of work: a team of 8 full-time Documentation Assistants worked for one year to achieve the links between filmographic and technical records. Not easy and not cheap. Not all mapping can be achieved with bulk processes, some requires intellectual analysis, record by record. We started data mapping / linking in Spring 2010, and the system launched in Autumn 2011. 21

22 Work Top level of the ladder: describes the general / abstract qualities of a moving image creation Provides a single access point to all Manifestations / Items Enables collection search using descriptive filmographic as well as technical data – eg country of production / director) Can be standalone, collection, series or component part of larger work NOTE: Serial Work example: Topical Budget newsreel is a Series Work in film. Applies most obviously in television: The Wire and Coronation Street being two prominent examples. 22

23 Work Can define any moving image creation: film television web medical
surveillance training video recorded sporting events etc NOTE: 1. CEN CWS was designed around the model of feature films, but it works for all moving image creations – I hope we’ve proved that in our implementation at the BFI. 2. In any case they are in our system, with this structure. 23

24 Work Includes content descriptions: Descriptive text:
synopsis shotlist Descriptive terms: subject indexing genre 24

25 NOTE: A Work record in CID. Note the ‘obsolete’ status of the Genre term ‘Historical Drama’ – our genre terms are being cleaned, among many other data cleaning tasks. 25

26 Manifestation An embodiment of a Work – includes information about specific release / distribution context, or the place in the life-cycle of the Work: theatrical release non-theatrical (community, training, prison, etc) home viewing (blu-ray, laserdisc, etc) pre-release (assembly edit, censorship submission, etc) web exhibition / distribution television transmission etc. NOTE: 1. The crucial thing about the Manifestation is that it contextualises the Items which belong to it: it clarifies their position in the life-cycle of the moving image work, and it therefore enables a very clear organisation of the object records in relation to the work. 2. It is an organising and contextualising principle. 26

27 Manifestation Includes information about:
format (eg DCP, 35mm Film, Umatic) language / usage (eg French dialogue, English subtitles) sound / silent colour / black and white running time / duration / footage distributing / exhibiting person or institution date (of release, submission, publication, or other relevant contextualising date) place (of release, submission, publication, etc) NOTE: The format, language, sound and other information in the Manifestation can be directly replicated in the Items which belong to them, or the details can be modified in the specific Items – Items can be imperfect, incomplete, anomalous (almost by definition). So, the Manifestation represents an ‘ideal’ which helps to explain and contextualise the Item’s origins, and to understand its imperfections. 27

28 NOTE: A Manifestation record in CID. 28

29 Item The object-level record for an archive holding, a single example of a Manifestation. Includes analogue objects (film, video cassettes and discs, etc) and digital files Encompasses fragments or otherwise incomplete or defective objects (eg single reel of multi-reeler, mute print of sound film) NOTE: A lot of film archive cataloguing is rich at the Item level (the BFI data is), but the difficulty has been: contextualising the object record in useful, clear ways: the Manifestation enables that. searching the collection using descriptive data, rather than object-describing technical data – a good example is country search, previously it has been difficult to produce a list of archive holdings for works produced in Denmark, for example: the Work enables that. 29

30 Item Includes all object-level technical data:
format / description (eg Camera Negative, Separation, Internegative, Digital Betacam, Umatic) gauge / width (eg 35mm, 8mm, 1inch) sound detail (eg combined as sound, mute) colour detail (eg 2 Strip Technicolor, Ektachrome) running time / duration / footage condition / preservation requirements and priorities acquisition history / status / access conditions completeness / number of carriers / components etc NOTE: Some of this is obviously Collections Management information (eg access conditions, acquisition information) rather than object description. 30

31 NOTE: An Item record in CID. 31

32 Web browser view (online by early 2013)
NOTE: The Work record on the left displaying details of titles, production country, synopsis, subjects, etc The hierarchy view on the right shows how the Work – Manifestations – Items sit together. Users can navigate down from the Work, to find the relevant Manifestation, then click through into the Item detailed record 32

Download ppt "Collections Information Database (CID)"

Similar presentations

Ads by Google