Un peu de veille en sciences de l'information et de la documentation
| par Fabrizio Tinti |







Billets_récents

______________________


______________________

Ma_bib
Site web
Ressources SHS
BSPO@SlideShare
BSPO@LinkedIn

______________________

Coin_perso
Sur la liseuse (1)
Sur la liseuse (2)
Sur la platine
The Eternal (Sonic Youth)
The Dead Weather [vidéo]
Fresh Blood (Eels)
For What It's Worth (Placebo)
Dark Night Of The Soul
Die Slow (Health)


mercredi, 15 juillet 2009

D-Lib Magazine (août 09)

Au sommaire, notamment, du dernier n° de D-Lib Magazine (vol. 15, n° 7-8, juillet-août 09):

Articles:

"This article will discuss how to measure the accuracy of Optical Character Recognition (OCR) output in a way that is relevant to the needs of the end users of digital resources. A case study measuring the OCR accuracy of the British Library's 19th Century Newspapers Database provides a clear example of the benefits to be gained from measuring not just character accuracy but also word and significant word accuracy. As OCR primarily facilitates searching, indexing and other means of structuring the user experience of online newspaper archives, measuring the word and significant word accuracy of the OCR output is very revealing of a resource's likely performance for these functions. Having such data is therefore extremely helpful for planning and quality assurance assessment. After briefly discussing the role of OCR in the text capture process and how OCR works, we give a detailed description of the methodology, statistical data gathering techniques and analysis used in this study. Our conclusions point the way forward with suggested actions to assist other mass digitization projects in applying these techniques."

"This article is motivated by the demand for unified access to the wealth of distributed digital cultural collections, allowing users to make queries and discover information about them through integrated processes. Our effort originates from the semantic interoperability perspective and considers CIDOC/CRM as the mediating schema, which integrates in an optimal way the semantics of the collection-level metadata schemas and application profiles. The research reveals the complexity of mapping metadata schemas to ontologies and resolves particular difficulties by presenting the crosswalk between Dublin Core Collections Application Profile and CIDOC/CRM."

Comptes rendus de conférence:

  • Doing So Much More: The Fourth Annual International Conference on Open Repositories (OR09)

mardi, 14 juillet 2009

OpenPub

Catalogablog signale:

"A specification is being developed for distribution of books via a catalog, the OpenPub standard.

OpenPub is an initiative of Lexcycle, Adobe, the Internet Archive, and O'Reilly Media to create an Open Publication Distribution System (OPDS) enabling the widespread discovery, description, and access of book and other published material on the open web. OPDS utilizes existing or emergent open standards and conventions such as ATOM with a priority on simplicity and extensibility.

Libraries might have an interest in this. Why are none among the developers?"

19:53 Publié dans Catalo, Livres, Métadonnées, Standards | Lien permanent | Commentaires (0) | Envoyer cette note | |  del.icio.us | |  Facebook

dimanche, 12 juillet 2009

OpenMIC

METS Tool

"[...] OpenMIC includes a complete METS metadata implementation with structure map, descriptive metadata, source metadata, technical metadata and rights metadata documents. OpenMIC incorporates MODS, Dublin Core, MIX (NISO technical metadata for images) AES (technical metadata for sound recordings) and PREMIS. [...]"

(source: Catalogablog, 09/07/09)

10:52 Publié dans Catalo, Métadonnées | Lien permanent | Commentaires (0) | Envoyer cette note | |  del.icio.us | |  Facebook

jeudi, 09 juillet 2009

Workflow des métadonnées des ouvrages: un livre blanc

Streamlining Book Metadata Workflow

"The white paper was commissioned by NISO and OCLC as a follow-up to the Symposium for Publishers and Librarians held by OCLC on March 18-19, 2009 to discuss book metadata. This paper analyzes the current state of metadata creation, exchange, and use throughout the book supply chain. With the number of book formats multiplying and the amount of digital content growing rapidly, the metadata required to support the discovery, sale, and use of content by a global audience is increasing exponentially. At the same time economic pressures on all stakeholders in the supply chain from publishers, wholesalers, booksellers, metadata vendors, and librarians present greater challenges to providing quality and comprehensive metadata at every point in the cycle. Through interviews with over 30 industry representatives, Luther has created a book metadata exchange map illustrating the process and has identified opportunities for eliminating redundancies and making the entire process more efficient."

(source: NISO & OCLC, 30/06/09)

07:59 Publié dans Livres, Métadonnées, Standards | Lien permanent | Commentaires (0) | Envoyer cette note | |  del.icio.us | |  Facebook

lundi, 29 juin 2009

The Code4Lib Journal (n° 7)

Au menu du n°7 du Code4Lib Journal:

In 2000 a small public library system in New Zealand developed and released Koha, the world’s first open source library management system. This is the story of how that came to pass and why, and of the lessons learnt in their first foray into developing in open source.

This paper discusses the analysis of Apache web server logs from a faceted catalog interface (OPAC) at North Carolina State University. By grouping individual HTTP requests into user sessions and analyzing in that context, requests can be understood as particular user actions, with more specificity as to purpose and effect of an action. Client IP address and time are used as a sufficient proxy for determining user sessions from logs. Some initial exploratory findings of user behavior in the NCSU OPAC are provided, including that users make use of facets less than of text searching, and that some facet groups are used significantly more than others. Links are provided to the scripts used to make this session-based analysis, which could be modified for use with other facetted OPACs which use an Apache front-end.

The UW-Madison Libraries Library Course Page system is used to deliver electronic reserves materials and course-focused library instruction webpages to students. As part of a rewrite of our system we broke the application into three component pieces: a file repository, a course timetable data service, and an interface application for building and viewing individual course pages. The new three-piece system was written with an inward facing service-oriented architecture that allowed us to choose the best technologies to solve each of the tasks the entire system needs to accomplish.

JAbbr is an online tool developed at Cornell University to help users decipher journal title abbreviations. This article discusses why these abbreviations are so problematic, and how traditional tools are often insufficient, and then describes the novel approach used by JAbbr. Given an abbreviation, JAbbr creates a regular expression for fuzzy matching, tests it against a list of serial titles extracted from the library catalog, and returns a list of possible matches to the user. JAbbr is available as a web site and as a web service.

This article describes the workflow used by the University of Iowa Libraries to populate their institutional repository and their catalog with the data collected by ProQuest UMI Dissertation Publishing during the submission of students’ theses and dissertations. Re-purposing the metadata from ProQuest allowed the University of Iowa Libraries to streamline the process for ingesting theses and dissertations into their institutional repository The article includes a discussion of the benefits and limitations of the workflow described.

This article presents the application of part-of-speech (POS) based statistical text analysis to the task of bibliographic metadata extraction from electronic dissertations. By using the approach described here it is possible to detect the title of a Ph.D. paper with an accuracy of about 80%. The accuracy measurements are done using a conceptually simple approach and implementation.

mardi, 23 juin 2009

Projet VMF (Vocabulary Mapping Framework)

Major content metadata vocabularies to be mapped

"Work is under way to create an extensive and authoritative mapping of vocabularies from major content metadata standards, creating a downloadable tool to support interoperability across communities. The work is an expansion of the existing RDA/ONIX Framework into a comprehensive vocabulary of resource relators and categories, which will be a superset of those used in major standards from the publisher/producer, education and bibliographic/heritage communities (CIDOC CRM; DCMI; DDEX; DOI; FRBR; MARC21; LOM; ONIX; RDA – see reference section below for details). The resulting tool will be known as the Vocabulary Mapping Framework (VMF). The new vocabulary is not intended as a replacement for any existing standards, but as an aid to interoperability, whether automatic or human-mediated. The expanded Framework will include mappings of terms from code lists or allowed value sets in the existing standards to the RDA/ONIX vocabulary, enabling the computation of “best fit” mappings between any pairing of standards. The results of the VMF project will be formally presented at an event at the British Library on the morning of November 9th this year, and made available on the Web. The project, which is largely financed by a grant from the UK Joint Information Systems Committee (JISC), is being carried out by Godfrey Rust and Steffen Lindek of Rightscom and Gordon Dunsire, Depute Director of the Centre for Digital Library Research at Strathclyde University in Glasgow, Scotland, with input from other domain experts. A virtual Advisory Group drawn from interested parties is being convened. The International DOI Foundation, which fully endorses this work, will provide the web hosting facility as part of its commitment to promoting the wider use of interoperable metadata, and will use the mapping vocabulary wherever possible to support the association of metadata with DOI names. [...]"

(source: JISC, DOI, etc., 15/06/09 / via Catalogablog)

14:50 Publié dans Catalo, Métadonnées | Lien permanent | Commentaires (0) | Envoyer cette note | |  del.icio.us | |  Facebook

vendredi, 19 juin 2009

SKOS: recommandation W3C

SKOS is a W3C Proposed Recommendation

"The Semantic Web Deployment Working Group has published the Proposed Recommendation of SKOS Simple Knowledge Organization System Reference. SKOS provides a common data model for sharing and linking knowledge organization systems via the Web. SKOS is a vocabulary for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading schemes, taxonomies, folksonomies, and other similar types of controlled vocabulary. As an application of the Resource Description Framework (RDF), SKOS allows concepts to be composed and published on the World Wide Web, linked with data on the Web and integrated into other concept schemes. Along with this publication of the SKOS Reference Proposed Recommendation the Working Group has published an updated SKOS Primer Working Draft. Comments are welcome through 15 July."

(source: Planet RDF, 16/06/09)

21:04 Publié dans Métadonnées, Semantic web, Web | Lien permanent | Commentaires (0) | Envoyer cette note | |  del.icio.us | |  Facebook

vendredi, 12 juin 2009

Comment structurer les données (2)

The Metadata is the Interface: Better Description for Better Discovery of Archives and Special Collections, Synthesized from User Studies

"Structured metadata can be useful internally for collection management and public services, but is not always what users need most to discover primary sources, especially minimally-described collections and “hidden collections.” We understand archival standards for description and cataloging, but our users by and large don’t. Studies show that users often do not want to search for collections by provenance, for example, as important as this principle is for archival collections.5 One of several core competencies that special collections metadata librarians must have is “a keen understanding of users’ needs and preferences.” This is especially important now that discovery happens in multiple environments. Librarians and archivists need to manage archival collections by provenance, but also must describe what is in the collections for their users."

(source: OCLC / via ResourceShelf, 12/06/09)

21:55 Publié dans Métadonnées | Lien permanent | Commentaires (1) | Envoyer cette note | |  del.icio.us | |  Facebook

Comment structurer les données

Blurring the distinction between metadata and content files : datasets

"Case study: An archival body has collated text documents, photographs, plans, and more to assist with the preservation of national heritage buildings and monuments. They would now like to have these digitized and stored in a library and generally made publicly accessible.
This gives rise to some interesting conceptual nuances in figuring out the best way to structure the data in a library (repository) record."

(source: Metalogger, 12/06/09)

21:24 Publié dans Métadonnées | Lien permanent | Commentaires (0) | Envoyer cette note | |  del.icio.us | |  Facebook

Séminaire CIBER: présentations

Italie - Les présentations du dernier séminaire CIBER (Comité interuniversitaire bases de données et édition en ligne) qui a eu lieu début juin sont disponibles ici:

A voir, notamment:

Toutes les notes