Converting investigator-provided terms for describing research data into BIRNLex OWL/RDF concepts
Overview
The guidelines below and attached spreadsheet template are designed to hasten the process of incorporating testbed source database semantic terms (e.g., organisms, brain regions, cell types, imaging techniques, molecules, behavioral assessments, etc.) into fully formalized ontology classes in BIRNLex. The overall objective is to perform this process with the minimal amount of required manual effort. One way to constrain the task is to start with the current content present in the testbed databases, as opposed to specifying the general topics covered (i.e., the domains listed above) and then attempt to describe all potential concepts a given research group might need to represent. With this in mind, it is assumed a given testbed participant will begin with a dump of the unique terms from a given field in their databases (e.g., runing a SELECT UNIQUE... SQL query on a field containing organism tags) which can then be easily imported into the spreadsheet template attached to this page. After processing that spreadsheet according the guidelines given below, the spreadsheet content is run through an automatic process to generate an OWL/RDF representation along with BIRNLex IDs. This is added to BIRNLex, once the converted concept output passes final vetting by the original submitter. Finally, because this process begins with terms derived directly from a testbed source database, the process of then marking up that database with BIRNLex IDs can be fully automatic, since the new BIRNLex IDs will have preferred terms that map one-to-one with those in the testbed source database. This essentially sidesteps the process of forcing investigators to manually map their database terms into BIRNLex using the Concept Mapper, though they may eventually use this tool to vet the results of this auto-conversion process. --- ---Guidelines
- Run SQL "SELECT UNIQUE..." on the columns containing the terms you want incorporated into BIRNLex. Save the results in a text file with each term on single line by itself.
- Open this in Excel using the import feature
- Using the attached template spreadsheet (BIRNLex-Term-Convert-Template-2007-11-21.xlt), cut-n-paste this list of terms from your database into the TERM column (Note that this is an Excel Spreadsheet template file that should be placed in your My Templates MS Office directory. You then create a copy of this template using the Project Gallery option in the Excel File menu).
- Refer to the spreadsheet glossary worksheet for instructions on how to fill out the additional columns in the spreadsheet.
- Send the completed spreadsheet to Bill Bug. He will review the contents of this spreadsheet to determine whether there are any special cases that require special handling during the conversion process. If such cases are identified, they will then be examined in a joint discussion with the original term providers, so as to ensure all terms get properly converted and mapped to BIRNLex
- Bill Bug will autoconvert the spreadsheet content to OWL/RDF and will provide instructions on where you can review the results to ensure the conversion was successful and error free. Should discrepancies be identified, they are corrected back in the Excel spreadsheet, and the autoconversion is run again.
- Once this process has been completed successfully, the resulting new or updated BIRNLex classes will be added to the BIRNLex OWL file release and the associated term-to-BIRNLex map will be added to the Term Index Source by the BIRN-CC Mediation staff under Jeff Grethe to ensure there will be no need for source database domain experts to manually map these terms to BIRNLex using the Concept Mapper.
Testbed labs who've followed this proceedure
- UCLA
- contact:
Conversion Example
- The figures below depict how this process was used to convert the International Union of Pharmacologist classification for Voltage-gated Ion Channels to OWL/RDF
General Workflow
--- --- Term-To-BIRNLex-Convert-Workflow-Nov2007.pdf
Other important information and links
--- ---Future
I'd also suggest we think about getting these spreadsheets set up as Wiki-based objects. What I mean is folks would bring up a wiki page that includes the spreadsheet content embedded in a way that makes it easy for folks to edit them right there in a web browser, as opposed to passing an Excel spreadsheet. We started this a while back, but then backed off expecting the BioPortal would help with something like this. We should still expect to use the BioPortal for searching. Perhaps the Marginalia for collecting notes & comments on BIRNLex classes will work out, too. However, I saw two tools demo'd at the Biocurators meeting coming from the E. coli genome annotation community - GONUTS & EcoliWiki - that have hacked into the MediaWiki middleware so as to adapt MediaWiki to setting up community edited spreadsheet info. It's sort of like the !GoogleDocs spreadsheet feature. The goal would be to set this up so that ultimately after an edit to this spreadsheet is completed by someone - we automatically run the code I have to generate the RDF/OWL file that can then be vetted by the domain experts submitting the concepts/terms. After they approve the RDF/OWL representation (perhaps using BioPortal as an inhouse tool for reviewing OWL ontologies that are a work-in-progress), we'd automatically merge it into the current BIRNLex development release.Future Guidelines
With the added convenience provided by the strategy described above, future guidelines would look more like the following:- Run SQL "SELECT UNIQUE..." on the columns containing the terms you want incorporated into BIRNLex
- Run a quick lexically-enhanced match against current BIRNLex terms (including synonyms, acronyms, etc. - all lexical variant fields)
- Those that don't match auto-populate a Wiki-based spreadsheet with all the required columns laid out (including things like synonym, definition, abbreviation, source URL, etc.)
- The submitter experts bring up that Wiki page and add the additional property columns they want to add (e.g., for nerve cell types, they might add "location of soma", "location of synaptic terminals", "neurotransmitter", etc.. For this, if they were to specify the high-level BIRNLex class (e.g., brain region), then there would be a way for them to pick from existing BIRNLex brain regions to populate that cell.
- Save results (they would go temporarily to an RDBMS
- Hit a "parse to BIRNLex format" button, when they are done with their work on this set of submitted terms
- Go to another URL where an in-house version of the BioPortal would be set up to make it very easy for them to peruse the BIRNLex RDF/OWL version of the concepts they've just submitted to do final QC/vetting.
- One vetted, these new BIRNLex classes would be added to a working version of BIRNLex, available to all the BIRN tools using BIRNLex.
Updates/Progress
*- :
on 07/10/2008 at 13:54