WarrZone 2000

back to WarrZone

What is Cheminformatics?

SciFinder records the original talk as Warr, W. A. Balancing the needs of the recruiters and the aims of the educators. Book of Abstracts, 218th ACS National Meeting, New Orleans, Aug. 22-26 (1999), COMP-081.


EXTRACT FROM 218TH ACS NATIONAL MEETING AND EXPOSITION NEW ORLEANS, LOUISIANA, AUGUST 22-26, 1999

A report by Dr. Wendy A. Warr
Wendy Warr & Associates
March 2000

Wendy A Warr, Wendy Warr & Associates
6 Berwick Court, Holmes Chapel, Cheshire, CW4 7HZ,
\UK.
Email: wendy@warr.com

This paper considered whether cheminformatics is a discipline in its own right and discussed the demand for practitioners of the discipline. A "straw poll" has been carried out, largely on the Internet. About 80% of the email responses were from industry (this in itself is noteworthy) so I had telephone conversations with some well-known academics. In my talk, I thanked the experts (about 30 of them) who participated. About two years ago messages on listservers on this subject were mainly critical about the use of the term "chemoinformatics". Some people felt that it was a neologism invented by information professionals who felt that "chemical information" was not a sexy enough name to safeguard their jobs. Opinion has now shifted towards acceptance of chemoinformatics as a discipline although not everyone agrees about the definition, or even about the syntax: 50% of respondents like "chemoinformatics", and 20% "cheminformatics". A smaller proportion of respondents say "chemical informatics". One company even uses the spelling chemiinformatics. Even though the term "chem(o)informatics has become accepted, few people like the use of the job title "chemoinformatician". Pfizer and Aurora Biosciences, however, appear to use it publicly.

Greg Paris of Novartis supplied the following definition: "Chem(o)informatics is a generic term that encompasses the design, creation, organisation, storage, management, retrieval, analysis, dissemination, visualisation and use of chemical information, not only in its own right, but as a surrogate or index for other data, information and knowledge." (His full definition was abbreviated for the current presentation.) Frank Brown of Johnson & Johnson says the discipline is "mixing of information resources to transform data into information, and information into knowledge, for the intended purpose of making better decisions faster in the arena of drug lead identification and optimisation" (from Brown, F. K. Chemoinformatics: what is it and how does it impact drug discovery? Annu. Rep. Med. Chem. 1998, 33, 375-384). Mike Hann of Glaxo Wellcome has written on the subject in Hann, M.; Green, R. Chemoinformatics - a new name for an old problem? Current Opinion in Chemical Biology 1999, 3, 379-383.

A number of people have supplied me with job descriptions; some companies occasionally advertise jobs on my Web site. I read out two very different job descriptions, one for a "cheminformatician", one for a "research database administrator - chemiiinformatics". It is generally felt that the chemical information professional does online searching, does data retrieval, does database maintenance, searches the scientific literature and the Web and uses the chemical entity as an index to the literature etc. The skills required for these activities are a degree in chemistry and an MLS, knowledge of information sources, some computer skills, and foreign language skills if possible. Computational chemists are concerned with macromolecular targets, discovery projects, 3D data, protein ligand docking, modelling, QSAR, and theoretical method development. They tend to operate in a narrow domain with just a few chemical structures.

It is the data explosion arising from the adoption of high throughput screening and combinatorial chemistry that has led to the acceptance of chemical informatics as a new discipline. The tasks involved are database design, SAR, programming, multivariate analysis, pattern recognition, library design, calculation of properties, data mining, chemical registration, and library enumeration. Skills needed are an understanding of (medicinal) chemistry; ISIS, Web, Visual Basic and ORACLE experience; ability to analyse and correlate data from massive data banks; programming (UNIX scripting C++); interpersonal skills; enthusiasm; ability to look at the bigger picture; and project management skills.

When asked what industry sought, one well known academic said the job candidate should have a "first" degree in chemistry, or a related discipline [the audience seemed to agree with me that this means a masters degree or higher in the USA], some programming experience, and a PhD from the school of Philip Dean, or Johann Gasteiger, or Peter Johnson, or Irwin Kuntz, or Bob Pearlman, or Peter Willett. The graduates of Graham Richards' and Rod Hubbard's schools are more likely to follow careers in computational chemistry, as defined above.

I therefore asked "Where are they now?" for those people who had done research degrees under three of the above experts. Richards' group at Oxford has a Web page answering this question (at http://bellatrix.pcl.ox.ac.uk). Eighteen are academics, 14 are in the pharmaceutical, or related industry, and two are pursuing other careers. Willett's chemistry PhDs from 1987-1999 have the following career profiles:

Academia 7
Government IT 2
Web company 1
Pharma industry or related 11
CCDC 1

Johnson has recently turned out more entrepreneurs (at least two, not counting the ORAC/Synopsys team), plus one scientist in the pharmaceutical industry and one in academia.

Supply of chemical informatics experts is clearly not meeting the demand. One academic I spoke to in person says that he is asked (from Germany, Switzerland, the US and the UK) if he has a suitable job candidate about once every three weeks. A second academic has the same experience. There seem to be less than 12 candidates on the market each year whereas more than 40 a year must be needed. My contact felt that more 100 candidates graduating per annum would be too many.

Problems for educators include no access to combinatorial chemistry data, no access to high throughput screening data, shortage of the right teaching staff, a fast changing field, the time lag between supply and demand, and the status of interdisciplinary education (degrees with more than one specialism can be viewed with suspicion by student and teacher alike). Also, there are now more bioscience majors and fewer in the physical sciences (my thanks to Dana Roth for drawing my attention to this - see Nature, July 22, 1999, pp. 309-310).

Some specialised courses are beginning to appear e.g., Indiana University's course in Chemical Informatics and the Virtual School of Molecular Sciences at Nottingham. On the whole, though, practitioners of chemical informatics are still tending to learn "on the job".