SwissSimilarity

In order to avoid security-related warning messages when switching to secured connection, you may want either to:

Click here to proceed.

Frequently Asked Questions

    General, scientific and technical questions

    1. What is the general idea behind SwissSimilarity?
    2. Where do the libraries of molecules come from?
    3. What is a virtual library?
    4. How were the molecules prepared for screening?
    5. What are the available screening methods?
    6. What is the "combined score"?
    7. What are the differences between these approaches?
    8. How do the approaches account for the ligand flexibility?
    9. How much time is required for a screening job?
    10. Why is the time of computation fluctuating?
    11. Why is it not possible to screen some libraries with all available approaches?
    12. Does aromatic or kekule representation change anything for prediction?
    13. Can I draw more than one structure at a time in the sketcher?
    14. Can I cut-and-paste a molecule for input?
    15. Sometimes a pop-up box with Cannot retrieve sketcher instance from iframe appears. What does it mean?
    16. Can I use SwissSimilarity with an input not considered as small molecules (for instance peptides, proteins, other macromolecules)?
    17. Why do I get different results when screening GPCR ligands (ChEMBL) and GPCR ligands (GLASS)?
    18. What are the criteria applied for drug-likeness, lead-likeness and fragment-likeness?
    19. How much time the results will remain on the server?
    20. How can I share the results?
    21. What is the confidentiality of the users queries?

    Answers

    What is the general idea behind SwissSimilarity?

    The objective of SwissSimilarity is to provide a simple web-based tool to perform ligand-based screening of several libraries of small molecules using several different technologies.

    Where do the libraries of molecules come from?

    Molecule libraries were taken from different origins:

    - DrugBank version 4.3. This library contains about 1400 drugs approved by the FDA, as well as drugs withdrawn due to toxicity issues, developmental compounds, illicit molecules and nutraceutical compounds.
    - Ligand Expo database, version 1. This library contains ligands found in the experimental structures available in the Protein DataBank (PDB).
    - ChEMBL version 20. We selected only the molecules with a KD or an IC50 lower than 10 µM as measured by an assay of type B (measuring binding of compound to a molecular target), and tabulated with the highest confidence score. Molecules available as part of the ChEMBL Kinase SARfari and GPCR SARfari were also added to SwissSimilarity.
    - ChEBI, a library of Chemical Entities of Biological Interest, as of June 2015.
    - GLASS, the GPCR-Ligand Association database, as of June 2015.
    - HMDB, the Human Metabolome Database, version 3.6.
    - ZINC, a free database of commercially-available compounds for virtual screening, version 12.
    - several vendors whose catalogues are available as part of ZINC: Asinex, AsisChem, Chembridge, ChemDiv, Enamine, innovaPharm, Maybridge, Otava, Selleckchem, Sigma-Aldrich, SPECS, TimTec, Vitas.

    What is a virtual library?

    Contrary to the databases mentioned in the above section, the molecules involved in a virtual library are not necessarily described in patents, articles or any catalogue. The virtual library ready to be screened in SwissSimilarity is a in-house collection of products sythesizable from puchasable reactants (at Sigma-Aldrich) by using one of the 285'000'000 unique organic reactions derived from Hartenfeller et al..

    How were the molecules prepared for screening?

    Molecule libraries were retrieved as SMILES files from the above mentioned libraries. Molecules with a molecular mass larger than 1500 g/mol were removed. The standardize program of ChemAxon, version, 15.6.1.0, was used to remove counterions of salts, neutralize the compounds and find the most frequent tautomer, while the structurechecker program was used to keep only molecules containing H, C, N, O, S, P, B, F, Cl, Br and I atoms. For Electroshape and Spectrophores, the most probable protonation state at pH 7.4 was estimated using the ChemAxon cxcal program. 3D conformers were generated using the molconvert program. 20 different conformers were gerenated for Electroshape, Spectrophores, Shape-II and Align-IT.

    What are the available screening methods?

    screening can be performed using the following approaches:

    - FP2 molecular fingerptints, from OpenBabel.
    - Electroshape 5D (including atomic partial charges and lipophilicity contributions) for fast non-superpositional shape-based virtual screening .
    - Spectrophores. This fast non-superpositional shape-based virtual screening developed by Silicos-IT and implemented in openbabel 2.3.2, uses one-dimensional descriptors generated from the property fields surrounding the molecules.
    - Shape-IT, a shape-based alignment tool developed by Silicos-IT, which represents molecules as a set of atomic Gaussians and performs molecular alignment as described by Grant and Pickup (J. Phys. Chem. 1995, 99, 3503).
    - Align-IT, a pharmacophore-based tool from Silicos-IT to align molecules by representing pharmacophoric features as Gaussian 3D volumes.


    What is the "combined score"?

    In additioni to the score of the above mentioned methods, it is possible to make a consensus 2D/3D screening using a score based on both FP2 Tanimoto coefficient (s1) and Electroshape-5D Manhattan distance (s2). This combined score f(s1,s2) was developed for reverse screening using our SwissTargetPrediction web interface.

    It was obtained by logistic regression using f(s1,s2)=(1+exp(-a0-a1s1-a2s2))-1, where a0, a1 and a2 are parameters learned by the model to predict possible protein targets for a small molecule based on molecular similarity to known bioactive compounds. f(s1,s2) ranges from 0 for totally dissimilar molecules to 1 for perfectly identical molecules. This combined score was found to perform significantly better for drug-like molecules than the similarity assessed by FP2 or Electroshape-5D separatly.

    What are the differences between these approaches?

    FP2 fingerprints allows screening based on the similarity of chemical structures. Electroshape and Spectrophores allow screening by 3D shape similarity. They do not require molecular superimposition. Shape-IT and Align-IT allow screening by molecular superimposition. Note that the Shape-IT score accounts only for shape similarity, independently of the positioning of pharmacophoric features.

    How do the approaches account for the ligand flexibility?

    FP2 fingerprints account only for the 2D chemical structures, and not for 3D shape.

    For the other methods, which are 3D-based, 20 conformers of the user query molecule are compared to 20 conformers of each molecule in the library.
    For instance, an Electroshape screening of the Zinc Drug-Like library will compare the 20 conformers of the user molecule, to 200'000'000 conformers of commercially available molecules, corresponding to a total of 4 billions of calculations in 17 minutes.


    How much time is required for a screening job?

    The CPU time required for a screening depends on the size of the screened library and of the methods. Regarding the speed of calculation, the available methods rank this way: FP2 >> Electroshape > Spectrophores >> Shape-IT > Align-IT.

    Screening the 10'000'000 molecules of the ZINC Drug-like library with FP2 fingerprints takes about 30 seconds, while screening 20 conformers of 1'500 FDA approved molecules with Align-IT will take about 9 minutes.
    An estimate of the duration is provided when the mouse pointer is let over a given libary/method combination.


    Why is the time of computation fluctuating?

    The time needed for a specific screening (a given database with a given method) is not supposed to fluctuate. However your job can be queued waiting for a CPU to be free. All information about waiting and advancement for the job are given to the user through the web interface.

    Why is it not possible to screen some libraries with all available approaches?

    The availability of the library/method combinations depend on the required computational time. Only jobs taking less than 30 minutes are available.

    Does aromatic or kekule representation change anything for prediction?

    No. Either aromatic or kekule description can be inputted without impacting the screening results.

    Can I draw more than one structure at a time in the sketcher?

    You should not draw multiple unconnected molecular fragments in the sketcher because they would be considered as a unique structure and translated in a single SMILES of multiple fragments connetced by a dot. Besides, any SwissSimilarity screening is supposed to have a single molecule as input query.

    Can I cut-and-paste a molecule for input?

    You can easily paste SMILES in the dedicated text field. The operation for sketched molecule is trickier depending where the drawing is from.

    Sometimes a pop-up box with Cannot retrieve sketcher instance from iframe appears. What does it mean?

    This message is linked with Chemaxon Marvin for JavaScript Sketcher. It means that the sketcher is not well connected to its remote server, so in that case some fonctions may not work properly. To our experience, it happens mainly when trying to refresh the SwissSimilarity page within the web browser. It is recommended not to refresh any SwissSimilarity page but to click on Home (top right menus) instead.

    Can I use SwissSimilarity with an input not considered as small molecules (for instance peptides, proteins, other macromolecules)?

    It is feasible in practice provided that the structure can be inputted as a SMILES. However, the proposed screening methods have an applicability domain close to druglike organic compounds to very short oligopeptides or short oligosaccharides. For severely different molecular structures, it is unlikely that the similarity values (Tanimoto coefficient or Manhattan disctance) would be of any relevance. SwissSimilarity is intended to be employed in the context of drug discovery and medicinal chemistry. Any other usage requires extreme caution in interpretation of screening results.

    Why do I get different results when screening GPCR ligands (ChEMBL) and GPCR ligands (GLASS)?

    Whereas both GPCR ligands (ChEMBL) and GPCR ligands (GLASS) are good quality databases of GPCR ligands they are totally different libraries. The former is a subset of ChEMBL and the latter a fully independant collection. They do not necessarilly include the same molecules and are cured in diverse manners. As a consequnce the information they contain is different and so are the screening results.

    What are the criteria applied for drug-likeness, lead-likeness and fragment-likeness?

    The criteria are those of ZINC, which are provided here.

    How much time the results will remain on the server?

    Results will be kept on the web during several weeks on the server.

    How can I share the results?

    Each job is attributed a job number which allows retreiving the results without the need to perform the calculations again.

    Jobs can be retrieved at the address http://www.swisssimilarity.ch/result.php?job=123456789 where you can replace 123456789 by your job number.
    By clicking on the icon , it is possible to send an e-mail to your collaborators, which is automatically filled with the web address of the result page.


    What is the confidentiality of the users queries?

    The SIB Swiss Institute of Bioinformatics will not look at the users' queries. The queries will not be shared with third parties.

    However, we cannot warranty that the server will never be hacked.