The PubChemR
package is designed for R users who need to
interact with the PubChem database, a free resource from the National
Center for Biotechnology Information (NCBI). PubChem is a key repository
of chemical and biological data, including information on chemical
structures, identifiers, chemical and physical properties, biological
activities, patents, health, safety, toxicity data, and much more.
This package simplifies the process of accessing and manipulating
this vast array of data directly from R, making it a valuable resource
for chemists, biologists, bioinformaticians, and researchers in related
fields. In this vignette, we will explore the various functionalities
offered by the PubChemR
package. Each function is designed
to allow users to efficiently retrieve specific types of data from
PubChem. We will cover how to install and load the package, provide
detailed descriptions of each function, and demonstrate their usage with
practical examples.
The PubChemR
package is can be installed either from the
Comprehensive R Archive Network (CRAN) or directly from its GitHub
repository, offering users the flexibility to choose between the stable
CRAN version or the latest development version with potentially newer
features and fixes.
For most users, installing PubChemR
from CRAN is the
recommended method as it ensures a stable and tested version of the
package. You can install it using the standard R package installation
command:
install.packages("PubChemR")
This command will download and install the PubChemR
package along with any dependencies it requires. Once installed, you can
load the package in your R session as follows:
library(PubChemR)
For users who are interested in the latest features and updates that
might not yet be available on CRAN, the development version of
PubChemR
can be installed from GitHub. This version is
likely to include recent enhancements and bug fixes but may also be less
stable than the CRAN release.
To install the development version, you will first need to install
the devtools
package, which provides functions to install
packages directly from GitHub and other sources. You can install
devtools from CRAN using:
install.packages("devtools")
Once devtools is installed, you can install the development version
of PubChemR
using:
devtools::install_github("selcukorkmaz/PubChemR")
This command downloads and installs the package from the specified GitHub repository. After installation, load the package as usual:
library(PubChemR)
The PubChemR
package offers a suite of functions
designed to interact with the PubChem database, allowing users to
retrieve and manipulate chemical data efficiently. Below is an overview
of the main functions provided by the package:
The get_aids
function is designed to retrieve Assay IDs
(AIDs) from the PubChem database. This function is useful for accessing
detailed assay data related to specific compounds or substances, which
is crucial in fields such as pharmacology, biochemistry, and molecular
biology.
The function supports a range of identifiers including integers (e.g., CID and SID) and strings (e.g., name, SMILES, InChIKey and formula). Users can specify the namespace and domain for the query, as well as the type of search to be performed (e.g., substructure, superstructure, similarity, identity).
Here are the main parameters of the function:
identifier:
A vector of positive integers (e.g. cid,
sid) or identifier strings (name, smiles, inchikey, formula).namespace:
Specifies the type of identifier
provided.domain:
Specifies the domain of the query.searchtype:
Specifies the type of search to be
performed.options:
Additional arguments.In this example, we retrieve AIDs for the compounds with CID (Compound ID) 2244 (aspirin), 2519 (caffein) and 3672 (ibuprofen):
aids_by_cid <- get_aids(
identifier = c(2244, 2519, 3672),
namespace = "cid",
domain = "compound"
)
aids_by_cid
#>
#> Assay IDs (AIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: CID
#> - Identifier: 2244, 2519, ... and 1 more.
#>
#> NOTE: run AIDs(...) to extract Assays ID data. See ?AIDs for help.
The above code retrieves AIDs for the compounds with CIDs 2244, 2519 and 3672. The output shows the request details including the domain (Compound), namespace (Compound ID), and identifier (2244, 2519, … and 1 more). This provides a summary of the query performed.
To retrieve the AIDs associated with these compounds, we use the
AIDs
function on the result. This getter function return
the results either as a tibble (data frame) or as a list, depending on
the .to.data.frame
argument.
aids <- AIDs(object = aids_by_cid, .to.data.frame = TRUE)
aids
#> # A tibble: 8,931 × 2
#> CID AID
#> <dbl> <dbl>
#> 1 2244 1
#> 2 2244 3
#> 3 2244 9
#> 4 2244 15
#> 5 2244 19
#> 6 2244 21
#> 7 2244 23
#> 8 2244 25
#> 9 2244 29
#> 10 2244 31
#> # ℹ 8,921 more rows
The output is a tibble (data frame) with two columns: CID and AID. The CID column contains the compound IDs (2244, 2519 and 3672), and the AID column contains the Assay IDs.
table(aids$CID)
#>
#> 2244 2519 3672
#> 3240 2362 3329
There are 8,831 rows in total, indicating 3,195 assays related to the aspirin, 2,352 assays related to the caffein and 3,284 assays related to the ibuprofen.
In this example, we retrieve Assay IDs for the substance with SID (Substance ID) 103414350:
aids_by_sid <- get_aids(
identifier = c(103414350, 103204295),
namespace = "sid",
domain = "substance"
)
aids_by_sid
#>
#> Assay IDs (AIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Substance
#> - Namespace: SID
#> - Identifier: 103414350, 103204295
#>
#> NOTE: run AIDs(...) to extract Assays ID data. See ?AIDs for help.
The above code retrieves Assay IDs for the substance with SIDs (Substance IDs) 103414350 and 103204295. The output shows the request details including the domain (Substance), namespace (Substance ID), and identifier (103414350, 103204295). This provides a summary of the query performed.
To retrieve the Assay IDs associated with the SIDs 103414350 and 103204295, we use the AIDs function on the result. This getter function returns the results either as a tibble (data frame) or as a list, depending on the .to.data.frame argument.
AIDs(object = aids_by_sid, .to.data.frame = TRUE)
#> # A tibble: 8 × 2
#> SID AID
#> <dbl> <dbl>
#> 1 103414350 7810
#> 2 103414350 7815
#> 3 103414350 7816
#> 4 103414350 7820
#> 5 103414350 18990
#> 6 103204295 8712
#> 7 103204295 9506
#> 8 103204295 151808
The output is a tibble (data frame) with two columns: SID and AID. The SID column contains the substance ID (103414350 and 103204295), and the AID column contains the Assay There are a total of 8 rows, with 5 assays related to 103414350 and 3 assays related to 103204295.
In this example, we retrieve Assay IDs for the compounds with the names paracetamol, naproxen, and diclofenac:
aids_by_name <- get_aids(
identifier = c("paracetamol", "naproxen", "diclofenac"),
namespace = "name",
domain = "compound"
)
aids_by_name
#>
#> Assay IDs (AIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: Name
#> - Identifier: paracetamol, naproxen, ... and 1 more.
#>
#> NOTE: run AIDs(...) to extract Assays ID data. See ?AIDs for help.
The output shows the request details including the domain (Compound), namespace (Name), and identifier (aspirin). This provides a summary of the query performed.
To retrieve the Assay IDs associated with the compound names, we use
the AIDs
function on the result:
aids <- AIDs(object = aids_by_name, .to.data.frame = TRUE)
aids
#> # A tibble: 5,281 × 3
#> NAME CID AID
#> <chr> <dbl> <dbl>
#> 1 paracetamol 1983 155
#> 2 paracetamol 1983 157
#> 3 paracetamol 1983 161
#> 4 paracetamol 1983 165
#> 5 paracetamol 1983 167
#> 6 paracetamol 1983 175
#> 7 paracetamol 1983 248
#> 8 paracetamol 1983 357
#> 9 paracetamol 1983 377
#> 10 paracetamol 1983 410
#> # ℹ 5,271 more rows
The output is a tibble with three columns: NAME, CID and AID. The NAME column includes compound names, the CID column contains the compound IDs, and the AID column contains the assay IDs.
table(aids$NAME)
#>
#> diclofenac naproxen paracetamol
#> 1593 1586 2102
There are 5,192 rows in total, indicating 1,593 assays related to the diclofenac, 1,542 assays related to the naproxen and 2,057 assays related to the paracetamol.
In this example, we retrieve Assay IDs (AIDs) for aspirin using its SMILES representation:
aids_by_smiles <- get_aids(
identifier = "CC(=O)OC1=CC=CC=C1C(=O)O",
namespace = "smiles",
domain = "compound"
)
aids_by_smiles
#>
#> Assay IDs (AIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: SMILES
#> - Identifier: CC(=O)OC1=CC=CC=C1C(=O)O
#>
#> NOTE: run AIDs(...) to extract Assays ID data. See ?AIDs for help.
The above code retrieves AIDs for aspirin with the SMILES notation CC(=O)OC1=CC=CC=C1C(=O)O. The domain is set to compound and the namespace is set to smiles to indicate that the identifier is a SMILES string.
To extract the AIDs associated with the SMILES representation, we use
the AIDs
function on the result:
AIDs(object = aids_by_smiles, .to.data.frame = TRUE)
#> # A tibble: 3,240 × 3
#> SMILES CID AID
#> <chr> <dbl> <dbl>
#> 1 CC(=O)OC1=CC=CC=C1C(=O)O 2244 1
#> 2 CC(=O)OC1=CC=CC=C1C(=O)O 2244 3
#> 3 CC(=O)OC1=CC=CC=C1C(=O)O 2244 9
#> 4 CC(=O)OC1=CC=CC=C1C(=O)O 2244 15
#> 5 CC(=O)OC1=CC=CC=C1C(=O)O 2244 19
#> 6 CC(=O)OC1=CC=CC=C1C(=O)O 2244 21
#> 7 CC(=O)OC1=CC=CC=C1C(=O)O 2244 23
#> 8 CC(=O)OC1=CC=CC=C1C(=O)O 2244 25
#> 9 CC(=O)OC1=CC=CC=C1C(=O)O 2244 29
#> 10 CC(=O)OC1=CC=CC=C1C(=O)O 2244 31
#> # ℹ 3,230 more rows
The output is a tibble with three columns: SMILES, CID and AID. The SMILES column includes SMILES representation of aspirin, the CID column contains the compound ID of aspirin, and the AID column contains the related assay IDs.
In this example, we retrieve Assay IDs for the compound with InChIKey (International Chemical Identifier Key) GALPCCIBXQLXSH-UHFFFAOYSA-N:
aids_by_inchikey <- get_aids(
identifier = "GALPCCIBXQLXSH-UHFFFAOYSA-N",
namespace = "inchikey",
domain = "compound"
)
aids_by_inchikey
#>
#> Assay IDs (AIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: INCHI_Key
#> - Identifier: GALPCCIBXQLXSH-UHFFFAOYSA-N
#>
#> NOTE: run AIDs(...) to extract Assays ID data. See ?AIDs for help.
The above code retrieves Assay IDs for the compound with InChIKey GALPCCIBXQLXSH-UHFFFAOYSA-N. The output shows the request details including the domain (Compound), namespace (INCHI Key), and identifier (GALPCCIBXQLXSH-UHFFFAOYSA-N). This provides a summary of the query performed.
To retrieve the Assay IDs associated with the InChIKey, we use the AIDs function on the result. This getter function returns the results either as a tibble (data frame) or as a list, depending on the .to.data.frame argument.
AIDs(object = aids_by_inchikey, .to.data.frame = TRUE)
#> # A tibble: 5 × 3
#> INCHIKEY CID AID
#> <chr> <dbl> <dbl>
#> 1 GALPCCIBXQLXSH-UHFFFAOYSA-N 44375542 7810
#> 2 GALPCCIBXQLXSH-UHFFFAOYSA-N 44375542 7815
#> 3 GALPCCIBXQLXSH-UHFFFAOYSA-N 44375542 7816
#> 4 GALPCCIBXQLXSH-UHFFFAOYSA-N 44375542 7820
#> 5 GALPCCIBXQLXSH-UHFFFAOYSA-N 44375542 18990
The output is a tibble (data frame) with three columns: INCHIKEY, CID, and AID. The INCHIKEY column contains the InChIKey (GALPCCIBXQLXSH-UHFFFAOYSA-N in this case), the CID column contains the compound ID (44375542), and the AID column contains the Assay IDs. This tibble format makes it easy to analyze and manipulate the data in R. There are 5 rows in total, indicating the assays related to the compound.
In this example, we retrieve Assay IDs for compounds with the molecular formula C15H12N2O2:
aids_by_formula <- get_aids(
identifier = "C15H12N2O2",
namespace = "formula",
domain = "compound"
)
aids_by_formula
#>
#> Assay IDs (AIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: Formula
#> - Identifier: C15H12N2O2
#>
#> NOTE: run AIDs(...) to extract Assays ID data. See ?AIDs for help.
The above code retrieves Assay IDs for compounds with the molecular formula C15H12N2O2. The output shows the request details including the domain (Compound), namespace (Formula), and identifier (C15H12N2O2). This provides a summary of the query performed.
To retrieve the Assay IDs associated with this formula, we use the AIDs function on the result. This getter function returns the results either as a tibble (data frame) or as a list, depending on the .to.data.frame argument.
AIDs(object = aids_by_formula, .to.data.frame = TRUE)
#> # A tibble: 50,172 × 3
#> FORMULA CID AID
#> <chr> <dbl> <dbl>
#> 1 C15H12N2O2 1775 625220
#> 2 C15H12N2O2 1775 1094227
#> 3 C15H12N2O2 1775 1149315
#> 4 C15H12N2O2 1775 255686
#> 5 C15H12N2O2 1775 504845
#> 6 C15H12N2O2 1775 2313
#> 7 C15H12N2O2 1775 1096248
#> 8 C15H12N2O2 1775 136087
#> 9 C15H12N2O2 1775 1731409
#> 10 C15H12N2O2 1775 683946
#> # ℹ 50,162 more rows
The output is a tibble (data frame) with three columns: FORMULA, CID, and AID. The FORMULA column contains the molecular formula (C15H12N2O2), the CID column contains the compound ID, and the AID column contains the Assay IDs. This tibble format makes it easy to analyze and manipulate the data in R. There are 50,116 rows in total, indicating a comprehensive list of assays related to compounds with the specified molecular formula.
The get_cids
function is designed to retrieve Compound
IDs (CIDs) from the PubChem database. This function is particularly
useful for users who need to obtain the unique identifiers assigned to
chemical substances within PubChem.
The function queries the PubChem database using various identifiers such as names, formulas, or other chemical identifiers. It then extracts the corresponding CIDs and returns them in a structured format. This makes it a versatile tool for researchers working with chemical data.
Here are the main parameters of the function:
identifier
: A vector of identifiers for which CIDs are
to be retrieved. These can be integers (e.g., CID, SID, AID) or strings
(e.g., name, SMILES, InChIKey).namespace
: Specifies the type of identifier provided.
It can be ‘cid’, ‘name’, ‘smiles’, ‘inchi’, etc.domain
: The domain of the query, typically
‘compound’.searchtype
: The type of search to be performed, such as
‘substructure’ or ‘similarity’.options
: Additional arguments passed to the internal
get_json
function.In this example, we retrieve Compound IDs for the compounds with the names aspirin, caffeine, and ibuprofen:
cids_by_name <- get_cids(
identifier = c("aspirin", "caffein", "ibuprofen"),
namespace = "name",
domain = "compound"
)
cids_by_name
#>
#> Compound IDs (CIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: Name
#> - Identifier: aspirin, caffein, ... and 1 more.
#>
#> NOTE: run CIDs(...) to extract Compound ID data. See ?CIDs for help.
The above code retrieves Compound IDs for the compounds named aspirin, caffeine, and ibuprofen. The output shows the request details including the domain (Compound), namespace (Name), and identifiers (aspirin, caffeine, ibuprofen). This provides a summary of the query performed.
To retrieve the Compound IDs associated with the compound names, we
use the CIDs
function on the result:
CIDs(object = cids_by_name)
#> # A tibble: 3 × 2
#> Name CID
#> <chr> <dbl>
#> 1 aspirin 2244
#> 2 caffein 2519
#> 3 ibuprofen 3672
The CIDs
function call on the result extracts the
Compound IDs associated with the compound names. The output is a tibble
with two columns: Name and CID. The Name column contains the compound
names, and the CID column contains the Compound IDs. This tibble format
makes it easy to handle and analyze the data in R.
In this example, we retrieve Compound IDs (CIDs) for a compound using its SMILES representation:
cids_by_smiles <- get_cids(
identifier = "C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O",
namespace = "smiles",
domain = "compound"
)
cids_by_smiles
#>
#> Compound IDs (CIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: SMILES
#> - Identifier: C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O
#>
#> NOTE: run CIDs(...) to extract Compound ID data. See ?CIDs for help.
The above code retrieves CIDs for the compound with the SMILES notation C([C@@H]1C@HO)O. The domain is set to compound and the namespace is set to smiles to indicate that the identifier is a SMILES string.
To extract the CIDs associated with the SMILES representation, we use the CIDs function on the result:
CIDs(object = cids_by_smiles)
#> # A tibble: 1 × 2
#> SMILES CID
#> <chr> <dbl>
#> 1 C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O 5793
The CIDs
function call on the result extracts the CIDs
associated with the SMILES notation C([C@@H]1C@HO)O. The output is a tibble with two columns:
SMILES and CID. The SMILES column contains the SMILES notation, and the
CID column contains the Compound IDs. This output shows that the
specified compound is associated with CID 5793.
In this example, we retrieve Compound IDs (CIDs) for a compound using its InChIKey:
cids_by_inchikey <- get_cids(
identifier = "HEFNNWSXXWATRW-UHFFFAOYSA-N",
namespace = "inchikey",
domain = "compound"
)
cids_by_inchikey
#>
#> Compound IDs (CIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: INCHI_Key
#> - Identifier: HEFNNWSXXWATRW-UHFFFAOYSA-N
#>
#> NOTE: run CIDs(...) to extract Compound ID data. See ?CIDs for help.
The above code retrieves CIDs for the compound with the InChIKey HEFNNWSXXWATRW-UHFFFAOYSA-N. The domain is set to compound and the namespace is set to inchikey to indicate that the identifier is an InChIKey.
To extract the CIDs associated with the InChIKey, we use the CIDs function on the result:
CIDs(object = cids_by_inchikey)
#> # A tibble: 1 × 2
#> INCHI_Key CID
#> <chr> <dbl>
#> 1 HEFNNWSXXWATRW-UHFFFAOYSA-N 3672
The CIDs
function call on the result extracts the CIDs
associated with the InChIKey HEFNNWSXXWATRW-UHFFFAOYSA-N. The
output is a tibble with two columns: INCHI Key and CID. The INCHI Key
column contains the InChIKey, and the CID column contains the Compound
IDs. This output shows that the specified compound is associated with
CID 3672.
In this example, we retrieve Compound IDs (CIDs) for compounds with the molecular formula C15H12N2O2:
cids_by_formula <- get_cids(
identifier = "C15H12N2O2",
namespace = "formula",
domain = "compound"
)
cids_by_formula
#>
#> Compound IDs (CIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: Formula
#> - Identifier: C15H12N2O2
#>
#> NOTE: run CIDs(...) to extract Compound ID data. See ?CIDs for help.
The above code retrieves Compound IDs for compounds with the molecular formula C15H12N2O2. The output shows the request details including the domain (Compound), namespace (Formula), and identifier (C15H12N2O2). This provides a summary of the query performed.
To retrieve the Compound IDs associated with this formula, we use the CIDs function on the result. This getter function returns the results either as a tibble (data frame) or as a list, depending on the .to.data.frame argument.
CIDs(object = cids_by_formula, .to.data.frame = TRUE)
#> # A tibble: 5,039 × 2
#> Formula CID
#> <chr> <dbl>
#> 1 C15H12N2O2 1775
#> 2 C15H12N2O2 34312
#> 3 C15H12N2O2 2555
#> 4 C15H12N2O2 14650
#> 5 C15H12N2O2 129274
#> 6 C15H12N2O2 135290
#> 7 C15H12N2O2 928446
#> 8 C15H12N2O2 70052
#> 9 C15H12N2O2 135430309
#> 10 C15H12N2O2 25113764
#> # ℹ 5,029 more rows
The output is a tibble (data frame) with two columns: Formula and CID. The Formula column contains the molecular formula (C15H12N2O2), and the CID column contains the Compound IDs. This tibble format makes it easy to analyze and manipulate the data in R. There are 5,032 rows in total, indicating a comprehensive list of compounds related to the specified molecular formula.
The get_sids
function is designed to retrieve Substance
IDs (SIDs) from the PubChem database. This function is essential for
users who need to identify unique identifiers assigned to specific
chemical substances or mixtures in PubChem.
The get_sids
function queries the PubChem database using
various identifiers and extracts the corresponding SIDs. It is capable
of handling multiple identifiers and returns a structured tibble (data
frame) containing the SIDs along with the original identifiers. This
makes it a versatile tool for researchers working with chemical
data.
Here are the main parameters of the function:
identifier
: A vector specifying the identifiers for
which SIDs are to be retrieved. These can be numeric or character
vectors.namespace
: Specifies the type of identifier provided,
with ‘cid’ as the default.domain
: The domain of the query, typically
‘compound’.searchtype
: Specifies the type of search to be
performed, if applicable.options
: Additional arguments passed to the internal
get_json
function.In this example, we retrieve Substance IDs (SIDs) for the compound with CID (Compound ID) 2244:
sids_by_cid <- get_sids(
identifier = c(2244, 2519, 3672),
namespace = "cid",
domain = "compound"
)
sids_by_cid
#>
#> Substance IDs (SIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: CID
#> - Identifier: 2244, 2519, ... and 1 more.
#>
#> NOTE: run SIDs(...) to extract Substance ID data. See ?SIDs for help.
The above code retrieves Substance IDs for the compound with CID (Compound ID) 2244. The output shows the request details including the domain (Compound), namespace (Compound ID), and identifier (2244). This provides a summary of the query performed.
To retrieve the Substance IDs associated with the compound ID 2244, we use the SIDs function on the result. This getter function returns the results either as a tibble (data frame) or as a list, depending on the .to.data.frame argument.
sids <- SIDs(object = sids_by_cid, .to.data.frame = TRUE)
sids
#> # A tibble: 1,294 × 2
#> CID SID
#> <dbl> <dbl>
#> 1 2244 4594
#> 2 2244 87798
#> 3 2244 476106
#> 4 2244 602429
#> 5 2244 829042
#> 6 2244 832958
#> 7 2244 840714
#> 8 2244 3135921
#> 9 2244 5261264
#> 10 2244 7847177
#> # ℹ 1,284 more rows
The output is a tibble (data frame) with two columns: Compound ID and SID. The Compound ID column contains the compound IDs, and the SID column contains the Substance IDs.
table(sids$`Compound ID`)
#> Warning: Unknown or uninitialised column: `Compound ID`.
#> < table of extent 0 >
There are 1,288 rows in total, indicating 400 substances related to the compound ID 2244, 486 substances related to the compound ID 2519, and 402 substances related to the compound ID 3672.
In this example, we retrieve Substance IDs (SIDs) for the assay with AID (Assay ID) 1234:
sids_by_aids <- get_sids(
identifier = "1234",
namespace = "aid",
domain = "assay"
)
sids_by_aids
#>
#> Substance IDs (SIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Assay
#> - Namespace: AID
#> - Identifier: 1234
#>
#> NOTE: run SIDs(...) to extract Substance ID data. See ?SIDs for help.
The above code retrieves Substance IDs for the assay with AID (Assay ID) 1234. The output shows the request details including the domain (Assay), namespace (Assay ID), and identifier (1234). This provides a summary of the query performed.
To retrieve the Substance IDs associated with the assay ID 1234, we use the SIDs function on the result. This getter function returns the results either as a tibble (data frame) or as a list, depending on the .to.data.frame argument.
SIDs(object = sids_by_aids, .to.data.frame = TRUE)
#> # A tibble: 61 × 2
#> AID SID
#> <chr> <dbl>
#> 1 1234 845167
#> 2 1234 845769
#> 3 1234 847359
#> 4 1234 857446
#> 5 1234 857769
#> 6 1234 859251
#> 7 1234 864576
#> 8 1234 3714272
#> 9 1234 4252106
#> 10 1234 4259196
#> # ℹ 51 more rows
The output is a tibble (data frame) with two columns: Assay ID and SID. The Assay ID column contains the assay ID (1234 in this case), and the SID column contains the Substance IDs. This tibble format makes it easy to analyze and manipulate the data in R. There are 61 rows in total, indicating a list of substances related to the assay.
In this example, we retrieve Substance IDs for the compound with the name aspirin:
sids <- get_sids(
identifier = "aspirin",
namespace = "name",
domain = "compound"
)
sids
#>
#> Substance IDs (SIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: Name
#> - Identifier: aspirin
#>
#> NOTE: run SIDs(...) to extract Substance ID data. See ?SIDs for help.
The above code retrieves Substance IDs for the compound named aspirin. The output shows the request details including the domain (Compound), namespace (Name), and identifier (aspirin). This provides a summary of the query performed.
To retrieve the Substance IDs associated with the compound name
aspirin, we use the SIDs
function on the
result:
SIDs(object = sids)
#> # A tibble: 403 × 2
#> Name SID
#> <chr> <dbl>
#> 1 aspirin 4594
#> 2 aspirin 87798
#> 3 aspirin 476106
#> 4 aspirin 602429
#> 5 aspirin 829042
#> 6 aspirin 832958
#> 7 aspirin 840714
#> 8 aspirin 3135921
#> 9 aspirin 5261264
#> 10 aspirin 7847177
#> # ℹ 393 more rows
The SIDs
function call on the result extracts the
Substance IDs associated with the compound name aspirin. The
output is a tibble with two columns: SID and Name. The SID column
contains the Substance IDs, and the Name column contains the compound
name (aspirin in this case). This tibble format makes it easy to handle
and analyze the data in R. There are 2,356 rows in total, indicating a
comprehensive list of substances related to the compound name
aspirin.
sids_by_smiles <- get_sids(
identifier = "C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O",
namespace = "smiles",
domain = "compound"
)
sids_by_smiles
#>
#> Substance IDs (SIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: SMILES
#> - Identifier: C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O
#>
#> NOTE: run SIDs(...) to extract Substance ID data. See ?SIDs for help.
SIDs(object = sids_by_smiles)
#> # A tibble: 230 × 2
#> SMILES SID
#> <chr> <dbl>
#> 1 C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O 3333
#> 2 C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O 819111
#> 3 C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O 823016
#> 4 C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O 823057
#> 5 C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O 833240
#> 6 C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O 841535
#> 7 C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O 7847077
#> 8 C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O 8023353
#> 9 C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O 8153564
#> 10 C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O 14720288
#> # ℹ 220 more rows
In this example, we retrieve Substance IDs (SIDs) for a compound using its InChIKey:
sids_by_inchikey <- get_sids(
identifier = "BPGDAMSIGCZZLK-UHFFFAOYSA-N",
namespace = "inchikey",
domain = "compound"
)
sids_by_inchikey
#>
#> Substance IDs (SIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: INCHI_Key
#> - Identifier: BPGDAMSIGCZZLK-UHFFFAOYSA-N
#>
#> NOTE: run SIDs(...) to extract Substance ID data. See ?SIDs for help.
The above code retrieves SIDs for the compound with the InChIKey BPGDAMSIGCZZLK-UHFFFAOYSA-N. The domain is set to compound and the namespace is set to inchikey to indicate that the identifier is an InChIKey.
To extract the SIDs associated with the InChIKey, we use the SIDs function on the result:
SIDs(object = sids_by_inchikey)
#> # A tibble: 93 × 2
#> INCHI_Key SID
#> <chr> <dbl>
#> 1 BPGDAMSIGCZZLK-UHFFFAOYSA-N 106508
#> 2 BPGDAMSIGCZZLK-UHFFFAOYSA-N 6152946
#> 3 BPGDAMSIGCZZLK-UHFFFAOYSA-N 8159218
#> 4 BPGDAMSIGCZZLK-UHFFFAOYSA-N 10530904
#> 5 BPGDAMSIGCZZLK-UHFFFAOYSA-N 16165986
#> 6 BPGDAMSIGCZZLK-UHFFFAOYSA-N 36258367
#> 7 BPGDAMSIGCZZLK-UHFFFAOYSA-N 49834150
#> 8 BPGDAMSIGCZZLK-UHFFFAOYSA-N 49862448
#> 9 BPGDAMSIGCZZLK-UHFFFAOYSA-N 76795655
#> 10 BPGDAMSIGCZZLK-UHFFFAOYSA-N 91749770
#> # ℹ 83 more rows
The SIDs
function call on the result extracts the SIDs
associated with the InChIKey BPGDAMSIGCZZLK-UHFFFAOYSA-N. The
output is a tibble with two columns: INCHI Key and SID. The INCHI Key
column contains the InChIKey, and the SID column contains the Substance
IDs. This output shows that the specified compound is associated with 93
substance entries, each represented by a SID.
sids_by_formula <- get_sids(
identifier = "C15H12N2O2",
namespace = "formula",
domain = "compound"
)
sids_by_formula
#>
#> Substance IDs (SIDs) from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: Formula
#> - Identifier: C15H12N2O2
#>
#> NOTE: run SIDs(...) to extract Substance ID data. See ?SIDs for help.
SIDs(object = sids_by_formula, .to.data.frame = TRUE)
#> # A tibble: 347 × 2
#> Formula SID
#> <chr> <dbl>
#> 1 C15H12N2O2 9647
#> 2 C15H12N2O2 74340
#> 3 C15H12N2O2 592179
#> 4 C15H12N2O2 596082
#> 5 C15H12N2O2 841957
#> 6 C15H12N2O2 3136997
#> 7 C15H12N2O2 4284342
#> 8 C15H12N2O2 5171921
#> 9 C15H12N2O2 7847578
#> 10 C15H12N2O2 7980312
#> # ℹ 337 more rows
The get_assays
function is designed to retrieve
biological assay data from the PubChem database. This function is
particularly useful for researchers and scientists who need descriptive
information about various biological assays.
The function queries the PubChem database using specified identifiers and returns a list of assay data. It is capable of fetching various assay information, including experimental data, results, and methodologies.
Here are the main parameters of the function:
identifier
: A vector of positive specifying the assay
identifiers (AIDs) for which data are to be retrieved.
operation
: The operation to be performed on the input
records, defaulting to NULL. Expected opreation: record, concise, aids,
sids, cids, description, targets/, , summary, classification.
options
: Additional parameters for the query, currently not
affecting the results.
In this example, we retrieve assay data for several specific AIDs:
assay_data <- get_assays(
identifier = c(485314, 485341, 504466, 624202, 651820),
namespace = "aid"
)
assay_data
#>
#> An object of class 'PubChemInstanceList'
#>
#> Number of instances: 5
#> - Domain: Assay
#> - Namespace: AID
#> - Identifier(s): 485314, 485341, ... and 3 more.
#>
#> * Run 'instance(...)' function to extract specific instances from the complete list, and
#> 'request_args(...)' to see all the requested instance identifiers.
#> * See ?instance and ?request_args for details.
The above code retrieves assay data for multiple AIDs. The output shows the request details, including the domain (Assay), namespace (Assay ID), and identifiers. It also provides instructions on how to retrieve specific instances from the complete list and view all requested instance identifiers.
To view the request arguments:
request_args(object = assay_data)
#> $namespace
#> [1] "aid"
#>
#> $identifier
#> [1] 485314 485341 504466 624202 651820
#>
#> $domain
#> [1] "assay"
#>
#> $operation
#> [1] "description"
To retrieve detailed information about a specific assay (e.g.,
651820), you can use the instance
function on the
result:
aid_651820 <- instance(object = assay_data, .which = 651820)
aid_651820
#>
#> An object of class 'PubChemInstance'
#>
#> Request Details:
#> - Domain: Assay
#> - Namespace: AID
#> - Identifier: 651820
#>
#> Instance Details:
#> - aid (2): [<named numeric>] id, version
#> - aid_source (1): [<named list>] db
#> - name (1): [<unnamed character>]
#> - description (11): [<unnamed character>]
#> - protocol (1): [<unnamed character>]
#> - comment (4): [<unnamed character>]
#> - xref (1): [<unnamed list>]
#> - results (35): [<unnamed list>]
#> - revision (1): [<unnamed numeric>]
#> - target (1): [<unnamed list>]
#> - activity_outcome_method (1): [<unnamed numeric>]
#> - dr (1): [<unnamed list>]
#> - grant_number (1): [<unnamed character>]
#> - project_category (1): [<unnamed numeric>]
#>
#> NOTE: Run getter function 'retrieve()' with element name above to extract data from corresponding list.
#> See ?retrieve for details.
The instance
function call on the result extracts
detailed information about the specific assay, including experimental
data, results, and methodologies. This information is crucial for
understanding the biological activity and properties of the compounds
tested in the assay.
To extract specific details from the assay data, you can use the
retrieve
function with various slots:
retrieve(object = aid_651820, .slot = "aid", .to.data.frame = TRUE)
#> # A tibble: 2 × 3
#> Identifier Name Value
#> <dbl> <chr> <dbl>
#> 1 651820 id 651820
#> 2 651820 version 1
This code extracts the Assay ID and version of the assay, providing a concise summary of the assay’s unique identifier and its version in the PubChem database.
retrieve(object = aid_651820, .slot = "aid_source", .to.data.frame = TRUE)
#> # A tibble: 1 × 3
#> Identifier name source_id
#> <dbl> <chr> <chr>
#> 1 651820 NCGC HCV100
This code retrieves the source information for the assay, including the name of the source and the source ID, which helps in identifying the origin of the assay data.
retrieve(object = aid_651820, .slot = "name", .to.data.frame = FALSE)
#> $Identifier
#> [1] 651820
#>
#> [[2]]
#> [1] "qHTS Assay for Inhibitors of Hepatitis C Virus (HCV)"
This code extracts the name of the assay, providing a clear description of the assay’s purpose and target.
retrieve(object = aid_651820, .slot = "description", .to.data.frame = FALSE, .verbose = TRUE)
#>
#> PubChem Assay Details (description)
#>
#> Hepatitis C virus (HCV) infects about 200 million people in the world. Many infected people progress to chronic liver disease including cirrhosis with a risk of developing liver cancer. To date, there is no effective vaccine for hepatitis C. Current therapy based on interferon is only effective in about half of the patients and is associated with significant adverse effects. The fraction of people with HCV who can complete a successful treatment is estimated to be no more than 10 percent. Recent development of direct-acting antivirals against HCV, such as protease and polymerase inhibitors, is promising but still requires combination with peginterferon and ribavirin for maximal efficacy. In addition, these agents are associated with high rate of resistance and many have significant side effects.
#>
#> Due to the lack of a culture system for infectious HCV, the search for new HCV drugs has been greatly hampered. Cell-based screen for HCV inhibitors in use today is based on the HCV replicon system, which only targets the RNA replication step of the viral lifecycle and does not encompass viral entry, processing, assembly and secretion. High-throughput screening (HTS) with an infectious HCV system would cover the complete spectrum of potentially druggable targets in all stages of HCV lifecycle, and would have more biological relevance than other cell-based assays. Moreover, targeting several key processes in the viral life cycle may not only increase antiviral efficacy; more importantly, it may also reduce the capacity of the virus to develop resistance to the compound.
#>
#> The goal of this project is to identify novel HCV inhibitors as new therapies for hepatitis C, using a highly sensitive and specific assay platform which is based on a HCV infectious cell culture system established in the laboratory and adapted for high-throughput HCV drug screen.
#>
#> NIH Chemical Genomics Center [NCGC]
#> NIH Molecular Libraries Probe Centers Network [MLPCN]
#>
#> MLPCN Grant: MH095511
#> Assay Submitter (PI): Jake Liang, NIDDK
This code retrieves the detailed description of the assay, including its purpose, the challenges addressed, and the methodology used. This is crucial for understanding the context and rationale behind the assay.
retrieve(object = aid_651820, .slot = "protocol", .to.data.frame = FALSE, .verbose = TRUE)
#>
#> PubChem Assay Details (protocol)
#>
#> The assay will start with plating 1,000 cells/well in 3 muL volume and culture for 4 h. Then 23 nL of compounds from the library collection will be added to each well, followed by adding 2.5 muL of HCVcc-Cre virus (~ 0.5 moi) and further cultured for 44 h before the luciferase assay. A volume of 4.5 muL luciferase substrates will be added to each well and the plates will be incubated at room temperature for 15 min. and then read for 15 sec. for the luciferase activity
This code retrieves the detailed protocol for conducting the assay, providing step-by-step instructions, including the materials needed, preparation steps, and the assay procedure. This is crucial for replicating the experiment and ensuring consistent results.
retrieve(object = aid_651820, .slot = "comment", .to.data.frame = FALSE, .verbose = TRUE)
#>
#> PubChem Assay Details (comment)
#>
#> Compound Ranking:
#>
#> 1. Compounds are first classified as having full titration curves, partial modulation, partial curve (weaker actives), single point activity (at highest concentration only), or inactive. See data field "Curve Description". For this assay, apparent inhibitors are ranked higher than compounds that showed apparent activation.
#> 2. For all inactive compounds, PUBCHEM_ACTIVITY_SCORE is 0. For all active compounds, a score range was given for each curve class type given above. Active compounds have PUBCHEM_ACTIVITY_SCORE between 40 and 100. Inconclusive compounds have PUBCHEM_ACTIVITY_SCORE between 1 and 39. Fit_LogAC50 was used for determining relative score and was scaled to each curve class' score range.
This code retrieves additional contextual information and detailed criteria for evaluating the activity of compounds in the assay. In this specific case, it includes the PUBCHEM_ACTIVITY_OUTCOME and PUBCHEM_ACTIVITY_SCORE, which help in interpreting the assay results and determining the activity level of the compounds tested.
retrieve(object = aid_651820, .slot = "xref", .to.data.frame = FALSE)
#> $Identifier
#> [1] 651820
#>
#> $xref
#> dburl
#> "http://www.ncgc.nih.gov"
This code retrieves external references related to the assay, such as links to relevant publications and additional assay IDs. This helps in contextualizing the assay within the broader scientific literature and finding related studies.
retrieve(object = aid_651820, .slot = "results", .to.data.frame = TRUE)
#> # A tibble: 74 × 8
#> Identifier tid name description type unit ac tc
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 651820 1 Phenotype Indicates type… 4 254 <NA> <NA>
#> 2 651820 2 Potency Concentration … 1 5 TRUE <NA>
#> 3 651820 3 Efficacy Maximal effica… 1 15 <NA> <NA>
#> 4 651820 4 Analysis Comment Annotation/not… 4 254 <NA> <NA>
#> 5 651820 5 Activity_Score Activity score. 2 254 <NA> <NA>
#> 6 651820 6 Curve_Description A description … 4 254 <NA> <NA>
#> 7 651820 7 Fit_LogAC50 The logarithm … 1 254 <NA> <NA>
#> 8 651820 8 Fit_HillSlope The Hill slope… 1 254 <NA> <NA>
#> 9 651820 9 Fit_R2 R^2 fit value … 1 254 <NA> <NA>
#> 10 651820 10 Fit_InfiniteActivity The asymptotic… 1 15 <NA> <NA>
#> # ℹ 64 more rows
This code retrieves a tibble with detailed experimental results, including EC50 values, activation percentages, and other key metrics. This data is essential for analyzing the performance of the compounds in the assay and making informed conclusions about their biological activity.
retrieve(object = aid_651820, .slot = "revision", .to.data.frame = FALSE)
#> $Identifier
#> [1] 651820
#>
#> [[2]]
#> [1] 1
This code retrieves the revision number of the assay data, indicating the version of the data retrieved. This helps track changes and updates to the assay information over time.
retrieve(object = aid_651820, .slot = "activity_outcome_method", .to.data.frame = FALSE)
#> $Identifier
#> [1] 651820
#>
#> [[2]]
#> [1] 2
This code retrieves the method used to determine the activity outcome of the compounds in the assay. This information is crucial for understanding the criteria and process used to classify the compounds’ activity levels.
retrieve(object = aid_651820, .slot = "project_category", .to.data.frame = FALSE)
#> $Identifier
#> [1] 651820
#>
#> [[2]]
#> [1] 2
This code retrieves the category of the project under which the assay was conducted. This helps in identifying the broader context and objectives of the research project associated with the assay.
The get_compounds
function is designed to streamline the
process of retrieving detailed compound data from the extensive PubChem
database. This function is an invaluable tool for chemists, biologists,
pharmacologists, and researchers who require comprehensive chemical
compound information for their scientific investigations and
analyses.
The function interfaces directly with the PubChem database, allowing users to query and retrieve a wide array of data on chemical compounds. Upon execution, the function returns a list containing detailed information about each queried compound. This information can encompass various aspects such as:
Here are the main parameters of the function:
identifier
: A vector specifying the compound
identifiers. These identifiers can be either positive integers (such as
CIDs, which are unique compound identifiers in PubChem) or identifier
strings (such as chemical names, SMILES strings, InChI, etc.). This
parameter allows for flexible input methods tailored to the specific
needs of the user.namespace
: Specifies the type of identifier provided in
the identifier parameter. Common values for this parameter include:
operation
: An optional parameter specifying the
operation to be performed on the input records. This can include
operations such as filtering, sorting, or transforming the data based on
specific criteria. By default, this parameter is set to NULL, indicating
no additional operations are performed.searchtype
: An optional parameter that defines the type
of search to be conducted. This can be used to refine and specify the
search strategy, such as exact match, substructure search, or similarity
search. By default, this parameter is set to NULL, indicating a general
search.options
: A list of additional parameters that can be
used to customize the query further. This can include options such as
result limits, output formats, and other advanced settings to tailor the
data retrieval process to specific requirements.In this example, we retrieve compound data for specific CIDs (Compound IDs) 2244 and 5245:
compound_data <- get_compounds(
identifier = c(2244, 5245),
namespace = "cid"
)
compound_data
#>
#> An object of class 'PubChemInstanceList'
#>
#> Number of instances: 2
#> - Domain: Compound
#> - Namespace: CID
#> - Identifier(s): 2244, 5245
#>
#> * Run 'instance(...)' function to extract specific instances from the complete list, and
#> 'request_args(...)' to see all the requested instance identifiers.
#> * See ?instance and ?request_args for details.
The above code retrieves compound data for the compounds with CIDs 2244 and 5245. The output shows the request details, including the domain (Compound), namespace (Compound ID), and identifiers. It also provides instructions on how to retrieve specific instances from the complete list and view all requested instance identifiers.
To view the request arguments:
request_args(object = compound_data)
#> $namespace
#> [1] "cid"
#>
#> $identifier
#> [1] 2244 5245
#>
#> $domain
#> [1] "compound"
#>
#> $operation
#> NULL
#>
#> $options
#> NULL
#>
#> $searchtype
#> NULL
To retrieve detailed information about a specific compound, you can
use the instance
function on the result:
compound_2244 <- instance(object = compound_data, .which = 2244)
compound_2244
#>
#> An object of class 'PubChemInstance'
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: CID
#> - Identifier: 2244
#>
#> Instance Details:
#> - id (1): [<named list>] id
#> - atoms (2): [<named list>] aid, element
#> - bonds (3): [<named list>] aid1, aid2, order
#> - coords (1): [<unnamed list>]
#> - charge (1): [<unnamed numeric>]
#> - props (22): [<unnamed list>]
#> - count (10): [<named numeric>] heavy_atom, atom_chiral, atom_chiral_def, atom_chiral_undef, ...
#>
#> NOTE: Run getter function 'retrieve()' with element name above to extract data from corresponding list.
#> See ?retrieve for details.
The instance
function call on the result extracts
detailed information about the specific compound, including chemical
structures, properties, and identifiers.
To retrieve specific data elements from the compound data, you can
use the retrieve
function with the relevant slots:
retrieve(object = compound_2244, .slot = "id", .to.data.frame = TRUE)
#> # A tibble: 1 × 2
#> Identifier id
#> <dbl> <dbl>
#> 1 2244 2244
The retrieve
function call with the id slot
extracts the compound identifier (CID) for the specific compound. In
this case, the CID is 2244, confirming the identity of the compound.
retrieve(object = compound_2244, .slot = "atoms", .to.data.frame = FALSE)
#> $Identifier
#> [1] 2244
#>
#> $aid
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
#>
#> $element
#> [1] 8 8 8 8 6 6 6 6 6 6 6 6 6 1 1 1 1 1 1 1 1
The retrieve
function call with the atoms slot
extracts information about the atoms in the compound. The output
includes two vectors: aid, representing the atom IDs, and element,
representing the atomic numbers of the elements. For example, element 8
represents oxygen, and element 6 represents carbon.
retrieve(object = compound_2244, .slot = "bonds", .to.data.frame = FALSE)
#> $Identifier
#> [1] 2244
#>
#> $aid1
#> [1] 1 1 2 2 3 4 5 5 6 6 7 7 8 8 9 9 10 12 13 13 13
#>
#> $aid2
#> [1] 5 12 11 21 11 12 6 7 8 11 9 14 10 15 10 16 17 13 18 19 20
#>
#> $order
#> [1] 1 1 1 1 2 2 1 2 2 1 1 1 1 1 2 1 1 1 1 1 1
The retrieve
function call with the bonds slot
extracts information about the bonds in the compound. The output
includes three vectors: aid1 and aid2 represent the atom IDs involved in
each bond, and order represents the bond order (e.g., single, double
bonds).
retrieve(object = compound_2244, .slot = "coords", .to.data.frame = FALSE)
#> $Identifier
#> [1] 2244
#>
#> $type
#> [1] 1 5 255
#>
#> $aid
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
#>
#> $conformers
#> $conformers[[1]]
#> $conformers[[1]]$x
#> [1] 3.7320 6.3301 4.5981 2.8660 4.5981 5.4641 4.5981 6.3301 5.4641 6.3301
#> [11] 5.4641 2.8660 2.0000 4.0611 6.8671 5.4641 6.8671 2.3100 1.4631 1.6900
#> [21] 6.3301
#>
#> $conformers[[1]]$y
#> [1] -0.0600 1.4400 1.4400 -1.5600 -0.5600 -0.0600 -1.5600 -0.5600 -2.0600
#> [10] -1.5600 0.9400 -0.5600 -0.0600 -1.8700 -0.2500 -2.6800 -1.8700 0.4769
#> [19] 0.2500 -0.5969 2.0600
#>
#> $conformers[[1]]$style
#> $conformers[[1]]$style$annotation
#> [1] 8 8 8 8 8 8
#>
#> $conformers[[1]]$style$aid1
#> [1] 5 5 6 7 8 9
#>
#> $conformers[[1]]$style$aid2
#> [1] 6 7 8 9 10 10
The retrieve
function call with the coords slot
extracts the coordinates of the atoms in the compound. The output
includes details such as:
type
: Represents the type of coordinates.aid
: Atom IDs for which the coordinates are
provided.conformers
: Contains the conformer data, including x
and y coordinates for each atom. This provides the spatial arrangement
of the atoms in the compound, which is crucial for understanding the
compound’s 3D structure and interactions.retrieve(object = compound_2244, .slot = "props", .to.data.frame = TRUE)
#> # A tibble: 22 × 11
#> Identifier label name datatype release value implementation version software
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2244 Comp… Cano… 5 2021.1… 1 <NA> <NA> <NA>
#> 2 2244 Comp… <NA> 7 2021.1… 212 E_COMPLEXITY 3.4.8.… Cactvs
#> 3 2244 Count Hydr… 5 2021.1… 4 E_NHACCEPTORS 3.4.8.… Cactvs
#> 4 2244 Count Hydr… 5 2021.1… 1 E_NHDONORS 3.4.8.… Cactvs
#> 5 2244 Count Rota… 5 2021.1… 3 E_NROTBONDS 3.4.8.… Cactvs
#> 6 2244 Fing… SubS… 16 2021.1… 0000… E_SCREEN 3.4.8.… Cactvs
#> 7 2244 IUPA… Allo… 1 2021.1… 2-ac… <NA> 2.7.0 Lexiche…
#> 8 2244 IUPA… CAS-… 1 2021.1… 2-ac… <NA> 2.7.0 Lexiche…
#> 9 2244 IUPA… Mark… 1 2021.1… 2-ac… <NA> 2.7.0 Lexiche…
#> 10 2244 IUPA… Pref… 1 2021.1… 2-ac… <NA> 2.7.0 Lexiche…
#> # ℹ 12 more rows
#> # ℹ 2 more variables: source <chr>, parameters <chr>
The retrieve
function call with the props slot
extracts detailed properties of the compound, including information such
as label, name, data type, release, value, implementation, version,
software, and source. This comprehensive information covers various
physical, chemical, and structural properties of the compound.
retrieve(object = compound_2244, .slot = "count", .to.data.frame = TRUE)
#> # A tibble: 10 × 3
#> Identifier Name Value
#> <dbl> <chr> <dbl>
#> 1 2244 heavy_atom 13
#> 2 2244 atom_chiral 0
#> 3 2244 atom_chiral_def 0
#> 4 2244 atom_chiral_undef 0
#> 5 2244 bond_chiral 0
#> 6 2244 bond_chiral_def 0
#> 7 2244 bond_chiral_undef 0
#> 8 2244 isotope_atom 0
#> 9 2244 covalent_unit 1
#> 10 2244 tautomers -1
The retrieve
function call with the count slot
extracts various count metrics for the compound. The output includes a
tibble with two columns: Name and Value. This information includes:
heavy_atom: The number of heavy atoms in the compound. atom_chiral, atom_chiral_def, atom_chiral_undef: Counts of chiral atoms and their defined/undefined states. bond_chiral, bond_chiral_def, bond_chiral_undef: Counts of chiral bonds and their defined/undefined states. isotope_atom: The number of isotopic atoms. covalent_unit: The number of covalent units in the compound. tautomers: The number of tautomers.
These counts provide insights into the compound’s chemical complexity and stereochemistry, which are essential for understanding its reactivity and biological activity.
The get_substances
function retrieves substance data
from the PubChem database based on a specified identifier and namespace.
This function is crucial for obtaining detailed information about a
substance, including its various identifiers, sources, synonyms,
comments, cross-references, and compound details.
Here are the main parameters of the function:
identifier
: A character or numeric vector specifying
the identifiers for the request. This can be a substance ID (SID), name,
or other supported identifier.namespace
: Specifies the namespace for the request. The
default value is ‘sid’.operation
: Specifies the operation to be performed on
the input records. The default value is NULL.searchtype
: Specifies the type of search to be
performed. The default value is NULL.options
: Additional parameters for the query. These can
be used to customize the search further.In this example, we retrieve substance data for aspirin:
substance_data <- get_substances(
identifier = "aspirin",
namespace = "name"
)
substance_data
#>
#> An object of class 'PubChemInstanceList'
#>
#> Number of instances: 1
#> - Domain: Substance
#> - Namespace: Name
#> - Identifier(s): aspirin
#>
#> * Run 'instance(...)' function to extract specific instances from the complete list, and
#> 'request_args(...)' to see all the requested instance identifiers.
#> * See ?instance and ?request_args for details.
The above code retrieves substance data for the identifier “aspirin”. The output indicates that the request details include the domain (Substance), namespace (Name), and identifier (aspirin). It also mentions that you can run the instance(…) function to extract specific instances and request_args(…) to see all requested instance identifiers.
To see the arguments used in the request, use the request_args function:
request_args(object = substance_data)
#> $namespace
#> [1] "name"
#>
#> $identifier
#> [1] "aspirin"
#>
#> $domain
#> [1] "substance"
This output shows the namespace (“name”), identifier (“aspirin”), and domain (“substance”) used in the request.
To extract specific substance data, we use the instance function with the specified identifier:
substance_aspirin <- instance(object = substance_data, .which = "aspirin")
substance_aspirin
#>
#> Substance Data from PubChem Database
#>
#> Request Details:
#> - Domain: Substance
#> - Namespace: Name
#> - Identifier: aspirin
#>
#> Number of substances retrieved: 146
#>
#> Substances contain data within following slots;
#> - sid (2): [<named numeric>] id, version
#> - source (1): [<named list>] db
#> - synonyms (6): [<unnamed character>]
#> - comment (2): [<unnamed character>]
#> - xref (4): [<unnamed list>]
#> - compound (2): [<unnamed list>]
#>
#> NOTE: Run getter function 'retrieve()' with element name above to extract data from corresponding list.
#> See ?retrieve for details.
The above output shows the request details for aspirin and indicates that 143 substances were retrieved. It lists the slots available for further data extraction. These slots include sid, source, synonyms, comment, xref, and compound.
To extract data from the sid slot as a data frame:
retrieve(object = substance_aspirin, .slot = "sid", .to.data.frame = TRUE)
#> # A tibble: 2 × 3
#> Identifier Name Value
#> <chr> <chr> <dbl>
#> 1 aspirin id 4594
#> 2 aspirin version 10
This output shows the id and version for the substance “aspirin”. The id is 4594 and the version is 10.
To extract data from the source slot as a data frame:
retrieve(object = substance_aspirin, .slot = "source", .to.data.frame = TRUE)
#> # A tibble: 1 × 3
#> Identifier name source_id
#> <chr> <chr> <chr>
#> 1 aspirin KEGG C01405
This output shows the source information for “aspirin”. The source is KEGG, and the source ID is C01405.
To extract data from the synonyms slot:
retrieve(object = substance_aspirin, .slot = "synonyms", .to.data.frame = FALSE)
#> $Identifier
#> [1] "aspirin"
#>
#> [[2]]
#> [1] "2-Acetoxybenzenecarboxylic acid"
#>
#> [[3]]
#> [1] "50-78-2"
#>
#> [[4]]
#> [1] "Acetylsalicylate"
#>
#> [[5]]
#> [1] "Acetylsalicylic acid"
#>
#> [[6]]
#> [1] "Aspirin"
#>
#> [[7]]
#> [1] "C01405"
This output lists the synonyms for “aspirin”. These include “2-Acetoxybenzenecarboxylic acid”, “50-78-2”, “Acetylsalicylate”, “Acetylsalicylic acid”, “Aspirin”, and “C01405”.
To extract data from the comment slot with verbosity:
retrieve(object = substance_aspirin, .slot = "comment", .to.data.frame = FALSE, .verbose = TRUE)
#>
#> PubChem Substance Details (comment)
#>
#> Same as: <a href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=7847177">D00109</a>
#> Is a reactant of enzyme EC: 3.1.1.55
This output shows comments related to “aspirin”. It indicates that “aspirin” is the same as D00109 and is a reactant of the enzyme EC: 3.1.1.55.
To extract data from the xref slot with verbosity:
retrieve(object = substance_aspirin, .slot = "xref", .to.data.frame = FALSE, .verbose = TRUE)
#>
#> PubChem Substance Details (xref)
#>
#> > Source: regid
#> Value: C01405
#>
#> > Source: rn
#> Value: 50-78-2
#>
#> > Source: dburl
#> Value: http://www.genome.jp/kegg/
#>
#> > Source: sburl
#> Value: http://www.genome.jp/dbget-bin/www_bget?cpd:C01405
This output shows cross-references for “aspirin”. It includes the source “regid” with value C01405, the source “rn” with value 50-78-2, the source “dburl” with the URL for the KEGG database, and the source “sburl” with a specific URL for the compound in the KEGG database.
To extract data from the compound slot:
retrieve(object = substance_aspirin, .slot = "compound", .to.data.frame = FALSE)
#> $Identifier
#> [1] "aspirin"
#>
#> [[2]]
#> [[2]]$id
#> type
#> 0
#>
#> [[2]]$atoms
#> [[2]]$atoms$aid
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
#>
#> [[2]]$atoms$element
#> [1] 8 8 8 8 6 6 6 6 6 6 6 6 6
#>
#>
#> [[2]]$bonds
#> [[2]]$bonds$aid1
#> [1] 1 1 2 3 4 5 5 5 6 8 9 10 11
#>
#> [[2]]$bonds$aid2
#> [1] 6 10 7 7 10 6 7 8 9 11 12 13 12
#>
#> [[2]]$bonds$order
#> [1] 1 1 2 1 2 1 1 2 2 1 1 1 2
#>
#>
#> [[2]]$coords
#> [[2]]$coords[[1]]
#> [[2]]$coords[[1]]$type
#> [1] 1 3
#>
#> [[2]]$coords[[1]]$aid
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
#>
#> [[2]]$coords[[1]]$conformers
#> [[2]]$coords[[1]]$conformers[[1]]
#> [[2]]$coords[[1]]$conformers[[1]]$x
#> [1] 22.7278 19.0863 21.5033 23.9396 20.2981 21.5226 20.2981 19.0928 21.5226
#> [10] 23.9396 19.0928 20.2981 25.1450
#>
#> [[2]]$coords[[1]]$conformers[[1]]$y
#> [1] -15.8040 -14.0004 -13.9940 -17.9642 -15.8105 -16.5029 -14.6927 -16.5029
#> [9] -17.9133 -16.4964 -17.9133 -18.6250 -15.7977
#>
#>
#>
#>
#>
#> [[2]]$charge
#> [1] 0
#>
#>
#> [[3]]
#> [[3]]$id
#> [[3]]$id$type
#> [1] 1
#>
#> [[3]]$id$id
#> cid
#> 2244
This output shows detailed compound data for “aspirin”. It includes the atom IDs, elements, bond information, coordinates, and charge. Additionally, it provides an ID of the compound in PubChem (cid 2244).
Each section provides specific details about the substance “aspirin”, making it possible to analyze different aspects of the substance data from the PubChem database.
The get_properties
function facilitates the retrieval of
specific chemical properties of compounds from the PubChem database.
This function is essential for researchers and chemists who require
detailed chemical information about various compounds.
The function queries the PubChem database using specified identifiers and returns a list or dataframe containing the requested properties of each compound. These properties can include molecular weight, chemical formula, isomeric SMILES, and more, depending on the available data in PubChem and the properties requested. You may find the full list of properties at https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest#section=Compound-Property-Tables.
Here are the main parameters of the function:
properties
: A character vector specifying the
properties to be retrieved. This vector can include various chemical
properties like mass, molecular formula, InChI, etc.
identifier
: A vector of identifiers for the
compounds. These identifiers can be either positive integers (such as
CIDs, which are unique compound identifiers in PubChem) or identifier
strings (such as chemical names, SMILES strings, InChI, etc.).
namespace
: Specifies the type of identifier provided
in the identifier parameter. The default value is ‘cid’. Common values
for this parameter include cid, name, smiles inchi
searchtype
: An optional parameter that defines the
type of search to be conducted. This can be used to refine and specify
the search strategy, such as exact match, substructure search, or
similarity search. By default, this parameter is set to NULL, indicating
a general search.
options
: Additional arguments for the query. These
can be used to customize the search further, but by default, it is set
to NULL.
propertyMatch
: A list that specifies matching
criteria for the properties. It includes:
.ignore.case
: A logical value indicating whether to
ignore case when matching property names. Default is FALSE.type
: Specifies the type of match to be performed, such
as “contain”, “exact”, “all”. Default is “contain”.In this example, we retrieve properties for the compounds “aspirin” and “ibuprofen”. The propertyMatch argument is used to specify matching criteria, such as ignoring case and using a “contain” type search. Therefore, this code retrieves the properties containing “mass”, “molecular”, and “inchi” for the compounds “aspirin” and “ibuprofen”, ignoring case sensitivity.
props <- get_properties(
properties = c("mass", "molecular", "inchi"),
identifier = c("aspirin", "ibuprofen"),
namespace = "name",
propertyMatch = list(
.ignore.case = TRUE,
type = "contain"
)
)
props
#>
#> An object of class 'PubChemInstanceList'
#>
#> Number of instances: 2
#> - Domain: Compound
#> - Namespace: Name
#> - Identifier(s): aspirin, ibuprofen
#>
#> * Run 'instance(...)' function to extract specific instances from the complete list, and
#> 'request_args(...)' to see all the requested instance identifiers.
#> * See ?instance and ?request_args for details.
To extract specific details from the property data, you can use the
retrieve
function with various slots:
retrieve(object = props, .which = "aspirin", .to.data.frame = TRUE)
#> # A tibble: 1 × 8
#> Identifier CID MolecularFormula MolecularWeight InChI InChIKey ExactMass
#> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
#> 1 aspirin 2244 C9H8O4 180.16 InChI=1S… BSYNRYM… 180.0422…
#> # ℹ 1 more variable: MonoisotopicMass <chr>
This code extracts the properties of aspirin, providing a detailed summary of its CID, molecular formula, molecular weight, InChI, InChIKey, exact mass, and monoisotopic mass.
retrieve(object = props, .which = "ibuprofen", .to.data.frame = FALSE)
#> $Identifier
#> [1] "ibuprofen"
#>
#> $CID
#> [1] 3672
#>
#> $MolecularFormula
#> [1] "C13H18O2"
#>
#> $MolecularWeight
#> [1] "206.28"
#>
#> $InChI
#> [1] "InChI=1S/C13H18O2/c1-9(2)8-11-4-6-12(7-5-11)10(3)13(14)15/h4-7,9-10H,8H2,1-3H3,(H,14,15)"
#>
#> $InChIKey
#> [1] "HEFNNWSXXWATRW-UHFFFAOYSA-N"
#>
#> $ExactMass
#> [1] "206.130679813"
#>
#> $MonoisotopicMass
#> [1] "206.130679813"
This code extracts the properties of ibuprofen and displays them as a list. The properties include CID, molecular formula, molecular weight, InChI, InChIKey, exact mass, and monoisotopic mass.
retrieve(object = props, .to.data.frame = TRUE, .combine.all = TRUE)
#> # A tibble: 2 × 8
#> Identifier CID MolecularFormula MolecularWeight InChI InChIKey ExactMass
#> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
#> 1 aspirin 2244 C9H8O4 180.16 InChI=1S… BSYNRYM… 180.0422…
#> 2 ibuprofen 3672 C13H18O2 206.28 InChI=1S… HEFNNWS… 206.1306…
#> # ℹ 1 more variable: MonoisotopicMass <chr>
This code combines the properties of all retrieved compounds (aspirin and ibuprofen) into a single dataframe, making it easier to compare their properties side-by-side.
The get_synonyms
function is designed to retrieve
synonyms for chemical compounds or substances from the PubChem database.
It is particularly useful for obtaining various names and identifiers
associated with a specific chemical entity.
The function queries the PubChem database for synonyms of a given identifier (such as a Compound ID or a chemical name) and returns a comprehensive list of alternative names and identifiers. This can include systematic names, trade names, registry numbers, and other forms of identification used in scientific literature and industry.
Here are the main parameters of the function:
identifier
: The identifier for which synonyms are to be
retrieved. This can be a numeric value (like a Compound ID) or a
character string (like a chemical name).namespac
e: Specifies the namespace for the query.
Common values include: ‘cid’ (Compound Identifier) [default] ‘name’
(Chemical Name)domain
: Specifies the domain for the request.
Typically, this is ‘compound’. The default value is ‘compound’.searchtype
: Specifies the type of search to be
performed. The default value is NULL.options
: Additional arguments for customization of the
request.In this example, we retrieve synonyms for the compound “aspirin”:
synonyms <- get_synonyms(
identifier = "aspirin",
namespace = "name"
)
synonyms
#>
#> Synonyms from PubChem Database
#>
#> Request Details:
#> - Domain: Compound
#> - Namespace: Name
#> - Identifier: aspirin
#>
#> NOTE: run 'synonyms(...)' to extract synonyms data. See ?synonyms for help.
The above code retrieves synonyms for the compound “aspirin” using its name as the identifier. The namespace is set to “name” to indicate that the identifier is a chemical name.
The output is a list of synonyms for the compound “aspirin”. These synonyms include various names and identifiers associated with the compound in different contexts, such as:
The retrieved synonyms provide a comprehensive view of the different names and identifiers that can be used to reference the same chemical entity in scientific literature and industry.
The get_all_sources
function facilitates the retrieval
of a list of all current depositors for substances or assays from the
PubChem database. This function is particularly useful for users who
need to identify and analyze the sources of chemical data.
The function queries the PubChem database to obtain a comprehensive list of sources (such as laboratories, companies, or research institutions) that have contributed substance or assay data. This information can be crucial for researchers and professionals who are tracking the origin of specific chemical data or assessing the diversity of data sources in PubChem.
Here is the main parameter of the function:
domain
: Specifies the domain for which sources are to
be retrieved. The domain can be either ‘substance’ or ‘assay’. The
default value is ‘substance’.In this example, we retrieve all sources for substances:
substance_sources <- get_all_sources(
domain = "substance"
)
substance_sources
#> [1] "001Chemical"
#> [2] "10X CHEM"
#> [3] "1st Scientific"
#> [4] "3A SpeedChemical Inc"
#> [5] "3B Scientific (Wuhan) Corp"
#> [6] "3WAY PHARM INC"
#> [7] "4C Pharma Scientific Inc"
#> [8] "A&J Pharmtech CO., LTD."
#> [9] "A1 BioChem Labs"
#> [10] "A2B Chem"
#> [11] "A2Z Chemical"
#> [12] "AA BLOCKS"
#> [13] "AAA Chemistry"
#> [14] "Aaron Chemicals LLC"
#> [15] "AAT Bioquest"
#> [16] "AbaChemScene"
#> [17] "Abacipharm Corp"
#> [18] "ABBLIS Chemicals"
#> [19] "Abbott Labs"
#> [20] "abcr GmbH"
#> [21] "Abe Lab, University of Texas MD Anderson Cancer Center"
#> [22] "ABI Chem"
#> [23] "AbMole Bioscience"
#> [24] "AbovChem LLC"
#> [25] "Abu Montakim Tareq, International Islamic University Chittagong"
#> [26] "Acadechem"
#> [27] "Accela ChemBio Inc."
#> [28] "Ace Therapeutics"
#> [29] "Acemol"
#> [30] "Aceschem Inc"
#> [31] "Acesobio"
#> [32] "Achem-Block"
#> [33] "Achemica"
#> [34] "Achemo Scientific Limited"
#> [35] "Achemtek"
#> [36] "Acmec Biochemical"
#> [37] "ACO Pharm Screening Compound"
#> [38] "Acorn PharmaTech Product List"
#> [39] "ACT Chemical"
#> [40] "Activate Scientific"
#> [41] "Active Biopharma"
#> [42] "Adooq BioScience"
#> [43] "Advanced Technology & Industrial Co., Ltd."
#> [44] "AEchem Scientific Corp., USA"
#> [45] "Agios Pharmaceuticals"
#> [46] "AHH Chemical co.,ltd"
#> [47] "AIBioTech, LLC"
#> [48] "AK Scientific, Inc. (AKSCI)"
#> [49] "AKos Consulting & Solutions"
#> [50] "Aladdin"
#> [51] "Alagar Yadav, Karpagam University"
#> [52] "Alcatraz Chemicals"
#> [53] "AlchemyPharm"
#> [54] "Alfa Chemistry"
#> [55] "AlfaChemInvent LLC"
#> [56] "Alichem"
#> [57] "Alinda Chemical Trade Company Ltd"
#> [58] "ALKEMIX"
#> [59] "Allbio Pharm Co., Ltd"
#> [60] "Alomone Labs"
#> [61] "Alsachim"
#> [62] "Amadis Chemical"
#> [63] "Amatye"
#> [64] "Ambeed"
#> [65] "Ambinter"
#> [66] "Ambit Biosciences"
#> [67] "Amfluoro"
#> [68] "AmicBase - Antimicrobial Activities"
#> [69] "Ampyridine Co.,Ltd"
#> [70] "AN PharmaTech"
#> [71] "Analytical Resources Core (ARC), Colorado State University (CSU)"
#> [72] "Angayarkanni Lab, Department of Microbial Biotechnology, Bharathiar University"
#> [73] "Angel Pharmatech Ltd."
#> [74] "Angene Chemical"
#> [75] "Annker Organics"
#> [76] "Ansion Pharma"
#> [77] "Anten Chemical"
#> [78] "Anward"
#> [79] "AOBChem USA"
#> [80] "AOBIOUS INC"
#> [81] "Apeiron Synthesis"
#> [82] "ApexBio Technology"
#> [83] "Apexmol"
#> [84] "Apollo Scientific"
#> [85] "April Scientific Inc."
#> [86] "Aribo Reagent"
#> [87] "Ark Pharm, Inc."
#> [88] "Ark Pharma Scientific Limited"
#> [89] "Aromalake Chemical"
#> [90] "Aromsyn catalogue"
#> [91] "Aronis"
#> [92] "Arromax Pharmatech Co., Ltd"
#> [93] "ASAS Labor GmbH"
#> [94] "ASCA GmbH - Angewandte Synthesechemie Adlershof"
#> [95] "ASINEX"
#> [96] "Assembly Blocks Pvt. Ltd."
#> [97] "AstaTech, Inc."
#> [98] "ATPase-Kinase Pharmacophores (AKP)"
#> [99] "Aurora Fine Chemicals LLC"
#> [100] "Aurum Pharmatech LLC"
#> [101] "AVA Biochem Switzerland"
#> [102] "AvaChem Scientific"
#> [103] "Avanti Polar Lipids"
#> [104] "Avantor Inc"
#> [105] "AX Molecules Inc"
#> [106] "Axispharm"
#> [107] "Axon Medchem"
#> [108] "AZEPINE"
#> [109] "B&C Chemical"
#> [110] "Baker Lab, Chemistry Department, The University of North Carolina at Chapel Hill"
#> [111] "Bangyong Technology Co., Ltd."
#> [112] "Bar-Sagi Lab, NYU School of Medicine"
#> [113] "Barrie Walker, BARK Information Services"
#> [114] "Baynoe Chem"
#> [115] "Be-Medicine"
#> [116] "Beijing Advanced Technology Co, Ltd"
#> [117] "Belisle Laboratory, Department of Microbiology, Immunology and Pathology, Colorado State University"
#> [118] "Beltsville Human Nutrition Research Center, ARS, USDA"
#> [119] "BenchChem"
#> [120] "BePharm Ltd."
#> [121] "BerrChemical"
#> [122] "Bertin Pharma"
#> [123] "Bestdo Inc"
#> [124] "Bhaskar Lab, Department of Zoology, Sri Venkateswara University, Tirupati, Andhra Pradesh, INDIA"
#> [125] "Bic Biotech"
#> [126] "BIDD"
#> [127] "BIND"
#> [128] "BindingDB"
#> [129] "BioAustralis Fine Chemicals"
#> [130] "BioChemPartner"
#> [131] "Biocore"
#> [132] "BioCrick"
#> [133] "BioCyc"
#> [134] "Biological Magnetic Resonance Data Bank (BMRB)"
#> [135] "Biomatrik Inc. (Monodispersed PEG Manufacturer)"
#> [136] "Biopharma PEG Scientific Inc"
#> [137] "Bioprocess Technology Lab, Department of Microbiology, Bharathidasan University"
#> [138] "Biopurify Phytochemicals"
#> [139] "Biorbyt"
#> [140] "Biosynce Pharmatech"
#> [141] "Biosynth"
#> [142] "BLD Pharm"
#> [143] "BOC Sciences"
#> [144] "Boehringer Ingelheim - opnMe.com"
#> [145] "Boerchem"
#> [146] "Bonglee Kim Lab, Department of Cancer Preventive Material Development, Kyung Hee University"
#> [147] "Boone Lab, Chemical Genomics, University of Toronto"
#> [148] "Boroncore"
#> [149] "Boronpharm"
#> [150] "Bradner/Qi Labs at DFCI"
#> [151] "Brenntag Connect"
#> [152] "Bright Pigments, Inc"
#> [153] "Broad Institute"
#> [154] "BroadPharm"
#> [155] "Bu Lab, School of Pharmaceutical Sciences, Sun Yat-Sen University"
#> [156] "Buhrlage Lab, Dana-Farber Cancer Institute and Novartis Institutes for BioMedical Research (Cambridge, Mass)"
#> [157] "Burek Lab, Department of Anaesthesiology, Intensive Care, Emergency and Pain Med, University Hospital Wuerzburg"
#> [158] "Burnham Center for Chemical Genomics"
#> [159] "C. David Weaver Laboratory, Vanderbilt University"
#> [160] "Calbiochem"
#> [161] "California Peptide Research, Inc."
#> [162] "Cancer Functional Genomics, Wellcome Trust Sanger Institute"
#> [163] "Cancer Research UK Cambridge Research Institute"
#> [164] "Cangzhou Enke Pharma Tech Co.,Ltd."
#> [165] "CAPOT"
#> [166] "Carbott PharmTech Inc."
#> [167] "Carcinogenic Potency Database (CPDB)"
#> [168] "Career Henan Chemical Co"
#> [169] "Cayman Chemical"
#> [170] "CC_PMLSC"
#> [171] "CCSbase"
#> [172] "CD Biosynsis"
#> [173] "CD Formulation"
#> [174] "CEGChem"
#> [175] "Center for Chemical Genomics, University of Michigan"
#> [176] "Center for Natural Product Technologies at UIC (CENAPT)"
#> [177] "CF Plus Chemicals"
#> [178] "ChangChem"
#> [179] "Changzhou Highassay Chemical Co., Ltd"
#> [180] "Changzhou Naide Chemical"
#> [181] "ChEBI"
#> [182] "Chem-Impex International, Inc."
#> [183] "Chem-Space.com Database"
#> [184] "Chemaphor Chemical Services"
#> [185] "ChemBank"
#> [186] "Chembase.cn"
#> [187] "ChemBioBank"
#> [188] "ChEMBL"
#> [189] "ChemBlock"
#> [190] "ChemBridge"
#> [191] "Chemchart"
#> [192] "ChemDB"
#> [193] "ChemDiv"
#> [194] "Chemenu Inc."
#> [195] "ChemExper Chemical Directory"
#> [196] "ChemFaces"
#> [197] "ChemFish Tokyo Co., Ltd."
#> [198] "Chemhere"
#> [199] "Chemical Biology Department, Max Planck Institute of Molecular Physiology"
#> [200] "Chemical Carcinogenesis Research Information System (CCRIS)"
#> [201] "chemical genetic matrix"
#> [202] "Chemical Probes Portal"
#> [203] "Chemical Synthesis Database"
#> [204] "ChemIDplus"
#> [205] "Chemieliva Pharmaceutical Co., Ltd"
#> [206] "ChemieTek"
#> [207] "Cheminformatics Friedrich-Schiller-University Jena"
#> [208] "ChemLabIndex"
#> [209] "ChemMol"
#> [210] "Chemodex Ltd."
#> [211] "Chemoproteomic Metabolic Pathway Resource, Scripps University"
#> [212] "Chemotion"
#> [213] "ChemProbes"
#> [214] "ChemShuttle"
#> [215] "Chemsoon"
#> [216] "ChemSpider"
#> [217] "ChemTik"
#> [218] "ChemWise"
#> [219] "Chen Lab, School of Medicine, Emory University"
#> [220] "CHESS fine organics"
#> [221] "China MainChem Co., Ltd"
#> [222] "Chiralblock Biosciences"
#> [223] "CHIRALEN"
#> [224] "Chirial Bio-material Co., Ltd."
#> [225] "Chiron AS"
#> [226] "Chris Southan"
#> [227] "Chung Lab, Department of Pediatrics, Emory University"
#> [228] "Circadian Research, Kay Laboratory, University of California at San Diego (UCSD)"
#> [229] "Ciulli Lab, Division of Biological Chemistry and Drug Discovery, University of Dundee"
#> [230] "Clearsynth"
#> [231] "Clinivex"
#> [232] "CLRI (CSIR)"
#> [233] "CMLD-BU"
#> [234] "Collaborative Drug Discovery, Inc."
#> [235] "Columbia University Molecular Screening Center"
#> [236] "Combi-Blocks"
#> [237] "Comparative Toxicogenomics Database (CTD)"
#> [238] "Compass Remediation Chemicals"
#> [239] "Cooke Chemical Co., Ltd"
#> [240] "CoreSyn"
#> [241] "Corson Lab, School of Medicine, Indiana University"
#> [242] "Cosutin Industrial"
#> [243] "Creasyn Finechem"
#> [244] "Creative Biogene"
#> [245] "Creative Biolabs"
#> [246] "Creative Enzymes"
#> [247] "Creative Proteomics"
#> [248] "CreativePeptides"
#> [249] "Crooks Lab, College of Pharmacy, University of Arkansas for Medical Sciences"
#> [250] "Crystallography Open Database (COD)"
#> [251] "CSNpharm"
#> [252] "Cure First"
#> [253] "cyandye llc"
#> [254] "Cyclic PharmaTech"
#> [255] "CYH Pharma"
#> [256] "CymitQuimica"
#> [257] "Dao Fu Chemical"
#> [258] "DAOGE BIOPHARMA"
#> [259] "Davey Lab, Department of Microbiology, NEIDL, Boston University"
#> [260] "Day Biochem"
#> [261] "DC Chemicals"
#> [262] "Debye Scientific Co., Ltd"
#> [263] "Denison Lab, Department of Environmental Toxicology, UC Davis"
#> [264] "Department of drug chemistry, Lithuanian University of Health Sciences"
#> [265] "Department of Molecular Cell Biology, Weizmann Institute of Science"
#> [266] "Department of Pharmacy, LMU"
#> [267] "Derbyshire Lab, Chemistry Department, Duke University"
#> [268] "DerMardirossian Lab, San Diego Biomedical Research Institute"
#> [269] "Dharmacon, a Horizon Discovery Group company"
#> [270] "Diabetic Complications Screening"
#> [271] "DiRusso Lab, Biochemistry Department, University of Nebraska"
#> [272] "DiscoveryGate"
#> [273] "Domainex"
#> [