Skip to content

1.6 PD BQ Exercises

Arjun Prasad edited this page Oct 15, 2024 · 5 revisions

Run some queries using the web interface

These are live data, so the numbers here could change as isolates are added to the system or withdrawn by submitters.

Many SQL tutorials are available on the web, you can easily search them up, and Google Gemini can be useful as well especially when you don't know the names of what you're trying to do. If you need an introduction to BigQuery see https://cloud.google.com/bigquery/docs/introduction-sql See also our 4. Some Useful Links page.

1. How many Acinetobacter baumannii are there in the Isolates Browser

We can count rows very easily using standard SQL. The only thing that is different is the way tables are referred to with "backticks" around the address of the table.

select count(1) 
from `ncbi-pathogen-detect.pdbrowser.isolates`
where taxgroup_name = 'Acinetobacter baumannii'

You should get a number ~ 36,000

2. How many of those A. baumannii have public assemblies?

Since not all of the isolates in the Pathogen Detection system have public assemblies in GenBank we can count just the ones that do.

select count(1)
from `ncbi-pathogen-detect.pdbrowser.isolates`
where taxgroup_name = 'Acinetobacter baumannii'
and asm_acc is not NULL

You should get a number ~ 30,000

3. How many have AST results for tobramycin?

In order to search repeated record fields like 'AST_phenotypes' we have to use a slightly different technique such as the unnest function which expands the contents of that field into its own table with each row including one row or group of elements from the AST_phenotypes field.

select count(1)
from `ncbi-pathogen-detect.pdbrowser.isolates` i
where taxgroup_name = 'Acinetobacter baumannii'
and asm_acc is not NULL
and (select count(1)
    from unnest(i.AST_phenotypes)
    where antibiotic like 'tobramycin' ) >=1

I got 811 at the time of writing this. A far smaller number than the number of A. baumannii isolates.

4. How many have AST results for tobramycin that conflict?

Some submitters have run their isolates through multiple phenotyping platforms and have submitted multiple different AST test results for the isolates. How often do different platformst give different results?

The AST browser has data from antibiotic susceptibility tests (AST), including several tests of the drug meropenem. We can search that table and "join" it with itself to identify isolates with conflicting data.

select count(distinct(ast1.biosample_acc))
from `ncbi-pathogen-detect.pdbrowser.ast` ast1
join `ncbi-pathogen-detect.pdbrowser.ast` ast2 
    on ast1.biosample_acc = ast2.biosample_acc 
    and ast1.phenotype != ast2.phenotype
    and ast1.antibiotic = 'tobramycin'
    and ast2.antibiotic = 'tobramycin'

148 isolates had multiple tobramycin test results that are different from each other

5. Take a look at the MICs for results that conflict

Very similar to the query above, but this time we want to take a look at the raw data. Looking at the data to see if it makes sense is often a part of good research.

select distinct ast1.target_acc, ast1.biosample_acc, ast1.phenotype, 
    ast1.mic, ast1.standard, ast1.platform
from `ncbi-pathogen-detect.pdbrowser.ast` ast1
join `ncbi-pathogen-detect.pdbrowser.ast` ast2
    on ast1.biosample_acc = ast2.biosample_acc
    and ast1.phenotype != ast2.phenotype
    and ast1.antibiotic = 'tobramycin'
    and ast2.antibiotic = 'tobramycin'
order by ast1.biosample_acc, ast1.mic, ast1.platform

Results for MICs that conflict

6. What is the distribution of MICs for tobramycin resistant A. baumannii?

MIC cutoffs can change, and submitters may make mistakes. As of 2023, the CLSI cutoff MIC for tobramycin resistance in Pseudomonas aeruginosa is 8 μg/mL. This means that isolates with a MIC of 8 μg/mL or higher are considered resistant to tobramycin.

What are the MICs for "resistant" isolates?

select mic, count(1), standard
from `ncbi-pathogen-detect.pdbrowser.ast`
where antibiotic = 'tobramycin'
and phenotype = 'resistant'
and taxgroup_name = 'Acinetobacter baumannii'
group by mic, standard
order by mic

7. What MICs do people call susceptible and resistant for tobramycin?

Adding the phenotype to the group by clause and removing resistant phenotype as a requirement allows us to view what people call different MICs.

select mic, phenotype, count(1), standard
from `ncbi-pathogen-detect.pdbrowser.ast`
where antibiotic = 'tobramycin'
and taxgroup_name = 'Acinetobacter baumannii'
group by mic, phenotype, standard
order by mic, phenotype

Notice that at an MIC of 8 there are both intermediate and resistant calls because of changes to the CLSI standard over time.

Linking between tables using "join"

Above we replicated some of the things that Mike showed you how to do in the browsers. As you have seen above with the joins, SQL has additional analysis capabilities that arent reflected in the browsers. For example we can link between tables to find isolates that are phenotypically resistant to a drug but don't have any resistance elements for that drug.

8. Find isolates tested for meropenem resistance with mic >= 2

The AST browser has data from antibiotic susceptibility tests (AST), including several tests of the drug meropenem.

select ast.target_acc, ast.biosample_acc, ast.antibiotic, ast.measurement_sign, ast.mic
from `ncbi-pathogen-detect.pdbrowser.ast` ast
where ast.antibiotic = 'meropenem'
and ast.mic >= 2

I got 1,684 rows of meropenem resistance tests at the time of this writing.

9. Find meropenem isolates with MIC >= 2 and no known carbapenem resistance genes or point mutations

SQL joins allow you to merge the data from two tables. We can "join" the AST table data from the above query to the MicroBIGG-E table to identify the isolates that don't have a carbapenem resistance gene or point mutation?

select ast.target_acc, ast.biosample_acc, ast.antibiotic, ast.measurement_sign, ast.mic
from `ncbi-pathogen-detect.pdbrowser.ast` ast

left join `ncbi-pathogen-detect.pdbrowser.microbigge` mb -- all rows from ast and any matching rows from mb
  on mb.target_acc = ast.target_acc
  and mb.subclass = 'CARBAPENEM'

where ast.antibiotic = 'meropenem'
and ast.mic >= 2

and mb.subclass is NULL -- no rows from mb match the requirements for the join
order by target_acc

I got 738 resistant tests for isolates without any "CARBAPENEM" resistance elements from MicroBIGG-E.

10. Join to the isolates browser to make sure AMRFinderPlus results are in MicroBIGG-E

One complication is that not all isolates have their AMRFinderPlus results in MicroBIGG-E, so the more refined way to do this is to also link to the isolates table to make sure that those isolates have results in MicroBIGG-E (they have public assembly accessions and AMRFinderPlus runs).

select ast.target_acc, ast.biosample_acc, ast.antibiotic, ast.measurement_sign, ast.mic
from `ncbi-pathogen-detect.pdbrowser.ast` ast
left join `ncbi-pathogen-detect.pdbrowser.microbigge` mb -- all rows from ast and any matching rows from mb
  on mb.target_acc = ast.target_acc
  and mb.subclass = 'CARBAPENEM'

inner join `ncbi-pathogen-detect.pdbrowser.isolates` i
  on i.target_acc = ast.target_acc
  and i.amrfinderplus_version is not NULL -- AMRFinderPlus was run
  and i.asm_acc is not NULL -- There are public assemblies in GenBank
  
where ast.antibiotic = 'meropenem'
and ast.mic >= 2
and mb.subclass is NULL -- no rows from mb match the requirements for the join
order by target_acc

Now I get 262 isolates with those characteristics.

Clone this wiki locally