Add catalog queries to `lksearch` #23

christinahedges · 2024-07-29T18:41:57Z

lksearch is our package for searching and finding data. These queries all require the internet to work. It seems that the logical place for catalog queries to live is inside lksearch.

Catalog queries are useful for users, and can be agnostic of any particular file. @rebekah9969 has already opened a PR against lightkurve with this functionality, I have fixed the remaining issues with the PR and changed it to output a pandas.DataFrame object, to be more inline with the rest of lksearch.

Here's an example of the functional call with output

There are convenience functions to query each individual catalog by name and it will let you query by pixel distance.

from lksearch import query_EPIC
catalog = query_EPIC(SkyCoord.from_name('Kepler-10'),
              Time.now(),
              radius=2*u.pixel,
              magnitude_limit=18)

src/lksearch/__init__.py

src/lksearch/catalog.py

christinahedges · 2024-07-30T13:57:13Z

@Nschanche thanks for a great review, I've updated this PR based on your comments, and a few more tweaks.

I updated epoch to have a sane default
I changed the outputs to remove the "motion" columns. This was on purpose because the returned table has already been updated for motion, so I didn't want users to be confused over which RA/Dec to use
I added the ability to get the skycoord returned instead of a table, which will be helpful for some workflows.
I added the name resolve so if users put in "Au mic" it will resolve inside the query.
I added a tutorial. To do this I had to make a new landing page so all the tutorials are accessible.

@tylerapritchard could you give this a review?

@Nschanche and @tylerapritchard are you happy with this API? An alternate would be:

from lksearch.catalogs import TICSearch
c = SkyCoord.from_name('Kepler-10')
sr = TICSearch(c)
sr.table
sr.skycoords

But I think this might be confusing because there's no e.g. download function, and no need to filter anything. I think this API is simpler for people, but what do you think?

tylerapritchard · 2024-07-31T14:03:13Z

This looks great - I'll review the code - but from a high level what if we renamed catalog to CatalogSearch to keep the naming theme but not change the API? I think this is still fairly consistent. Overall I think I like the simplicity of the current setup and building out the same API might be over-engineering this.

However, if we did want to change the API for consistency across the package I could imagine a .download() function saving a csv file of the table (which could be a simple pass-through to dataframe.to_csv()) & a .filter() for somewhat who wanted to cut the table down based on say proper motion or magnitude if they realized that they grabbed too many sources in their initial query.

christinahedges · 2024-08-01T14:20:52Z

Based on comments at TSC3 it seems like people want to be able to put in a TIC/KIC/EPIC ID and get likely matches. We might do something like not giving people a cross-match exactly, but returning the likely hits from the catalog with a separation and relative flux so they could make their own cut.

We might need a facility to do this that can take in multiple TICIDs (see TessProposalTool...)

I think folks will want to get out extra keywords which we could add e.g. the stellar parameters.

christinahedges · 2024-08-05T20:17:43Z

I updated this PR to include stellar parameters as outputs and updated the tutorials.

@tylerapritchard has pointed out that we could have an API like the one I described above e.g.

from lksearch.catalogs import TICSearch
c = SkyCoord.from_name('Kepler-10')
sr = TICSearch(c)
sr.table
sr.skycoords

But I think this is overkill given what we want unless we add in some crossmatching functionality from the TESS proposal tool stuff. Let's keep these as simple functions query_CATALOGNAME until the action items below are taken care of, then we can revisit this question.

I think the piece missing to make this work is a more flexible interface like in lksearch.MASTSearch so people can input e.g. a string of coordinates or a tuple of coordinates, e.g.

query_TIC(f"{ra}, {dec}")
query_TIC((ra, dec))

and a way to resolve TIC, KIC, and EPIC names so we can do:

query_TIC("TIC 3777809370")
query_gaia("TIC 3777809370")
query_EPIC("TIC 3777809370")
query_KIC("TIC 3777809370")

To do this we are going to need a way to query a given catalog by the ID and then return a SkyCoord object e.g.

def TIC_to_SkyCoord(ids:List[str]) -> SkyCoord:
    ...
def KIC_to_SkyCoord(ids:List[str]) -> SkyCoord:
    ...
def EPIC_to_SkyCoord(ids:List[str]) -> SkyCoord:
    ...
def gaiadr3_to_SkyCoord(ids:List[str]) -> SkyCoord:
    ...

This should be a fast query that isn't a radius query to each catalog, because the unique ID name should give us a unique row without a radius query. We need to dig in to figure out how to do this in astroquery.

People will inevitably want a crossmatch, not a radius query. For now, let's keep this as a radius query. That means we want to be able to do the first two of these, but not the last.

Radius query: single Coordinate and radius -> many possible catalog rows
ID query: one or many catalog IDs -> exact match catalog rows
Crossmatch: one or many Coordinates -> same number of "best match" catalog rows

Once the below are addressed we can talk about a) whether we add crossmatching b) whether we change the API.

TODO:

Update interface so that query_X can accept a tuple of RA/Dec, a string of RA/Dec
Add ID queries for each catalog (TIC_to_SkyCoord, EPIC_to_SkyCoord, etc). Each function should have a check that the input string has the right format e.g. "TIC XXXX", "gaiadr3 XXXXX" etc. To enable this to be future proof, this should take in a list of TICs and return a SkyCoord with multiple entries. Users should not (usually) be using these functions directly, instead they will use query_TIC, query_EPIC etc.
Add interface for query_X that, when passed an ID from that catalog, will do an ID match instead of a radius query.
Add tests for ID queries e.g. query_TIC("TIC XXXXX") should return one result and it should be an exact match. query_KIC("TIC XXXXX") should not return a single result as there is not a one-to-one match between the two catalogs.
Update the tutorial to add an example of querying by ID.

…lity? Separate PR probably

christinahedges

@tylerapritchard This is a really awesome PR! There's lots of great functionality here. Other than the bug fixes we identified the main comments here are

You can remove the CatalogResult class and then have everything be in pandas dataframes. This will be important to keep the outputs consistent for users
The names need to follow Python style of having functions be lower case. I suggest we follow verb_noun(noun) style for function names and make the API something like this:

from lksearch import MASTSearch, TESSSearch, KeplerSearch, K2Search
from lksearch.catalogs import query_region, query_ID, find_alternate_names, match_names_to_catalogs
query_region(region params)
query_id(ID)
find_alternate_names(name)
match_names_to_catalogs(name, match_strings=["tic", 'kic'])

christinahedges · 2024-11-21T21:26:44Z

src/lksearch/CatalogSearch.py

+    input_catalog: str = None,
+    max_results: int = None,
+    return_skycoord: bool = False,
+    epoch: Union[str, Time] = None,


Can we update epoch across this PR to output_epoch so that it's clear this isn't changing the input?

christinahedges · 2024-11-21T21:31:27Z

src/lksearch/CatalogSearch.py

+
+def QueryID(
+    search_object: Union[str, int, list[str, int]],
+    catalog: str = None,


If we have an input catalog, should be called output catalog?

christinahedges · 2024-11-21T21:43:40Z

src/lksearch/CatalogSearch.py

+
+# use simbad to get name/ID crossmatches
+def IDLookup(search_input: Union[str, list[str]], match: Union[str, list[str]] = None):
+    """Uses the Simbad name resolver and ids to disambiguate the search_input string or list.


This functionality is excellent, can we break it into two functions so that we don't have two different types of output

NameLookup -> Series/Table of all matching names

IDLookup -> Table of all matching IDs

christinahedges · 2024-11-21T21:45:05Z

src/lksearch/CatalogSearch.py

@@ -0,0 +1,616 @@
+"""Catalog class to search various catalogs for missions"""


This module should be lowercase not upper case to meet Python standards, uppercase like this would indicate a class. CatalogResult can stay!

christinahedges · 2024-11-21T21:47:44Z

src/lksearch/CatalogSearch.py

+            for cat in np.atleast_1d(match):
+                mcat = cat.strip().replace(" ", "").lower()
+                cmatch = None
+                for sid in result[i]["id"]:


This appears to break with astroquery versions that are less recent as the column name is "ID", you could do a check earlier to get the name of the first column and set it to "id".

christinahedges · 2024-11-21T22:01:42Z

src/lksearch/CatalogSearch.py

+    return c
+
+
+class CatalogResult(Table):


If we remove this, we can remove the functionality to turn a table into a sky coordinate, and then everything can be pandas dataframes. I think this is the best trade, (we still have the functionality with return_skycoord=True) and we get to use pandas throughout the lksearch package outputs.

add catalog queries

4c185c2

christinahedges added the enhancement New feature or request label Jul 29, 2024

christinahedges requested review from Nschanche and tylerapritchard July 29, 2024 18:41

christinahedges added 2 commits July 29, 2024 14:43

update ruff [skip ci]

e5f63ff

fix ci

979fe41

Nschanche reviewed Jul 29, 2024

View reviewed changes

christinahedges added 3 commits July 30, 2024 09:47

added documentation

73b2a1e

ruff

7988abb

hotfix name resolve [skip ci]

c1190c6

update docs

e765afd

christinahedges assigned tylerapritchard Aug 5, 2024

christinahedges mentioned this pull request Sep 17, 2024

interact_sky() enhancements: more catalogs (ZTF, VSX); upgrade to Gaia DR3 lightkurve/lightkurve#1433

Open

8 tasks

tylerapritchard added 9 commits October 7, 2024 15:18

pre-meeting commit

28725e3

xmatch update

8224ac3

resolve merge

9d58fd8

updates, moved catalog dict to json file in data

cc09089

tutorials and documentation updates

fbe9dba

updated idlooklup filtering and docs

06db262

removed FFI search test due to MAST deprecation. Remove FFI functiona…

0a99bb1

…lity? Separate PR probably

ruff

8e2252f

fstring update

87862e3

christinahedges commented Nov 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add catalog queries to `lksearch` #23

Add catalog queries to `lksearch` #23

christinahedges commented Jul 29, 2024 •

edited

Loading

christinahedges commented Jul 30, 2024 •

edited

Loading

tylerapritchard commented Jul 31, 2024 •

edited

Loading

christinahedges commented Aug 1, 2024

christinahedges commented Aug 5, 2024 •

edited

Loading

christinahedges left a comment

christinahedges Nov 21, 2024

christinahedges Nov 21, 2024

christinahedges Nov 21, 2024

christinahedges Nov 21, 2024

christinahedges Nov 21, 2024

christinahedges Nov 21, 2024

		@@ -0,0 +1,616 @@
		"""Catalog class to search various catalogs for missions"""

Add catalog queries to lksearch #23

Are you sure you want to change the base?

Add catalog queries to lksearch #23

Conversation

christinahedges commented Jul 29, 2024 • edited Loading

christinahedges commented Jul 30, 2024 • edited Loading

tylerapritchard commented Jul 31, 2024 • edited Loading

christinahedges commented Aug 1, 2024

christinahedges commented Aug 5, 2024 • edited Loading

christinahedges left a comment

Choose a reason for hiding this comment

christinahedges Nov 21, 2024

Choose a reason for hiding this comment

christinahedges Nov 21, 2024

Choose a reason for hiding this comment

christinahedges Nov 21, 2024

Choose a reason for hiding this comment

christinahedges Nov 21, 2024

Choose a reason for hiding this comment

christinahedges Nov 21, 2024

Choose a reason for hiding this comment

christinahedges Nov 21, 2024

Choose a reason for hiding this comment

Add catalog queries to `lksearch` #23

Add catalog queries to `lksearch` #23

christinahedges commented Jul 29, 2024 •

edited

Loading

christinahedges commented Jul 30, 2024 •

edited

Loading

tylerapritchard commented Jul 31, 2024 •

edited

Loading

christinahedges commented Aug 5, 2024 •

edited

Loading