-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geoportal Facets app and DCAT in 1.2.9 #313
Comments
hi, I recommend switching to Geoportal Server 2.x. This is a new 'generation' of Geoportal Server, based on elastic and implementing configurable faceted search as its starting point. You can find the application in its own GitHub repository. |
Thanks @mhogeweg We would like to switch, but we have a front-end application that is dependent upon the v.1.2.x architecture including the (optional) solr index. At this time we do not have the resources to rewrite this front-end app, so I need to try to make the latest 1.2.x version work for now. Hopefully we can migrate the app in the future to your new architecture. Any assistance with debugging this issue or pointers is greatly appreciated. |
is your site public by chance? |
The front-end production website is here: http://portal.westcoastoceans.org/discover/ If you mean the upgraded geoportal 1.2.9 site with solr that I'm trying to debug, I'm experimenting with that on our dev server and it's not yet hooked up to the front-end. |
what seems to break is the link to the xml. for example for the first entry on the page above the links are:
the XML link points to 127.0.0.1, which would be my machine. also the link in the solrjson response I suggest checking the configuration and see what I see this page has about 1600 items, while the vanilla gpt site has some 2100. did you follow step 7 and have the GcService web app deployed? |
Thanks @mhogeweg. Glad to see gc129 site is working on your end -- I'll try to do some more debugging on this end. I copied the configuration from our existing site, and noticed that they are set up to point to local host, which I assumed was intended. I'm not too worried about the links not working since we are not using that aspect, but I could change it to the main (dev) URL: http://207.141.116.172 The discrepancy that you see between the Geoportal-Solr page and the vanilla gpt site is what I'm trying to debug. That difference is equal to the number of records that were pulled from the DCAT source: http://geo.wa.gov/data.json (WA Geospatial Open Data Portal) I followed step 7 and deployed a new gc service web app to go with this geoportal instance, and named it gc129 (instead of GcService). And, it is successfully working as of yesterday and as it spun up I could see the count of indexed files increasing until it hit 1585. |
This app was created before we harvested DCAT. the app takes metadata and applies an xslt transformation. That transformation did not include support for DCAT as a structure. I'm making some updates and will share shortly. |
attached are two xslt that should replace the corresponding files in the folder: These transformations take the metadata in the geoportal server index and prepare them for solr. The DCAT items were not indexed as the xslt did not know how to deal with the format yet. please check with these and see if the DCAT items do get indexed (may require tomcat stop/start) |
Thank you very much @mhogeweg! This is a big help to us. I have installed the new config files and restarted tomcat, but haven't seen a change in the indexed files. Is there a way to manually do this -- I know it is scheduled via one of the config files to run in the middle of the night. |
I did not detect any changes between the dc-toSolr.xslt you provided in the zip file, and the one from the existing repo. Should there be changes in that file? or just the dc-base-toSolr.xslt |
it is just the dc-base one. the other one imports this one, so you may keep the existing one. I included it as they 'go together'. I'll check on forcing solr to reindex the content. |
I stopped tomcat, deleted all the files from the solr index (data folder), restarted tomcat and watched the solr index repopulate from 0 records and stop at 1585 again. So, unfortunately, this .xslt file change does not appear to be working for me. |
@mhogeweg I added your xslt files to my Geoportal Facets for DCAT and solr is still not indexing the DCAT entries. Any suggestions? |
I'm going to look into this a bit more. I harvested the geoportal129 site into our geoportal 2 sandbox: http://geoss.esri.com/geoportal2/#. If you open the 'source of origin' facet, you'll see your ip address listed with 1477 documents. My harvested indicated that 651 docs failed to publish (total 2128 retrieved). I'll try to understand why so many failed (likely a validation issue). Do you see any errors in your solr logs? |
Thanks @mhogeweg. There are no solr errors in the tomcat (Catalina) logs. Is there another set of logs I should check? We have customized or site a bit as far as validation to loosen it up a bit, so maybe that's the reason for validation failing on your side. (?) |
I have deployed the geoportal facets application (solr v.4.1.0) to index a geoportal v.1.2.9 database. It is not indexing all the records, and it appears to be missing new records harvested from a DCAT source. I followed the instructions from the wiki here: https://github.com/Esri/geoportal-server/wiki/Geoportal-Facets-using-Apache-Solr
Any suggestions for debugging this issue? Is there a configuration that I'm missing?
The text was updated successfully, but these errors were encountered: