Methinks you guys changed something in how you get AWS credential on GeoLab when you went to the production version. A prototype access I’d used for using s3 data in the mspass course that ran on previously does not longer works. I get an error that says my credentials are invalid - a summary of a typically very long python exception dump.
Previously I used a thing Amazon calls an “access point” and the following incantation I got from your documentation pages:
That approach does not seem to work as the client I create that way is throwing an error when I just run list_objects_v2 on a name I know worked before:
“miniseed/AZ/2010/001/”.
Gave this a try and after digging into the documentation a bit I see why that might be useful for some applications. It really really adds a lot of complexity to something relatively simple.
My next question is do I have to now use the “AsyncEarthscopeClient” or is there a way to instantiate a standard s3 client to get slower access? As I read it this beast is used to allow one to work on chunks of data as they come in. I can see why you’d want to do that, but I presume you guys are using that feature to filter only selected miniseed packets from an s3 object as it comes down. Have fun with that, but it adds a lot of complexity that I do not want to inflict on students in the MsPASS class next month.
I find a quick solution that might work as a hack fix for this upcoming class is to use the “await” command before any s3 access request. I’m told there is a 1000 component limit on what can be returned in a list_objects_v2 request with the async client. I don’t think there is any network on earth at this point that has 1000 stations running on one day except maybe some node experiments. Not sure if those are in the miniseed archives anyway. I’ll worry about that later.
Key next question is if I have to use the async client or if there is way to access the s3 archives with a blocking s3 client?
Note, I did manage to hack this to seem to now work. I had to add “await” in multiple places and change function name from “def” to “async def”.
Anyone reading this - this is no place for rookies. Stay away unless you are a python expert and have a deep understanding of computer IO interaction with a program.
An update to that last comment I made for anyone in the community reading this. I, at least, has never seen the “async” and “await” keywords that Sophia’s example above uses. They are a relatively new feature in python. Their purpose is to improve performance of IO intensive scripts. After studying this problem more and working with the new AsyncEarthscopeClient for interacting with s3 data, I can endorse their decision to add that complication.
On the other hand, my warning about that “this is no place for rookies” has been strongly reinforced by this experience. This really really complicates any algorithm using an async feature like the s3 client. The reasons are many and somewhat subtle, in my opinion, but they are very real. I have this recommendation for anyone who has to dive into this problem: ask our favorite AI to give you a summary of how to use “async” and “await” when using an s3 client. That is a good start as the documentation is overwhelming otherwise.
The other key point to Earthscope is you need to seriously think about how construct a simple API for accessing the seismic archives. Not many people are going to be able to deal with this, although maybe with AI help I’m wrong about that. The MsPASS team is going to work on doing one version of that that meshes with how MsPASS works. I’ll be using a prototype in the upcoming MsPASS course, but it will need some refining and won’t be stable for a few months. Students in the course should still find it helpful and a lot easier than using the primitives.