Accesing s3 miniseed data

pavlis · June 11, 2026, 11:49am

Methinks you guys changed something in how you get AWS credential on GeoLab when you went to the production version. A prototype access I’d used for using s3 data in the mspass course that ran on previously does not longer works. I get an error that says my credentials are invalid - a summary of a typically very long python exception dump.

Previously I used a thing Amazon calls an “access point” and the following incantation I got from your documentation pages:

import boto3
from botocore.config import Config
from earthscope_sdk import EarthScopeClient

from s3_worker_plugin import fetch_s3_client

client = EarthScopeClient()
creds = client.user.get_aws_credentials()

S3_ACCESS_POINT = "earthscope-mseed-res-na3mtd4fq5kz7pntcyr1uh46use2a--ol-s3"
BUCKET = S3_ACCESS_POINT

session = boto3.Session(
    aws_access_key_id=creds.aws_access_key_id,
    aws_secret_access_key=creds.aws_secret_access_key,
    aws_session_token=creds.aws_session_token,
)

s3_client = fetch_s3_client(session)

That approach does not seem to work as the client I create that way is throwing an error when I just run list_objects_v2 on a name I know worked before:
“miniseed/AZ/2010/001/”.

I found this new section of your documentation that suggests to me the old method no longer is valid.

The prototype s3 access I created a while back for the MsPASS course is broken until we can resolve this. This is likely something simple to fix.

sophia.parafina · June 11, 2026, 4:30pm

The SDK has been updated to make reading data from S3 faster and more efficient using async. Here’s an example of how to create a session.

from earthscope_sdk import AsyncEarthScopeClient
from botocore.config import Config

es = AsyncEarthScopeClient()

session = await es.user.get_aioboto3_session()
s3_client = await session.client(
    "s3",
    config=Config(
        # checksum verification doesn't work with S3 object lambda
        response_checksum_validation="when_required",
    ),
).__aenter__()

Hope that helps,

sophia

pavlis · June 11, 2026, 6:27pm

Well that explains why my notebook doesn’t work anymore. Do I still need the S#_ACCESS_POINT incantation to define the bucket name? I would think so.

I’ll give this a try and give you an update.

pavlis · June 11, 2026, 7:40pm

Gave this a try and after digging into the documentation a bit I see why that might be useful for some applications. It really really adds a lot of complexity to something relatively simple.

My next question is do I have to now use the “AsyncEarthscopeClient” or is there a way to instantiate a standard s3 client to get slower access? As I read it this beast is used to allow one to work on chunks of data as they come in. I can see why you’d want to do that, but I presume you guys are using that feature to filter only selected miniseed packets from an s3 object as it comes down. Have fun with that, but it adds a lot of complexity that I do not want to inflict on students in the MsPASS class next month.

I find a quick solution that might work as a hack fix for this upcoming class is to use the “await” command before any s3 access request. I’m told there is a 1000 component limit on what can be returned in a list_objects_v2 request with the async client. I don’t think there is any network on earth at this point that has 1000 stations running on one day except maybe some node experiments. Not sure if those are in the miniseed archives anyway. I’ll worry about that later.

Key next question is if I have to use the async client or if there is way to access the s3 archives with a blocking s3 client?

pavlis · June 11, 2026, 7:49pm

Note, I did manage to hack this to seem to now work. I had to add “await” in multiple places and change function name from “def” to “async def”.

Anyone reading this - this is no place for rookies. Stay away unless you are a python expert and have a deep understanding of computer IO interaction with a program.

pavlis · June 13, 2026, 12:41pm

An update to that last comment I made for anyone in the community reading this. I, at least, has never seen the “async” and “await” keywords that Sophia’s example above uses. They are a relatively new feature in python. Their purpose is to improve performance of IO intensive scripts. After studying this problem more and working with the new AsyncEarthscopeClient for interacting with s3 data, I can endorse their decision to add that complication.

On the other hand, my warning about that “this is no place for rookies” has been strongly reinforced by this experience. This really really complicates any algorithm using an async feature like the s3 client. The reasons are many and somewhat subtle, in my opinion, but they are very real. I have this recommendation for anyone who has to dive into this problem: ask our favorite AI to give you a summary of how to use “async” and “await” when using an s3 client. That is a good start as the documentation is overwhelming otherwise.

The other key point to Earthscope is you need to seriously think about how construct a simple API for accessing the seismic archives. Not many people are going to be able to deal with this, although maybe with AI help I’m wrong about that. The MsPASS team is going to work on doing one version of that that meshes with how MsPASS works. I’ll be using a prototype in the upcoming MsPASS course, but it will need some refining and won’t be stable for a few months. Students in the course should still find it helpful and a lot easier than using the primitives.

Topic		Replies	Views
Welcome to Help & Troubleshooting Help & Troubleshooting how-to , welcome	0	16	May 27, 2026
Getting Started with GeoLab Getting Started login , how-to , welcome	0	18	May 28, 2026
Python Package for LLM Agent FDSN Queries Community & Cool Finds how-to	0	3	June 11, 2026
Setting up an image to use dask Help & Troubleshooting environment	5	8	June 15, 2026
Start Here: Welcome to the GeoLab Community GeoLab how-to , welcome	0	17	May 28, 2026

Accesing s3 miniseed data

Related topics