Documents

  • Metadata checker logfiles:

  • Solar Orbiter Observing plans (SOOPS)

Main issue for SWA-PAS datasets

The main issue for SWA-PAS CDF datasets concerns CDF variable names.

Extracted from Solar orbiter metadata dictionnary V.2.5

3.2.1.1 General conventions

The general conventions for the CDF variables for Solar Orbiter are provided in the following list:

*  CDF variable description and naming conventions shall be compliant with the ISTP
guidelines. In addition, CDF variable names shall contain capital letters only and shall not exceed 63 characters.

This rule is a good one, but unfortunately, we didn’t respect it when creating our first CDF files.

We are using a mix of uppercase, lowercase or capitalized variable names.

As an example, for solo_L1_swa-pas-3d dataset:

Variables: [
        'Epoch',
        'Duration', 'CCSDS_time', 'SCET',
        'SOURCE', 'SAMPLE', 'NB_SAMPLE',
        'FIRST_ENERGY', 'NB_ENERGY',
        'FIRST_ELEVATION', 'NB_ELEVATION',
        'FIRST_CEM', 'NB_CEM',
        'INFO', 'SCHEME', 'FULL_3D', 'COMPRESSED',
        'MAX_CNT_ENERGY', 'MAX_CNT_ELEVATION', 'MAX_CNT_CEM',
        'NB_K', 'K',
        'COUNTS',
        'Energy', 'delta_p_Energy', 'delta_m_Energy',
        'Azimuth', 'Elevation',
        'delta_Azimuth', 'delta_Elevation']

Implications

We can easily modify our variable names to put them in uppercase.

It will imply a full reprocessing and delivery of all PAS L1/L2 CDFs for the whole mission, but anyway, it will be the case as for any update of CDF metadata

We will also have to modify/update all the softwares that use these CDF files :

  • produce_L1

    C software that create PAS L1 CDFs from L0 telemetry files

  • produce_L2

    python software that create PAS L2 CDFs from L1 ones

  • produce_L3

    IDL + python software that create PAS L3 CDFs from PAS L2 swa-pas-vdf ones

  • tools

    Various python tools to manipulate PAS CDFs (plots, checks, statistics…)

  • cl software :

    IDL software written by Emmanuel Penou to plot our Solar Orbiter datasets and many other experiments

  • AMDA software

    Multi-mission analysis tools written by IRAP to plot our datasets

  • SOAR

    Probably some implications in the SOAR SQL databases that describe SWA-PAS datasets

Most of these modifications are easy to do (replacing CDF varnames by their uppercase value), but the various softwares will probably not accept to work with a mix of older/newer datasets.

⇒ it will be necessary to switch at a given time the whole PAS CDF files and softwares, from older to newer ones

⇒ not so easy to do

Metadata checker

It should be interesting to install a copy of the CDF metadata checker on our computers, to be sure that all rules are applied, before delivering a new CDF file to MSSL/SOAR

It should avoid unnecessary return trips between IRAP and SOAR archive

Note

Is it possible to have a copy of the metadata checker ?

Is it a python tool ?

Do you think this tools will continue to evolve ?

It so, it can detect later some new discrepancies that will imply a new full delivery of our CDFs.

450 GB for PAS L1 data, 880 for PAS L2

Global attributes

Missing global attributes

  • Software_version : missing for L1 datasets

  • TEXT : missing for L1 datasets

  • TARGET_NAME

  • TARGET_CLASS

  • TARGET_REGION

  • TIME_MIN : should be extracted : Epoch[0]

  • TIME_MAX : should be extracted : Epoch[-1]

  • SOOP_NAME

  • SOOP_TYPE

  • OBS_ID

  • LEVEL : missing

  • Instrument : missing

  • Data_product : missing

  • Acknowledgement : empty

  • Parents : missing for L1 datasets, OK for L2

There is not technical issue to update these attributes

Global attributes to update

Some messages of the metadata checker are a bit strange :

Source_name is ['SOLO>Solar Orbiter']
From dict, it should be: 'SOLO>Solar Orbiter'

Instrument is ['SWA-PAS>Solar Wind Analyser-Proton Alpha Sensor']
From dict, it should be: 'SWA-PAS>Solar Wind Analyser Proton Alpha Sensor'

I don’t understand these messages. Is is OK or not?

Some attributes are to be updated:

  • Descriptor

Descriptor is ['SWA-PAS>Solar Wind Analyser / Proton-Alpha Sensor']
From dict, it should be: 'SWA-PAS-3D>Solar Wind Analyser, Proton Alpha Sensor, etc'

Some attributes are logically dependent :

  • CDF filename = "solo_L2_swa-pas-vdf_20231127_V01.cdf"

  • Logical_file_id = "solo_L2_swa-pas-vdf_20231127_V01"

  • Logical_source = "solo_L2_swa-pas-vdf"

  • Instrument = "SWA-PAS>Solar Wind Analyser, Proton-Alpha Sensor"

  • Data_type = "L2> Level 2 Data"

  • LEVEL = "L2>Level 2 Data"

  • Data_product = "VDF>Velocity Distibution function"

  • Descriptor = "SWA-PAS-VDF>SWA PAS Velocity distribution function"

Was not so clear when reading the metadata standard document.

We have to check/update all these global attributes for each L1/L2/L3 datasets

CDF variables

Missing CDF variables

  • QUALITY_FLAG

Currently we have no QUALITY_FLAG variable in our CDFs, but a variable quality_factor in our L2 datasets

It’s planned to add QUALITY_FLAG in L2 files (computed from quality_factor)

Note

what to do with PAS L1 files?

Add QUALITY_FLAG with a default value?

Try to define a real quality flag?

  • QUALITY_BITMASK

We are not currently using this QUALITY_BITMASK value.

Note

Do we have to add this QUALITY_BITMASK, with a default 0 value?

And use some of these bits later…

Variable attributes

We have a lot of attributes that will have to be updated or added.

  • SI_CONVERSION

  • VAR_NOTES

  • FORMAT

  • DISPLAY_TYPE

We have to feed these attributes…

Technical considerations

The use of CDF files implies a full reprocessing of all the CDF files for the whole mission for any modification of the metadata.

We have to create a new CDF file, incrementing its version number, and deliver this new CDF file to MSSL, then to SOAR.

Reprocessing/patching

Internally, we can easily modify the content of a CDF metadata, WITHOUT creating a new CDF file.

e.g
from spacepy import pycdf
from datetime import datetime

filename = "solo_L1_swa-pas-mom_20231111_V01.cdf"

with pycdf.CDF (filename, readonly = False) as cdf:

        # add/modify a CDF global attribute

        cdf.attrs["NEW_ATTRIBUTE"] = "add some text here"

        cdf.attrs["Generation_date"] = datetime.now().isoformat(timespec="seconds")

        # add/modify some variable attribute

        cdf["Epoch"].attrs["VAR_NOTES"] = "add another text here"

        # add a new variable

        cdf.new ("QUALITY_FLAG", recVary = True, type = pycdf.const.CDF_INT)

        cdf["QUALITY_FLAG"] = [0] * len (cdf["Epoch"])

        # rename a variable (uppercase)

        cdf["Epoch"].rename ("EPOCH")


# Modification are written in the original file
Note

Do you think possible to apply such kind to software patch in SOAR CDF files, on a given dataset, to make some minor updates?

It could avoid redelivery of one or more datasets for the whole mission.

Otherwise, we have to make a copy of each CDF files, update the copy and deliver the new one to MSSL ⇒ SOAR

$ cp solo_L1_swa-pas-mom_20231111_V01.cdf solo_L1_swa-pas-mom_20231111_V02.cdf

$ update_metadata solo_L1_swa-pas-mom_20231111_V02.cdf

$ PUT_MSSL solo_L1_swa-pas-mom_20231111_V02.cdf