SOAR Metadata report for SWA-PAS datasets ========================================= == Documents * Metadata checker logfiles: link:/documents/METADATA/REPORT/pas[] * Solar Orbiter Observing plans (SOOPS) https://www.cosmos.esa.int/web/soar/soops[] == Main issue for SWA-PAS datasets The main issue for SWA-PAS CDF datasets concerns CDF variable names. Extracted from Solar orbiter metadata dictionnary V.2.5 ~~~~ 3.2.1.1 General conventions The general conventions for the CDF variables for Solar Orbiter are provided in the following list: * CDF variable description and naming conventions shall be compliant with the ISTP guidelines. In addition, CDF variable names shall contain capital letters only and shall not exceed 63 characters. ~~~~ This rule is a good one, but unfortunately, we didn't respect it when creating our first CDF files. We are using a mix of uppercase, lowercase or capitalized variable names. As an example, for solo_L1_swa-pas-3d dataset: ---- Variables: [ 'Epoch', 'Duration', 'CCSDS_time', 'SCET', 'SOURCE', 'SAMPLE', 'NB_SAMPLE', 'FIRST_ENERGY', 'NB_ENERGY', 'FIRST_ELEVATION', 'NB_ELEVATION', 'FIRST_CEM', 'NB_CEM', 'INFO', 'SCHEME', 'FULL_3D', 'COMPRESSED', 'MAX_CNT_ENERGY', 'MAX_CNT_ELEVATION', 'MAX_CNT_CEM', 'NB_K', 'K', 'COUNTS', 'Energy', 'delta_p_Energy', 'delta_m_Energy', 'Azimuth', 'Elevation', 'delta_Azimuth', 'delta_Elevation'] ---- === Implications We can easily modify our variable names to put them in uppercase. It will imply a full reprocessing and delivery of all PAS L1/L2 CDFs for the whole mission, but anyway, it will be the case as for any update of CDF metadata We will also have to modify/update all the softwares that use these CDF files : * produce_L1 + -- C software that create PAS L1 CDFs from L0 telemetry files -- * produce_L2 + -- python software that create PAS L2 CDFs from L1 ones -- * produce_L3 + -- IDL + python software that create PAS L3 CDFs from PAS L2 swa-pas-vdf ones -- * tools + -- Various python tools to manipulate PAS CDFs (plots, checks, statistics...) -- * cl software : + -- IDL software written by Emmanuel Penou to plot our Solar Orbiter datasets and many other experiments -- * AMDA software + -- Multi-mission analysis tools written by IRAP to plot our datasets -- * SOAR + -- Probably some implications in the SOAR SQL databases that describe SWA-PAS datasets -- Most of these modifications are easy to do (replacing CDF varnames by their uppercase value), but the various softwares will probably not accept to work with a mix of older/newer datasets. => it will be necessary to switch at a given time the whole PAS CDF files and softwares, from older to newer ones => not so easy to do == Metadata checker It should be interesting to install a copy of the CDF metadata checker on our computers, to be sure that all rules are applied, before delivering a new CDF file to MSSL/SOAR It should avoid unnecessary return trips between IRAP and SOAR archive [NOTE] -- Is it possible to have a copy of the metadata checker ? Is it a python tool ? -- Do you think this tools will continue to evolve ? It so, it can detect later some new discrepancies that will imply a new full delivery of our CDFs. 450 GB for PAS L1 data, 880 for PAS L2 == Global attributes === Missing global attributes * Software_version : missing for L1 datasets * TEXT : missing for L1 datasets * TARGET_NAME * TARGET_CLASS * TARGET_REGION * TIME_MIN : should be extracted : Epoch[0] * TIME_MAX : should be extracted : Epoch[-1] * SOOP_NAME * SOOP_TYPE * OBS_ID * LEVEL : missing * Instrument : missing * Data_product : missing * Acknowledgement : empty * Parents : missing for L1 datasets, OK for L2 There is not technical issue to update these attributes === Global attributes to update Some messages of the metadata checker are a bit strange : ---- Source_name is ['SOLO>Solar Orbiter'] From dict, it should be: 'SOLO>Solar Orbiter' Instrument is ['SWA-PAS>Solar Wind Analyser-Proton Alpha Sensor'] From dict, it should be: 'SWA-PAS>Solar Wind Analyser Proton Alpha Sensor' ---- I don't understand these messages. Is is OK or not? Some attributes are to be updated: * Descriptor ---- Descriptor is ['SWA-PAS>Solar Wind Analyser / Proton-Alpha Sensor'] From dict, it should be: 'SWA-PAS-3D>Solar Wind Analyser, Proton Alpha Sensor, etc' ---- Some attributes are logically dependent : * CDF filename = "solo_L2_swa-pas-vdf_20231127_V01.cdf" * Logical_file_id = "solo_L2_swa-pas-vdf_20231127_V01" * Logical_source = "solo_L2_swa-pas-vdf" * Instrument = "SWA-PAS>Solar Wind Analyser, Proton-Alpha Sensor" * Data_type = "L2> Level 2 Data" * LEVEL = "L2>Level 2 Data" * Data_product = "VDF>Velocity Distibution function" * Descriptor = "SWA-PAS-VDF>SWA PAS Velocity distribution function" Was not so clear when reading the metadata standard document. We have to check/update all these global attributes for each L1/L2/L3 datasets == CDF variables === Missing CDF variables * QUALITY_FLAG Currently we have no QUALITY_FLAG variable in our CDFs, but a variable quality_factor in our L2 datasets It's planned to add QUALITY_FLAG in L2 files (computed from quality_factor) [NOTE] -- what to do with PAS L1 files? Add QUALITY_FLAG with a default value? Try to define a real quality flag? -- * QUALITY_BITMASK We are not currently using this QUALITY_BITMASK value. [NOTE] -- Do we have to add this QUALITY_BITMASK, with a default 0 value? And use some of these bits later... -- == Variable attributes We have a lot of attributes that will have to be updated or added. * SI_CONVERSION * VAR_NOTES * FORMAT * DISPLAY_TYPE We have to feed these attributes... == Technical considerations The use of CDF files implies a full reprocessing of all the CDF files for the whole mission for any modification of the metadata. We have to create a new CDF file, incrementing its version number, and deliver this new CDF file to MSSL, then to SOAR. === Reprocessing/patching Internally, we can easily modify the content of a CDF metadata, WITHOUT creating a new CDF file. .e.g [source, python] ---- from spacepy import pycdf from datetime import datetime filename = "solo_L1_swa-pas-mom_20231111_V01.cdf" with pycdf.CDF (filename, readonly = False) as cdf: # add/modify a CDF global attribute cdf.attrs["NEW_ATTRIBUTE"] = "add some text here" cdf.attrs["Generation_date"] = datetime.now().isoformat(timespec="seconds") # add/modify some variable attribute cdf["Epoch"].attrs["VAR_NOTES"] = "add another text here" # add a new variable cdf.new ("QUALITY_FLAG", recVary = True, type = pycdf.const.CDF_INT) cdf["QUALITY_FLAG"] = [0] * len (cdf["Epoch"]) # rename a variable (uppercase) cdf["Epoch"].rename ("EPOCH") # Modification are written in the original file ---- [NOTE] -- Do you think possible to apply such kind to software patch in SOAR CDF files, on a given dataset, to make some minor updates? It could avoid redelivery of one or more datasets for the whole mission. -- Otherwise, we have to make a copy of each CDF files, update the copy and deliver the new one to MSSL => SOAR ---- $ cp solo_L1_swa-pas-mom_20231111_V01.cdf solo_L1_swa-pas-mom_20231111_V02.cdf $ update_metadata solo_L1_swa-pas-mom_20231111_V02.cdf $ PUT_MSSL solo_L1_swa-pas-mom_20231111_V02.cdf ----