SOAR Metadata report for SWA-PAS datasets
=========================================

== Documents

* Metadata checker logfiles:

link:/documents/METADATA/REPORT/pas[]

* Solar Orbiter Observing plans (SOOPS)

https://www.cosmos.esa.int/web/soar/soops[]

== Main issue for SWA-PAS datasets

The main issue for SWA-PAS CDF datasets concerns CDF variable names.

Extracted from Solar orbiter metadata dictionnary V.2.5

~~~~
3.2.1.1 General conventions

The general conventions for the CDF variables for Solar Orbiter are provided in the following list:

*  CDF variable description and naming conventions shall be compliant with the ISTP
guidelines. In addition, CDF variable names shall contain capital letters only and shall not exceed 63 characters.
~~~~

This rule is a good one, but unfortunately, we didn't respect it when creating our first CDF files.

We are using a mix of uppercase, lowercase or capitalized variable names.

As an example, for solo_L1_swa-pas-3d dataset:

----
Variables: [
	'Epoch', 
	'Duration', 'CCSDS_time', 'SCET', 
	'SOURCE', 'SAMPLE', 'NB_SAMPLE', 
	'FIRST_ENERGY', 'NB_ENERGY', 
	'FIRST_ELEVATION', 'NB_ELEVATION',
	'FIRST_CEM', 'NB_CEM',
	'INFO', 'SCHEME', 'FULL_3D', 'COMPRESSED',
	'MAX_CNT_ENERGY', 'MAX_CNT_ELEVATION', 'MAX_CNT_CEM',
	'NB_K', 'K',
	'COUNTS',
	'Energy', 'delta_p_Energy', 'delta_m_Energy',
	'Azimuth', 'Elevation',
	'delta_Azimuth', 'delta_Elevation'] 
----

=== Implications

We can easily modify our variable names to put them in uppercase.

It will imply a full reprocessing and delivery of all PAS L1/L2 CDFs for the whole mission, 
but anyway, it will be the case as for any update of CDF metadata

We will also have to modify/update all the softwares that use these CDF files :

* produce_L1
+
--
C software that create PAS L1 CDFs from L0 telemetry files
--

* produce_L2 
+
--
python software that create PAS L2 CDFs from L1 ones
--

* produce_L3
+
--
IDL + python software that create PAS L3 CDFs from PAS L2 swa-pas-vdf ones
--

* tools
+
--
Various python tools to manipulate PAS CDFs (plots, checks, statistics...)
--

* cl software :
+
--
IDL software written by Emmanuel Penou to plot our Solar Orbiter datasets
and many other experiments
--

* AMDA software
+
--
Multi-mission analysis tools written by IRAP to plot our datasets
--

* SOAR
+
--
Probably some implications in the SOAR SQL databases that describe SWA-PAS datasets
--

Most of these modifications are easy to do (replacing CDF varnames by their uppercase value),
but the various softwares will probably not accept to work with a mix of older/newer datasets.

=> it will be necessary to switch at a given time the whole PAS CDF files and softwares, from older to newer ones

=> not so easy to do


== Metadata checker

It should be interesting to install a copy of the CDF metadata checker on our computers,
to be sure that all rules are applied, before delivering a new CDF file to MSSL/SOAR

It should avoid unnecessary return trips between IRAP and SOAR archive

[NOTE]
--
Is it possible to have a copy of the metadata checker ?

Is it a python tool ?
--

Do you think this tools will continue to evolve ?

It so, it can detect later some new discrepancies that will imply a new full delivery of our CDFs.

450 GB for PAS L1 data, 880 for PAS L2



== Global attributes

=== Missing global attributes

* Software_version : missing for L1 datasets
* TEXT : missing for L1 datasets
* TARGET_NAME
* TARGET_CLASS
* TARGET_REGION
* TIME_MIN : should be extracted : Epoch[0]
* TIME_MAX : should be extracted : Epoch[-1]
* SOOP_NAME
* SOOP_TYPE
* OBS_ID
* LEVEL : missing
* Instrument : missing
* Data_product : missing
* Acknowledgement : empty
* Parents : missing for L1 datasets, OK for L2

There is not technical issue to update these attributes

=== Global attributes to update

Some messages of the metadata checker are a bit strange :

----
Source_name is ['SOLO>Solar Orbiter']
From dict, it should be: 'SOLO>Solar Orbiter'

Instrument is ['SWA-PAS>Solar Wind Analyser-Proton Alpha Sensor']
From dict, it should be: 'SWA-PAS>Solar Wind Analyser Proton Alpha Sensor'
----

I don't understand these messages. Is is OK or not?


Some attributes are to be updated:

* Descriptor

----
Descriptor is ['SWA-PAS>Solar Wind Analyser / Proton-Alpha Sensor']
From dict, it should be: 'SWA-PAS-3D>Solar Wind Analyser, Proton Alpha Sensor, etc'
----



Some attributes are logically dependent :

* CDF filename = "solo_L2_swa-pas-vdf_20231127_V01.cdf"

* Logical_file_id = "solo_L2_swa-pas-vdf_20231127_V01"

* Logical_source = "solo_L2_swa-pas-vdf"

* Instrument = "SWA-PAS>Solar Wind Analyser, Proton-Alpha Sensor"

* Data_type = "L2> Level 2 Data"

* LEVEL = "L2>Level 2 Data"

* Data_product = "VDF>Velocity Distibution function"

* Descriptor = "SWA-PAS-VDF>SWA PAS Velocity distribution function"


Was not so clear when reading the metadata standard document.

We have to check/update all these global attributes for each L1/L2/L3 datasets


== CDF variables

=== Missing CDF variables

* QUALITY_FLAG

Currently we have no QUALITY_FLAG variable in our CDFs, but a variable quality_factor in our L2 datasets

It's planned to add QUALITY_FLAG in L2 files (computed from quality_factor)

[NOTE]
--
what to do with PAS L1 files? 

Add QUALITY_FLAG with a default value?

Try to define a real quality flag?
--

* QUALITY_BITMASK

We are not currently using this QUALITY_BITMASK value.

[NOTE]
--
Do we have to add this QUALITY_BITMASK, with a default 0 value?

And use some of these bits later...
--


== Variable attributes

We have a lot of attributes that will have to be updated or added.

* SI_CONVERSION
* VAR_NOTES
* FORMAT
* DISPLAY_TYPE

We have to feed these attributes...


== Technical considerations

The use of CDF files implies a full reprocessing of all the CDF files for the whole mission
for any modification of the metadata.

We have to create a new CDF file, incrementing its version number, and deliver this new CDF file to MSSL, then to SOAR.

=== Reprocessing/patching 

Internally, we can easily modify the content of a CDF metadata, WITHOUT creating a new CDF file.

.e.g
[source, python]
----
from spacepy import pycdf
from datetime import datetime

filename = "solo_L1_swa-pas-mom_20231111_V01.cdf"

with pycdf.CDF (filename, readonly = False) as cdf:

	# add/modify a CDF global attribute

	cdf.attrs["NEW_ATTRIBUTE"] = "add some text here"

	cdf.attrs["Generation_date"] = datetime.now().isoformat(timespec="seconds")

	# add/modify some variable attribute

	cdf["Epoch"].attrs["VAR_NOTES"] = "add another text here"

	# add a new variable

	cdf.new ("QUALITY_FLAG", recVary = True, type = pycdf.const.CDF_INT)

	cdf["QUALITY_FLAG"] = [0] * len (cdf["Epoch"])

	# rename a variable (uppercase)

	cdf["Epoch"].rename ("EPOCH")


# Modification are written in the original file
----

[NOTE]
--
Do you think possible to apply such kind to software patch in SOAR CDF files, on a given dataset, to make some minor updates?

It could avoid redelivery of one or more datasets for the whole mission.
--

Otherwise, we have to make a copy of each CDF files, update the copy and deliver the new one to MSSL => SOAR

----
$ cp solo_L1_swa-pas-mom_20231111_V01.cdf solo_L1_swa-pas-mom_20231111_V02.cdf

$ update_metadata solo_L1_swa-pas-mom_20231111_V02.cdf

$ PUT_MSSL solo_L1_swa-pas-mom_20231111_V02.cdf
----