mzMLmeta

class mzml2isa.mzml.mzMLmeta(in_file, ontology=None, complete_parse=False)

Class to store and obtain the meta information from the mzML file

The class uses the xpaths of mzML locations and then extracts meta information at these locations. The meta info taken is determined by the ontology terms and a set of rules associated with that term e.g. if it can be repeated, if has associated software if it has a value as well as name.

tree

lxml.etree.ElementTree – the tree object created from the mzML file

ns

dict – a dictionary containing the xml namespace mapping

obo

pronto.Ontology – the ontology object

meta

dict – structured dictionary containing extracted metadata

env

dict – the environment variables, tag names that are not standards among different mzML files.

build_env()

Build the env and the ns dictionaries.

cvParam_loop(elements, location_name, terms)

Loop through the elements and eventually update the self.meta dict.

Parameters:
  • elements (iterator) – the element containing the cvParam tags
  • location_name (str) – Name of the xml location
  • terms (dict) – terms that are to be extracted
data_file_content()

Extract the Data file content from all scans.

This method is called only in the case the FileContent xml Element contained no actual cvParam elements. This was witnessed in at least one file from a Waters instrument.

derived()

Get the derived meta information

The derived meta information includes all tags that are solely based on the file name, such as MS Assay Name, Derived Spectral Data File or Sample Name.

extract_meta(terms, xpaths)

Extract meta information for CV terms based on their location in the xml file

Updates the self.meta dictionary with the relevant meta information

Parameters:
  • terms (dict) – The CV and search parameters required at the xml locations
  • xpath (dict) – the xpath locations to search

See also

cvParam_loop

merge_entries(name)

An unoptimized way of merging meta entries only made of duplicates.

This is only useful when the spectrum_meta method is called, as a way of reducing the size of some meta entries that add no interesting information (for instance, when all binary data is compressed the same way, it is useless to know that for each scan).

Parameters:name (str) – the entry to de-duplicate
Returns:the list of the list with deduplicated arguments
Return type:list

Note

Using an OrderedSet to deduplicate while preserving order may be a good idea (see http://code.activestate.com/recipes/576694/) for actual implementation

meta_isa

Returns the metadata dictionary with actual ISA headers

meta_json

Returns the metadata dictionary in json format

mzrange()

Try to extract the m/z range of all scans.

polarity()

Iterates over all scans to get the average polarity.

scan_num()

Extract the total number of scans.

software(soft_ref, name)

Get associated software of cv term. Updates the self.meta dictionary

Parameters:
  • soft_ref (str) – the reference to the software found in another xml “ref” attribute.
  • name (str) – Name of the associated CV term that the software is associated to.
spectrum_meta()

Extract information of each spectrum in entry lists.

This method is only called is the complete_parse parameters was set as True when the mzMLmeta object was created. This requires more time as iterating through hundreds of elements is bound to be more performance hungry than just a few elements. It is believed to be useful when mzml2isa is used as a parsing library.

timerange()

Try to extract the Time range of all the scans.

Time range consists in the smallest and largest time the successive scans were started. The unit (most of the time minute) will be extracted as well if possible.

urlize()

Urllize all accessions within the meta dictionary