mzMLmeta(in_file, ontology=None, complete_parse=False)¶
Class to store and obtain the meta information from the mzML file
The class uses the xpaths of mzML locations and then extracts meta information at these locations. The meta info taken is determined by the ontology terms and a set of rules associated with that term e.g. if it can be repeated, if has associated software if it has a value as well as name.
lxml.etree.ElementTree – the tree object created from the mzML file
dict – a dictionary containing the xml namespace mapping
pronto.Ontology – the ontology object
dict – structured dictionary containing extracted metadata
dict – the environment variables, tag names that are not standards among different mzML files.
Build the env and the ns dictionaries.
cvParam_loop(elements, location_name, terms)¶
Loop through the elements and eventually update the self.meta dict.
Extract the Data file content from all scans.
This method is called only in the case the FileContent xml Element contained no actual cvParam elements. This was witnessed in at least one file from a Waters instrument.
Get the derived meta information
The derived meta information includes all tags that are solely based on the file name, such as MS Assay Name, Derived Spectral Data File or Sample Name.
Extract meta information for CV terms based on their location in the xml file
Updates the self.meta dictionary with the relevant meta information
An unoptimized way of merging meta entries only made of duplicates.
This is only useful when the
spectrum_metamethod is called, as a way of reducing the size of some meta entries that add no interesting information (for instance, when all binary data is compressed the same way, it is useless to know that for each scan).
Parameters: name (str) – the entry to de-duplicate Returns: the list of the list with deduplicated arguments Return type: list
Using an OrderedSet to deduplicate while preserving order may be a good idea (see http://code.activestate.com/recipes/576694/) for actual implementation
Returns the metadata dictionary with actual ISA headers
Returns the metadata dictionary in json format
Try to extract the m/z range of all scans.
Iterates over all scans to get the average polarity.
Extract the total number of scans.
Get associated software of cv term. Updates the self.meta dictionary
Extract information of each spectrum in entry lists.
This method is only called is the complete_parse parameters was set as True when the mzMLmeta object was created. This requires more time as iterating through hundreds of elements is bound to be more performance hungry than just a few elements. It is believed to be useful when mzml2isa is used as a parsing library.
Try to extract the Time range of all the scans.
Time range consists in the smallest and largest time the successive scans were started. The unit (most of the time minute) will be extracted as well if possible.
Urllize all accessions within the meta dictionary