data_management.scrape_bse
This script uses a web scraper to download all basis sets from the Basis Set Exchange (BSE) in the specified output format.
Usage
usage: scrape_bse.py [-h] [-o OUTFORMAT] [-g] destination
positional arguments:
destination Destination directory for basis set files.
optional arguments:
-h, --help show this help message and exit
-o OUTFORMAT, --outformat OUTFORMAT
Output format. (Default: NWChem)
-g, --optimize_general
Toggle on optimizing general contractions. Default OFF.
This script creates the following files in the given destination:
+---destination
| <all_basis_set_files>
Classes
Web scraper to download basis sets from the Basis Set Exchange (BSE). |
Functions
|
Write the basis set out to a file. |
|
Entry point function to generate basis set files. |
|
Parse command line arguments. |
Module Contents
- class data_management.scrape_bse.BSEBasisSetScraper(base_url: str = 'https://www.basissetexchange.org', user_agent: str = 'NWChemEx BSE Basis Set Scraper', email: str = '', format: str = 'nwchem', uncontract_general: bool = False, uncontract_segmented: bool = False, uncontract_spdf: bool = False, optimize_general: bool = False, make_general: bool = False, header_toggle: bool = True)
Web scraper to download basis sets from the Basis Set Exchange (BSE).
- base_url
- filtered_basis_sets
- filtered_metadata
- filters
- valid_formats
- default_header_toggle
- default_make_general
- default_optimize_general
- default_uncontract_general
- default_uncontract_segmented
- default_uncontract_spdf
- add_filter(metadata_key: str, values: list) None
Add a metadata filter to the basis set list and update the filtered basis set list and metadata.
This function adds filters to the list of valid basis sets contained by this class based on metadata values scraped from BSE. If filters already exist for the metadata key given, the new values will be appended to the existing filter value list. Values must match exactly!
When multiple filter values exist for a metadata key, basis sets are guaranteed to contain at least one of the filter values, but not necessarily all filter values for the metadata key. However, filter values of different metadata keys are applied sequentially, so the filtered basis sets must contain at least one of the filter values for each metadata key.
For example:
scraper.add_filter("family", ["pople", "dunning"]) scraper.add_filter("role", ["orbital", "optri"])
will filter to all basis sets that are of either the “pople” or “dunning” families, but only if they have a role of “orbital” or “optri”.
The filtered basis set names can be retrieved using the data member filtered_basis_sets or the full filtered metadata can be retrieved with filtered_metadata.
- Parameters:
metadata_key (str) – Key for the desired value in basis set metadata.
values (list) – Values of the metadata to filter by.
- download_basis_set(basis_name: str, elements: str = '') tuple
Download a single basis set. An optional string of elements can be provided or left empty to get all elements.
- Parameters:
basis_name (str) – BSE basis set name identifier.
elements (str, optional) – Comma-separated string of atomic numbers, defaults to “”
- Raises:
RuntimeError – Basis set could not be obtained from BSE.
- Returns:
Basis set name cleaned to be a file name and the text for the basis set file.
- Return type:
tuple
- download_valid_basis_sets() tuple
Download the list of basis sets available from BSE.
- Returns:
Collections of basis set names and metadata
- Return type:
tuple of list and dict
- download_valid_formats() list
Download the list of formats available from BSE.
- Returns:
Collection of format names
- Return type:
list
- get_extension(format: str = '') str
Get the extension for the given BSE format identifier. If no format identifier is given, the class default is used.
- Parameters:
format (str, optional) – BSE format identifier, defaults to “”
- Returns:
Basis set file extension
- Return type:
str
- set_header(user_agent: str = '', email: str = '') None
Generates the header to use in requests.
- Parameters:
user_agent (str, optional) – Description of who is pinging the BSE API, defaults to “”
email (str, optional) – Email to send to BSE (not shared), defaults to “”
- set_default_format(format: str) None
Set the default format for basis sets.
- Parameters:
format (str) – Valid BSE format identifier for basis sets.
- get_filtered_basis_sets() tuple
Filter the existing valid basis sets based on metadata filters currently set in the class. This function does not change the class.
- Returns:
Returns a filtered list of basis set names and the filtered metadata dict
- Return type:
tuple of list and dict
- validate_basis_set_name(basis_name: str) None
Validate the basis name against the list of valid basis names retrieved from BSE.
- Parameters:
basis_name (str) – Name of the basis set
- Raises:
RuntimeError – Invalid basis name was given.
- validate_format_name(format: str) None
Validate the format name against the list of valid format names retrieved from BSE.
- Parameters:
format (str) – Name of the formatting option
- Raises:
RuntimeError – Invalid format option was given
- _create_params(elements: str = '') dict
Create the parameter dictionary for a BSE request.
- Parameters:
elements (str, optional) – Elements to retrieve bases for, defaults to “”
- Returns:
Dictionary of parameter names (keys) and their values
- Return type:
dict
- data_management.scrape_bse._write_basis_set(destination: str, basis_name: str, basis_data: str, extension: str) None
Write the basis set out to a file.
- Parameters:
basis_name (str) – Name of the basis set.
basis_data (str) – Text data to write to the basis set file
extension (str) – Extension for the basis set file. Must include the dot ‘.’ separator if one is needed.
- data_management.scrape_bse.main(args: argparse.Namespace) None
Entry point function to generate basis set files.
- Parameters:
args (Namespace) – Command line argument namespace
- data_management.scrape_bse.parse_args() argparse.Namespace
Parse command line arguments.
- Returns:
Values of command line arguments.
- Return type:
Namespace