data_management.scrape_bse

This script uses a web scraper to download all basis sets from the Basis Set Exchange (BSE) in the specified output format.

Usage

usage: scrape_bse.py [-h] [-o OUTFORMAT] [-g] destination

positional arguments:
  destination           Destination directory for basis set files.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTFORMAT, --outformat OUTFORMAT
                        Output format. (Default: NWChem)
  -g, --optimize_general
                        Toggle on optimizing general contractions. Default OFF.

This script creates the following files in the given destination:

+---destination
|       <all_basis_set_files>

Classes

BSEBasisSetScraper

Web scraper to download basis sets from the Basis Set Exchange (BSE).

Functions

_write_basis_set(→ None)

Write the basis set out to a file.

main(→ None)

Entry point function to generate basis set files.

parse_args(→ argparse.Namespace)

Parse command line arguments.

Module Contents

class data_management.scrape_bse.BSEBasisSetScraper(base_url: str = 'https://www.basissetexchange.org', user_agent: str = 'NWChemEx BSE Basis Set Scraper', email: str = '', format: str = 'nwchem', uncontract_general: bool = False, uncontract_segmented: bool = False, uncontract_spdf: bool = False, optimize_general: bool = False, make_general: bool = False, header_toggle: bool = True)

Web scraper to download basis sets from the Basis Set Exchange (BSE).

base_url
filtered_basis_sets
filtered_metadata
filters
valid_formats
default_header_toggle
default_make_general
default_optimize_general
default_uncontract_general
default_uncontract_segmented
default_uncontract_spdf
add_filter(metadata_key: str, values: list) None

Add a metadata filter to the basis set list and update the filtered basis set list and metadata.

This function adds filters to the list of valid basis sets contained by this class based on metadata values scraped from BSE. If filters already exist for the metadata key given, the new values will be appended to the existing filter value list. Values must match exactly!

When multiple filter values exist for a metadata key, basis sets are guaranteed to contain at least one of the filter values, but not necessarily all filter values for the metadata key. However, filter values of different metadata keys are applied sequentially, so the filtered basis sets must contain at least one of the filter values for each metadata key.

For example:

scraper.add_filter("family", ["pople", "dunning"])
scraper.add_filter("role", ["orbital", "optri"])

will filter to all basis sets that are of either the “pople” or “dunning” families, but only if they have a role of “orbital” or “optri”.

The filtered basis set names can be retrieved using the data member filtered_basis_sets or the full filtered metadata can be retrieved with filtered_metadata.

Parameters:
  • metadata_key (str) – Key for the desired value in basis set metadata.

  • values (list) – Values of the metadata to filter by.

download_basis_set(basis_name: str, elements: str = '') tuple

Download a single basis set. An optional string of elements can be provided or left empty to get all elements.

Parameters:
  • basis_name (str) – BSE basis set name identifier.

  • elements (str, optional) – Comma-separated string of atomic numbers, defaults to “”

Raises:

RuntimeError – Basis set could not be obtained from BSE.

Returns:

Basis set name cleaned to be a file name and the text for the basis set file.

Return type:

tuple

download_valid_basis_sets() tuple

Download the list of basis sets available from BSE.

Returns:

Collections of basis set names and metadata

Return type:

tuple of list and dict

download_valid_formats() list

Download the list of formats available from BSE.

Returns:

Collection of format names

Return type:

list

get_extension(format: str = '') str

Get the extension for the given BSE format identifier. If no format identifier is given, the class default is used.

Parameters:

format (str, optional) – BSE format identifier, defaults to “”

Returns:

Basis set file extension

Return type:

str

set_header(user_agent: str = '', email: str = '') None

Generates the header to use in requests.

Parameters:
  • user_agent (str, optional) – Description of who is pinging the BSE API, defaults to “”

  • email (str, optional) – Email to send to BSE (not shared), defaults to “”

set_default_format(format: str) None

Set the default format for basis sets.

Parameters:

format (str) – Valid BSE format identifier for basis sets.

get_filtered_basis_sets() tuple

Filter the existing valid basis sets based on metadata filters currently set in the class. This function does not change the class.

Returns:

Returns a filtered list of basis set names and the filtered metadata dict

Return type:

tuple of list and dict

validate_basis_set_name(basis_name: str) None

Validate the basis name against the list of valid basis names retrieved from BSE.

Parameters:

basis_name (str) – Name of the basis set

Raises:

RuntimeError – Invalid basis name was given.

validate_format_name(format: str) None

Validate the format name against the list of valid format names retrieved from BSE.

Parameters:

format (str) – Name of the formatting option

Raises:

RuntimeError – Invalid format option was given

_create_params(elements: str = '') dict

Create the parameter dictionary for a BSE request.

Parameters:

elements (str, optional) – Elements to retrieve bases for, defaults to “”

Returns:

Dictionary of parameter names (keys) and their values

Return type:

dict

data_management.scrape_bse._write_basis_set(destination: str, basis_name: str, basis_data: str, extension: str) None

Write the basis set out to a file.

Parameters:
  • basis_name (str) – Name of the basis set.

  • basis_data (str) – Text data to write to the basis set file

  • extension (str) – Extension for the basis set file. Must include the dot ‘.’ separator if one is needed.

data_management.scrape_bse.main(args: argparse.Namespace) None

Entry point function to generate basis set files.

Parameters:

args (Namespace) – Command line argument namespace

data_management.scrape_bse.parse_args() argparse.Namespace

Parse command line arguments.

Returns:

Values of command line arguments.

Return type:

Namespace