torch_geometric.datasets.PCQM4Mv2
- class PCQM4Mv2(root: str, split: str = 'train', transform: Optional[Callable] = None, backend: str = 'sqlite', from_smiles: Optional[Callable] = None)[source]
Bases:
OnDiskDatasetThe PCQM4Mv2 dataset from the “OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs” paper.
PCQM4Mv2is a quantum chemistry dataset originally curated under the PubChemQC project. The task is to predict the DFT-calculated HOMO-LUMO energy gap of molecules given their 2D molecular graphs.Note
This dataset uses the
OnDiskDatasetbase class to load data dynamically from disk.- Parameters:
root (str) – Root directory where the dataset should be saved.
split (str, optional) – If
"train", loads the training dataset. If"val", loads the validation dataset. If"test", loads the test dataset. If"holdout", loads the holdout dataset. (default:"train")transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before every access. (default:None)backend (str) – The
Databasebackend to use. (default:"sqlite")from_smiles (callable, optional) – A custom function that takes a SMILES string and outputs a
Dataobject. If not set, defaults tofrom_smiles(). (default:None)