Available Datasets =================== This page provides detailed information about the datasets available in the Network Datasets repository. Dataset Registry ---------------- All available datasets are listed in the ``registry.json`` file. Each dataset entry includes: * **name**: Unique identifier for the dataset * **version**: Dataset version number * **path**: Relative path to the dataset files * **summary**: Brief description of the dataset * **license**: License information (typically CC-BY-4.0) Current Datasets ---------------- toynet-11edges ~~~~~~~~~~~~~~ * **Version**: 1.0.0 * **License**: CC-BY-4.0 * **Path**: ``datasets/toynet-11edges/v1`` A small toy network with 8 nodes and 11 edges, designed for testing and learning purposes. **Files**: * ``nodes.json``: Node definitions with coordinates * ``edges.json``: Edge definitions connecting nodes * ``probs.json``: Edge failure probabilities **Use Cases**: * Testing algorithms and functions * Learning the data format * Quick prototyping ema-highway ~~~~~~~~~~~ * **Version**: 1.0.0 * **License**: CC-BY-4.0 * **Path**: ``datasets/ema-highway/v1`` Eastern Massachusetts benchmark highway network with nodes, edges, and probability files. **Files**: * ``nodes.json``: Highway intersection nodes * ``edges.json``: Road segments between intersections * ``probs_bin.json``: Binary failure probabilities * ``probs_mult.json``: Multi-state failure probabilities **Example reference**: Byun, J.-E., Ryu, H., & Straub, D. (2025). Branch-and-bound algorithm for efficient reliability analysis of general coherent systems. Structural Safety, 102653. **Use Cases**: * Transportation network analysis * Connectivity to critical facilities * Connectivity between communities * Emergency response planning Generated Example Collection ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``datasets/generated/`` directory contains **synthetic example datasets** produced with ``ndtools.network_generator``. These examples are intended for tutorials, quick tests, and format demonstrations. They follow the same JSON schemas as all curated datasets. .. note:: See :file:`datasets/generated/README.md` for an overview, and :file:`datasets/generated/PROVENANCE.md` for the exact commands and parameters used to generate each example. **Layout** Each example resides in its own subdirectory with versioning: .. code-block:: text generated/ grid_8x8/ v1/data/{nodes.json, edges.json, probs.json, graph.png} er_60_p005/ v1/data/{...} ws_n60_k6_b015/ v1/data/{...} ba_n60_m3/ v1/data/{...} rg_n60_r017/ v1/data/{...} config_n60_deg3/ v1/data/{...} README.md PROVENANCE.md CHANGELOG.md **What’s inside each example** - :file:`nodes.json` — map of node id → attributes (at minimum: ``x``, ``y``) - :file:`edges.json` — map of edge id → ``{from, to, directed, ...}`` - :file:`probs.json` — per-edge binary probabilities (e.g., ``"0"``=failure, ``"1"``=working) - :file:`graph.png` — (optional) auto-rendered preview **Reproducibility & provenance** Each example’s parameters (model family, size, probabilities, seed, etc.) are recorded in :file:`generated/metadata.json` inside the dataset folder and summarized across the collection in :file:`generated/PROVENANCE.md`. Regenerate or extend the collection via the CLI examples shown there. distribution-substation-liang2022 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * **Version**: 1.0.0 * **License**: CC-BY-4.0 * **Path**: ``datasets/distribution-substation-liang2022/v1`` Example 110/220 kV distribution substation network based on Liang et al. (2022). Includes nodes, edges, macrocomponents, equipment fragility, and probability files. **Files**: * ``nodes.json``: Substation nodes with coordinates and attributes * ``edges.json``: Power line connections between substations * ``probs.json``: Edge failure probabilities * ``macrocomponents.json``: Component grouping information * ``equipment.json``: Equipment fragility data **Citation**: Liang, H., Blagojevic, N., Xie, Q., & Stojadinovic, B. (2022). Seismic risk analysis of electrical substations based on the network analysis method. Earthquake Engineering & Structural Dynamics, 51(11), 2690-2707. **Use Cases**: * Power grid reliability analysis * Seismic risk assessment * Infrastructure resilience studies Data Format ----------- All datasets follow a consistent JSON format defined by JSON schemas in the ``schema/`` directory. Node Format ~~~~~~~~~~~ Nodes are stored as a JSON object where keys are node IDs and values are attribute dictionaries: .. code-block:: json { "node_id": { "x": 0.0, "y": 0.0, "type": "source", "additional_attributes": "..." } } **Required attributes**: * ``x``: X-coordinate (number) * ``y``: Y-coordinate (number) **Examples of optional attributes**: * ``type``: Node type (string) * ``group_name``: Grouping identifier (string) * ``capacity``: Capacity value (number or string) * ``unit``: Unit of measurement (string) * Any other custom attributes Edge Format ~~~~~~~~~~~ Edges are stored as a JSON object where keys are edge IDs and values are connection dictionaries: .. code-block:: json { "edge_id": { "from": "node1", "to": "node2", "directed": false, "additional_attributes": "..." } } **Required attributes**: * ``from``: Source node ID (string) * ``to``: Target node ID (string) * ``directed``: Whether edge is directed (boolean) **Examples of optional attributes**: * ``eid``: Edge identifier (string) * ``macrocomponent_type``: Component type (string) * ``length``: Edge length (number) * Any other custom attributes Probability Format ~~~~~~~~~~~~~~~~~~ Probabilities are stored as a JSON object mapping edge IDs to probability dictionaries: .. code-block:: json { "edge_id": { "0": {"p": 0.05}, "1": {"p": 0.95} } } Where, for example, ``"1"` indicates the edge could imply active/working and ``"0"` failure. **Required attributes**: * ``int``: Integer state index starting from 0 * ``p``: Probability of the state (number between 0 and 1) **Examples of optional attributes**: * ``description``: Description of the state (string) Dataset Metadata ---------------- Each dataset includes a ``dataset.yaml`` file with metadata: .. code-block:: yaml name: dataset-name version: 1.0.0 title: Human-readable title license: CC-BY-4.0 description: > Detailed description of the dataset contacts: - name: Contact Name affiliation: Institution email: contact@example.com tags: [tag1, tag2, tag3] files: nodes: data/nodes.json edges: data/edges.json probs: data/probs.json citation: | Citation information Loading Datasets ---------------- Using ndtools ~~~~~~~~~~~~~ .. code-block:: python from ndtools.io import dataset_paths, load_json from pathlib import Path # Get dataset paths nodes_path, edges_path, probs_path = dataset_paths( Path('datasets'), 'dataset_name', 'v1' ) # Load data nodes = load_json(nodes_path) edges = load_json(edges_path) probs = load_json(probs_path) Direct Loading ~~~~~~~~~~~~~~ .. code-block:: python import json from pathlib import Path dataset_path = Path("datasets/dataset_name/v1/data") with open(dataset_path / "nodes.json") as f: nodes = json.load(f) with open(dataset_path / "edges.json") as f: edges = json.load(f) with open(dataset_path / "probs.json") as f: probs = json.load(f) Validation ---------- All datasets can be validated against their schemas: .. code-block:: bash # Validate all datasets python data_validate.py --root . # Validate specific dataset python data_validate.py --root . --dataset dataset-name Adding New Datasets ------------------- To add a new dataset to the repository: 1. Create a new directory following the naming convention: ``dataset_name/v1/`` (⚠️ **Don’t use hyphens (`-`)** — use **underscores (`_`)** in dataset names.) 2. Add your data files in the ``data/`` subdirectory 3. Create a ``dataset.yaml`` metadata file 4. Update the ``registry.json`` file 5. Validate your dataset using the provided validation tools See the :doc:`contributing` page for detailed instructions.