Available Datasets
This page provides detailed information about the datasets available in the Network Datasets repository.
Dataset Registry
All available datasets are listed in the registry.json file. Each dataset entry includes:
name: Unique identifier for the dataset
version: Dataset version number
path: Relative path to the dataset files
summary: Brief description of the dataset
license: License information (typically CC-BY-4.0)
Current Datasets
toynet-11edges
Version: 1.0.0
License: CC-BY-4.0
Path:
datasets/toynet-11edges/v1
A small toy network with 8 nodes and 11 edges, designed for testing and learning purposes.
Files:
nodes.json: Node definitions with coordinatesedges.json: Edge definitions connecting nodesprobs.json: Edge failure probabilities
Use Cases:
Testing algorithms and functions
Learning the data format
Quick prototyping
ema-highway
Version: 1.0.0
License: CC-BY-4.0
Path:
datasets/ema-highway/v1
Eastern Massachusetts benchmark highway network with nodes, edges, and probability files.
Files:
nodes.json: Highway intersection nodesedges.json: Road segments between intersectionsprobs_bin.json: Binary failure probabilitiesprobs_mult.json: Multi-state failure probabilities
- Example reference:
Byun, J.-E., Ryu, H., & Straub, D. (2025). Branch-and-bound algorithm for efficient reliability analysis of general coherent systems. Structural Safety, 102653.
Use Cases:
Transportation network analysis
Connectivity to critical facilities
Connectivity between communities
Emergency response planning
Generated Example Collection
The datasets/generated/ directory contains synthetic example datasets produced with
ndtools.network_generator. These examples are intended for tutorials, quick tests,
and format demonstrations. They follow the same JSON schemas as all curated datasets.
Note
See datasets/generated/README.md for an overview, and datasets/generated/PROVENANCE.md
for the exact commands and parameters used to generate each example.
Layout
Each example resides in its own subdirectory with versioning:
generated/
grid_8x8/
v1/data/{nodes.json, edges.json, probs.json, graph.png}
er_60_p005/
v1/data/{...}
ws_n60_k6_b015/
v1/data/{...}
ba_n60_m3/
v1/data/{...}
rg_n60_r017/
v1/data/{...}
config_n60_deg3/
v1/data/{...}
README.md
PROVENANCE.md
CHANGELOG.md
What’s inside each example
nodes.json— map of node id → attributes (at minimum:x,y)edges.json— map of edge id →{from, to, directed, ...}probs.json— per-edge binary probabilities (e.g., ``”0”``=failure, ``”1”``=working)graph.png— (optional) auto-rendered preview
Reproducibility & provenance
Each example’s parameters (model family, size, probabilities, seed, etc.) are recorded in
generated/metadata.json inside the dataset folder and summarized across the collection in
generated/PROVENANCE.md. Regenerate or extend the collection via the CLI examples shown there.
distribution-substation-liang2022
Version: 1.0.0
License: CC-BY-4.0
Path:
datasets/distribution-substation-liang2022/v1
Example 110/220 kV distribution substation network based on Liang et al. (2022). Includes nodes, edges, macrocomponents, equipment fragility, and probability files.
Files:
nodes.json: Substation nodes with coordinates and attributesedges.json: Power line connections between substationsprobs.json: Edge failure probabilitiesmacrocomponents.json: Component grouping informationequipment.json: Equipment fragility data
- Citation:
Liang, H., Blagojevic, N., Xie, Q., & Stojadinovic, B. (2022). Seismic risk analysis of electrical substations based on the network analysis method. Earthquake Engineering & Structural Dynamics, 51(11), 2690-2707.
Use Cases:
Power grid reliability analysis
Seismic risk assessment
Infrastructure resilience studies
Data Format
All datasets follow a consistent JSON format defined by JSON schemas in the schema/ directory.
Node Format
Nodes are stored as a JSON object where keys are node IDs and values are attribute dictionaries:
{
"node_id": {
"x": 0.0,
"y": 0.0,
"type": "source",
"additional_attributes": "..."
}
}
Required attributes:
x: X-coordinate (number)y: Y-coordinate (number)
Examples of optional attributes:
type: Node type (string)group_name: Grouping identifier (string)capacity: Capacity value (number or string)unit: Unit of measurement (string)Any other custom attributes
Edge Format
Edges are stored as a JSON object where keys are edge IDs and values are connection dictionaries:
{
"edge_id": {
"from": "node1",
"to": "node2",
"directed": false,
"additional_attributes": "..."
}
}
Required attributes:
from: Source node ID (string)to: Target node ID (string)directed: Whether edge is directed (boolean)
Examples of optional attributes:
eid: Edge identifier (string)macrocomponent_type: Component type (string)length: Edge length (number)Any other custom attributes
Probability Format
Probabilities are stored as a JSON object mapping edge IDs to probability dictionaries:
{
"edge_id": {
"0": {"p": 0.05},
"1": {"p": 0.95}
}
}
Where, for example, ``”1”` indicates the edge could imply active/working and ``”0”` failure.
Required attributes:
int: Integer state index starting from 0p: Probability of the state (number between 0 and 1)
Examples of optional attributes:
description: Description of the state (string)
Dataset Metadata
Each dataset includes a dataset.yaml file with metadata:
name: dataset-name
version: 1.0.0
title: Human-readable title
license: CC-BY-4.0
description: >
Detailed description of the dataset
contacts:
- name: Contact Name
affiliation: Institution
email: contact@example.com
tags: [tag1, tag2, tag3]
files:
nodes: data/nodes.json
edges: data/edges.json
probs: data/probs.json
citation: |
Citation information
Loading Datasets
Using ndtools
from ndtools.io import dataset_paths, load_json
from pathlib import Path
# Get dataset paths
nodes_path, edges_path, probs_path = dataset_paths(
Path('datasets'), 'dataset_name', 'v1'
)
# Load data
nodes = load_json(nodes_path)
edges = load_json(edges_path)
probs = load_json(probs_path)
Direct Loading
import json
from pathlib import Path
dataset_path = Path("datasets/dataset_name/v1/data")
with open(dataset_path / "nodes.json") as f:
nodes = json.load(f)
with open(dataset_path / "edges.json") as f:
edges = json.load(f)
with open(dataset_path / "probs.json") as f:
probs = json.load(f)
Validation
All datasets can be validated against their schemas:
# Validate all datasets
python data_validate.py --root .
# Validate specific dataset
python data_validate.py --root . --dataset dataset-name
Adding New Datasets
To add a new dataset to the repository:
Create a new directory following the naming convention:
dataset_name/v1/(⚠️ Don’t use hyphens (`-`) — use underscores (`_`) in dataset names.)Add your data files in the
data/subdirectoryCreate a
dataset.yamlmetadata fileUpdate the
registry.jsonfileValidate your dataset using the provided validation tools
See the contributing page for detailed instructions.