Available Datasets

This page provides detailed information about the datasets available in the Network Datasets repository.

Dataset Registry

All available datasets are listed in the registry.json file. Each dataset entry includes:

name: Unique identifier for the dataset
version: Dataset version number
path: Relative path to the dataset files
summary: Brief description of the dataset
license: License information (typically CC-BY-4.0)

Current Datasets

toynet-11edges

Version: 1.0.0
License: CC-BY-4.0
Path: datasets/toynet-11edges/v1

A small toy network with 8 nodes and 11 edges, designed for testing and learning purposes.

Files:

nodes.json: Node definitions with coordinates
edges.json: Edge definitions connecting nodes
probs.json: Edge failure probabilities

Use Cases:

Testing algorithms and functions
Learning the data format
Quick prototyping

ema-highway

Version: 1.0.0
License: CC-BY-4.0
Path: datasets/ema-highway/v1

Eastern Massachusetts benchmark highway network with nodes, edges, and probability files.

Files:

nodes.json: Highway intersection nodes
edges.json: Road segments between intersections
probs_bin.json: Binary failure probabilities
probs_mult.json: Multi-state failure probabilities

Example reference:: Byun, J.-E., Ryu, H., & Straub, D. (2025). Branch-and-bound algorithm for efficient reliability analysis of general coherent systems. Structural Safety, 102653.

Use Cases:

Transportation network analysis
Connectivity to critical facilities
Connectivity between communities
Emergency response planning

Generated Example Collection

The datasets/generated/ directory contains synthetic example datasets produced with ndtools.network_generator. These examples are intended for tutorials, quick tests, and format demonstrations. They follow the same JSON schemas as all curated datasets.

Note

See datasets/generated/README.md for an overview, and datasets/generated/PROVENANCE.md for the exact commands and parameters used to generate each example.

Layout

Each example resides in its own subdirectory with versioning:

generated/
  grid_8x8/
    v1/data/{nodes.json, edges.json, probs.json, graph.png}
  er_60_p005/
    v1/data/{...}
  ws_n60_k6_b015/
    v1/data/{...}
  ba_n60_m3/
    v1/data/{...}
  rg_n60_r017/
    v1/data/{...}
  config_n60_deg3/
    v1/data/{...}
  README.md
  PROVENANCE.md
  CHANGELOG.md

What’s inside each example

nodes.json — map of node id → attributes (at minimum: x, y)
edges.json — map of edge id → {from, to, directed, ...}
probs.json — per-edge binary probabilities (e.g., ``”0”``=failure, ``”1”``=working)
graph.png — (optional) auto-rendered preview

Reproducibility & provenance

Each example’s parameters (model family, size, probabilities, seed, etc.) are recorded in generated/metadata.json inside the dataset folder and summarized across the collection in generated/PROVENANCE.md. Regenerate or extend the collection via the CLI examples shown there.

distribution-substation-liang2022

Version: 1.0.0
License: CC-BY-4.0
Path: datasets/distribution-substation-liang2022/v1

Example 110/220 kV distribution substation network based on Liang et al. (2022). Includes nodes, edges, macrocomponents, equipment fragility, and probability files.

Files:

nodes.json: Substation nodes with coordinates and attributes
edges.json: Power line connections between substations
probs.json: Edge failure probabilities
macrocomponents.json: Component grouping information
equipment.json: Equipment fragility data

Citation:: Liang, H., Blagojevic, N., Xie, Q., & Stojadinovic, B. (2022). Seismic risk analysis of electrical substations based on the network analysis method. Earthquake Engineering & Structural Dynamics, 51(11), 2690-2707.

Use Cases:

Power grid reliability analysis
Seismic risk assessment
Infrastructure resilience studies

Data Format

All datasets follow a consistent JSON format defined by JSON schemas in the schema/ directory.

Node Format

Nodes are stored as a JSON object where keys are node IDs and values are attribute dictionaries:

{
  "node_id": {
    "x": 0.0,
    "y": 0.0,
    "type": "source",
    "additional_attributes": "..."
  }
}

Required attributes:

x: X-coordinate (number)
y: Y-coordinate (number)

Examples of optional attributes:

type: Node type (string)
group_name: Grouping identifier (string)
capacity: Capacity value (number or string)
unit: Unit of measurement (string)
Any other custom attributes

Edge Format

Edges are stored as a JSON object where keys are edge IDs and values are connection dictionaries:

{
  "edge_id": {
    "from": "node1",
    "to": "node2",
    "directed": false,
    "additional_attributes": "..."
  }
}

Required attributes:

from: Source node ID (string)
to: Target node ID (string)
directed: Whether edge is directed (boolean)

Examples of optional attributes:

eid: Edge identifier (string)
macrocomponent_type: Component type (string)
length: Edge length (number)
Any other custom attributes

Probability Format

Probabilities are stored as a JSON object mapping edge IDs to probability dictionaries:

{
  "edge_id": {
    "0": {"p": 0.05},
    "1": {"p": 0.95}
  }
}

Where, for example, ``”1”` indicates the edge could imply active/working and ``”0”` failure.

Required attributes:

int: Integer state index starting from 0
p: Probability of the state (number between 0 and 1)

Examples of optional attributes:

description: Description of the state (string)

Dataset Metadata

Each dataset includes a dataset.yaml file with metadata:

name: dataset-name
version: 1.0.0
title: Human-readable title
license: CC-BY-4.0
description: >
  Detailed description of the dataset
contacts:
  - name: Contact Name
    affiliation: Institution
    email: contact@example.com
tags: [tag1, tag2, tag3]
files:
  nodes: data/nodes.json
  edges: data/edges.json
  probs: data/probs.json
citation: |
  Citation information

Loading Datasets

Using ndtools

from ndtools.io import dataset_paths, load_json
from pathlib import Path

# Get dataset paths
nodes_path, edges_path, probs_path = dataset_paths(
    Path('datasets'), 'dataset_name', 'v1'
)

# Load data
nodes = load_json(nodes_path)
edges = load_json(edges_path)
probs = load_json(probs_path)

Direct Loading

import json
from pathlib import Path

dataset_path = Path("datasets/dataset_name/v1/data")

with open(dataset_path / "nodes.json") as f:
    nodes = json.load(f)

with open(dataset_path / "edges.json") as f:
    edges = json.load(f)

with open(dataset_path / "probs.json") as f:
    probs = json.load(f)

Validation

All datasets can be validated against their schemas:

# Validate all datasets
python data_validate.py --root .

# Validate specific dataset
python data_validate.py --root . --dataset dataset-name

Adding New Datasets

To add a new dataset to the repository:

Create a new directory following the naming convention: dataset_name/v1/ (⚠️ Don’t use hyphens (`-`) — use underscores (`_`) in dataset names.)
Add your data files in the data/ subdirectory
Create a dataset.yaml metadata file
Update the registry.json file
Validate your dataset using the provided validation tools

See the contributing page for detailed instructions.