Available Datasets

This page provides detailed information about the datasets available in the Network Datasets repository.

Dataset Registry

All available datasets are listed in the registry.json file. Each dataset entry includes:

  • name: Unique identifier for the dataset

  • version: Dataset version number

  • path: Relative path to the dataset files

  • summary: Brief description of the dataset

  • license: License information (typically CC-BY-4.0)

Current Datasets

toynet-11edges

  • Version: 1.0.0

  • License: CC-BY-4.0

  • Path: datasets/toynet-11edges/v1

A small toy network with 8 nodes and 11 edges, designed for testing and learning purposes.

Files:

  • nodes.json: Node definitions with coordinates

  • edges.json: Edge definitions connecting nodes

  • probs.json: Edge failure probabilities

Use Cases:

  • Testing algorithms and functions

  • Learning the data format

  • Quick prototyping

ema-highway

  • Version: 1.0.0

  • License: CC-BY-4.0

  • Path: datasets/ema-highway/v1

Eastern Massachusetts benchmark highway network with nodes, edges, and probability files.

Files:

  • nodes.json: Highway intersection nodes

  • edges.json: Road segments between intersections

  • probs_bin.json: Binary failure probabilities

  • probs_mult.json: Multi-state failure probabilities

Example reference:

Byun, J.-E., Ryu, H., & Straub, D. (2025). Branch-and-bound algorithm for efficient reliability analysis of general coherent systems. Structural Safety, 102653.

Use Cases:

  • Transportation network analysis

  • Connectivity to critical facilities

  • Connectivity between communities

  • Emergency response planning

Generated Example Collection

The datasets/generated/ directory contains synthetic example datasets produced with ndtools.network_generator. These examples are intended for tutorials, quick tests, and format demonstrations. They follow the same JSON schemas as all curated datasets.

Note

See datasets/generated/README.md for an overview, and datasets/generated/PROVENANCE.md for the exact commands and parameters used to generate each example.

Layout

Each example resides in its own subdirectory with versioning:

generated/
  grid_8x8/
    v1/data/{nodes.json, edges.json, probs.json, graph.png}
  er_60_p005/
    v1/data/{...}
  ws_n60_k6_b015/
    v1/data/{...}
  ba_n60_m3/
    v1/data/{...}
  rg_n60_r017/
    v1/data/{...}
  config_n60_deg3/
    v1/data/{...}
  README.md
  PROVENANCE.md
  CHANGELOG.md

What’s inside each example

  • nodes.json — map of node id → attributes (at minimum: x, y)

  • edges.json — map of edge id → {from, to, directed, ...}

  • probs.json — per-edge binary probabilities (e.g., ``”0”``=failure, ``”1”``=working)

  • graph.png — (optional) auto-rendered preview

Reproducibility & provenance

Each example’s parameters (model family, size, probabilities, seed, etc.) are recorded in generated/metadata.json inside the dataset folder and summarized across the collection in generated/PROVENANCE.md. Regenerate or extend the collection via the CLI examples shown there.

distribution-substation-liang2022

  • Version: 1.0.0

  • License: CC-BY-4.0

  • Path: datasets/distribution-substation-liang2022/v1

Example 110/220 kV distribution substation network based on Liang et al. (2022). Includes nodes, edges, macrocomponents, equipment fragility, and probability files.

Files:

  • nodes.json: Substation nodes with coordinates and attributes

  • edges.json: Power line connections between substations

  • probs.json: Edge failure probabilities

  • macrocomponents.json: Component grouping information

  • equipment.json: Equipment fragility data

Citation:

Liang, H., Blagojevic, N., Xie, Q., & Stojadinovic, B. (2022). Seismic risk analysis of electrical substations based on the network analysis method. Earthquake Engineering & Structural Dynamics, 51(11), 2690-2707.

Use Cases:

  • Power grid reliability analysis

  • Seismic risk assessment

  • Infrastructure resilience studies

Data Format

All datasets follow a consistent JSON format defined by JSON schemas in the schema/ directory.

Node Format

Nodes are stored as a JSON object where keys are node IDs and values are attribute dictionaries:

{
  "node_id": {
    "x": 0.0,
    "y": 0.0,
    "type": "source",
    "additional_attributes": "..."
  }
}

Required attributes:

  • x: X-coordinate (number)

  • y: Y-coordinate (number)

Examples of optional attributes:

  • type: Node type (string)

  • group_name: Grouping identifier (string)

  • capacity: Capacity value (number or string)

  • unit: Unit of measurement (string)

  • Any other custom attributes

Edge Format

Edges are stored as a JSON object where keys are edge IDs and values are connection dictionaries:

{
  "edge_id": {
    "from": "node1",
    "to": "node2",
    "directed": false,
    "additional_attributes": "..."
  }
}

Required attributes:

  • from: Source node ID (string)

  • to: Target node ID (string)

  • directed: Whether edge is directed (boolean)

Examples of optional attributes:

  • eid: Edge identifier (string)

  • macrocomponent_type: Component type (string)

  • length: Edge length (number)

  • Any other custom attributes

Probability Format

Probabilities are stored as a JSON object mapping edge IDs to probability dictionaries:

{
  "edge_id": {
    "0": {"p": 0.05},
    "1": {"p": 0.95}
  }
}

Where, for example, ``”1”` indicates the edge could imply active/working and ``”0”` failure.

Required attributes:

  • int: Integer state index starting from 0

  • p: Probability of the state (number between 0 and 1)

Examples of optional attributes:

  • description: Description of the state (string)

Dataset Metadata

Each dataset includes a dataset.yaml file with metadata:

name: dataset-name
version: 1.0.0
title: Human-readable title
license: CC-BY-4.0
description: >
  Detailed description of the dataset
contacts:
  - name: Contact Name
    affiliation: Institution
    email: contact@example.com
tags: [tag1, tag2, tag3]
files:
  nodes: data/nodes.json
  edges: data/edges.json
  probs: data/probs.json
citation: |
  Citation information

Loading Datasets

Using ndtools

from ndtools.io import dataset_paths, load_json
from pathlib import Path

# Get dataset paths
nodes_path, edges_path, probs_path = dataset_paths(
    Path('datasets'), 'dataset_name', 'v1'
)

# Load data
nodes = load_json(nodes_path)
edges = load_json(edges_path)
probs = load_json(probs_path)

Direct Loading

import json
from pathlib import Path

dataset_path = Path("datasets/dataset_name/v1/data")

with open(dataset_path / "nodes.json") as f:
    nodes = json.load(f)

with open(dataset_path / "edges.json") as f:
    edges = json.load(f)

with open(dataset_path / "probs.json") as f:
    probs = json.load(f)

Validation

All datasets can be validated against their schemas:

# Validate all datasets
python data_validate.py --root .

# Validate specific dataset
python data_validate.py --root . --dataset dataset-name

Adding New Datasets

To add a new dataset to the repository:

  1. Create a new directory following the naming convention: dataset_name/v1/ (⚠️ Don’t use hyphens (`-`) — use underscores (`_`) in dataset names.)

  2. Add your data files in the data/ subdirectory

  3. Create a dataset.yaml metadata file

  4. Update the registry.json file

  5. Validate your dataset using the provided validation tools

See the contributing page for detailed instructions.