How to contribute

We welcome contributions to the Network Datasets repository! This page provides guidelines for contributing datasets, code improvements, and documentation.

Types of Contributions

We accept several types of contributions:

  • New datasets: Infrastructure network datasets following our format

  • Code improvements: Bug fixes, new features, performance optimizations

  • Documentation: Improvements to existing docs, new tutorials

  • Testing: Additional test cases, validation improvements

  • Examples: New Jupyter notebooks, usage examples

Getting Started

  1. Fork the repository on GitHub

  2. Clone your fork locally:

    git clone https://github.com/your-username/network-datasets.git
    cd network-datasets
    
  3. Create a development environment:

    conda create -n network-datasets-dev python=3.9
    conda activate network-datasets-dev
    pip install -e ".[dev]"
    
  4. Install pre-commit hooks (optional but recommended):

    pip install pre-commit
    pre-commit install
    

Adding New Datasets

Dataset Structure

New datasets should follow this directory structure:

dataset-name/
├── dataset.yaml          # Dataset metadata
└── v1/                   # Version directory
    ├── data/             # Data files
    │   ├── nodes.json    # Node definitions
    │   ├── edges.json    # Edge definitions
    │   └── probs.json    # Probability data
    ├── docs/             # Documentation
    │   ├── README.md     # Dataset description
    │   ├── PROVENANCE.md # Data source information
    │   └── CHANGELOG.md  # Version history
    └── scripts/          # Analysis scripts (optional)
        └── example.ipynb

Required Files

dataset.yaml

Dataset metadata file with the following structure:

name: dataset-name
version: 1.0.0
title: Human-readable title
license: CC-BY-4.0
description: >
  Detailed description of the dataset including:
  - What type of infrastructure network
  - Number of nodes and edges
  - Data source and methodology
  - Use cases and applications
contacts:
  - name: Your Name
    affiliation: Your Institution
    email: your.email@example.com
tags: [power, transportation, water, etc.]
files:
  nodes: data/nodes.json
  edges: data/edges.json
  probs: data/probs.json
citation: |
  Citation information for the dataset
nodes.json

Node definitions following the JSON schema:

{
  "node_id": {
    "x": 0.0,
    "y": 0.0,
    "type": "optional_type",
    "additional_attributes": "optional"
  }
}
edges.json

Edge definitions following the JSON schema:

{
  "edge_id": {
    "from": "node1",
    "to": "node2",
    "directed": false,
    "additional_attributes": "optional"
  }
}
probs.json

Probability data for edge failures:

{
  "edge_id": {
    "1": {"p": 0.95},
    "0": {"p": 0.05}
  }
}

Data Quality Guidelines

  • Coordinates: Use consistent units (e.g., kilometers) and coordinate system

  • Node IDs: Use descriptive, unique identifiers

  • Edge IDs: Use descriptive, unique identifiers

  • Attributes: Include relevant metadata (capacity, type, etc.)

  • Probabilities: Ensure probabilities sum to 1.0 for each edge

  • Validation: All data must pass schema validation

Dataset Documentation

Create comprehensive documentation for your dataset:

README.md

Include: * Dataset overview and purpose * Data source and methodology * Network statistics (nodes, edges, connectivity) * Usage examples * Citation information

PROVENANCE.md

Include: * Original data source * Processing steps and transformations * Assumptions and limitations * Data quality notes

CHANGELOG.md

Track changes and updates to the dataset.

Validation

Before submitting, validate your dataset:

# Validate all datasets
python data_validate.py --root .

# Validate specific dataset
python data_validate.py --root . --dataset your-dataset-name

Update Registry

Add your dataset to the registry.json file:

[
  {
    "name": "your-dataset-name",
    "version": "1.0.0",
    "path": "your-dataset-name/v1",
    "summary": "Brief description of your dataset",
    "license": "CC-BY-4.0"
  }
]

Code Contributions

Code Style

  • Follow PEP 8 style guidelines

  • Use type hints for function parameters and return values

  • Write docstrings for all public functions

  • Use meaningful variable and function names

Testing

  • Write tests for new functionality

  • Ensure all existing tests pass

  • Aim for good test coverage

# Run tests
pytest tests/

# Run with coverage
pytest --cov=ndtools tests/

Documentation

  • Update docstrings for modified functions

  • Add examples to the documentation

  • Update the API reference if needed

Pull Request Process

  1. Create a feature branch:

    git checkout -b feature/your-feature-name
    
  2. Make your changes and commit them:

    git add .
    git commit -m "Add your dataset: brief description"
    
  3. Push to your fork:

    git push origin feature/your-feature-name
    
  4. Create a pull request on GitHub with: * Clear description of changes * Reference to any related issues * Screenshots for UI changes * Test results

Pull Request Guidelines

  • Keep PRs focused on a single feature or dataset

  • Write clear, descriptive commit messages

  • Respond to review feedback promptly

  • Update documentation as needed

  • Ensure all tests pass

Review Process

All contributions are reviewed by maintainers:

  • Code quality: Style, functionality, tests

  • Data quality: Validation, documentation, format compliance

  • Documentation: Clarity, completeness, accuracy

  • Testing: Coverage, correctness

Reviewers may request changes before merging.

License

By contributing to this project, you agree that your contributions will be licensed under the same licenses as the project:

  • Code: MIT License

  • Data: CC-BY-4.0 License

This means your contributions can be used by others under these terms.

Getting Help

If you need help with contributing:

  • Open an issue on GitHub for questions

  • Check existing issues for similar questions

  • Read the documentation thoroughly

  • Ask in discussions for general questions

Recognition

Contributors will be recognized in:

  • The project’s README.md file

  • Release notes for significant contributions

  • The project’s documentation

Thank you for contributing to the Network Datasets project!