How to contribute
We welcome contributions to the Network Datasets repository! This page provides guidelines for contributing datasets, code improvements, and documentation.
Types of Contributions
We accept several types of contributions:
New datasets: Infrastructure network datasets following our format
Code improvements: Bug fixes, new features, performance optimizations
Documentation: Improvements to existing docs, new tutorials
Testing: Additional test cases, validation improvements
Examples: New Jupyter notebooks, usage examples
Getting Started
Fork the repository on GitHub
Clone your fork locally:
git clone https://github.com/your-username/network-datasets.git cd network-datasets
Create a development environment:
conda create -n network-datasets-dev python=3.9 conda activate network-datasets-dev pip install -e ".[dev]"
Install pre-commit hooks (optional but recommended):
pip install pre-commit pre-commit install
Adding New Datasets
Dataset Structure
New datasets should follow this directory structure:
dataset-name/
├── dataset.yaml # Dataset metadata
└── v1/ # Version directory
├── data/ # Data files
│ ├── nodes.json # Node definitions
│ ├── edges.json # Edge definitions
│ └── probs.json # Probability data
├── docs/ # Documentation
│ ├── README.md # Dataset description
│ ├── PROVENANCE.md # Data source information
│ └── CHANGELOG.md # Version history
└── scripts/ # Analysis scripts (optional)
└── example.ipynb
Required Files
- dataset.yaml
Dataset metadata file with the following structure:
name: dataset-name version: 1.0.0 title: Human-readable title license: CC-BY-4.0 description: > Detailed description of the dataset including: - What type of infrastructure network - Number of nodes and edges - Data source and methodology - Use cases and applications contacts: - name: Your Name affiliation: Your Institution email: your.email@example.com tags: [power, transportation, water, etc.] files: nodes: data/nodes.json edges: data/edges.json probs: data/probs.json citation: | Citation information for the dataset
- nodes.json
Node definitions following the JSON schema:
{ "node_id": { "x": 0.0, "y": 0.0, "type": "optional_type", "additional_attributes": "optional" } }
- edges.json
Edge definitions following the JSON schema:
{ "edge_id": { "from": "node1", "to": "node2", "directed": false, "additional_attributes": "optional" } }
- probs.json
Probability data for edge failures:
{ "edge_id": { "1": {"p": 0.95}, "0": {"p": 0.05} } }
Data Quality Guidelines
Coordinates: Use consistent units (e.g., kilometers) and coordinate system
Node IDs: Use descriptive, unique identifiers
Edge IDs: Use descriptive, unique identifiers
Attributes: Include relevant metadata (capacity, type, etc.)
Probabilities: Ensure probabilities sum to 1.0 for each edge
Validation: All data must pass schema validation
Dataset Documentation
Create comprehensive documentation for your dataset:
- README.md
Include: * Dataset overview and purpose * Data source and methodology * Network statistics (nodes, edges, connectivity) * Usage examples * Citation information
- PROVENANCE.md
Include: * Original data source * Processing steps and transformations * Assumptions and limitations * Data quality notes
- CHANGELOG.md
Track changes and updates to the dataset.
Validation
Before submitting, validate your dataset:
# Validate all datasets
python data_validate.py --root .
# Validate specific dataset
python data_validate.py --root . --dataset your-dataset-name
Update Registry
Add your dataset to the registry.json file:
[
{
"name": "your-dataset-name",
"version": "1.0.0",
"path": "your-dataset-name/v1",
"summary": "Brief description of your dataset",
"license": "CC-BY-4.0"
}
]
Code Contributions
Code Style
Follow PEP 8 style guidelines
Use type hints for function parameters and return values
Write docstrings for all public functions
Use meaningful variable and function names
Testing
Write tests for new functionality
Ensure all existing tests pass
Aim for good test coverage
# Run tests
pytest tests/
# Run with coverage
pytest --cov=ndtools tests/
Documentation
Update docstrings for modified functions
Add examples to the documentation
Update the API reference if needed
Pull Request Process
Create a feature branch:
git checkout -b feature/your-feature-name
Make your changes and commit them:
git add . git commit -m "Add your dataset: brief description"
Push to your fork:
git push origin feature/your-feature-name
Create a pull request on GitHub with: * Clear description of changes * Reference to any related issues * Screenshots for UI changes * Test results
Pull Request Guidelines
Keep PRs focused on a single feature or dataset
Write clear, descriptive commit messages
Respond to review feedback promptly
Update documentation as needed
Ensure all tests pass
Review Process
All contributions are reviewed by maintainers:
Code quality: Style, functionality, tests
Data quality: Validation, documentation, format compliance
Documentation: Clarity, completeness, accuracy
Testing: Coverage, correctness
Reviewers may request changes before merging.
License
By contributing to this project, you agree that your contributions will be licensed under the same licenses as the project:
Code: MIT License
Data: CC-BY-4.0 License
This means your contributions can be used by others under these terms.
Getting Help
If you need help with contributing:
Open an issue on GitHub for questions
Check existing issues for similar questions
Read the documentation thoroughly
Ask in discussions for general questions
Recognition
Contributors will be recognized in:
The project’s README.md file
Release notes for significant contributions
The project’s documentation
Thank you for contributing to the Network Datasets project!