Carbon-optimized learning

### PROBLEM:
Distributed learning has a higher carbon cost than centralized learning (https://arxiv.org/abs/2102.07627)
![Screenshot 2022-10-25 at 21 28 13](https://user-images.githubusercontent.com/62021543/197864657-7fb5e23c-b4c1-4e67-95c3-249107e401e7.png)

The problem is worsened in non-IID settings
![Screenshot 2022-10-25 at 21 28 39](https://user-images.githubusercontent.com/62021543/197864851-9b2ac930-665b-4229-8779-011983b78dd0.png)

___
### POSSIBLE SOLUTIONS:
This can be mitigated by several approaches
1) model compression
2) communication compression (e.g. powerSGD)
3) client selection  
   _- especially for non-IID settings: we can select clients to preferentially learn from based on a trade-off between the `value of the client's data` and the `carbon cost using it`--> can be seen as a form of model personalization_
4) others I wont list yet...
___

### PROPOSED APPROACH
1) The first step is **computing the carbon footprint**
`The carbon footprint of distributed learning` = `cost of compute` + `cost communication `

   - Both are dependent on the carbon intensity https://app.electricitymaps.com/map of where the computation takes place (i.e. energy mix of the country or data centre for example)
   - Our tool, CUMULATOR https://pypi.org/project/cumulator/ computes this _(albeit imperfectly as it does not take into consideration some important real-world hardware issues, but lets ignore that for now)_

2) Next is **adapting and integrating CUMULATOR into DISCO**
   - OPTION 1: Sending CUMULATOR to each user --> monitoring local compute and comms --> sending results to be aggregated --> report results to all users
      _- This is maybe unfeasible, invasive and creates more communication overhead than it is worth....to be explored_
   - OPTION 2: Asking each user to collect data on the determinants of carbon footprint (GPU/CPU brand, its geographical location, epochs of local learning and number of communication rounds) --> then either communicating those directly or some privacy-preserving composite of them --> prediction of the carbon footprint centrally from these metrics compared to the model run on a benchmark GPU centrally
   - others... to discuss

3) Finally, **displaying the results**

- Displaying tangible results to users to communicate the cost of training the model.
- Perhaps add some tips about how to reduce this cost
- Definitely show the "value" gained for each "unit" of carbon 

Example:
>  Each 1% of Accuracy/F1 etc costs 10Wh (or 1 week of an average tree's carbon recycling capacity)

4) Once the above are done...we can explore **carbon-optimized learning**
- Monitoring the carbon impact of optimization techniques like compression
- Make the tradeoffs necessary to perform "client selection" (where we decide who to communicate with to maximize the trade-off between accuracy vs carbon footprint)
- etc!

___
Lauzhack candidate task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Carbon-optimized learning #501

PROBLEM:

POSSIBLE SOLUTIONS:

PROPOSED APPROACH

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Carbon-optimized learning #501

Description

PROBLEM:

POSSIBLE SOLUTIONS:

PROPOSED APPROACH

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions