# Replication Preprocessing

A preprocessor is used to check that a replication job is good to start before job creation

## Simple Replication Jobs

- Checks if Codex drives are ready
    - Ensure any source/destination on Codex VFS is fully built and sized.
- Checks sources
    - Makes sure they exist
    - Makes sure they are readable
- Checks outputs
    - Makes sure they are writable
    - Checks to make sure they have enough space based on the sources
    - Makes sure the destination is not a part of the source
    - Checks to see if there will be naming collisions on each output
        - Displays alert if auto-rename is not enabled


## Preset Replication Jobs

- Checks to make sure there are presets
- Checks everything simple jobs check


## Space Availability Check

This check can be a become a rather expensive check if sources are large enough. For this reason, a two-stage strategy was implemented.

### Fast Path Space Available Check

To circumvent enumerating over the entirety of a source, we can assume that:
- If a destination's volume has enough space available for the aggregate size of the space used on all source volumes, then there is no need to actually enumerate over the source's contents to determine the aggregate size
- It is important to note that even if the source is the ENTIRE root volume, we still cannot say that the space utilized on the volume is the aggregate size of what would be offloaded. 
    - This behavior can be experienced this when offloading a network drive that has more space utilized than what is actually there. It is likely the trash needed emptied.
    
This is the current flow of the fast path check:
1. For each source, retrieve the volume-level space used (or for Codex drives, use retrieve the size of the item)
2. Sum all source volume/sizes
3. For each destination, compare the space available on the root volume to the size of all sources
    ```swift
    if  destinationFreeSpace > naiveTotalBytesNeeded {
        // skip deep enumeration!
    }
    ```

### Fallback (Deep Enumeration)

Only if any destination failed the fast-path check, we enumerate over all sources to get the true aggregate size of the item that will be offloaded.

        
## Current Limitations and Performance Considerations

- Currently (05/15/25), all of these checks occur synchronously on the main thread.
    - Slow source volumes that have more total space utilized than the destinations' space available might cause significant beach balling due to this
        - Example:
            - Source: 500GB network drive using 400GB
            - Destination: 1TB SSD with 200 GB available
        - To fix this, we will need to refactor these checks to happen on background thread or waiting to do a real size check until replication construction.
   - However, beach balling will not occur when a slow source is used and all destinations have enough space for the entire space utilized on the source volume. No deep enumeration of the source will occur.
        - Example: 
            - Source: 1 TB network drive that is only using 500GB
            - Destination: 1TB SSD with 600 GB available


