Documentation

Conceptually how sync works

Simplified

For each data source, we store last_sync_data. That way when we trigger the sync it fetch data for interval last_sync_data - today. And during the syncing we always upate the last_sync_data, and eventually save the today when sync is done. That way if the sync failed in the middle, we don't need to go from start.

Full details

  • The last_sync_data is updated even when data source tell us it don't have the data. This saves issue of infinite fetching of data that will never be there.
  • For each service we have a specific time window E.g.: 40 days, by which we always look into the past from the last_sync_date, because during that time the data in the source keeps updating, and some reports become's available. Therefore the sync interval is more like last_sync_data - 40 days - today.
  • we cache all http requests, each has a custom time to live, because we know some reports will never change once they are available, but some reports change each hour. So this saves us much time, and api calls.
    • And yes, this would sometimes uselessly write old data to DB, even we know the data doesn't change but thats fie. I didn't want to do the caching on the higher level, because I would get really messy and data source dependent.
    • Note we don't cache the response it the data source don't have data. So it will always try to refetch to get the data that can eventually appear there

How we get the revenue and costs data, accuracy, how the syncing works:


Previously we've done an MVP project where we successuflly proved we can integrate each service API, to fetch revenues and costs. Use it only if you develop the whole API integration from scratch: