While 2M tokens is powerful, it is not free. Vertex AI charges based on the number of input tokens processed.
Sending a massive repository on every single chat turn will quickly exhaust your budget and result in high latency, as the model must re-process the entire 2M tokens every time.
The solution to this is Context Caching.