A number of architectures, both existing and emerging, are designed for high performance embedded computing. These architectures, including Field Programmable Gate Arrays (FPGAs) and Multi-core architectures such as the Cell chip from IBM, contain on-chip memory and on-board memory that form a memory hierarchy similar to a caching hierarchy on a conventional processor. However these chips come with non-cached memory hierachies in order to provide higher performance for less chip area. In this talk, I will give an overview of these architectures and their memory hierarchies. I will present research we are doing into automatically allocating the memories for sliding window based image processing applications. Image processing applications are frequently implemented on these architectures. Our approach determines the optimal assignment of data to memories to minimize data accesses and maximize parallelism. For FPGA architectures, our approach gives a block diagram of the design and an estimate of the achievable speedup before the detailed design is complete.