Explain the differences between Recurrent Neural Networks (RNNs), Long Short-Term Memory networks (LSTMs), and Gated Recurrent Units (GRUs) in terms of their architecture, capabilities, and typical applications. What are the advantages and limitations of each? Provide a simple code example to illustrate the implementation of each architecture using either TensorFlow or PyTorch.