The linear sequence of amino acids within a protein is considered the primary structure of the protein. Proteins are built from a set of only twenty amino acids, each of which has a unique side chain.
ESM3 can reason over protein sequence, structure, and function, by representing each of these through alphabets of discrete tokens that can be combined in a generative language model. This ...