How attention sinks keep language models stable (hanlab.mit.edu)
218 points by pr337h4m 6 days ago | 36 comments
271218 points by pr337h4m 6 days ago | 36 comments
271354 points by the1bernard 4 days ago | 113 comments
2726 points by foxfired a day ago | 0 comments
27338 points by rahulbstomar 2 days ago | 9 comments
27412 points by mhagiwara a day ago | 2 comments
275267 points by lihaoyi 4 days ago | 94 comments
276215 points by babelfish 3 days ago | 16 comments
27789 points by Palmik 3 days ago | 6 comments
2788 points by LAsteNERD a day ago | 4 comments
27966 points by aanthonymax 3 days ago | 22 comments
2808 points by voxadam 5 hours ago | 0 comments
2813 points by NKosmatos a day ago | 0 comments
282104 points by avestura 4 days ago | 44 comments
28345 points by pinewurst a day ago | 2 comments
28491 points by ndr 13 hours ago | 109 comments
28586 points by jxmorris12 7 days ago | 13 comments
286287 points by freediver 7 days ago | 177 comments
2875 points by rbanffy 11 hours ago | 0 comments
288221 points by Signez 4 days ago | 226 comments
28931 points by arto 5 days ago | 28 comments
2908 points by 01-_- 5 hours ago | 1 comment
291146 points by simonpure 5 days ago | 35 comments
29269 points by thomassmith65 19 hours ago | 100 comments
29365 points by randomgermanguy 4 days ago | 95 comments
294978 points by max__dev 7 days ago | 784 comments
29584 points by bckmn 2 days ago | 52 comments
296227 points by maelito 5 days ago | 149 comments
2975 points by gmays a day ago | 0 comments
298275 points by gaws 3 days ago | 94 comments
2997 points by polished85 16 hours ago | 5 comments
300