Developer, 11 year reddit refugee

Zetaphor

  • 3 Posts
  • 54 Comments
Joined 8 months ago
cake
Cake day: March 12th, 2024

help-circle

  • I’m really enjoying Otterwiki. Everything is saved as markdown, attachments are next to the markdown files in a folder, and version control is integrated with a git repo. Everything lives in a directory and the application runs from a docker container.

    It’s the perfect amount of simplicity and is really just a UI on top of fully portable standard tech.







  • To elaborate further from the other comment, it’s a person running a copy of the Lemmy software on their server. I for example am running mine (and seeing this thread) from https://zemmy.cc. Thanks to Federation all of our different servers are able to talk to each other so we can have a shared experience rather than everyone being on one centralized instance managed by one set of administrators (like reddit is).

    This provides resilience to the network. If reddit goes down, reddit is down. If lemmy.world goes down, you can still access the content of every community that isn’t on lemmy.world, and if other servers were subscribed to the content on a community from lemmy.world you could still see the content from before the server went offline (and it will resync once it’s back up).

    If we put all of our eggs into a single basket, we have a single point of failure. If all of the major communities go to lemmy.world then lemmy.world is that single point of failure. Doing that is effectively just recreating the same issues we had with reddit but with extra steps. By spreading larger communities across servers we ensure that the outage (or permanent closure) of a single instance doesn’t take down half the active communities with it.







  • Setting aside the obvious answer of “because capitalism”, there are a lot of obstacles towards democratizing this technology. Training of these models is done on clusters of A100 GPU’s, which are priced at $10,000USD each. Then there’s also the fact that a lot of the progress being made is being done by highly specialized academics, often with the resources of large corporations like Microsoft.

    Additionally the curation of datasets is another massive obstacle. We’ve mostly reached the point of diminishing returns of just throwing all the data at the training of models, it’s quickly becoming apparent that the quality of data is far more important than the quantity of the data (see TinyStories as an example). This means a lot of work and research needs to go into qualitative analysis when preparing a dataset. You need a large corpus of input, each of which are above a quality threshold, but then also as a whole they need to represent a wide enough variety of circumstances for you to reach emergence in the domain(s) you’re trying to train for.

    There is a large and growing body of open source model development, but even that only exists because of Meta “leaking” the original Llama models, and now more recently releasing Llama 2 with a commercial license. Practically overnight an entire ecosystem was born creating higher quality fine-tunes and specialized datasets, but all of that was only possible because Meta invested the resources and made it available to the public.

    Actually in hindsight it looks like the answer is still “because capitalism” despite everything I’ve just said.