4chan gets real about software
9/7/2012: Updated with feedback from moot
4chan, the Internet's long-time dumping ground and butt of many a joke, is getting serious about software by making their biggest public-facing code change in nearly a decade, introducing an API and a bunch of new functionality.
Given its reputation, many commentators have already written this off with a shrug and a laugh. But 4chan is also one of the web's most popular and influential communities. It's the source of so many Internet-age cultural trends that even your grandma may be dimly aware that the clever picture she posted on her Facebook was trawled a thousand copies ago from the dark depths of /mlp/. Given that there's big money in all this, the API offers businesses a direct line to the heart of the machine.
As a professional software developer and long time 4chan user, I think this is a pretty interesting development. I talked yesterday afternoon to some of those who worked on 4chan's code over the years and know a little about why this is such an important development.
4chan, whose codebase is a heavily modified version of the Futallaby image board system, has suffered all kinds of software problems over the years. Note: moot says that Futallaby's code is almost entirely gone, and their software is named "Yotsuba" now. Volunteers running the site struggled with massive growth, hacks, denial-of-service attacks, regular crashes and much else besides. They were generally paid little, if at all, for their efforts, simply because there just wasn't any money to go around. Founder Chris Poole, aka moot, famously held $20,000 in credit card debt just trying to keep the site afloat. It was amazing that the site held on at all.
For nearly its entire history, 4chan was completely hands-off on software from the client side--i.e. you or anyone else interested in the data. Excepting messing with users by auto-playing obnoxious music or putting party hats on every post, the public-facing code changed little over the years and was aimed esclusively at web browsers. New features appear extremely rarely, and the developers I talked to could only identify of a handful in the last six years.
In May, however, 4chan announced a refactoring of the site's HTML output, the underlying structure of the page served to browsers. Yesterday, they announced three more big software-related changes:
• They're rolling the functionality of the most popular 4chan browser extensions into the site itself.
• They're adding a read-only JSON API, a way for outsiders to slurp up raw data on what's appearing at the site.
• Both of these changes are released and documented publicly on GitHub, a popular code repository.
May's HTML refactoring cleaned up years of cruft in 4chan's garbled source. This itself was significant, at the time, because it allowed users who had either written or thought about writing browser extensions to make much better versions with improved functionality. "mootykins" also asked that extension authors limit the number of requests they made to 4chan, in order to reduce load on the servers. As the default user experience is so sparse, extensions quickly grew to become a big part of 4chan users' experience. Their first official extension (for FireFox) was written in 2005, but user-written extensions appeared much earlier.
The API opens up new possibilities for third party developers. Where previously getting site content meant grabbing the HTML source (a horrible mess, even with the refactor) and attempting to parse it, developers can now grab content easily and parse it quickly in more versatile languages. This could lead to mobile phone apps (moot says this is unlikely, since Apple and Google both just kicked third party apps off their app stores), general site analytics, or simply detecting hot threads and trends throughout the site. With 4chan's tendency to generate new creative content, this is a pretty desirable feature.
Unfortunately, right now the API only works for individual threads and doesn't report info for a full board or for the site as a whole. Boards can be viewed as RSS in post-date order, but this doesn't include the most popular board, /b/. Also, 4chan's data is rendered as HTML before it's saved in the database, so the API doesn't do a fantastic job of separating out valuable info. Note: moot says that the API will soon be updated with endpoints for full boards. The 1.0 version was released to support the new inline extension
4chan is using version control and releasing information publicly. Although they've been using some form of version control since about 2006, this hasn't been well-known publicly; because of their chaotic nature I'd assumed they were still making changes live, on the public site. As recently as 2008, I was told, 4chan didn't have a real development environment set up for testing, though that may have changed since then— growth was so quick, and changes needed to be made so rapidly, that version control or development environment usage wasn't practical. 4chan's sharing of its code publicly (and letting people watch repositories where changes are being made) is a big step towards their code's transparency. They might even accept a pull request to the extensions script if a user made updates to it. For the most part, 4chan is deeply secretive. Most of the site's inner operations are rarely discussed, and people currently involved in the site didn't want to discuss its current workings even in broad terms.
So, if users and developers want this functionality, and these are positive changes for the site, why is this coming about only now after years of near-silence? First, browsers extensions became a popular early solution because not all their features were wanted by the whole community. Change on any site is hard, especially with a long-term user base. I know from experienc, in changing Boing Boing's design throughout the years, that even a slight change (or no change) can elicit some angry emails. And ourusers are pretty polite! I can't imagine what we'd do if we got DDoS'd by angry users every time we moved the nav bar.
These new updates on 4chan suggest two things: 4chan's userbase is slowly rolling over to where older, angrier users aren't around to complain, but also that 4chan is becoming more active in—and less afraid of—making site-wide changes. They're getting users used to it.
4chan's stability has also improved recently, so administrators are probably spending less time putting out fires. This may be partly due to them getting static cache flushing—a method of reducing how much load servers are placed under when users request pages—working properly for threads. Previously, each time someone posted, a new copy of the HTML thread had to be generated from scratch. Instead, now, the output is cached and a process periodically writes a new version on a schedule. Note: moot says this is the case, but that only three boards are rebuilt using a timer. 4chan never loaded content dynamically.
4chan's official browser extensions—not to mention encouraging other extension writers to throttle their countless users' manic request rate—probably improved server stability quite a bit as well.
Additionally, while 4chan mostly takes a laissez-faire approach to offensive content, it has strict rules. Most of 4chan's codebase is concerned with moderation and administrator functionality. Trolls and other obnoxious users may be effectively synonymous with 4chan—it's part of why there's very little money to be made there--but dealing with the worst remains a monstrous task.
Lastly, these changes were largely made by new, incoming volunteers. Traditionally, the volunteers working on the code don't have too much experience as software developers. In the early days, the developers were just cutting their chops on a large site; the "hackers", likewise, were script kiddies wreaking havoc with automated tools. 4chan must be attracting better developers now.
4chan's movements suggest that it's planning more active and organized development. They've made large changes to the site
and are closing down extensions. Note: moot says they're not "closing down" extensions. Its established user base is turning over more rapidly—or, perhaps, it's simply maturing. It's bringing in new developers, it's using version control, and its publicly releasing its source on Github. It's opened up with a JSON API so third party apps and projects can be made, even if the the available data are limited in scope. With a user base as large as has—22 million unique visitors making 1.3 billion pageviews in June this year—these changes should lead the site in interesting new directions.
2001 Nobel laureate in economics Joseph Stiglitz has a long history of being on the right side of history. For example: pricing the Iraq war at $3T; raising the alarm about sovereign wealth funds acquiring US debt; nailing the double-standard on bailouts for debt crises (and the way that this destabilizes poor countries); sounding the […]
In this 30 minute video, Harry Brignull rounds up his work on cataloging and unpicking “Dark Patterns,” (previously) the super-optimized techniques used by online services to lure their customers into taking actions they would not make otherwise and will later regret.
What to do if you’ve just signed up to work in one of the most expensive real-estate markets in the world, with almost all of your net worth tied up in illiquid shares in your employer’s company? Just ask a Silicon Valley bank for a 100% mortgage, which they’ll cheerfully supply on 24 hours’ notice, […]
Looks like all of your potential employers are hiring candidates with programming skills (which you don’t have). With all of the languages out there today, it’s tough to know where to start.With the Complete Front-End to Back-End Coding Bundle, you can beef your resume up in all the right places, no confusion necessary. This package of […]
Those of us who love music wish we could listen to it 24/7. But it’s impossible when we’re trying to converse with our friends, or when are swimming in the local pool.That is, until now. The KOAR Bone Conduction Bluetooth Headset, now 48% off, has changed the audio game.Made with lightweight titanium memory metal, this headset boasts patented bone conduction technology to transport sound […]
It’s one thing to enjoy dinner at home and a nice glass of Cabernet Sauvignon with your best friend, Netflix, but it’s another thing entirely to make that meal from scratch and get that wine delivered right to your doorstep.But what if we told you there’s a way to make this possible? To keep your social life, […]