4chan gets real about software
9/7/2012: Updated with feedback from moot
4chan, the Internet's long-time dumping ground and butt of many a joke, is getting serious about software by making their biggest public-facing code change in nearly a decade, introducing an API and a bunch of new functionality.
Given its reputation, many commentators have already written this off with a shrug and a laugh. But 4chan is also one of the web's most popular and influential communities. It's the source of so many Internet-age cultural trends that even your grandma may be dimly aware that the clever picture she posted on her Facebook was trawled a thousand copies ago from the dark depths of /mlp/. Given that there's big money in all this, the API offers businesses a direct line to the heart of the machine.
As a professional software developer and long time 4chan user, I think this is a pretty interesting development. I talked yesterday afternoon to some of those who worked on 4chan's code over the years and know a little about why this is such an important development.
4chan, whose codebase is a heavily modified version of the Futallaby image board system, has suffered all kinds of software problems over the years. Note: moot says that Futallaby's code is almost entirely gone, and their software is named "Yotsuba" now. Volunteers running the site struggled with massive growth, hacks, denial-of-service attacks, regular crashes and much else besides. They were generally paid little, if at all, for their efforts, simply because there just wasn't any money to go around. Founder Chris Poole, aka moot, famously held $20,000 in credit card debt just trying to keep the site afloat. It was amazing that the site held on at all.
For nearly its entire history, 4chan was completely hands-off on software from the client side--i.e. you or anyone else interested in the data. Excepting messing with users by auto-playing obnoxious music or putting party hats on every post, the public-facing code changed little over the years and was aimed esclusively at web browsers. New features appear extremely rarely, and the developers I talked to could only identify of a handful in the last six years.
In May, however, 4chan announced a refactoring of the site's HTML output, the underlying structure of the page served to browsers. Yesterday, they announced three more big software-related changes:
• They're rolling the functionality of the most popular 4chan browser extensions into the site itself.
• They're adding a read-only JSON API, a way for outsiders to slurp up raw data on what's appearing at the site.
• Both of these changes are released and documented publicly on GitHub, a popular code repository.
May's HTML refactoring cleaned up years of cruft in 4chan's garbled source. This itself was significant, at the time, because it allowed users who had either written or thought about writing browser extensions to make much better versions with improved functionality. "mootykins" also asked that extension authors limit the number of requests they made to 4chan, in order to reduce load on the servers. As the default user experience is so sparse, extensions quickly grew to become a big part of 4chan users' experience. Their first official extension (for FireFox) was written in 2005, but user-written extensions appeared much earlier.
The API opens up new possibilities for third party developers. Where previously getting site content meant grabbing the HTML source (a horrible mess, even with the refactor) and attempting to parse it, developers can now grab content easily and parse it quickly in more versatile languages. This could lead to mobile phone apps (moot says this is unlikely, since Apple and Google both just kicked third party apps off their app stores), general site analytics, or simply detecting hot threads and trends throughout the site. With 4chan's tendency to generate new creative content, this is a pretty desirable feature.
Unfortunately, right now the API only works for individual threads and doesn't report info for a full board or for the site as a whole. Boards can be viewed as RSS in post-date order, but this doesn't include the most popular board, /b/. Also, 4chan's data is rendered as HTML before it's saved in the database, so the API doesn't do a fantastic job of separating out valuable info. Note: moot says that the API will soon be updated with endpoints for full boards. The 1.0 version was released to support the new inline extension
4chan is using version control and releasing information publicly. Although they've been using some form of version control since about 2006, this hasn't been well-known publicly; because of their chaotic nature I'd assumed they were still making changes live, on the public site. As recently as 2008, I was told, 4chan didn't have a real development environment set up for testing, though that may have changed since then— growth was so quick, and changes needed to be made so rapidly, that version control or development environment usage wasn't practical. 4chan's sharing of its code publicly (and letting people watch repositories where changes are being made) is a big step towards their code's transparency. They might even accept a pull request to the extensions script if a user made updates to it. For the most part, 4chan is deeply secretive. Most of the site's inner operations are rarely discussed, and people currently involved in the site didn't want to discuss its current workings even in broad terms.
So, if users and developers want this functionality, and these are positive changes for the site, why is this coming about only now after years of near-silence? First, browsers extensions became a popular early solution because not all their features were wanted by the whole community. Change on any site is hard, especially with a long-term user base. I know from experienc, in changing Boing Boing's design throughout the years, that even a slight change (or no change) can elicit some angry emails. And ourusers are pretty polite! I can't imagine what we'd do if we got DDoS'd by angry users every time we moved the nav bar.
These new updates on 4chan suggest two things: 4chan's userbase is slowly rolling over to where older, angrier users aren't around to complain, but also that 4chan is becoming more active in—and less afraid of—making site-wide changes. They're getting users used to it.
4chan's stability has also improved recently, so administrators are probably spending less time putting out fires. This may be partly due to them getting static cache flushing—a method of reducing how much load servers are placed under when users request pages—working properly for threads. Previously, each time someone posted, a new copy of the HTML thread had to be generated from scratch. Instead, now, the output is cached and a process periodically writes a new version on a schedule. Note: moot says this is the case, but that only three boards are rebuilt using a timer. 4chan never loaded content dynamically.
4chan's official browser extensions—not to mention encouraging other extension writers to throttle their countless users' manic request rate—probably improved server stability quite a bit as well.
Additionally, while 4chan mostly takes a laissez-faire approach to offensive content, it has strict rules. Most of 4chan's codebase is concerned with moderation and administrator functionality. Trolls and other obnoxious users may be effectively synonymous with 4chan—it's part of why there's very little money to be made there--but dealing with the worst remains a monstrous task.
Lastly, these changes were largely made by new, incoming volunteers. Traditionally, the volunteers working on the code don't have too much experience as software developers. In the early days, the developers were just cutting their chops on a large site; the "hackers", likewise, were script kiddies wreaking havoc with automated tools. 4chan must be attracting better developers now.
4chan's movements suggest that it's planning more active and organized development. They've made large changes to the site
and are closing down extensions. Note: moot says they're not "closing down" extensions. Its established user base is turning over more rapidly—or, perhaps, it's simply maturing. It's bringing in new developers, it's using version control, and its publicly releasing its source on Github. It's opened up with a JSON API so third party apps and projects can be made, even if the the available data are limited in scope. With a user base as large as has—22 million unique visitors making 1.3 billion pageviews in June this year—these changes should lead the site in interesting new directions.
Stanford’s Center for Research on Education Outcomes released this study in 2015, comparing the outcomes for students enrolled in online charter schools with comparable students (controlled for grade level, gender, race/ethnicity, free lunch eligibility, English language status, special ed status and historical state achievement test scores) in brick-and-mortar classrooms.
The World Wealth and Inequality project’s latest white-paper, co-authored by Thomas “Capital in the 21st Century” Piketty, painstaking pieces together fragmentary data-sources to build up a detailed picture of wealth inequality in Russia in the pre-revolutionary period; during phases of the Soviet era; on the eve of the collapse of the USSR; and ever since.
Parents of students enrolled in Lawton Chiles Middle Academy in Polk County, Florida got an orientation package offering their kids the right to skip to the front of the lunch line in exchange for a $100 donation to the Parent-Teacher-Student Alliance.
The Pry.Me Bottle Opener holds tens of thousands of times its own weight, and you can pick one up now from the Boing Boing Store.This remarkable keychain is considerably smaller than any of your keys, but don’t let that fool you: it can easily open any bottle, and could even tow a trailer full of […]
Guaranteeing your privacy online goes way beyond checking the “Do Not Track” option in your browser’s settings. To ensure that your internet activity is totally hidden from Internet Service Providers, advertisers, and other prying eyes, take a look at Windscribe’s VPN protection. It usually costs $7.50 per month, but you can get a 3-year subscription […]
This project management bundle will help you get organized and learn how to lead a team to success. You can pay what you want for these five courses when you pick them up from the Boing Boing Store.To help you become an invaluable asset for your company, this bundle includes a curated collection of professional […]