DAVID KRAVETS | Wired |
Federal prosectors added nine new felony counts against well-known coder and activist Aaron Swartz, who was charged last year for allegedly breaching hacking laws by downloading millions of academic articles from a subscription database via an open connection at MIT.
Swartz, the 25-year-old executive director of Demand Progress, has a history of downloading massive data sets, both to use in research and to release public domain documents from behind paywalls. He surrendered in July 2011, remains free on bond and faces dozens of years in prison and a $1 million fine if convicted.
Like last year’s original grand jury indictment on four felony counts, (.pdf) the superseding indictment (.pdf) unveiled Thursday accuses Swartz of evading MIT’s attempts to kick his laptop off the network while downloading millions of documents from JSTOR, a not-for-profit company that provides searchable, digitized copies of academic journals that are normally inaccessible to the public.
Using a program named keepgrabbing.py, the scraping took place from September 2010 to January 2011 via MIT’s network, and was invasive enough to bring down JSTOR’s servers on several occasions, according to the indictment.
Disclosure: Swartz was part of a small team that sold Reddit to Condé Nast, Wired’s parent company, and has done coding work for Wired.
In essence, many of the charges stem from Swartz allegedly breaching the terms of service agreement for those using the research service.
“JSTOR authorizes users to download a limited number of journal articles at a time,” according to the latest indictment. “Before being given access to JSTOR’s digital archive, each user must agree and acknowledge that they cannot download or export content from JSTOR’s computer servers with automated programs such as web robots, spiders, and scrapers. JSTOR also uses computerized measures to prevent users from downloading an unauthorized number of articles using automated techniques.”
MIT authorizes guests to use the service, which was the case with Swartz, who at the time was a fellow at Harvard’s Safra Center for Ethics.
The case tests the reach of the Computer Fraud and Abuse Act, which was passed in 1984 to enhance the government’s ability to prosecute hackers who accessed computers to steal information or to disrupt or destroy computer functionality.
The government, however, has interpreted the anti-hacking provisions to include activities such as violating a website’s terms of service or a company’s computer usage policy, a position a federal appeals court in April said means “millions of unsuspecting individuals would find that they are engaging in criminal conduct.” The 9th U.S. Circuit Court of Appeals, in limiting reach of the CFAA, said that violations of employee contract agreements and websites’ terms of service were better left to civil lawsuits.
The rulings by the 9th Circuit cover the West, and not Massachusetts, meaning they are not binding in Swartz’ prosecution. The Obama administration has declined to appeal the ruling to the Supreme Court.
The indictment accuses Swartz of repeatedly spoofing the MAC address — an identifier that is usually static — of his computer after MIT blocked his computer based on that number. The grand jury indictment also notes that Swartz didn’t provide a real e-mail address when registering on the network. Swartz also allegedly snuck an Acer laptop bought just for the downloading into a closet at MIT in order to get a persistent connection to the network.
Swartz allegedly hid his face from surveillance cameras by holding his bike helmet up to his face and looking through the ventilation holes when going in to swap out an external drive used to store the documents. Swartz also allegedly named his guest account “Gary Host,” with the nickname “Ghost.”
Most of the new nine charges specify exact dates of the breaches, which include unauthorized computer access, computer fraud and unlawfully obtaining information. Generally, the original four-count indictment listed those allegations as single counts.
A 13th count on the superseding indictment — recklessly damaging a protected computer — is virtually the same as the final count 4 in the original indictment.
“The pace and volume of his automated requests impaired computers JSTOR used to provide service to researchers and research institutions and caused JSTOR to cut off legitimate MIT researchers for days at a time,” the amended indictment said.
Swartz’s attorney, Martin G. Weinberg, said his client would plead not guilty “to the series of restructured allegations” and intends to make “legal and factual defenses” to the allegations.
His history includes a study that peered through thousands of law review articles looking for law professors who had been paid by industry patrons to write papers. That study was published in 2008 in the Stanford Law Review.
Swartz is no stranger to the feds being interested in his skills at prodigious downloads.
In 2008, the federal court system decided to try out allowing free public access to its court record search system PACER at 17 libraries across the country. Swartz went to the 7th U.S. Circuit Court of Appeals library in Chicago and installed a small PERL script he had written. The code cycled sequentially through case numbers, requesting a new document from PACER every three seconds. In this manner, Swartz got nearly 20 million pages of court documents, which his script uploaded to Amazon’s EC2 cloud computing service.
While the documents are in the public record and free to share, PACER normally charges 10 cents per page.