CIS 551 / TCOM 401

April 13, 2006


Lecture 23. Polymorphic Viruses and Evasion Techniques, Web Security

Scribed by Taehyun Kim



▪ Plan for today

-        to wrap up discussion of viruses and worms

-        web security



-        looks at the traffic on large scale in the Internet.

-        looks for common, invariant data in packets that occurs with high frequency and large dispersion in the address. this is the typical pattern that most worms exhibit.

-        To avoid getting detected by Earlybird, attackers can violate one of the three assumptions.



<Breaking 1st assumption>

▪ Polymorphic viruses/worms

-        make it hard to have invariant contents

-        mutate themselves as they replicate so that they are less similar between offspring and parents.

-        Looking for frequently appearing substring is not going to catch them.

-        should be done globally.

-        also have to be able to generate lots of random permutations. (Otherwise, they will still be detected by Earlybird.)


▪ Strategy

-        Spam : using different subject lines. - very simple.

-        The virus, as its replicating, generates keys randomly and encrypts the contents. : Virus somehow decrypts main body of the code using a random key and jumps to the code it decrypted. When the virus creates a new instance of itself, it generates a random key, encrypts most of the virus with this random key, and then copies that encrypted version + bootstrapping part of the virus that has the key embedded in it. Virus has to have the bootstrapping part that access to the key and decrypts the rest, but the decryption part of the code is going to be invariant across instances of the viruses.

-        Project 1 buffer overflow : we have a lot of latitude over what assembly code instructions we put in the beginning. It makes it easier to put NOPs, but by doing just a little bit work we can make the code look different for every instance.

- There are lots of ways you can generate instructions that have the affect of the NOP.

- Reordering independent instructions.

- Using different register names.

- Using equivalent instruction sequences.

-        Suppose that its somehow able to infect a program that communicates using a secure channel. The worm will be encrypted automatically as a part of the natural behavior of the machine, so its not going to be filtered.



<Breaking 2nd assumption>

▪ Worms dont have to scan randomly

-        You need some way of getting to access to a list of good machines. Suppose you want to get virus or worm that would target particular kind of web servers. The best way to spread your worm will be to directly contact to whole bunch of web servers that are running a particular version of fetch. Google ( powered by php / supported by fetch version)

-        You can make meta-server worm which uses the facilities of the internet itself to figure out where other vulnerable machines might be.

-        Also make use of topological information. If you infect a local machine running on the network and you have full controller on that machine, you can listen to the protocol to find out the names of other machines nearby. You dont have to generate your attack addresses randomly. You can just do topological search of the space to look for particular vulnerable software. Email viruses need to know the names of valid addresses and itll be pretty hard to just guess randomly. Instead, they can walk through address books.



<Breaking 3rd assumption>

▪ Propagate slowly

-        This is sort of counter intuitive. Most analyses focus on how quickly worms maliciously take over some fraction of the vulnerable machines. However, it might be more destructive in a long run, if you take very slow approach to infect machines. Its very careful not to set off any kind of intrusion detection systems that are looking for unusual frequency behavior. By slowly and subtly changing a bunch of small things over time you can corrupt a lot of data before things are noticeable.



▪ Witty Worm

-        It was trying to spread quickly.

-        used one of the UDP single packet attacks.

-        It was limited only by bandwidth, so it didnt need to worry about the round-trip time on the Internet.

-        Payload was supposed to slowly and randomly corrupt disk blocks over time.

-        Flaw had been announced the previous day. This means that some hacker was able to figure out the appropriate buffer overflow, construct the single packet attack in one day and generate this worm.

-        UCSD telescope picked it up.



▪ Web Security

Q : What are the things you worry about when using web applications?

A : Class answers

-        Links can lie in multiple ways – may not take you where you think they do.

-        More malicious users (authentication is a problem)

-        Cookies can reveal private information and questions of security.

-        Spyware / Malware : mobile code problem (Java applets and so on)

-        Eavesdropping / keylogger

-        Knowing whats going on : configuration management – not so much web security thing.

-        Embedded code / scripts / flash / ActiveX / executable contents

-        Authorization, etc. : access control

-        Profile stealing

-        Trusting remote sites with your confidential information



▪ Open web application security project

: looked at a large number of web applications to see what the biggest problems are.

-        invalidated input, buffer overflows, injection flaws, cross site scripting, etc.

-        A lot of web developers (typically web designers) have no actual experience with thinking about security.

-        Inappropriate error handling : revealing too much information by the data you send back.

-        Insecure storage




: a protocol on which web is built.

-        Stateless : the server actually doesnt keep any connection state about what clients have contacted in the past. Its very hard to do things like having session at Amazon that keeps track of what you have in your shopping cart. However, stateless protocols are much more scalable. trade-off



▪ HTTP body

-        Connection information : modern versions of HTTP allow you to reuse the same TCP connection for multiple requests and responses with the server. HTTP is self-extinguished, but the server would still maintain the TCP connection for client. (In the original version, if a client downloads a web page containing 20 pictures, every picture requires another TCP handshake)

-        Cookie information : Whenever you contact to a server, your web browser automatically uploads the cookies which are previously left on your machine.

-        Client information : What kind of web browsers running, what kind of operating system is running. You can just snoop some HTTP traffic and find out that people on a certain machine are running a certain version of Internet Explorer.

-        Preference for data it gets back.



▪ First issue of security

: URL (which are used to identify the contents, servers and clients.)

-        A lot of server security issues that come up in the web arise because URLs are complicated.

-        You can embed lots of contents in the URIs itself. Because HTTP is stateless, one of tricks you can do is store state in the URLs. You actually can see really complicated string of parameters when you go to a web page.

-        In terms of security, there are a lot of possibilities for misusing URLs. Most common thing is to change the URLs. (,

-        Also, you can make the URLs unreadable : use IP addresses instead of domain names, embed the characters as Unicode instead of actual ASCII syntax.



▪ Second Issue

: Stateless

-        Embed state in the URL.

-        Hidden Input fields : You can add these into the web pages, and then your client will post that information back to the server. The server can save information across requests by giving you back a web page that has hidden input fields. It sets a hidden text box with some string it want to remember, and next time when you click on some link, it posts that back to the server. So, the server can remember the state across actions of the client by hiding information in hidden input fields. You can detect this easily by just looking at the source code of the web page you get back.

-        Cookies : These store data on the client machine.



▪ Cookies

-        Whenever the server wants to store something on the client, it just sends back the reply that has set cookie line as part of the header. (the name of the cookie = some value) These are strings that are remembered by the web browsers.

-        server can specify a path and a domain. Cookies can be associated with particular domain names and particular paths within a domain. Those restrict when the cookies will be sent back to the server.

-        Setting a flag called secure lets only the client upload the cookies.

-        Whenever the client requests the URL to the server, the browser looks through the cookie cache to see if any of the cookies that you have match the URL in domain name and path specified in the cookies. If so, it adds one of those cookie lines to the request message.

-        New instances of cookies overwrite old ones. This can be used by cross site scripting attacks.

-        Clients arent required to purge the cookies whenever they are expired.

-        At most 4Kbytes long to prevent malicious servers from overloading the clients with too much cookie data.

-        HTTP proxy server shouldnt cache any of the set cookie headers because the state can be replayable.



▪ Cross site scripting

-        Web browsers dont just display HTML. They also run code. On the client side, you can have embedded scripts that get executed by the browser locally. Server also typically runs executable contents because we want dynamically generated web pages with new information.

-        CGI : allows the server to run arbitrary code written any programming language. (typically written C and Perl )

-        PHP : like preprocessor for HTML



How cross site scripting works

-        Suppose you try to access foo.html which doesnt exist. Then the server will give you back a web page with error message not found. URL has this foo.html, and the foo.html somehow has to show up inside this web page as text. That means that the server, when it gets this error message, copies some piece of URL into the HTML. Instead of foo.html, we can embed some java script. If the server just copies this string into the HTML page, Java script which got embedded into the error message by the server would run on the machine. However, this sort of attack is becoming well-known, so some modern servers wont just naively copy the URL.